Skip to content

feat: instrument plain-agents with opensearch-genai-observability-sdk-py#85

Open
vamsimanohar wants to merge 1 commit intomainfrom
feat/sdk-instrumented-agents
Open

feat: instrument plain-agents with opensearch-genai-observability-sdk-py#85
vamsimanohar wants to merge 1 commit intomainfrom
feat/sdk-instrumented-agents

Conversation

@vamsimanohar
Copy link
Member

@vamsimanohar vamsimanohar commented Mar 12, 2026

Summary

  • Replace manual OTel TracerProvider/exporter setup with the SDK's register() (one-line tracing config)
  • Replace tracer.start_as_current_span() + manual span.set_attribute() calls with observe() + enrich()
  • Net reduction of ~112 lines across orchestrator, events-agent, and weather-agent
  • All existing functionality preserved: fault injection, ASGI middleware, MCP integration, metrics, logging

Changes

File What changed
orchestrator/main.py setup_telemetry()register(), manual spans → observe() + enrich()
events-agent/main.py Same pattern, MCP tool call spans use observe() with manual MCP attributes
weather-agent/main.py setup_telemetry() split: register() for tracing + manual metrics/logs setup
weather-agent/server.py Manual span.set_attribute()enrich()
*/Dockerfile Replace opentelemetry-api + opentelemetry-sdk with opensearch-genai-observability-sdk-py>=0.2.6
weather-agent/pyproject.toml Same dependency change

Before (manual OTel)

def setup_telemetry(service_name, otlp_endpoint):
    resource = Resource.create({"service.name": service_name, ...})
    tracer_provider = TracerProvider(resource=resource)
    tracer_provider.add_span_processor(
        BatchSpanProcessor(OTLPSpanExporter(endpoint=otlp_endpoint, insecure=True))
    )
    trace.set_tracer_provider(tracer_provider)
    return trace.get_tracer(service_name)

# ... later in handler:
with tracer.start_as_current_span("chat", kind=SpanKind.INTERNAL) as chat_span:
    chat_span.set_attribute("gen_ai.operation.name", "chat")
    chat_span.set_attribute("gen_ai.system", SYSTEMS[model])
    chat_span.set_attribute("gen_ai.request.model", model)
    chat_span.set_attribute("gen_ai.usage.input_tokens", input_tokens)
    chat_span.set_attribute("gen_ai.usage.output_tokens", output_tokens)
    chat_span.set_attribute("gen_ai.response.finish_reasons", ["tool_calls"])

After (SDK)

register(endpoint="grpc://localhost:4317", service_name="travel-planner")

# ... later in handler:
with observe("planning", op=Op.CHAT):
    enrich(model=model, provider=provider,
           input_tokens=input_tokens, output_tokens=output_tokens,
           finish_reason="tool_calls")

Dependencies

Requires opensearch-genai-observability-sdk-py >= 0.2.6 (pending release from opensearch-project/genai-observability-sdk-py#8).

Test plan

  • Verify Docker builds succeed with SDK dependency
  • Run multi-agent-planner with docker compose up and verify traces appear in OpenSearch
  • Compare trace attributes between SDK and manual versions are equivalent
  • Verify fault injection still works as expected

@vamsimanohar vamsimanohar force-pushed the feat/sdk-instrumented-agents branch 2 times, most recently from 3e39abc to 1dce40d Compare March 12, 2026 23:53
@github-actions
Copy link

✅ Docs preview build completed successfully! All tests passed.

@codecov
Copy link

codecov bot commented Mar 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 18.51%. Comparing base (c65cb60) to head (1dce40d).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #85   +/-   ##
=======================================
  Coverage   18.51%   18.51%           
=======================================
  Files           3        3           
  Lines          54       54           
  Branches       19       18    -1     
=======================================
  Hits           10       10           
  Misses         44       44           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator

@kylehounslow kylehounslow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @vamsimanohar — the observe() + enrich() pattern reads much better than the manual span setup, and -115 net lines is a win.

Some things I noticed:


1. Bug fix

Good catch on tool_spanagent_span in the orchestrator's events-agent error handler. That was referencing the wrong scope.

2. service.version dropped from Resource

The original setup_telemetry() set service.version: "1.0.0" on the Resource. The SDK's register() only sets service.name, so service.version is gone from all spans. It's a standard OTel resource attribute — not GenAI-specific but useful for debugging. Is this a gap in the SDK's register() API we should address?

(I checked whether removing gen_ai.agent.id and gen_ai.agent.name from the Resource was a problem — it's not. Per the GenAI semconv, those are span-level attributes. The original code was over-promoting them to the Resource. This PR is actually more correct.)

3. gen_ai.system is deprecated

Both old and new code set gen_ai.system. The current semconv registry marks it deprecated in favor of gen_ai.provider.name. Not a regression so not blocking, but worth a follow-up on the SDK to emit gen_ai.provider.name.

4. gen_ai.agent.name missing on sub-agent invoke spans

The original orchestrator set gen_ai.agent.name on the weather/events invoke spans:

agent_span.set_attribute("gen_ai.agent.name", "weather-agent")

The PR uses observe("weather-agent", op=Op.INVOKE_AGENT) which sets the span name but not the gen_ai.agent.name attribute. Per semconv it's Conditionally Required on invoke_agent spans when the name is known. Quick fix — add an enrich() or manual set_attribute inside those blocks.

5. enrich() output placement

In the orchestrator and events-agent, the final enrich(output_messages=...) is called after exiting the with observe(...) block, which targets the parent/root span. The comments explain the intent. Is this the intended SDK pattern for enriching the parent? If so might be worth documenting.

6. Metrics Resource created separately

The metrics setup creates its own Resource.create({"service.name": "..."}) independently from register(). Trace and metrics Resources could drift. Minor — maybe a # TODO: unify when SDK supports metrics comment.

7. Lock file changes

The package-lock.json bumps (devalue, tar, yauzl, svgo) look like routine dep updates. Any reason to bundle them here vs. a separate PR?

8. Test plan

The PR description has a test plan with unchecked boxes — have you run through these locally?

  • Docker builds succeed with SDK dependency
  • Traces appear in OpenSearch with correct attributes
  • Trace attributes equivalent between SDK and manual versions
  • Fault injection still works

This review was co-authored with Claude Opus 4 (claude-4.6).

@vamsimanohar vamsimanohar force-pushed the feat/sdk-instrumented-agents branch 2 times, most recently from 9996f66 to c472565 Compare March 17, 2026 04:15
@github-actions
Copy link

✅ Docs preview build completed successfully! All tests passed.

Replace manual OTel TracerProvider/exporter setup and span.set_attribute()
calls with the SDK's register(), observe(), and enrich() APIs.

- register() replaces ~20-30 lines of TracerProvider + exporter config
- observe() as context manager replaces tracer.start_as_current_span()
- enrich() replaces dozens of span.set_attribute("gen_ai.*", ...) calls
- Net reduction of ~112 lines while preserving identical trace output
- Metrics and logging setup unchanged (SDK handles tracing only)
- Fault injection, ASGI middleware, MCP integration all preserved

Requires opensearch-genai-observability-sdk-py >= 0.2.6.

Signed-off-by: Vamshi Vijay Nakkirtha <vamsimanohar@gmail.com>
Signed-off-by: Vamsi Manohar <reddyvam@amazon.com>
@vamsimanohar vamsimanohar force-pushed the feat/sdk-instrumented-agents branch from c472565 to 21175a6 Compare March 17, 2026 04:21
vamsimanohar added a commit to opensearch-project/genai-observability-sdk-py that referenced this pull request Mar 17, 2026
- Replace deprecated gen_ai.system with gen_ai.provider.name per
  OTel semconv v1.40.0
- Add service_version param to register() with OTEL_SERVICE_VERSION
  env var fallback, setting service.version on the OTel Resource

Addresses review comments from opensearch-project/observability-stack#85.

Signed-off-by: Vamsi Manohar <reddyvam@amazon.com>
@vamsimanohar
Copy link
Member Author

Points 2 & 3 addressed in the SDK: opensearch-project/genai-observability-sdk-py#20

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants