feat: instrument plain-agents with opensearch-genai-observability-sdk-py by vamsimanohar · Pull Request #85 · opensearch-project/observability-stack

vamsimanohar · 2026-03-12T21:54:14Z

Summary

Replace manual OTel TracerProvider/exporter setup with the SDK's register() (one-line tracing config)
Replace tracer.start_as_current_span() + manual span.set_attribute() calls with observe() + enrich()
Net reduction of ~112 lines across orchestrator, events-agent, and weather-agent
All existing functionality preserved: fault injection, ASGI middleware, MCP integration, metrics, logging

Changes

File	What changed
`orchestrator/main.py`	`setup_telemetry()` → `register()`, manual spans → `observe()` + `enrich()`
`events-agent/main.py`	Same pattern, MCP tool call spans use `observe()` with manual MCP attributes
`weather-agent/main.py`	`setup_telemetry()` split: `register()` for tracing + manual metrics/logs setup
`weather-agent/server.py`	Manual `span.set_attribute()` → `enrich()`
`*/Dockerfile`	Replace `opentelemetry-api` + `opentelemetry-sdk` with `opensearch-genai-observability-sdk-py>=0.2.6`
`weather-agent/pyproject.toml`	Same dependency change

Before (manual OTel)

def setup_telemetry(service_name, otlp_endpoint):
    resource = Resource.create({"service.name": service_name, ...})
    tracer_provider = TracerProvider(resource=resource)
    tracer_provider.add_span_processor(
        BatchSpanProcessor(OTLPSpanExporter(endpoint=otlp_endpoint, insecure=True))
    )
    trace.set_tracer_provider(tracer_provider)
    return trace.get_tracer(service_name)

# ... later in handler:
with tracer.start_as_current_span("chat", kind=SpanKind.INTERNAL) as chat_span:
    chat_span.set_attribute("gen_ai.operation.name", "chat")
    chat_span.set_attribute("gen_ai.system", SYSTEMS[model])
    chat_span.set_attribute("gen_ai.request.model", model)
    chat_span.set_attribute("gen_ai.usage.input_tokens", input_tokens)
    chat_span.set_attribute("gen_ai.usage.output_tokens", output_tokens)
    chat_span.set_attribute("gen_ai.response.finish_reasons", ["tool_calls"])

After (SDK)

register(endpoint="grpc://localhost:4317", service_name="travel-planner")

# ... later in handler:
with observe("planning", op=Op.CHAT):
    enrich(model=model, provider=provider,
           input_tokens=input_tokens, output_tokens=output_tokens,
           finish_reason="tool_calls")

Dependencies

Requires opensearch-genai-observability-sdk-py >= 0.2.6 (pending release from opensearch-project/genai-observability-sdk-py#8).

Test plan

Verify Docker builds succeed with SDK dependency
Run multi-agent-planner with docker compose up and verify traces appear in OpenSearch
Compare trace attributes between SDK and manual versions are equivalent
Verify fault injection still works as expected

github-actions · 2026-03-12T23:54:36Z

✅ Docs preview build completed successfully! All tests passed.

codecov · 2026-03-12T23:55:13Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 18.51%. Comparing base (c65cb60) to head (1dce40d).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main      #85   +/-   ##
=======================================
  Coverage   18.51%   18.51%           
=======================================
  Files           3        3           
  Lines          54       54           
  Branches       19       18    -1     
=======================================
  Hits           10       10           
  Misses         44       44

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

kylehounslow

Nice work @vamsimanohar — the observe() + enrich() pattern reads much better than the manual span setup, and -115 net lines is a win.

Some things I noticed:

1. Bug fix

Good catch on tool_span → agent_span in the orchestrator's events-agent error handler. That was referencing the wrong scope.

2. `service.version` dropped from Resource

The original setup_telemetry() set service.version: "1.0.0" on the Resource. The SDK's register() only sets service.name, so service.version is gone from all spans. It's a standard OTel resource attribute — not GenAI-specific but useful for debugging. Is this a gap in the SDK's register() API we should address?

(I checked whether removing gen_ai.agent.id and gen_ai.agent.name from the Resource was a problem — it's not. Per the GenAI semconv, those are span-level attributes. The original code was over-promoting them to the Resource. This PR is actually more correct.)

3. `gen_ai.system` is deprecated

Both old and new code set gen_ai.system. The current semconv registry marks it deprecated in favor of gen_ai.provider.name. Not a regression so not blocking, but worth a follow-up on the SDK to emit gen_ai.provider.name.

4. `gen_ai.agent.name` missing on sub-agent invoke spans

The original orchestrator set gen_ai.agent.name on the weather/events invoke spans:

agent_span.set_attribute("gen_ai.agent.name", "weather-agent")

The PR uses observe("weather-agent", op=Op.INVOKE_AGENT) which sets the span name but not the gen_ai.agent.name attribute. Per semconv it's Conditionally Required on invoke_agent spans when the name is known. Quick fix — add an enrich() or manual set_attribute inside those blocks.

5. `enrich()` output placement

In the orchestrator and events-agent, the final enrich(output_messages=...) is called after exiting the with observe(...) block, which targets the parent/root span. The comments explain the intent. Is this the intended SDK pattern for enriching the parent? If so might be worth documenting.

6. Metrics Resource created separately

The metrics setup creates its own Resource.create({"service.name": "..."}) independently from register(). Trace and metrics Resources could drift. Minor — maybe a # TODO: unify when SDK supports metrics comment.

7. Lock file changes

The package-lock.json bumps (devalue, tar, yauzl, svgo) look like routine dep updates. Any reason to bundle them here vs. a separate PR?

8. Test plan

The PR description has a test plan with unchecked boxes — have you run through these locally?

Docker builds succeed with SDK dependency
Traces appear in OpenSearch with correct attributes
Trace attributes equivalent between SDK and manual versions
Fault injection still works

This review was co-authored with Claude Opus 4 (claude-4.6).

github-actions · 2026-03-17T04:16:02Z

✅ Docs preview build completed successfully! All tests passed.

Replace manual OTel TracerProvider/exporter setup and span.set_attribute() calls with the SDK's register(), observe(), and enrich() APIs. - register() replaces ~20-30 lines of TracerProvider + exporter config - observe() as context manager replaces tracer.start_as_current_span() - enrich() replaces dozens of span.set_attribute("gen_ai.*", ...) calls - Net reduction of ~112 lines while preserving identical trace output - Metrics and logging setup unchanged (SDK handles tracing only) - Fault injection, ASGI middleware, MCP integration all preserved Requires opensearch-genai-observability-sdk-py >= 0.2.6. Signed-off-by: Vamshi Vijay Nakkirtha <vamsimanohar@gmail.com> Signed-off-by: Vamsi Manohar <reddyvam@amazon.com>

- Replace deprecated gen_ai.system with gen_ai.provider.name per OTel semconv v1.40.0 - Add service_version param to register() with OTEL_SERVICE_VERSION env var fallback, setting service.version on the OTel Resource Addresses review comments from opensearch-project/observability-stack#85. Signed-off-by: Vamsi Manohar <reddyvam@amazon.com>

vamsimanohar · 2026-03-17T04:53:38Z

Points 2 & 3 addressed in the SDK: opensearch-project/genai-observability-sdk-py#20

vamsimanohar requested review from anirudha, goyamegh, kylehounslow and ps48 as code owners March 12, 2026 21:54

vamsimanohar force-pushed the feat/sdk-instrumented-agents branch 2 times, most recently from 3e39abc to 1dce40d Compare March 12, 2026 23:53

kylehounslow reviewed Mar 16, 2026

View reviewed changes

kylehounslow requested changes Mar 16, 2026

View reviewed changes

vamsimanohar force-pushed the feat/sdk-instrumented-agents branch 2 times, most recently from 9996f66 to c472565 Compare March 17, 2026 04:15

vamsimanohar force-pushed the feat/sdk-instrumented-agents branch from c472565 to 21175a6 Compare March 17, 2026 04:21

vamsimanohar mentioned this pull request Mar 17, 2026

fix: update semconv attrs and add service.version to register() opensearch-project/genai-observability-sdk-py#20

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: instrument plain-agents with opensearch-genai-observability-sdk-py#85

feat: instrument plain-agents with opensearch-genai-observability-sdk-py#85
vamsimanohar wants to merge 1 commit intomainfrom
feat/sdk-instrumented-agents

vamsimanohar commented Mar 12, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 12, 2026

Uh oh!

codecov bot commented Mar 12, 2026

Uh oh!

kylehounslow left a comment

Uh oh!

github-actions bot commented Mar 17, 2026

Uh oh!

vamsimanohar commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vamsimanohar commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Before (manual OTel)

After (SDK)

Dependencies

Test plan

Uh oh!

github-actions bot commented Mar 12, 2026

Uh oh!

codecov bot commented Mar 12, 2026

Codecov Report

Uh oh!

kylehounslow left a comment

Choose a reason for hiding this comment

1. Bug fix

2. service.version dropped from Resource

3. gen_ai.system is deprecated

4. gen_ai.agent.name missing on sub-agent invoke spans

5. enrich() output placement

6. Metrics Resource created separately

7. Lock file changes

8. Test plan

Uh oh!

github-actions bot commented Mar 17, 2026

Uh oh!

vamsimanohar commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vamsimanohar commented Mar 12, 2026 •

edited

Loading

2. `service.version` dropped from Resource

3. `gen_ai.system` is deprecated

4. `gen_ai.agent.name` missing on sub-agent invoke spans

5. `enrich()` output placement