feat: instrument plain-agents with opensearch-genai-observability-sdk-py#85
feat: instrument plain-agents with opensearch-genai-observability-sdk-py#85vamsimanohar wants to merge 1 commit intomainfrom
Conversation
3e39abc to
1dce40d
Compare
|
✅ Docs preview build completed successfully! All tests passed. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #85 +/- ##
=======================================
Coverage 18.51% 18.51%
=======================================
Files 3 3
Lines 54 54
Branches 19 18 -1
=======================================
Hits 10 10
Misses 44 44 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
kylehounslow
left a comment
There was a problem hiding this comment.
Nice work @vamsimanohar — the observe() + enrich() pattern reads much better than the manual span setup, and -115 net lines is a win.
Some things I noticed:
1. Bug fix
Good catch on tool_span → agent_span in the orchestrator's events-agent error handler. That was referencing the wrong scope.
2. service.version dropped from Resource
The original setup_telemetry() set service.version: "1.0.0" on the Resource. The SDK's register() only sets service.name, so service.version is gone from all spans. It's a standard OTel resource attribute — not GenAI-specific but useful for debugging. Is this a gap in the SDK's register() API we should address?
(I checked whether removing gen_ai.agent.id and gen_ai.agent.name from the Resource was a problem — it's not. Per the GenAI semconv, those are span-level attributes. The original code was over-promoting them to the Resource. This PR is actually more correct.)
3. gen_ai.system is deprecated
Both old and new code set gen_ai.system. The current semconv registry marks it deprecated in favor of gen_ai.provider.name. Not a regression so not blocking, but worth a follow-up on the SDK to emit gen_ai.provider.name.
4. gen_ai.agent.name missing on sub-agent invoke spans
The original orchestrator set gen_ai.agent.name on the weather/events invoke spans:
agent_span.set_attribute("gen_ai.agent.name", "weather-agent")The PR uses observe("weather-agent", op=Op.INVOKE_AGENT) which sets the span name but not the gen_ai.agent.name attribute. Per semconv it's Conditionally Required on invoke_agent spans when the name is known. Quick fix — add an enrich() or manual set_attribute inside those blocks.
5. enrich() output placement
In the orchestrator and events-agent, the final enrich(output_messages=...) is called after exiting the with observe(...) block, which targets the parent/root span. The comments explain the intent. Is this the intended SDK pattern for enriching the parent? If so might be worth documenting.
6. Metrics Resource created separately
The metrics setup creates its own Resource.create({"service.name": "..."}) independently from register(). Trace and metrics Resources could drift. Minor — maybe a # TODO: unify when SDK supports metrics comment.
7. Lock file changes
The package-lock.json bumps (devalue, tar, yauzl, svgo) look like routine dep updates. Any reason to bundle them here vs. a separate PR?
8. Test plan
The PR description has a test plan with unchecked boxes — have you run through these locally?
- Docker builds succeed with SDK dependency
- Traces appear in OpenSearch with correct attributes
- Trace attributes equivalent between SDK and manual versions
- Fault injection still works
This review was co-authored with Claude Opus 4 (claude-4.6).
9996f66 to
c472565
Compare
|
✅ Docs preview build completed successfully! All tests passed. |
Replace manual OTel TracerProvider/exporter setup and span.set_attribute()
calls with the SDK's register(), observe(), and enrich() APIs.
- register() replaces ~20-30 lines of TracerProvider + exporter config
- observe() as context manager replaces tracer.start_as_current_span()
- enrich() replaces dozens of span.set_attribute("gen_ai.*", ...) calls
- Net reduction of ~112 lines while preserving identical trace output
- Metrics and logging setup unchanged (SDK handles tracing only)
- Fault injection, ASGI middleware, MCP integration all preserved
Requires opensearch-genai-observability-sdk-py >= 0.2.6.
Signed-off-by: Vamshi Vijay Nakkirtha <vamsimanohar@gmail.com>
Signed-off-by: Vamsi Manohar <reddyvam@amazon.com>
c472565 to
21175a6
Compare
- Replace deprecated gen_ai.system with gen_ai.provider.name per OTel semconv v1.40.0 - Add service_version param to register() with OTEL_SERVICE_VERSION env var fallback, setting service.version on the OTel Resource Addresses review comments from opensearch-project/observability-stack#85. Signed-off-by: Vamsi Manohar <reddyvam@amazon.com>
|
Points 2 & 3 addressed in the SDK: opensearch-project/genai-observability-sdk-py#20 |
Summary
TracerProvider/exporter setup with the SDK'sregister()(one-line tracing config)tracer.start_as_current_span()+ manualspan.set_attribute()calls withobserve()+enrich()Changes
orchestrator/main.pysetup_telemetry()→register(), manual spans →observe()+enrich()events-agent/main.pyobserve()with manual MCP attributesweather-agent/main.pysetup_telemetry()split:register()for tracing + manual metrics/logs setupweather-agent/server.pyspan.set_attribute()→enrich()*/Dockerfileopentelemetry-api+opentelemetry-sdkwithopensearch-genai-observability-sdk-py>=0.2.6weather-agent/pyproject.tomlBefore (manual OTel)
After (SDK)
Dependencies
Requires
opensearch-genai-observability-sdk-py >= 0.2.6(pending release from opensearch-project/genai-observability-sdk-py#8).Test plan
docker compose upand verify traces appear in OpenSearch