Skip to content

OpenTelemetry tracing support#786

Open
tewbo wants to merge 15 commits intoydb-platform:mainfrom
tewbo:otel-tracing-support
Open

OpenTelemetry tracing support#786
tewbo wants to merge 15 commits intoydb-platform:mainfrom
tewbo:otel-tracing-support

Conversation

@tewbo
Copy link
Copy Markdown

@tewbo tewbo commented Mar 21, 2026

Pull request type

  • Bugfix
  • Feature
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • Documentation content changes
  • Other (please describe):

What is the current behavior?

The YDB Python SDK does not provide built-in OpenTelemetry tracing support. There is legacy integration with OpenTracing API which uses the deprecated standard.

Issue Number: N/A

What is the new behavior?

Adds OpenTelemetry tracing support to the YDB Python SDK. When enabled via enable_tracing(), the SDK automatically creates spans for key operations:

  • ydb.CreateSession — session creation
  • ydb.ExecuteQuery — query execution (session and transaction level, both sync and async)
  • ydb.Commit / ydb.Rollback — transaction commit and rollback
  • ydb.Driver.Initialize — driver initialization

Each span includes standard attributes: db.system.name, db.namespace, server.address, server.port, ydb.session.id, ydb.node.id, ydb.tx.id.

W3C Trace Context (traceparent) is automatically propagated in gRPC metadata, enabling end-to-end distributed tracing between client and YDB server. Execute spans cover the full operation lifecycle, including streaming result iteration — not just the initial gRPC call. Errors are recorded on spans with error.type, db.response.status_code, and exception events.

Tracing is opt-in (pip install ydb[tracing] + enable_tracing()). Without it, the SDK behavior is unchanged — all tracing code paths are no-op.

Other information

  • Includes unit tests for sync, async, error handling, parent-child relationships, context propagation, noop mode, and concurrent span isolation

@tewbo tewbo marked this pull request as draft March 23, 2026 10:36
@KirillKurdyukov KirillKurdyukov self-requested a review March 23, 2026 13:47
@tewbo tewbo marked this pull request as ready for review March 24, 2026 07:02
Comment thread ydb/opentelemetry/_plugin.py Outdated
Comment thread setup.py Outdated
Comment thread examples/opentelemetry/example.py
Comment thread ydb/opentelemetry/_plugin.py Outdated
@tewbo tewbo marked this pull request as draft April 9, 2026 17:56
@tewbo tewbo marked this pull request as ready for review April 9, 2026 18:15
KirillKurdyukov and others added 3 commits April 20, 2026 14:38
Query-service retries now emit an umbrella INTERNAL ydb.RunWithRetry span
and a ydb.Try INTERNAL span per attempt. Each ydb.Try carries the
ydb.retry.backoff_ms attribute (the sleep preceding the attempt — 0 for
the first one, i.e. the next-attempt timeline includes the backoff).
Retriable exceptions are recorded on the owning ydb.Try span, and an
exception that escapes the whole retry loop (including an
asyncio.CancelledError hitting a backoff sleep) is recorded on the outer
ydb.RunWithRetry span.

CLIENT spans (ydb.CreateSession, ydb.ExecuteQuery, ydb.Commit,
ydb.Rollback) now also emit network.peer.address / network.peer.port
for the concrete node serving the session, while server.address /
server.port keep meaning the host from the connection string.

Also fixes a "Пр" typo in docs/opentelemetry.rst and corrects span names
(ydb.CommitTransaction -> ydb.Commit, ydb.RollbackTransaction -> ydb.Rollback).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move ydb.RunWithRetry / ydb.Try span emission directly into
retry_operation_sync / retry_operation_async in ydb/retries.py, and drop
the short-lived ydb.query._retries shim. Tracing is still no-op by
default, so there is no cost for the table-service callers that share
the same retry loop; we just stop duplicating the retry logic to add
spans.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…p session.id/tx.id

RPC (CLIENT-kind) spans now carry the peer metadata from the discovery
endpoint map, not from the grpc-target string of the request:

  * network.peer.address = EndpointInfo.address (the node host)
  * network.peer.port    = EndpointInfo.port
  * ydb.node.dc          = EndpointInfo.location

To do that, EndpointOptions and Connection now also carry address/port/
location populated by resolver.endpoints_with_options(); sessions
resolve their peer tuple via driver._store.connections_by_node_id after
CreateSession returns, which is the right place to ask which node owns
this session.

Dropped the noisy ydb.session.id and ydb.tx.id attributes - they pollute
every span and are recoverable from trace context if really needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants