Report gRPC status code in client-computed stats#10805
Conversation
9ca27a6 to
b39ed92
Compare
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 64 metrics, 7 unstable metrics. Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.61.0-SNAPSHOT~c7b581dd08, baseline=1.61.0-SNAPSHOT~7cff99444b
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.059 s) : 0, 1058921
Total [baseline] (8.897 s) : 0, 8897195
Agent [candidate] (1.064 s) : 0, 1064257
Total [candidate] (8.879 s) : 0, 8878941
section iast
Agent [baseline] (1.226 s) : 0, 1226066
Total [baseline] (9.56 s) : 0, 9560060
Agent [candidate] (1.224 s) : 0, 1223811
Total [candidate] (9.652 s) : 0, 9652123
gantt
title insecure-bank - break down per module: candidate=1.61.0-SNAPSHOT~c7b581dd08, baseline=1.61.0-SNAPSHOT~7cff99444b
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.211 ms) : 0, 1211
crashtracking [candidate] (1.209 ms) : 0, 1209
BytebuddyAgent [baseline] (626.943 ms) : 0, 626943
BytebuddyAgent [candidate] (631.714 ms) : 0, 631714
AgentMeter [baseline] (29.015 ms) : 0, 29015
AgentMeter [candidate] (29.4 ms) : 0, 29400
GlobalTracer [baseline] (256.756 ms) : 0, 256756
GlobalTracer [candidate] (258.464 ms) : 0, 258464
AppSec [baseline] (31.478 ms) : 0, 31478
AppSec [candidate] (31.763 ms) : 0, 31763
Debugger [baseline] (58.555 ms) : 0, 58555
Debugger [candidate] (59.004 ms) : 0, 59004
Remote Config [baseline] (615.073 µs) : 0, 615
Remote Config [candidate] (625.727 µs) : 0, 626
Telemetry [baseline] (8.651 ms) : 0, 8651
Telemetry [candidate] (8.692 ms) : 0, 8692
Flare Poller [baseline] (9.53 ms) : 0, 9530
Flare Poller [candidate] (7.224 ms) : 0, 7224
section iast
crashtracking [baseline] (1.197 ms) : 0, 1197
crashtracking [candidate] (1.192 ms) : 0, 1192
BytebuddyAgent [baseline] (795.527 ms) : 0, 795527
BytebuddyAgent [candidate] (793.893 ms) : 0, 793893
AgentMeter [baseline] (11.351 ms) : 0, 11351
AgentMeter [candidate] (11.298 ms) : 0, 11298
GlobalTracer [baseline] (247.229 ms) : 0, 247229
GlobalTracer [candidate] (247.26 ms) : 0, 247260
AppSec [baseline] (26.397 ms) : 0, 26397
AppSec [candidate] (26.316 ms) : 0, 26316
Debugger [baseline] (62.971 ms) : 0, 62971
Debugger [candidate] (62.385 ms) : 0, 62385
Remote Config [baseline] (513.386 µs) : 0, 513
Remote Config [candidate] (519.449 µs) : 0, 519
Telemetry [baseline] (14.783 ms) : 0, 14783
Telemetry [candidate] (14.927 ms) : 0, 14927
Flare Poller [baseline] (4.87 ms) : 0, 4870
Flare Poller [candidate] (4.894 ms) : 0, 4894
IAST [baseline] (25.157 ms) : 0, 25157
IAST [candidate] (25.142 ms) : 0, 25142
Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.61.0-SNAPSHOT~c7b581dd08, baseline=1.61.0-SNAPSHOT~7cff99444b
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.059 s) : 0, 1058895
Total [baseline] (11.035 s) : 0, 11035082
Agent [candidate] (1.074 s) : 0, 1074038
Total [candidate] (11.117 s) : 0, 11117480
section appsec
Agent [baseline] (1.254 s) : 0, 1253761
Total [baseline] (11.144 s) : 0, 11143896
Agent [candidate] (1.251 s) : 0, 1250784
Total [candidate] (11.159 s) : 0, 11158870
section iast
Agent [baseline] (1.227 s) : 0, 1227461
Total [baseline] (11.334 s) : 0, 11334347
Agent [candidate] (1.239 s) : 0, 1238961
Total [candidate] (11.447 s) : 0, 11446779
section profiling
Agent [baseline] (1.188 s) : 0, 1187505
Total [baseline] (11.04 s) : 0, 11040279
Agent [candidate] (1.189 s) : 0, 1188884
Total [candidate] (11.037 s) : 0, 11036555
gantt
title petclinic - break down per module: candidate=1.61.0-SNAPSHOT~c7b581dd08, baseline=1.61.0-SNAPSHOT~7cff99444b
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.218 ms) : 0, 1218
crashtracking [candidate] (1.219 ms) : 0, 1219
BytebuddyAgent [baseline] (627.315 ms) : 0, 627315
BytebuddyAgent [candidate] (637.142 ms) : 0, 637142
AgentMeter [baseline] (29.138 ms) : 0, 29138
AgentMeter [candidate] (29.513 ms) : 0, 29513
GlobalTracer [baseline] (256.978 ms) : 0, 256978
GlobalTracer [candidate] (260.022 ms) : 0, 260022
AppSec [baseline] (31.495 ms) : 0, 31495
AppSec [candidate] (31.98 ms) : 0, 31980
Debugger [baseline] (59.527 ms) : 0, 59527
Debugger [candidate] (60.303 ms) : 0, 60303
Remote Config [baseline] (613.652 µs) : 0, 614
Remote Config [candidate] (629.637 µs) : 0, 630
Telemetry [baseline] (8.652 ms) : 0, 8652
Telemetry [candidate] (8.763 ms) : 0, 8763
Flare Poller [baseline] (8.0 ms) : 0, 8000
Flare Poller [candidate] (8.093 ms) : 0, 8093
section appsec
crashtracking [baseline] (1.224 ms) : 0, 1224
crashtracking [candidate] (1.198 ms) : 0, 1198
BytebuddyAgent [baseline] (662.917 ms) : 0, 662917
BytebuddyAgent [candidate] (660.75 ms) : 0, 660750
AgentMeter [baseline] (12.135 ms) : 0, 12135
AgentMeter [candidate] (12.075 ms) : 0, 12075
GlobalTracer [baseline] (259.569 ms) : 0, 259569
GlobalTracer [candidate] (259.282 ms) : 0, 259282
AppSec [baseline] (177.997 ms) : 0, 177997
AppSec [candidate] (177.734 ms) : 0, 177734
Debugger [baseline] (66.269 ms) : 0, 66269
Debugger [candidate] (66.069 ms) : 0, 66069
Remote Config [baseline] (580.652 µs) : 0, 581
Remote Config [candidate] (569.002 µs) : 0, 569
Telemetry [baseline] (8.961 ms) : 0, 8961
Telemetry [candidate] (9.084 ms) : 0, 9084
Flare Poller [baseline] (3.622 ms) : 0, 3622
Flare Poller [candidate] (3.674 ms) : 0, 3674
IAST [baseline] (24.115 ms) : 0, 24115
IAST [candidate] (24.068 ms) : 0, 24068
section iast
crashtracking [baseline] (1.196 ms) : 0, 1196
crashtracking [candidate] (1.211 ms) : 0, 1211
BytebuddyAgent [baseline] (795.492 ms) : 0, 795492
BytebuddyAgent [candidate] (803.37 ms) : 0, 803370
AgentMeter [baseline] (11.278 ms) : 0, 11278
AgentMeter [candidate] (11.594 ms) : 0, 11594
GlobalTracer [baseline] (247.509 ms) : 0, 247509
GlobalTracer [candidate] (249.601 ms) : 0, 249601
AppSec [baseline] (26.416 ms) : 0, 26416
AppSec [candidate] (26.712 ms) : 0, 26712
Debugger [baseline] (66.047 ms) : 0, 66047
Debugger [candidate] (65.151 ms) : 0, 65151
Remote Config [baseline] (540.992 µs) : 0, 541
Remote Config [candidate] (530.885 µs) : 0, 531
Telemetry [baseline] (13.437 ms) : 0, 13437
Telemetry [candidate] (14.396 ms) : 0, 14396
Flare Poller [baseline] (4.419 ms) : 0, 4419
Flare Poller [candidate] (4.772 ms) : 0, 4772
IAST [baseline] (25.088 ms) : 0, 25088
IAST [candidate] (25.456 ms) : 0, 25456
section profiling
crashtracking [baseline] (1.182 ms) : 0, 1182
crashtracking [candidate] (1.183 ms) : 0, 1183
BytebuddyAgent [baseline] (686.496 ms) : 0, 686496
BytebuddyAgent [candidate] (687.198 ms) : 0, 687198
AgentMeter [baseline] (8.671 ms) : 0, 8671
AgentMeter [candidate] (8.707 ms) : 0, 8707
GlobalTracer [baseline] (216.327 ms) : 0, 216327
GlobalTracer [candidate] (216.519 ms) : 0, 216519
AppSec [baseline] (32.034 ms) : 0, 32034
AppSec [candidate] (32.169 ms) : 0, 32169
Debugger [baseline] (64.704 ms) : 0, 64704
Debugger [candidate] (63.231 ms) : 0, 63231
Remote Config [baseline] (574.926 µs) : 0, 575
Remote Config [candidate] (589.598 µs) : 0, 590
Telemetry [baseline] (8.984 ms) : 0, 8984
Telemetry [candidate] (10.551 ms) : 0, 10551
Flare Poller [baseline] (3.495 ms) : 0, 3495
Flare Poller [candidate] (3.517 ms) : 0, 3517
ProfilingAgent [baseline] (93.979 ms) : 0, 93979
ProfilingAgent [candidate] (94.084 ms) : 0, 94084
Profiling [baseline] (94.556 ms) : 0, 94556
Profiling [candidate] (94.639 ms) : 0, 94639
LoadParameters
See matching parameters
SummaryFound 1 performance improvements and 2 performance regressions! Performance is the same for 17 metrics, 16 unstable metrics.
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.61.0-SNAPSHOT~c7b581dd08, baseline=1.61.0-SNAPSHOT~7cff99444b
dateFormat X
axisFormat %s
section baseline
no_agent (17.475 ms) : 17295, 17656
. : milestone, 17475,
appsec (18.842 ms) : 18651, 19033
. : milestone, 18842,
code_origins (18.789 ms) : 18605, 18972
. : milestone, 18789,
iast (18.056 ms) : 17873, 18239
. : milestone, 18056,
profiling (18.887 ms) : 18698, 19075
. : milestone, 18887,
tracing (17.743 ms) : 17565, 17922
. : milestone, 17743,
section candidate
no_agent (18.427 ms) : 18238, 18616
. : milestone, 18427,
appsec (18.626 ms) : 18436, 18815
. : milestone, 18626,
code_origins (17.932 ms) : 17755, 18109
. : milestone, 17932,
iast (18.445 ms) : 18260, 18629
. : milestone, 18445,
profiling (18.734 ms) : 18548, 18920
. : milestone, 18734,
tracing (17.799 ms) : 17625, 17974
. : milestone, 17799,
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.61.0-SNAPSHOT~c7b581dd08, baseline=1.61.0-SNAPSHOT~7cff99444b
dateFormat X
axisFormat %s
section baseline
no_agent (1.188 ms) : 1177, 1200
. : milestone, 1188,
iast (3.235 ms) : 3192, 3277
. : milestone, 3235,
iast_FULL (5.79 ms) : 5733, 5848
. : milestone, 5790,
iast_GLOBAL (3.406 ms) : 3356, 3456
. : milestone, 3406,
profiling (2.084 ms) : 2065, 2103
. : milestone, 2084,
tracing (1.767 ms) : 1752, 1781
. : milestone, 1767,
section candidate
no_agent (1.177 ms) : 1166, 1188
. : milestone, 1177,
iast (3.133 ms) : 3091, 3176
. : milestone, 3133,
iast_FULL (5.875 ms) : 5815, 5935
. : milestone, 5875,
iast_GLOBAL (3.684 ms) : 3623, 3745
. : milestone, 3684,
profiling (2.192 ms) : 2171, 2212
. : milestone, 2192,
tracing (1.783 ms) : 1769, 1798
. : milestone, 1783,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics. Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.61.0-SNAPSHOT~c7b581dd08, baseline=1.61.0-SNAPSHOT~7cff99444b
dateFormat X
axisFormat %s
section baseline
no_agent (14.993 s) : 14993000, 14993000
. : milestone, 14993000,
appsec (14.869 s) : 14869000, 14869000
. : milestone, 14869000,
iast (18.112 s) : 18112000, 18112000
. : milestone, 18112000,
iast_GLOBAL (17.842 s) : 17842000, 17842000
. : milestone, 17842000,
profiling (15.028 s) : 15028000, 15028000
. : milestone, 15028000,
tracing (15.105 s) : 15105000, 15105000
. : milestone, 15105000,
section candidate
no_agent (14.981 s) : 14981000, 14981000
. : milestone, 14981000,
appsec (14.914 s) : 14914000, 14914000
. : milestone, 14914000,
iast (18.014 s) : 18014000, 18014000
. : milestone, 18014000,
iast_GLOBAL (17.662 s) : 17662000, 17662000
. : milestone, 17662000,
profiling (14.551 s) : 14551000, 14551000
. : milestone, 14551000,
tracing (15.049 s) : 15049000, 15049000
. : milestone, 15049000,
Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.61.0-SNAPSHOT~c7b581dd08, baseline=1.61.0-SNAPSHOT~7cff99444b
dateFormat X
axisFormat %s
section baseline
no_agent (1.479 ms) : 1468, 1491
. : milestone, 1479,
appsec (3.801 ms) : 3580, 4022
. : milestone, 3801,
iast (2.257 ms) : 2188, 2327
. : milestone, 2257,
iast_GLOBAL (2.307 ms) : 2237, 2377
. : milestone, 2307,
profiling (2.1 ms) : 2044, 2155
. : milestone, 2100,
tracing (2.078 ms) : 2023, 2132
. : milestone, 2078,
section candidate
no_agent (1.477 ms) : 1465, 1489
. : milestone, 1477,
appsec (3.794 ms) : 3573, 4015
. : milestone, 3794,
iast (2.265 ms) : 2195, 2334
. : milestone, 2265,
iast_GLOBAL (2.304 ms) : 2234, 2373
. : milestone, 2304,
profiling (2.1 ms) : 2044, 2157
. : milestone, 2100,
tracing (2.069 ms) : 2015, 2123
. : milestone, 2069,
|
21ec5e9 to
9f3f544
Compare
# Conflicts: # dd-trace-core/src/main/java/datadog/trace/common/metrics/MetricKey.java
When client-computed stats (CCS) are enabled, the agent **merges** stats it computes itself from raw spans with stats pre-computed by the tracer. For gRPC spans, without Client Computed Stats (metrics) the agent resolves the status code from the span's tags via [`getGRPCStatusCode()`](https://github.com/DataDog/datadog-agent/blob/47938ea8c9b9894dcb03dc3f81cf2c6e408f1b6c/pkg/trace/stats/aggregation.go#L167-L221), which always returns a numeric string (e.g. `4`) or an empty string. With CCS enabled, the code uses [`GRPCStatusCode`](https://github.com/DataDog/datadog-agent/blob/47938ea8c9b9894dcb03dc3f81cf2c6e408f1b6c/pkg/trace/stats/aggregation.go#L160) without translation. This change mimics the aggregation of the agent, and what is expected from the agent, in [`NewAggregationFromGroup`](https://github.com/DataDog/datadog-agent/blob/47938ea8c9b9894dcb03dc3f81cf2c6e408f1b6c/pkg/trace/stats/aggregation.go#L146-L165). Protocol wise [ClientGroupedStats.GRPC_status_code](https://github.com/DataDog/datadog-agent/blob/47938ea8c9b9894dcb03dc3f81cf2c6e408f1b6c/pkg/proto/datadog/trace/stats.proto#L103) is a `string`.
9f3f544 to
a3832a0
Compare
....84/src/main/java/datadog/trace/instrumentation/armeria/grpc/client/GrpcClientDecorator.java
Outdated
Show resolved
Hide resolved
....84/src/main/java/datadog/trace/instrumentation/armeria/grpc/server/GrpcServerDecorator.java
Outdated
Show resolved
Hide resolved
amarziali
left a comment
There was a problem hiding this comment.
Thanks for having fixed that. it looks good. I left a minor comment
b0bf34a to
07ec949
Compare
|
/merge |
|
View all feedbacks in Devflow UI.
This pull request is not mergeable according to GitHub. Common reasons include pending required checks, missing approvals, or merge conflicts — but it could also be blocked by other repository rules or settings.
devflow unqueued this merge request: It did not become mergeable within the expected time |
|
/merge |
|
View all feedbacks in Devflow UI.
The expected merge time in
PR can't be merged according to github policy |
…nt-computed-stats
|
/merge |
|
View all feedbacks in Devflow UI.
This pull request is not mergeable according to GitHub. Common reasons include pending required checks, missing approvals, or merge conflicts — but it could also be blocked by other repository rules or settings.
The expected merge time in
|
…nt-computed-stats
What Does This Do
Reports the gRPC status code via Client Computed Stats.
This status is supported since 7.65.0 in the agent (DataDog/datadog-agent#34220), which is the minimal version needed to support CCS as well.
Current grpc instrumentations capture the status code, but not its numeric value, so it was chosen to add a new span tag that will be used in the client aggregation.
span.setTag("status.code", status.getCode().name()); span.setTag("grpc.status.code", status.getCode().name()); + span.setTag("rpc.grpc.status_code", status.getCode().value());This affects grpc and armeria instrumentations.
Note an additional system will be added DataDog/system-tests#6483
Motivation
Completeness of CCS.
Additional notes
When client-computed stats (CCS) are enabled, the agent merges stats it computes itself from raw spans with stats pre-computed by the tracer.
For gRPC spans, without Client Computed Stats (metrics) the agent resolves the status code from the span's tags via
getGRPCStatusCode(), which always returns a numeric string (e.g.4) or an empty string. With CCS enabled, the code usesGRPCStatusCodewithout translation.flowchart TB subgraph tracer["dd-trace-java"] span["gRPC span<br>grpc.status.code = 'DEADLINE_EXCEEDED'<br>rpc.grpc.status_code = 4"] span -->|raw spans| v04["POST /v0.4/traces<br>msgpack"] span --> agg["ConflatingMetricsAggregator<br>reads rpc.grpc.status_code<br>GRPCStatusCode = '4'"] agg -->|pre-computed stats| v06["POST /v0.6/stats<br>msgpack · GRPCStatusCode: '4'"] end subgraph agent["datadog-agent"] v04 --> agentPath["NewAggregationFromSpan<br>getGRPCStatusCode<br>meta[grpc.status.code]='DEADLINE_EXCEEDED' → '4'"] v06 --> ccsPath["NewAggregationFromGroup<br>GRPCStatusCode → '4'"] agentPath --> k1["key{GRPCStatusCode:'4',...}"] ccsPath --> k2["key{GRPCStatusCode:'4',...}"] endThis change mimics the aggregation of the agent, and what is expected from the agent, in
NewAggregationFromGroup.Protocol wise ClientGroupedStats.GRPC_status_code is a
string.