Skip to content

Events dropped due to telemetry buffer overflow under heavy tool usage #449

@anandgupta42

Description

@anandgupta42

Problem

During sessions with high tool call volume, the event buffer overflows and drops events silently. The error message logged is:

N events dropped due to buffer overflow

This means event counts are undercounted for power users, and we may be missing important failure/error events during the heaviest usage periods.

Impact by version

CLI Version Occurrences Total Events Dropped Avg Dropped/Occurrence Max Single Drop
0.5.7 167 36,968 221 3,901
0.5.1 43 7,614 177 706
0.5.3 14 2,691 192 646
0.5.2 9 931 103 290
0.5.4 7 511 73 283
0.5.5 2 9 5 5

Total: ~48,800 events dropped across all versions in the last 7 days.

Observations

  • The problem is getting worse with newer versions0.5.7 accounts for 75% of all drops
  • The worst single overflow dropped 3,901 events in one batch
  • Overflows correlate with sessions that have very high tool_call volume (1,000+ calls in a session), especially sql_execute and todowrite heavy workflows
  • 0.5.5 had the fewest drops (avg 5), suggesting something regressed in 0.5.7

Suggested investigation

  1. Check the telemetry buffer size and flush interval — the current settings can't keep up with heavy sql_execute loops
  2. Consider increasing buffer capacity or flushing more aggressively
  3. Consider back-pressure or sampling instead of silent drops
  4. Investigate why 0.5.7 is significantly worse than 0.5.5

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions