Skip to content

feat: add the tracegrind tool#8

Draft
art049 wants to merge 6 commits intomasterfrom
tracegrind-tool
Draft

feat: add the tracegrind tool#8
art049 wants to merge 6 commits intomasterfrom
tracegrind-tool

Conversation

@art049
Copy link
Member

@art049 art049 commented Feb 5, 2026

No description provided.

@codspeed-hq
Copy link

codspeed-hq bot commented Feb 5, 2026

CodSpeed Performance Report

Merging this PR will create unknown performance changes

Comparing tracegrind-tool (38e24d9) with master (022ccc3)

Summary

🆕 64 new benchmarks
⏩ 56 skipped benchmarks1

Performance Changes

Benchmark BASE HEAD Efficiency
🆕 test_valgrind[valgrind.codspeed, stress-ng --cpu 4 --cpu-ops 10, callgrind/no-inline] N/A 1.9 s N/A
🆕 test_valgrind[valgrind.codspeed, stress-ng --cpu 4 --cpu-ops 10, tracegrind/cache-sim+systime] N/A 59.9 s N/A
🆕 test_valgrind[valgrind.codspeed, stress-ng --cpu 4 --cpu-ops 10, tracegrind/cache-sim] N/A 60.1 s N/A
🆕 test_valgrind[valgrind.codspeed, stress-ng --cpu 4 --cpu-ops 10, callgrind/full-no-inline] N/A 3.1 s N/A
🆕 test_valgrind[valgrind.codspeed, stress-ng --cpu 4 --cpu-ops 10, tracegrind/default] N/A 57.8 s N/A
🆕 test_valgrind[valgrind.codspeed, stress-ng --cpu 4 --cpu-ops 10, callgrind/full-with-inline] N/A 3.3 s N/A
🆕 test_valgrind[valgrind.codspeed, stress-ng --cpu 4 --cpu-ops 10, callgrind/inline] N/A 2.1 s N/A
🆕 test_valgrind[valgrind.codspeed, python3 testdata/test.py, callgrind/no-inline] N/A 3.5 s N/A
🆕 test_valgrind[valgrind.codspeed, python3 testdata/test.py, tracegrind/default] N/A 11.7 s N/A
🆕 test_valgrind[valgrind.codspeed, python3 testdata/test.py, callgrind/full-no-inline] N/A 5.4 s N/A
🆕 test_valgrind[valgrind.codspeed, python3 testdata/test.py, callgrind/full-with-inline] N/A 5.5 s N/A
🆕 test_valgrind[valgrind.codspeed, python3 testdata/test.py, tracegrind/cache-sim] N/A 15.1 s N/A
🆕 test_valgrind[valgrind.codspeed, python3 testdata/test.py, tracegrind/cache-sim+systime] N/A 15.5 s N/A
🆕 test_valgrind[valgrind.codspeed, python3 testdata/test.py, callgrind/inline] N/A 3.7 s N/A
🆕 test_valgrind[valgrind.codspeed, testdata/take_strings-aarch64 varbinview_non_null, callgrind/no-inline] N/A 2.5 s N/A
🆕 test_valgrind[valgrind.codspeed, testdata/take_strings-aarch64 varbinview_non_null, tracegrind/cache-sim] N/A 9.1 s N/A
🆕 test_valgrind[valgrind.codspeed, testdata/take_strings-aarch64 varbinview_non_null, tracegrind/default] N/A 7.7 s N/A
🆕 test_valgrind[valgrind.codspeed, testdata/take_strings-aarch64 varbinview_non_null, callgrind/full-with-inline] N/A 7.2 s N/A
🆕 test_valgrind[valgrind.codspeed, testdata/take_strings-aarch64 varbinview_non_null, tracegrind/cache-sim+systime] N/A 9.5 s N/A
🆕 test_valgrind[valgrind.codspeed, testdata/take_strings-aarch64 varbinview_non_null, callgrind/full-no-inline] N/A 3 s N/A
... ... ... ... ...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

Footnotes

  1. 56 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

art049 and others added 4 commits February 6, 2026 00:20
Pure copy of callgrind/ to tracegrind/ with symbol prefix rename
CLG_ → TG_ (expanding to vgTracegrind_), header guards updated,
public header renamed to tracegrind.h with TRACEGRIND_* macros.
No behavioral changes — output is still identical to callgrind.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace callgrind's accumulated callgraph output with streaming CSV
trace data emitted at function ENTER/EXIT boundaries. Each row contains
delta counters since the last sample, enabling per-call cost attribution.

Key changes:
- dump.c: Replace callgraph output with CSV trace (trace_open/emit/close)
- callstack.c: Hook push/pop_call_stack to emit ENTER/EXIT samples
- threads.c: Add per-thread last_sample_cost for delta tracking
- global.h: Add trace_output struct and per-thread sample state
- main.c: Open trace at init, close at fini, update copyright

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add --output-format=csv|msgpack option. MsgPack format uses LZ4 block
compression achieving ~12x compression vs CSV.

New files:
- tg_msgpack.c/h: MsgPack encoder (write-only)
- tg_lz4.c/h: LZ4 compression wrapper with VG_() adaptations
- lz4.c/h: Vendored LZ4 library (BSD-2-Clause)
- docs/tracegrind-msgpack-format.md: Format specification

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update msgpack format to version 2 with event_schemas
- Each event type (ENTER, EXIT, FORK) has its own column schema
- FORK events use minimal 4-element format: [seq, tid, event, child_pid]
- Remove CSV output format entirely (msgpack-only now)
- Add decode-trace.py script for debugging trace files
- Add fork detection via post-syscall handler for fork/clone/vfork

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
art049 and others added 2 commits February 6, 2026 00:23
Add tracegrind configurations to the benchmark suite:
- tracegrind/default: basic tracing
- tracegrind/cache-sim: with cache simulation
- tracegrind/cache-sim+systime: with cache sim and syscall timing

This allows direct performance comparison between callgrind and tracegrind.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Detect available tools at startup and only run benchmarks for tools
that are present. This fixes CI failures when running against upstream
valgrind which doesn't have tracegrind.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant