Session tracing and observability for AI coding agents, powered by Phoenix.
Every agent interaction — prompts, tool calls, file edits, shell commands, thinking steps — is captured automatically and sent to Phoenix for search, replay, and analysis.
| Agent | Mechanism | Status |
|---|---|---|
| Cursor | Hook events → JSONL buffer → flush on session end | ✅ Stable |
| Claude Code | Stop hook → JSONL transcript parser → flush | ✅ New |
Both agents produce identical span structures in Phoenix — same trace hierarchy, same attributes, same OpenInference conventions. You can compare sessions across agents in a single Phoenix project.
flowchart LR
subgraph cursor [Cursor IDE]
A[Hook event] -->|stdin JSON| B[trace-hook.sh]
end
subgraph claude [Claude Code]
G[Stop hook] -->|transcript path| H[claude-code-trace-hook.sh]
end
B -->|append| C["/tmp/cursor-traces.jsonl"]
B -->|"on stop/sessionEnd"| D["flush.py → CursorAdapter"]
H -->|read| I["~/.claude/projects/.../session.jsonl"]
H -->|parse| J["flush.py → ClaudeCodeAdapter"]
D --> E[core.py]
J --> E
E -->|Phoenix SDK| F[Phoenix]
F --> K[Traces & Sessions UI]
Hot path (~5 ms): Every Cursor hook event is piped to trace-hook.sh, which appends the raw JSON to a local buffer file.
Flush (on session end): When a session ends, flush.py runs in the background. The Cursor adapter reads the buffer, normalises events, and hands them to the core engine for span construction and Phoenix export.
Stop hook: Claude Code's Stop hook fires after each agent turn. The hook script extracts the transcript_path from the hook context and passes it to flush.py.
Transcript parsing: The Claude Code adapter reads the JSONL transcript, parses user messages, assistant content blocks (text, thinking, tool_use), and tool results into normalised events. Tool use/result pairs are linked automatically.
Both adapters produce NormalizedEvent objects that the core engine processes uniformly: turn assignment, span building with monotonic timestamps, parent-child relationships, and Phoenix export using OpenInference semantic conventions.
Result: Each conversation turn becomes a separate trace in Phoenix. All turns from the same session are grouped together, giving you a full conversational thread view — regardless of which agent generated them.
| Hook event | Span name | Content |
|---|---|---|
sessionStart |
session |
Composer mode, background agent flag |
beforeSubmitPrompt |
(first 120 chars of prompt) | Full prompt text, attachments |
afterAgentThought |
thinking |
Agent's reasoning text |
postToolUse |
tool:<name> |
Tool input and output |
postToolUseFailure |
tool:<name>.error |
Error message, failure type |
afterShellExecution |
shell |
Command and output |
afterMCPExecution |
mcp:<name> |
MCP tool input and result |
afterFileEdit |
edit:<filename> |
File path and edits |
afterAgentResponse |
response |
Agent's final response |
preCompact |
compaction |
Context usage stats |
subagentStop |
subagent:<type> |
Task, summary, tool count |
stop / sessionEnd |
session.end |
Status, reason, duration |
| Content block | Span name | Content |
|---|---|---|
| User message | (first 120 chars of prompt) | Full prompt text |
thinking |
thinking |
Model reasoning text |
tool_use + tool_result |
tool:<name> |
Tool input, output, duration |
text (assistant) |
response |
Final response text |
| Summary | session.end |
Session summary |
- macOS or Linux
- Docker (for local Phoenix — optional if you have an existing instance)
uv is installed automatically if not already present.
git clone https://github.com/mazzucci/coding-agent-insights.git
cd coding-agent-insights
bash install.shThe installer will:
- Install
uvif needed - Auto-detect Cursor (
~/.cursor/) and/or Claude Code (~/.claude/) - Copy hook scripts and configure each detected agent
- Ask how you want to connect to Phoenix:
- Local Docker — spins up Phoenix v13.15.0 (with gRPC on port 4317)
- Existing URL — connects to your Phoenix instance
- Skip — configure later
- Ask for a Phoenix project name (default:
coding-agent-insights)
After install, both agents will trace sessions automatically.
Open Phoenix at http://localhost:6006, start an agent conversation, and watch traces appear in the project.
Settings are in .coding-agent-insights.env in each agent's hooks directory:
PHOENIX_HOST="http://localhost:6006"
PHOENIX_PROJECT="coding-agent-insights"
AGENT_TYPE="cursor" # or "claude_code"
# TRACES_DEBUG="true"
# TRACES_SKIP="field1,field2"
# TRACES_LOG="/tmp/coding-agent-insights.log"| Variable | Default | Purpose |
|---|---|---|
PHOENIX_HOST |
http://localhost:6006 |
Phoenix server URL |
PHOENIX_PROJECT |
coding-agent-insights |
Phoenix project name |
AGENT_TYPE |
cursor |
Agent type (cursor / claude_code) |
TRACES_DEBUG |
(unset) | Set to true for debug logging |
TRACES_SKIP |
(unset) | Comma-separated field names to redact |
TRACES_LOG |
/tmp/coding-agent-insights.log |
Debug log file path |
Legacy CURSOR_TRACES_* env vars are still supported for backward compatibility.
Traces flush automatically. To flush manually:
# Cursor
uv run hooks/flush.py --agent cursor
# Claude Code
uv run hooks/flush.py --agent claude_code --transcript ~/.claude/projects/.../session.jsonlWith debug output:
TRACES_DEBUG=true uv run hooks/flush.py --agent cursorEach user turn (prompt + agent response cycle) becomes a trace. Tool calls, file edits, and thinking steps appear as child spans with proper input/output attribution.
All turns from the same conversation are grouped into a Phoenix session. The Sessions tab shows the conversational thread with first input and last output for each turn.
Both Cursor and Claude Code traces appear in the same Phoenix project. Compare how different agents handle the same tasks, analyse tool usage patterns, and identify which workflows are most effective.
Save exemplary traces to Phoenix datasets for future reference — proven prompt patterns, successful tool chains, or reference workflows. See the coding-agent-insights skill for programmatic examples.
hooks/
├── flush.py # Entrypoint: dispatches to adapter → core → Phoenix
├── core.py # Agent-agnostic engine: NormalizedEvent, turns, spans, posting
├── trace-hook.sh # Cursor: hot-path bash hook (~5ms)
├── claude-code-trace-hook.sh # Claude Code: Stop hook script
└── adapters/
├── __init__.py # Adapter registry
├── cursor.py # Cursor: buffer I/O + event normalisation
└── claude_code.py # Claude Code: JSONL transcript parser
tests/
└── test_flush.py # 35 tests covering core, both adapters, cross-adapter parity
install.sh # Multi-agent installer with auto-detection
uninstall.sh # Cleanup script
docker-compose.yml # Phoenix with HTTP (6006) + gRPC (4317)
Each adapter implements:
read_events()→list[NormalizedEvent]- Agent-specific I/O and event normalisation
The core engine handles everything else: turn assignment, span building with monotonic timestamps, parent-child relationships, session labels, and Phoenix export.
Adding a new agent: Create adapters/your_agent.py with a class that implements read_events() returning NormalizedEvent objects. Register it in adapters/__init__.py. The core engine handles the rest.
python tests/test_flush.pyThe test suite covers:
- Core engine (20 tests): turn assignment, sequencing, timestamps, parent-child relationships, redaction, edge cases
- Cursor adapter (6 tests): event normalisation, all hook types, atomic buffer drain
- Claude Code adapter (8 tests): transcript parsing, tool use/result pairing, thinking blocks, multi-turn, timestamps
- Cross-adapter parity (1 test): both adapters produce consistent span structures
- Fork the repo
- Make your changes
- Run
python tests/test_flush.py(all 35 tests must pass) - Submit a PR