Skip to content

Recursive-Safeguarding/agent-trace-interp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

agent-trace-interp

Early prototype for the Martian Interpretability Prize proposal: Synthesizing Probabilistic Programs for AI Agent Interpretability.

What this contains

  • src/trace_schema.py -- Pydantic models for structured agent traces (tool calls, file references, memory writes)
  • src/collect_traces.py -- ARES-based trace collector for coding agents on SWE-Bench Verified
  • src/analyze_traces.py -- basic trace statistics (step counts, file frequencies, tool distributions)
  • dsl/grammar.py -- probabilistic DSL grammar with temporal primitives (belief evolution, commitment detection, attention decay)
  • examples/example_trace.json -- hand-crafted trace matching the proposal's illustrative walkthrough

Install

uv pip install -e .

For ARES trace collection (requires Docker):

uv pip install -e ".[ares]"

Usage

Analyze the example trace:

python -m src.analyze_traces examples/

Collect traces from ARES (requires Docker + API key):

export CHAT_COMPLETION_API_KEY=your-key
python -m src.collect_traces --instances 0 1 2 --model openai/glm-4.7

Status

This is a runnable skeleton demonstrating the trace collection and DSL design components of the proposal. The GFlowNet synthesis pipeline and counterfactual validation framework are the research contributions to be developed during the grant period.

Links

License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages