AI Engineer @ LinkedIn · Independent AI Safety Evaluation Engineer · Inspect Contributor
San Francisco, USA
LinkedIn · Email
Currently preparing to transition to roles as an AI Safety Evals Engineer by contributing to UK AISI's Inspect eval framework and constructing/publishing results from independent evaluations.
Inspect Tools · tool-channel starter kit for Inspect evals
In current LLMs, the tool-definition channel is both an input that can exhaust context and an attack vector. This packages provides schemas and solvers for researchers to quickly iterate on evaluations related to this surface.
context_exhaustionsolver: quantifies score degradation as the model'stoolsparameter is saturated with realistic MCP schemas at controlled context depths.- Corpus: a curated set of 1,239 real-world MCP tool schemas across 173 vendors.
- Roadmap: injection attacks (
inject_description,inject_shadow).
inspect_ai
- #3709 — vLLM chat-template controls for base-model evals
- #3969 —
pass_kepoch reducer (τ-bench pass^k consistency metric) - #4035 — Krippendorff's α metric for multi-judge agreement (open)
- #4269 — capture resolved sandbox runtime fingerprint in the eval log (open)
inspect_evals
- #1429 — fix CodeIPI exfiltration scorer to check tool-result messages
- #1501 —
cyberseceval_4: tolerate fenced / prose-wrapped judge JSON - #1503 — fix
mean_ofon_missing="skip"to also skipNone-valued samples
inspect_scout
- #455 — resolve
ModelEventinput refs from theevents_datapool schema
Hit me up! Open to collaboration on evals, tooling, or if you just want to contact me :)



