Joey Esposito joesposito8

Joey Esposito

AI Engineer @ LinkedIn · Independent AI Safety Evaluation Engineer · Inspect Contributor
San Francisco, USA
LinkedIn · Email

Currently preparing to transition to roles as an AI Safety Evals Engineer by contributing to UK AISI's Inspect eval framework and constructing/publishing results from independent evaluations.

Projects

Inspect Tools · tool-channel starter kit for Inspect evals

In current LLMs, the tool-definition channel is both an input that can exhaust context and an attack vector. This packages provides schemas and solvers for researchers to quickly iterate on evaluations related to this surface.

context_exhaustion solver: quantifies score degradation as the model's tools parameter is saturated with realistic MCP schemas at controlled context depths.
Corpus: a curated set of 1,239 real-world MCP tool schemas across 173 vendors.
Roadmap: injection attacks (inject_description, inject_shadow).

Selected contributions

inspect_ai

#3709 — vLLM chat-template controls for base-model evals
#3969 — pass_k epoch reducer (τ-bench pass^k consistency metric)
#4035 — Krippendorff's α metric for multi-judge agreement (open)
#4269 — capture resolved sandbox runtime fingerprint in the eval log (open)

inspect_evals

#1429 — fix CodeIPI exfiltration scorer to check tool-result messages
#1501 — cyberseceval_4: tolerate fenced / prose-wrapped judge JSON
#1503 — fix mean_of on_missing="skip" to also skip None-valued samples

inspect_scout

#455 — resolve ModelEvent input refs from the events_data pool schema

Contact

Hit me up! Open to collaboration on evals, tooling, or if you just want to contact me :)

Email · LinkedIn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Joey Esposito joesposito8

Achievements

Achievements

Block or report joesposito8

Joey Esposito

Projects

Inspect Tools · tool-channel starter kit for Inspect evals

Selected contributions

Contact

Pinned Loading

Uh oh!