Skip to content
View joesposito8's full-sized avatar

Block or report joesposito8

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
joesposito8/README.md

Joey Esposito

AI Engineer @ LinkedIn · Independent AI Safety Evaluation Engineer · Inspect Contributor
San Francisco, USA
LinkedIn · Email


Currently preparing to transition to roles as an AI Safety Evals Engineer by contributing to UK AISI's Inspect eval framework and constructing/publishing results from independent evaluations.

Projects

Inspect Tools  ·  tool-channel starter kit for Inspect evals

In current LLMs, the tool-definition channel is both an input that can exhaust context and an attack vector. This packages provides schemas and solvers for researchers to quickly iterate on evaluations related to this surface.

  • context_exhaustion solver: quantifies score degradation as the model's tools parameter is saturated with realistic MCP schemas at controlled context depths.
  • Corpus: a curated set of 1,239 real-world MCP tool schemas across 173 vendors.
  • Roadmap: injection attacks (inject_description, inject_shadow).

Selected contributions

inspect_ai

  • #3709 — vLLM chat-template controls for base-model evals
  • #3969pass_k epoch reducer (τ-bench pass^k consistency metric)
  • #4035 — Krippendorff's α metric for multi-judge agreement (open)
  • #4269 — capture resolved sandbox runtime fingerprint in the eval log (open)

inspect_evals

  • #1429 — fix CodeIPI exfiltration scorer to check tool-result messages
  • #1501cyberseceval_4: tolerate fenced / prose-wrapped judge JSON
  • #1503 — fix mean_of on_missing="skip" to also skip None-valued samples

inspect_scout

  • #455 — resolve ModelEvent input refs from the events_data pool schema

Contact

Hit me up! Open to collaboration on evals, tooling, or if you just want to contact me :)

Email · LinkedIn

Pinned Loading

  1. inspect-tools inspect-tools Public

    MCP tool schemas and Solver wrappers for Inspect — context-exhaustion safety eval measurement

    Python