Skip to content

deeptireddy-lab/agentspan-eval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

agentspan-eval

Examples of evaluating AgentSpan-hosted agents. Each subfolder is one self-contained eval pattern — clone, cd in, follow the folder's README.

Examples

Folder What it shows
correctnessEval/ Built-in CorrectnessEval — batched, dataset-style runs. Tool-usage + output-contents + LLM-as-judge semantic check for adversarial cases
assertionsEval/ Fine-grained, imperative assertions on a single run. Tool-used / order / output regex / max-turns / no-errors
expectEval/ Same single-run checks as assertionsEval, but using the fluent expect(result).used_tool(...).output_contains(...)... chain
multiagentEval/ Eval a 3-agent pipeline (researcher >> writer >> editor) over a topic dataset. Catches end-to-end coherence issues like "the editor strips key facts the researcher mentioned"
semanticEval/ LLM-as-judge checks on a single run — scores the output against natural-language criteria like "friendly tone", "covers both topics", "concise"
mockEval/ mock_run with a scripted event sequence — no LLM, no server, runs in <1s. Ideal for unit-test-style assertion coverage and regression checks

More to come (leaderboardEval/, humanFeedbackEval/).

Prerequisites

  • AgentSpan server running locally on :6767 (or wherever CONDUCTOR_SERVER_URL points)
  • agentspan Python SDK installed
  • An LLM API key set per example (OPENAI_API_KEY, etc.)

About

Examples of evaluating AgentSpan-hosted agents

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages