Skip to content

eval_accuracy scorer gives passing scores to planted accuracy failures (overgeneralized and contradicted claims) #12

@zaridan

Description

@zaridan

Summary

The `eval_accuracy` dimension scorer gives passing scores (≥ 70) to conversations planted with two specific accuracy failure types:

  • Overgeneralized claims (`AccuracyLabel.precision = "overgeneralized"`): the agent states something that goes beyond what the KB chunk supports — e.g. presenting a scoped rule as universally true.
  • Contradicted claims (`AccuracyLabel.status = "contradicted"`): the agent states something the KB explicitly contradicts.

Both failure types should produce a failing accuracy score. Instead, the scorer appears to evaluate whether the claim is plausible, not whether it is faithful to the specific KB chunk, and frequently awards passing scores.

Expected behavior

Conversations planted with `AccuracyLabel.precision = "overgeneralized"` or `AccuracyLabel.status = "contradicted"` should receive a failing accuracy score (< 70).

Impact

Inflated false-positive count in accuracy evaluations. Any eval run measuring detection rates for these failure types will show lower accuracy than the true figure.

Notes

Observed with Claude Haiku 4.5. The `exact` precision mode is not affected — failures planted via `AccuracyLabel.precision = "exact"` are correctly detected. Users relying on overgeneralized or contradicted planting for accuracy evaluation should treat their FP counts as unreliable until this is resolved.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingknown-limitationA known behavior boundary or limitation with no immediate fix planned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions