Skip to content

Latest commit

 

History

History
81 lines (66 loc) · 3.21 KB

File metadata and controls

81 lines (66 loc) · 3.21 KB

Evaluation DSL

The evaluation engine consumes a small JSON-based DSL to describe the expected state of the database after an agent run. Each expectation is expressed as an "assertion" evaluated against the diff between the before/after snapshots.

Anatomy of a spec

{
  "strict": true,
  "assertions": [
    {
      "diff_type": "added",
      "entity": "messages",
      "where": {
        "channelId": {"eq": 123},
        "body": {"contains": "hello"}
      },
      "expected_count": 1
    },
    {
      "diff_type": "changed",
      "entity": "issues",
      "where": {"id": {"eq": 42}},
      "expected_changes": {
        "status": {"to": {"eq": "Done"}}
      }
    }
  ]
}
  • diff_type – one of added, removed, changed.
  • entity – table name (as it appears in the service schema).
  • where – field predicates composed from the operator set below.
  • expected_count – optional exact or bounded ({"min":1} / {"max":2}) match on results.
  • expected_changes – for changed, lists fields and optional from/to predicates.
  • strict – when true, the engine fails if it observes additional field changes beyond expected_changes.

Operators

Scalar comparisons:

Operator Meaning
eq equality
ne inequality
gt/gte greater than / or equal
lt/lte less than / or equal

Collection / string helpers:

Operator Meaning
in / not_in membership check on sequences
contains substring match (case sensitive)
not_contains substring miss
i_contains substring match (case insensitive)
starts_with prefix check
ends_with suffix check
i_starts_with prefix check (case insensitive)
i_ends_with suffix check (case insensitive)
regex regular expression match
has_any any overlapping element in arrays
has_all all elements present in arrays

Existence handling:

Operator Meaning
exists true = field is not NULL, false = field is NULL

Multiple operators in one predicate object are ANDed. Multiple where fields are ANDed. Dot paths are supported for nested objects (e.g. start.timeZone).

Resources

  • JSON schema: backend/src/platform/evaluationEngine/dsl_schema.json
  • Engine implementation: backend/src/platform/evaluationEngine/assertion.py Sample test scenarios for Slack agents:
  • slack_bench_v2.json - Test cases covering message sending, channel ops, reactions, threading
  • slack_default.json - Seed data