Skip to content

Reliability hardening: self-test harness, integration tests, SLA budgets, diagnostics#17

Merged
DaScient merged 2 commits into
mainfrom
copilot/ensure-document-scanning-functionality
May 6, 2026
Merged

Reliability hardening: self-test harness, integration tests, SLA budgets, diagnostics#17
DaScient merged 2 commits into
mainfrom
copilot/ensure-document-scanning-functionality

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 6, 2026

Summary

Closes the gap between "unit tests pass" and "I can prove the scanner is functioning on demand." Adds an end-to-end self-test runnable from the UI and CI, pipeline contracts/SLAs, result versioning, and operational tooling.

  • Self-test harnessassets/js/agent/self-test.js (PE.SelfTest.run()) + 3 bundled DXF fixtures (clean / non-compliant / sparse) and a golden expected.json (fact keys, must-flag/must-not-flag rule ids, count bands, score band).
  • Integration testtests/integration.pipeline.test.js drives real PE.ExtractorsselectPacksRuleEnginescore, no mocks. Asserts the non-compliant fixture actually produces FLAGGED findings (door-width / egress).
  • Pipeline contracts_assertExtractionShape / _assertFindingsShape throw structured PIPELINE_CONTRACT errors instead of corrupting downstream state.
  • SLA budgetsPE.Pipeline.SLA per-step (ingest 15s / extract 8s / evaluate 2s / total 30s) with WARN on breach; enforced by tests/pipeline.perf.test.js.
  • Result versioning — every result is stamped with runId, engineVersion, rulesVersion (FNV-1a fingerprint of pack id@version:ruleCount tuples), startedAt/completedAt/totalMs.
  • Rule-engine memoization_buildContext cached per evaluate() call (reset at call boundary) so derived occupancy/sprinklered/construction context isn't recomputed per rule.
  • Crash resiliencetests/extractors.crash.test.js covers empty/garbage/non-DXF input, unknown check_fn, null parameters; rule engine is asserted to never emit a status outside {PASS, REVIEW, FLAGGED}.
  • Diagnostic bundlePE.Log.exportRun(runId) + entriesForRun() produce a schema-versioned, redacted JSON bundle for support tickets.
  • In-app Diagnostics button — added to the AI Settings modal; shows per-fixture pass/fail, score, and counts. Same code path as headless CI.
  • Schema gate in npm testtests/rules.schema.test.js invokes scripts/validate-rules.mjs.
  • Rules inventoryscripts/rules-report.mjs regenerates docs/RULES.md (23 packs / 153 rules); CI fails if stale (--check).
  • Health checkscripts/status.mjs runs the same selftest in Node and writes status.json; .github/workflows/health.yml runs it daily.
  • Service worker — bumped to plan-examiner-v5; pre-cache extended from 3 packs to all 22 active packs + selftest fixtures + self-test.js.
  • npm scriptsvalidate:rules, rules:report, rules:report:check, status.
// Anyone can verify the scanner end-to-end, no dev tools required:
const { summary, results } = await PE.SelfTest.run();
// summary: { ok, total, passed, failed, durationMs, manifestVersion, ... }
// results[i]: { id, ok, score, counts:{PASS,REVIEW,FLAGGED}, flagged:[...], assertions:[...] }

Out of scope (deferred per the original plan's "ship last"): PDF.js / Tesseract worker-ization. Per-pack golden .test.js files are subsumed by the cross-pack selftest fixtures, which exercise all 22 packs / 153 rules per run.

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • New rule pack (new jurisdiction or code edition)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Accessibility improvement

Rule Pack Changes (if applicable)

  • Rule pack file: n/a — no rule packs modified
  • Jurisdiction / Code edition: n/a
  • Number of rules added/modified: 0 (auto-generated docs/RULES.md reflects current inventory)

Testing

  • Tested manually in Chrome/Firefox/Safari
  • Tested with a sample PDF plan
  • Tested with a sample DXF plan
  • Tested with a sample DOCX plan
  • Rule pack JSON validates (run python3 -c "import json; json.load(open('assets/data/rules/your-pack.json'))")

npm test: 96/96 passing (was 79; +17 new). node scripts/validate-rules.mjs: 0 errors. node scripts/status.mjs: 3/3 fixtures pass.

Checklist

  • My code follows the existing code style
  • I have performed a self-review of my changes
  • No new API keys, secrets, or credentials are committed
  • Relevant documentation has been updated

Screenshots (if applicable)

The new "Run Diagnostics" control lives in the AI Settings modal below "Test connection"; output is rendered in a monospace results panel that turns green on full pass / red on any failure.

Related Issues

@DaScient DaScient marked this pull request as ready for review May 6, 2026 02:52
Copilot AI review requested due to automatic review settings May 6, 2026 02:52
@DaScient DaScient merged commit 4cc4ea6 into main May 6, 2026
9 checks passed
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an end-to-end self-test/diagnostics harness and supporting CI/ops tooling to prove the scanner pipeline (extract → select packs → evaluate → score) is functioning deterministically on demand (in-app, in CI, and via scheduled health checks).

Changes:

  • Adds PE.SelfTest plus bundled DXF fixtures + golden expectations, and integrates it into the UI (“Run Diagnostics”) and Node (scripts/status.mjs).
  • Introduces new integration/perf/crash-resilience tests and CI gates (rules schema validation + rules inventory freshness).
  • Hardens pipeline observability with SLA budgets, structured pipeline contracts, result stamping/versioning, and diagnostic bundle export helpers.

Reviewed changes

Copilot reviewed 21 out of 22 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/rules.schema.test.js Adds a Node test that gates scripts/validate-rules.mjs via npm test.
tests/pipeline.perf.test.js Enforces SLA-style performance ceilings using self-test fixtures + targeted checks.
tests/integration.pipeline.test.js True end-to-end integration test wiring real extractors → rule engine → score against fixtures.
tests/extractors.crash.test.js Adds crash-resilience matrix for extractors and rule engine output domain.
sw.js Bumps service worker cache version and expands pre-cache list to include self-test + more packs.
scripts/status.mjs Adds a headless health check that runs PE.SelfTest and writes status.json.
scripts/rules-report.mjs Generates/validates docs/RULES.md rule-pack inventory (CI freshness check).
package.json Adds new scripts: validate rules, generate/check rules report, and status.
index.html Adds “Run Diagnostics” UI and loads self-test.js.
docs/RULES.md Adds auto-generated rule pack inventory documentation.
assets/js/utils/log.js Adds entriesForRun() and exportRun() for run-scoped diagnostic bundles.
assets/js/app.js Wires the in-app “Run Diagnostics” button to PE.SelfTest.run().
assets/js/agent/self-test.js Implements the self-test harness runnable in browser + Node.
assets/js/agent/rule-engine.js Adds per-evaluate context memoization to reduce redundant context derivation.
assets/js/agent/pipeline.js Adds SLA budgets, pipeline contract assertions, result stamping/versioning, and rules fingerprinting.
assets/data/fixtures/selftest/clean-office.dxf Adds compliant self-test DXF fixture.
assets/data/fixtures/selftest/non-compliant-assembly.dxf Adds non-compliant DXF fixture to prove FLAGGED findings occur.
assets/data/fixtures/selftest/sparse-warehouse.dxf Adds sparse DXF fixture to prove missing evidence yields REVIEW outcomes.
assets/data/fixtures/selftest/expected.json Adds golden expectations for fixtures (bands, must-flag, must-not-flag, etc.).
.gitignore Ignores generated status.json.
.github/workflows/health.yml Adds scheduled workflow running scripts/status.mjs and uploading status.json.
.github/workflows/ci.yml Adds CI step to ensure docs/RULES.md is up-to-date (--check).
Comments suppressed due to low confidence (1)

sw.js:12

  • STATIC_ASSETS uses absolute (leading '/') URLs. On GitHub Pages project sites (served under '//'), these requests resolve to the origin root (e.g. '/assets/...') and will 404, causing pre-cache to fail and offline mode to be ineffective. Consider generating URLs relative to the service worker scope (e.g. omit the leading '/', or build with new URL('assets/...', self.registration.scope)).
var STATIC_ASSETS = [
  '/',
  '/index.html',
  '/assets/css/styles.css',
  '/assets/js/app.js',
  '/assets/js/agent/rule-engine.js',

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread assets/js/utils/log.js
Comment on lines +266 to +267
* No raw file contents, no API keys (already redacted by _redactString
* applied at record time). Safe to share.
Comment on lines 424 to 430
sT0 = _now();
emit('select', 'running', 'Loading rule packs for ' + (formData.buildingCode || 'IBC 2021') + '…');
_stepLog('select', 'running', 'loading packs', { buildingCode: formData.buildingCode });
var packs;
try {
packs = await selectPacks(formData.buildingCode || '2024 IBC', facts.buildingType);
result.packs = packs;
Comment on lines +267 to +272
var result = {
facts: {}, packs: [], findings: [], score: 0, summary: '', correctionLetter: '',
runId: _newRunId(),
engineVersion: ENGINE_VERSION,
rulesVersion: null, // filled in once packs are selected
startedAt: new Date().toISOString()
Comment thread scripts/rules-report.mjs
lines.push('<!-- Auto-generated by scripts/rules-report.mjs. Do not edit by hand. -->');
lines.push('# Plan-Examiner Rule-Pack Inventory');
lines.push('');
lines.push(`Generated: ${new Date().toISOString().slice(0, 10)} `);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants