Reliability hardening: self-test harness, integration tests, SLA budgets, diagnostics#17
Merged
Merged
Conversation
…ontracts Agent-Logs-Url: https://github.com/DaScient-Intelligence/Plan-Examiner/sessions/594db8e5-858f-4cf8-ba13-0eb0a71df477 Co-authored-by: DaScient <25983786+DaScient@users.noreply.github.com>
…atus, health workflow Agent-Logs-Url: https://github.com/DaScient-Intelligence/Plan-Examiner/sessions/594db8e5-858f-4cf8-ba13-0eb0a71df477 Co-authored-by: DaScient <25983786+DaScient@users.noreply.github.com>
Copilot created this pull request from a session on behalf of
DaScient
May 6, 2026 02:51
View session
There was a problem hiding this comment.
Pull request overview
This PR adds an end-to-end self-test/diagnostics harness and supporting CI/ops tooling to prove the scanner pipeline (extract → select packs → evaluate → score) is functioning deterministically on demand (in-app, in CI, and via scheduled health checks).
Changes:
- Adds
PE.SelfTestplus bundled DXF fixtures + golden expectations, and integrates it into the UI (“Run Diagnostics”) and Node (scripts/status.mjs). - Introduces new integration/perf/crash-resilience tests and CI gates (rules schema validation + rules inventory freshness).
- Hardens pipeline observability with SLA budgets, structured pipeline contracts, result stamping/versioning, and diagnostic bundle export helpers.
Reviewed changes
Copilot reviewed 21 out of 22 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/rules.schema.test.js | Adds a Node test that gates scripts/validate-rules.mjs via npm test. |
| tests/pipeline.perf.test.js | Enforces SLA-style performance ceilings using self-test fixtures + targeted checks. |
| tests/integration.pipeline.test.js | True end-to-end integration test wiring real extractors → rule engine → score against fixtures. |
| tests/extractors.crash.test.js | Adds crash-resilience matrix for extractors and rule engine output domain. |
| sw.js | Bumps service worker cache version and expands pre-cache list to include self-test + more packs. |
| scripts/status.mjs | Adds a headless health check that runs PE.SelfTest and writes status.json. |
| scripts/rules-report.mjs | Generates/validates docs/RULES.md rule-pack inventory (CI freshness check). |
| package.json | Adds new scripts: validate rules, generate/check rules report, and status. |
| index.html | Adds “Run Diagnostics” UI and loads self-test.js. |
| docs/RULES.md | Adds auto-generated rule pack inventory documentation. |
| assets/js/utils/log.js | Adds entriesForRun() and exportRun() for run-scoped diagnostic bundles. |
| assets/js/app.js | Wires the in-app “Run Diagnostics” button to PE.SelfTest.run(). |
| assets/js/agent/self-test.js | Implements the self-test harness runnable in browser + Node. |
| assets/js/agent/rule-engine.js | Adds per-evaluate context memoization to reduce redundant context derivation. |
| assets/js/agent/pipeline.js | Adds SLA budgets, pipeline contract assertions, result stamping/versioning, and rules fingerprinting. |
| assets/data/fixtures/selftest/clean-office.dxf | Adds compliant self-test DXF fixture. |
| assets/data/fixtures/selftest/non-compliant-assembly.dxf | Adds non-compliant DXF fixture to prove FLAGGED findings occur. |
| assets/data/fixtures/selftest/sparse-warehouse.dxf | Adds sparse DXF fixture to prove missing evidence yields REVIEW outcomes. |
| assets/data/fixtures/selftest/expected.json | Adds golden expectations for fixtures (bands, must-flag, must-not-flag, etc.). |
| .gitignore | Ignores generated status.json. |
| .github/workflows/health.yml | Adds scheduled workflow running scripts/status.mjs and uploading status.json. |
| .github/workflows/ci.yml | Adds CI step to ensure docs/RULES.md is up-to-date (--check). |
Comments suppressed due to low confidence (1)
sw.js:12
- STATIC_ASSETS uses absolute (leading '/') URLs. On GitHub Pages project sites (served under '//'), these requests resolve to the origin root (e.g. '/assets/...') and will 404, causing pre-cache to fail and offline mode to be ineffective. Consider generating URLs relative to the service worker scope (e.g. omit the leading '/', or build with new URL('assets/...', self.registration.scope)).
var STATIC_ASSETS = [
'/',
'/index.html',
'/assets/css/styles.css',
'/assets/js/app.js',
'/assets/js/agent/rule-engine.js',
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+266
to
+267
| * No raw file contents, no API keys (already redacted by _redactString | ||
| * applied at record time). Safe to share. |
Comment on lines
424
to
430
| sT0 = _now(); | ||
| emit('select', 'running', 'Loading rule packs for ' + (formData.buildingCode || 'IBC 2021') + '…'); | ||
| _stepLog('select', 'running', 'loading packs', { buildingCode: formData.buildingCode }); | ||
| var packs; | ||
| try { | ||
| packs = await selectPacks(formData.buildingCode || '2024 IBC', facts.buildingType); | ||
| result.packs = packs; |
Comment on lines
+267
to
+272
| var result = { | ||
| facts: {}, packs: [], findings: [], score: 0, summary: '', correctionLetter: '', | ||
| runId: _newRunId(), | ||
| engineVersion: ENGINE_VERSION, | ||
| rulesVersion: null, // filled in once packs are selected | ||
| startedAt: new Date().toISOString() |
| lines.push('<!-- Auto-generated by scripts/rules-report.mjs. Do not edit by hand. -->'); | ||
| lines.push('# Plan-Examiner Rule-Pack Inventory'); | ||
| lines.push(''); | ||
| lines.push(`Generated: ${new Date().toISOString().slice(0, 10)} `); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes the gap between "unit tests pass" and "I can prove the scanner is functioning on demand." Adds an end-to-end self-test runnable from the UI and CI, pipeline contracts/SLAs, result versioning, and operational tooling.
assets/js/agent/self-test.js(PE.SelfTest.run()) + 3 bundled DXF fixtures (clean / non-compliant / sparse) and a goldenexpected.json(fact keys, must-flag/must-not-flag rule ids, count bands, score band).tests/integration.pipeline.test.jsdrives realPE.Extractors→selectPacks→RuleEngine→score, no mocks. Asserts the non-compliant fixture actually produces FLAGGED findings (door-width / egress)._assertExtractionShape/_assertFindingsShapethrow structuredPIPELINE_CONTRACTerrors instead of corrupting downstream state.PE.Pipeline.SLAper-step (ingest 15s / extract 8s / evaluate 2s / total 30s) withWARNon breach; enforced bytests/pipeline.perf.test.js.resultis stamped withrunId,engineVersion,rulesVersion(FNV-1a fingerprint of packid@version:ruleCounttuples),startedAt/completedAt/totalMs._buildContextcached perevaluate()call (reset at call boundary) so derived occupancy/sprinklered/construction context isn't recomputed per rule.tests/extractors.crash.test.jscovers empty/garbage/non-DXF input, unknowncheck_fn, null parameters; rule engine is asserted to never emit a status outside{PASS, REVIEW, FLAGGED}.PE.Log.exportRun(runId)+entriesForRun()produce a schema-versioned, redacted JSON bundle for support tickets.npm test—tests/rules.schema.test.jsinvokesscripts/validate-rules.mjs.scripts/rules-report.mjsregeneratesdocs/RULES.md(23 packs / 153 rules); CI fails if stale (--check).scripts/status.mjsruns the same selftest in Node and writesstatus.json;.github/workflows/health.ymlruns it daily.plan-examiner-v5; pre-cache extended from 3 packs to all 22 active packs + selftest fixtures +self-test.js.validate:rules,rules:report,rules:report:check,status.Out of scope (deferred per the original plan's "ship last"): PDF.js / Tesseract worker-ization. Per-pack golden
.test.jsfiles are subsumed by the cross-pack selftest fixtures, which exercise all 22 packs / 153 rules per run.Type of Change
Rule Pack Changes (if applicable)
docs/RULES.mdreflects current inventory)Testing
python3 -c "import json; json.load(open('assets/data/rules/your-pack.json'))")npm test: 96/96 passing (was 79; +17 new).node scripts/validate-rules.mjs: 0 errors.node scripts/status.mjs: 3/3 fixtures pass.Checklist
Screenshots (if applicable)
The new "Run Diagnostics" control lives in the AI Settings modal below "Test connection"; output is rendered in a monospace results panel that turns green on full pass / red on any failure.
Related Issues