feat(research): gap validation protocol — automated sweep + decision matrix#3734
feat(research): gap validation protocol — automated sweep + decision matrix#3734ryanklee wants to merge 2 commits into
Conversation
…matrix Restore Phase 1 gap registry files (accidentally deleted in #3580) and build Phase 2 validation tooling: CLI `gap-validate.py` with 4-source automated sweep (OpenAlex patents, GitHub code search, Semantic Scholar, Papers with Code), 6-signal decision matrix (4-of-6 = high confidence), Phase 2 community probe scaffolding (forum + cold-email templates), and Phase 3 practitioner observation guide (7-question contextual inquiry protocol). Batch-validated GAP-001, GAP-003, GAP-007 — all high_confidence_novel. 16 tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis PR establishes a research gap validation platform with a curated registry of 18 research gaps, automated prior-art detection across four sources, scaffolding generators for outreach campaigns, practitioner observation protocols, and supporting utilities including a gap decay report tool and comprehensive test coverage. ChangesGap Validation and Registry System
Gap Decay Report Utility
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a0c13c2837
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| text=True, | ||
| timeout=SWEEP_TIMEOUT, | ||
| ) | ||
| if proc.returncode == 0 and proc.stdout.strip(): |
There was a problem hiding this comment.
Fail code-search signal when gh query errors
This branch only handles proc.returncode == 0; any non-zero gh api exit (auth missing, rate-limit, HTTP error) is silently treated as “no results,” which then falls through to a novel vote when unique_repos is empty. That can produce false novelty decisions from infrastructure/auth failures rather than real prior-art absence, so non-zero exits should return an inconclusive/error signal instead of being scored.
Useful? React with 👍 / 👎.
| if resp.status_code == 200: | ||
| data = resp.json() |
There was a problem hiding this comment.
Treat non-200 API responses as inconclusive
The sweeper only processes status 200 and otherwise continues without recording an error, so 429/403/5xx responses are interpreted like empty search results and can be scored as novel. This can systematically inflate novelty confidence during transient API failures or throttling; non-200 responses should be surfaced as inconclusive/error signals instead of silently ignored.
Useful? React with 👍 / 👎.
| - gap_id: GAP-002 | ||
| title: Epistemic quality infrastructure | ||
| request_ref: REQ-20260512-epistemic-quality-infrastructure | ||
| disposition: execute |
There was a problem hiding this comment.
Enforce single active execute gap in registry
The registry declares wip_limit: 1 and the schema states exactly one gap may be disposition: execute, but this entry introduces a second execute gap alongside GAP-001. That breaks the documented invariant and makes downstream tooling/reporting ambiguous about which gap is the single active execution target.
Useful? React with 👍 / 👎.
| except httpx.HTTPError: | ||
| pass |
There was a problem hiding this comment.
Stop swallowing patent sweep transport errors
This except block drops httpx transport failures and keeps scoring as if the query simply found no prior art. If one or more patent lookups fails due to timeout/DNS/network issues, the function can still fall through to a novel vote, which misclassifies infrastructure failure as evidence of novelty.
Useful? React with 👍 / 👎.
| item = json.loads(line) | ||
| all_results.append(item) | ||
| source_urls.append( | ||
| f"https://github.com/{item['repo']}/blob/main/{item['path']}" |
There was a problem hiding this comment.
Build GitHub evidence URLs from actual branch
Evidence links are hardcoded to /blob/main/, but many repositories use a different default branch (for example master or trunk). In those cases the stored URLs 404, so reviewers cannot verify the supposed prior-art evidence from sweep output.
Useful? React with 👍 / 👎.
| "filter": "type:patent", | ||
| "per_page": 10, |
There was a problem hiding this comment.
Query OpenAlex with supported work types only
This request filters on type:patent, but OpenAlex’s documented work types do not include patent (they include values like article, preprint, report, standard, etc.). As a result, the patent sweep query is invalid and cannot return intended results, biasing the patents signal toward false novelty/inconclusive outcomes.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (2)
scripts/gap-validate.py (2)
517-608: ⚡ Quick winAvoid duplicating observation-guide content in code and docs.
The full guide is embedded here and also maintained in
docs/research/gap-validation-observation-guide.md, creating drift risk. Load from the canonical doc file or keep one source-of-truth template.Also applies to: 611-613
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/gap-validate.py` around lines 517 - 608, The OBSERVATION_GUIDE_CONTENT constant duplicates the canonical doc; replace the hardcoded multi-line string by loading the contents of the canonical markdown (docs/research/gap-validation-observation-guide.md) at runtime (or during script initialization) and assign that text to OBSERVATION_GUIDE_CONTENT (with a clear fallback that logs an error if the doc is missing). Locate OBSERVATION_GUIDE_CONTENT in scripts/gap-validate.py, implement a small helper (e.g., read_observation_guide or similar) to read the file once, and ensure any tests or downstream consumers still reference the same symbol.
32-33: ⚡ Quick winMake output directory configurable instead of hardcoding a personal path.
VAULT_OUTPUT_DIRpoints to a user-specific Documents path, which is brittle for CI and other operators. Prefer a CLI flag/env var with a sensible repo-local default.Also applies to: 399-400, 505-507
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/gap-validate.py` around lines 32 - 33, Replace the hardcoded personal path assigned to VAULT_OUTPUT_DIR with a configurable option: read from an environment variable (e.g., GAP_VAULT_OUTPUT_DIR) or a CLI flag, falling back to a sensible repo-local default such as REPO_ROOT / "output" or REPO_ROOT / "vault_output"; update all other occurrences that use the same personal path (the other VAULT_OUTPUT_DIR assignments/usages around the later blocks) so they reference the new configurable variable or flag parsing logic instead of the hardcoded Path.home() location, and ensure any code that writes output creates the directory if missing.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/research/gap-portfolio-registry.yaml`:
- Line 4: The WIP invariant fails because wip_limit: 1 conflicts with two
records marked disposition: execute (GAP-001 and GAP-002 — also applies to the
entries at 18-18 and 30-30); fix by either increasing the wip_limit value to at
least 2 (update the wip_limit key) or changing one or more of those records'
disposition from execute to a non-active state (e.g., backlog/defer) so the
number of execute dispositions does not exceed wip_limit; update the wip_limit
or the disposition fields for GAP-001/GAP-002 (and the entries at 18-18 and
30-30) accordingly.
In `@scripts/gap-decay-report.py`:
- Around line 17-18: The load_registry function currently calls path.read_text
and yaml.safe_load without error handling; wrap the body of load_registry (the
function that takes path: Path = REGISTRY) in a try/except that catches file I/O
errors (FileNotFoundError, PermissionError, UnicodeDecodeError) and
yaml.YAMLError, log or raise a clear, user-friendly message that includes the
path and the underlying exception, and either return a sensible default (e.g.,
empty dict) or re-raise a custom exception to preserve upstream behavior; ensure
you reference REGISTRY and Path in the error text so it's easy to locate the
problematic file.
- Around line 34-36: The code directly accesses gap["gap_id"], gap["title"], and
gap["disposition"] which will raise KeyError for malformed records; update the
gap-processing logic (the block that builds the dict with
"gap_id"/"title"/"disposition") to validate presence of these keys before using
them—either use gap.get("gap_id")/get("title")/get("disposition") and detect
missing values, or explicitly check "gap_id" in gap etc., then log or raise a
clear error and skip the record (or provide a default) so the script won't crash
on missing fields.
- Line 28: The code calls datetime.fromisoformat(reviewed) to produce
reviewed_dt without validation; wrap that call in a try/except that catches
ValueError (and optionally TypeError) to handle invalid or empty last_reviewed
values, log a clear error mentioning the offending last_reviewed string and the
record identifier, and either skip that record or exit with a non-zero status
depending on desired behavior; update the code around reviewed_dt =
datetime.fromisoformat(reviewed) to perform this validation and error handling
(or use a safe parser like dateutil.parser.parse inside the same try/except) so
the script no longer crashes on malformed ISO dates.
In `@scripts/gap-validate.py`:
- Line 117: The current check using "if proc.returncode == 0 and
proc.stdout.strip()" silently treats failures or empty outputs as success/novel;
instead, treat any nonzero return code or empty stdout as an explicit
inconclusive result and attach the captured stderr/stdout as error context.
Update every similar branch (where you check proc.returncode == 0 or status_code
== 200) to: evaluate proc.returncode, proc.stdout, and proc.stderr, set the
result status to "inconclusive" when returncode != 0 or stdout is empty, and
include proc.stderr (and proc.stdout) in the returned/recorded error message so
external failures (auth/rate-limit/API errors) are not mis-scored as novel.
Ensure you apply this change for the occurrences around the symbols
proc.returncode, proc.stdout, proc.stderr and the analogous HTTP checks
(status_code) noted in the comment.
In `@tests/scripts/test_gap_validate.py`:
- Around line 140-143: The test test_guide_content_has_7_questions is counting
bold markers via content.count("**") instead of actual questions; update it to
scan OBSERVATION_GUIDE_CONTENT for real question items (e.g., use a regex or
line-based check against patterns like lines ending with '?' and/or lines
beginning with 'Q:' or a numbered question prefix) and count those matches, then
assert the count >= 7; modify the assertion in
test_guide_content_has_7_questions to use that more accurate question-match
logic against OBSERVATION_GUIDE_CONTENT.
---
Nitpick comments:
In `@scripts/gap-validate.py`:
- Around line 517-608: The OBSERVATION_GUIDE_CONTENT constant duplicates the
canonical doc; replace the hardcoded multi-line string by loading the contents
of the canonical markdown (docs/research/gap-validation-observation-guide.md) at
runtime (or during script initialization) and assign that text to
OBSERVATION_GUIDE_CONTENT (with a clear fallback that logs an error if the doc
is missing). Locate OBSERVATION_GUIDE_CONTENT in scripts/gap-validate.py,
implement a small helper (e.g., read_observation_guide or similar) to read the
file once, and ensure any tests or downstream consumers still reference the same
symbol.
- Around line 32-33: Replace the hardcoded personal path assigned to
VAULT_OUTPUT_DIR with a configurable option: read from an environment variable
(e.g., GAP_VAULT_OUTPUT_DIR) or a CLI flag, falling back to a sensible
repo-local default such as REPO_ROOT / "output" or REPO_ROOT / "vault_output";
update all other occurrences that use the same personal path (the other
VAULT_OUTPUT_DIR assignments/usages around the later blocks) so they reference
the new configurable variable or flag parsing logic instead of the hardcoded
Path.home() location, and ensure any code that writes output creates the
directory if missing.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 959703be-f7c1-42cd-a6c5-a34045c53ffc
📒 Files selected for processing (6)
docs/research/gap-portfolio-SCHEMA.mddocs/research/gap-portfolio-registry.yamldocs/research/gap-validation-observation-guide.mdscripts/gap-decay-report.pyscripts/gap-validate.pytests/scripts/test_gap_validate.py
| schema_version: 1 | ||
| registry_id: research-gap-portfolio-v1 | ||
| authority_case: CASE-20260509-RESEARCH-PO | ||
| wip_limit: 1 |
There was a problem hiding this comment.
WIP invariant is currently violated by two active execute gaps.
wip_limit: 1 conflicts with two records marked disposition: execute (GAP-001 and GAP-002). This breaks the registry contract and can invalidate downstream prioritization logic.
Also applies to: 18-18, 30-30
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/research/gap-portfolio-registry.yaml` at line 4, The WIP invariant fails
because wip_limit: 1 conflicts with two records marked disposition: execute
(GAP-001 and GAP-002 — also applies to the entries at 18-18 and 30-30); fix by
either increasing the wip_limit value to at least 2 (update the wip_limit key)
or changing one or more of those records' disposition from execute to a
non-active state (e.g., backlog/defer) so the number of execute dispositions
does not exceed wip_limit; update the wip_limit or the disposition fields for
GAP-001/GAP-002 (and the entries at 18-18 and 30-30) accordingly.
| def load_registry(path: Path = REGISTRY) -> dict: | ||
| return yaml.safe_load(path.read_text(encoding="utf-8")) |
There was a problem hiding this comment.
Add error handling for file I/O and YAML parsing.
If the registry file doesn't exist or contains invalid YAML, the script will crash with an unhelpful stack trace. Wrapping in a try-except block would provide clearer error messages.
🛡️ Proposed fix to add error handling
def load_registry(path: Path = REGISTRY) -> dict:
+ try:
- return yaml.safe_load(path.read_text(encoding="utf-8"))
+ return yaml.safe_load(path.read_text(encoding="utf-8"))
+ except FileNotFoundError:
+ print(f"Error: Registry file not found at {path}", file=sys.stderr)
+ sys.exit(1)
+ except yaml.YAMLError as e:
+ print(f"Error: Invalid YAML in registry file: {e}", file=sys.stderr)
+ sys.exit(1)
+ except Exception as e:
+ print(f"Error reading registry: {e}", file=sys.stderr)
+ sys.exit(1)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/gap-decay-report.py` around lines 17 - 18, The load_registry function
currently calls path.read_text and yaml.safe_load without error handling; wrap
the body of load_registry (the function that takes path: Path = REGISTRY) in a
try/except that catches file I/O errors (FileNotFoundError, PermissionError,
UnicodeDecodeError) and yaml.YAMLError, log or raise a clear, user-friendly
message that includes the path and the underlying exception, and either return a
sensible default (e.g., empty dict) or re-raise a custom exception to preserve
upstream behavior; ensure you reference REGISTRY and Path in the error text so
it's easy to locate the problematic file.
| for gap in registry.get("gaps", []): | ||
| halflife = gap.get("decay_rate_halflife_days", 365) | ||
| reviewed = gap.get("last_reviewed", "2026-01-01") | ||
| reviewed_dt = datetime.fromisoformat(reviewed) |
There was a problem hiding this comment.
Add error handling for date parsing.
If last_reviewed contains an invalid ISO date format, datetime.fromisoformat() will raise a ValueError and crash the script. Adding validation would provide clearer error messages and prevent script failure.
🛡️ Proposed fix to add date validation
- reviewed_dt = datetime.fromisoformat(reviewed)
+ try:
+ reviewed_dt = datetime.fromisoformat(reviewed)
+ except ValueError:
+ print(f"Warning: Invalid date format for {gap.get('gap_id', 'unknown')}, skipping", file=sys.stderr)
+ continue📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| reviewed_dt = datetime.fromisoformat(reviewed) | |
| try: | |
| reviewed_dt = datetime.fromisoformat(reviewed) | |
| except ValueError: | |
| print(f"Warning: Invalid date format for {gap.get('gap_id', 'unknown')}, skipping", file=sys.stderr) | |
| continue |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/gap-decay-report.py` at line 28, The code calls
datetime.fromisoformat(reviewed) to produce reviewed_dt without validation; wrap
that call in a try/except that catches ValueError (and optionally TypeError) to
handle invalid or empty last_reviewed values, log a clear error mentioning the
offending last_reviewed string and the record identifier, and either skip that
record or exit with a non-zero status depending on desired behavior; update the
code around reviewed_dt = datetime.fromisoformat(reviewed) to perform this
validation and error handling (or use a safe parser like dateutil.parser.parse
inside the same try/except) so the script no longer crashes on malformed ISO
dates.
| "gap_id": gap["gap_id"], | ||
| "title": gap["title"], | ||
| "disposition": gap["disposition"], |
There was a problem hiding this comment.
Add validation for required gap fields.
Direct dictionary access without validation will raise KeyError if required fields (gap_id, title, disposition) are missing from a gap record. Consider validating these fields or using .get() with error handling.
🛡️ Proposed fix to validate required fields
+ # Validate required fields
+ required = ["gap_id", "title", "disposition"]
+ missing = [f for f in required if f not in gap]
+ if missing:
+ print(f"Warning: Gap missing required fields {missing}, skipping", file=sys.stderr)
+ continue
+
if expiry <= horizon:
expiring.append(🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/gap-decay-report.py` around lines 34 - 36, The code directly accesses
gap["gap_id"], gap["title"], and gap["disposition"] which will raise KeyError
for malformed records; update the gap-processing logic (the block that builds
the dict with "gap_id"/"title"/"disposition") to validate presence of these keys
before using them—either use gap.get("gap_id")/get("title")/get("disposition")
and detect missing values, or explicitly check "gap_id" in gap etc., then log or
raise a clear error and skip the record (or provide a default) so the script
won't crash on missing fields.
| text=True, | ||
| timeout=SWEEP_TIMEOUT, | ||
| ) | ||
| if proc.returncode == 0 and proc.stdout.strip(): |
There was a problem hiding this comment.
External failures can be mis-scored as novel instead of inconclusive.
Several sweeps only handle success paths and otherwise continue with empty results, which can inflate novelty votes on auth/rate-limit/API failures. Treat non-200/nonzero responses as explicit inconclusive signals with captured error context.
#!/bin/bash
# Verify places where unsuccessful responses may be silently ignored.
rg -n "returncode == 0|status_code == 200|except httpx.HTTPError|pass" scripts/gap-validate.pyAlso applies to: 169-170, 227-228, 239-240, 305-306
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/gap-validate.py` at line 117, The current check using "if
proc.returncode == 0 and proc.stdout.strip()" silently treats failures or empty
outputs as success/novel; instead, treat any nonzero return code or empty stdout
as an explicit inconclusive result and attach the captured stderr/stdout as
error context. Update every similar branch (where you check proc.returncode == 0
or status_code == 200) to: evaluate proc.returncode, proc.stdout, and
proc.stderr, set the result status to "inconclusive" when returncode != 0 or
stdout is empty, and include proc.stderr (and proc.stdout) in the
returned/recorded error message so external failures (auth/rate-limit/API
errors) are not mis-scored as novel. Ensure you apply this change for the
occurrences around the symbols proc.returncode, proc.stdout, proc.stderr and the
analogous HTTP checks (status_code) noted in the comment.
| def test_guide_content_has_7_questions(self) -> None: | ||
| content = gap_validate.OBSERVATION_GUIDE_CONTENT | ||
| question_count = content.count("**") | ||
| assert question_count >= 7 |
There was a problem hiding this comment.
Question-count assertion is not actually counting questions.
On Line 142, content.count("**") counts bold markers, not question items, so this test can pass even when fewer than 7 questions exist.
Suggested fix
+import re
@@
class TestObservationGuide:
def test_guide_content_has_7_questions(self) -> None:
content = gap_validate.OBSERVATION_GUIDE_CONTENT
- question_count = content.count("**")
+ question_count = len(
+ re.findall(r"(?m)^\s*(?:\d+\.|[-*])\s+.*\?\s*$", content)
+ )
assert question_count >= 7🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/scripts/test_gap_validate.py` around lines 140 - 143, The test
test_guide_content_has_7_questions is counting bold markers via
content.count("**") instead of actual questions; update it to scan
OBSERVATION_GUIDE_CONTENT for real question items (e.g., use a regex or
line-based check against patterns like lines ending with '?' and/or lines
beginning with 'Q:' or a numbered question prefix) and count those matches, then
assert the count >= 7; modify the assertion in
test_guide_content_has_7_questions to use that more accurate question-match
logic against OBSERVATION_GUIDE_CONTENT.
|
Beta lane merge readiness check (2026-05-26T20:30Z) All CI checks pass: test, lint, typecheck, security, rust-check, CodeQL (actions/c-cpp/js-ts/python/rust), pr-admission, authority-case-check, freeze-check, homage-visual-regression, web-build, vscode-build, secrets-scan, actionlint, review. Only failing status: Merge state: UNSTABLE (non-required check failing). Ready for merge queue pending autoqueue-admission resolution. cc @ryanklee — needs manual merge queue enqueue or autoqueue-admission status investigation. |
|
Governed stale-PR reconciliation note (task |
Summary
scripts/gap-validate.py— automated sweep that scores research gaps against a decay/evidence/actionability rubric and produces a ranked decision matrixdocs/research/gap-portfolio-registry.yaml— structured registry of research gaps with schema validationdocs/research/gap-portfolio-SCHEMA.md— schema documentation for the registry formatdocs/research/gap-validation-observation-guide.md— observation protocol for validating gap claimsscripts/gap-decay-report.py— decay monitoring for gap freshnesstests/scripts/test_gap_validate.py— unit tests for the validation logicTest plan
uv run pytest tests/scripts/test_gap_validate.py -qpassesuv run ruff check scripts/gap-validate.py scripts/gap-decay-report.pycleanuv run python scripts/gap-validate.py --helpshows usage🤖 Generated with Claude Code