Skip to content

test(tdd): add failing security tests for SEC-01 through SEC-07 hardening#3290

Open
rysweet wants to merge 2 commits intomainfrom
tdd/security-hardening-3266-3282
Open

test(tdd): add failing security tests for SEC-01 through SEC-07 hardening#3290
rysweet wants to merge 2 commits intomainfrom
tdd/security-hardening-3266-3282

Conversation

@rysweet
Copy link
Owner

@rysweet rysweet commented Mar 18, 2026

Summary

TDD test files specifying the security hardening contract from the #3266/#3282 design specification. All tests intentionally fail against current code — they define the implementation target for the follow-up security PRs.

  • 86 tests collected | 81 fail | 4 pass (SEC-02 regression guards) | 1 skip

Test files added

File Security items covered Tests Status
tests/test_memory_auto_install_security.py SEC-02, SEC-04, SEC-07 14 3 pass, 10 fail, 1 skip
tests/recipes/test_run_recipe_by_name_working_dir.py SEC-03, SEC-05 33 all fail
tests/recipes/test_step_02b_output_validation.py SEC-01 39 all fail

What each set tests

SEC-01 (test_step_02b_output_validation.py) — validate_codebase_analysis() in new module src/amplihack/recipes/_validation.py:

  • Allowlist-only keys: {files, patterns, dependencies, entry_points, build_system, language, test_framework}
  • Max 64 KB JSON-serialised size
  • Max nesting depth 3
  • Rejects non-dict inputs, invalid JSON strings, empty dicts
  • Rejects XPiA probe keys (__system__, IMPORTANT)

SEC-03 (test_run_recipe_by_name_working_dir.py) — _validate_working_dir(path) helper:

  • Shell metacharacter rejection: ;, &, |, $, backtick, (, ), {, }, [, ], <, >, \\, ", '
  • Non-existent path rejection
  • File-instead-of-directory rejection
  • Returns pathlib.Path for valid directories
  • run_recipe_by_name calls validator before Rust runner

SEC-04 (test_memory_auto_install_security.py) — Hash verification:

  • --require-hashes flag in pip install command
  • src/amplihack/memory_auto_install_hashes.txt file existence and non-emptiness
  • Hash reference in pip command

SEC-05 (test_run_recipe_by_name_working_dir.py) — DeprecationWarning for unknown kwargs:

  • adapter="old" emits DeprecationWarning naming the kwarg
  • No warning for known params (user_context, dry_run, progress, etc.)

SEC-07 (test_memory_auto_install_security.py) — Error sanitization:

  • _sanitize_error() strips Unix and Windows absolute paths
  • _do_install calls _sanitize_error() on pip failure output

Test plan

  • Confirm all 81 tests fail on main (proving they are real TDD tests)
  • Implement SEC-07 (_sanitize_error) → 6 SEC-07 tests pass
  • Implement SEC-04 (hashes file + --require-hashes) → 4 SEC-04 tests pass
  • Implement SEC-03 (_validate_working_dir + integration into run_recipe_by_name) → 15 SEC-03 tests pass
  • Implement SEC-05 (DeprecationWarning in run_recipe_by_name) → 5 SEC-05 tests pass
  • Implement SEC-01 (validate_codebase_analysis in _validation.py) → 39 SEC-01 tests pass
  • All 86 tests green = implementation complete

🤖 Generated with Claude Code

Ubuntu and others added 2 commits March 18, 2026 01:58
…ning

Adds three TDD test files specifying the contract for unimplemented
security features from the #3266/#3282 design specification.

Tests are intentionally failing against current code:

tests/test_memory_auto_install_security.py
  - SEC-02: regression guard — sys.executable used in pip command (3 pass)
  - SEC-04: --require-hashes + hashes file + version pin (3 fail, 1 skip)
  - SEC-07: _sanitize_error() strips absolute paths from error output (6 fail)

tests/recipes/test_run_recipe_by_name_working_dir.py
  - SEC-03: _validate_working_dir() helper — metachar/nonexistent/file rejection (11 fail)
  - SEC-05: DeprecationWarning emitted for unknown **kwargs (5 fail)

tests/recipes/test_step_02b_output_validation.py
  - SEC-01: validate_codebase_analysis() — whitelist keys, 64KB limit,
            depth ≤ 3, type validation, empty dict rejection (39 fail)

Total: 86 collected | 81 fail | 4 pass | 1 skip

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Contributor

🤖 Auto-fixed version bump

The version in pyproject.toml has been automatically bumped to the next patch version.

If you need a minor or major version bump instead, please update pyproject.toml manually and push the change.

@github-actions
Copy link
Contributor

Repo Guardian - Passed

All 4 changed files were reviewed and no violations were found.

File Assessment
pyproject.toml Standard version bump — durable configuration file ✅
tests/recipes/test_run_recipe_by_name_working_dir.py TDD security tests for SEC-03/SEC-05 — permanent test suite ✅
tests/recipes/test_step_02b_output_validation.py TDD security tests for SEC-01 — permanent test suite ✅
tests/test_memory_auto_install_security.py TDD security tests for SEC-02/SEC-04/SEC-07 — permanent test suite ✅

No ephemeral content, meeting notes, one-off scripts, or temporal snapshots detected. All additions are durable security specifications appropriate for the repository.

Generated by Repo Guardian for issue #3290 ·

@github-actions
Copy link
Contributor

PR Triage Report

Risk: 🟡 Medium | Priority: High | Type: Test / Security (TDD foundation)

Summary

TDD test files specifying the security hardening contract for SEC-01 through SEC-07. 81 of 86 tests intentionally fail against current code — this is by design, establishing the implementation target for follow-up security PRs.

Coverage:

  • SEC-01: validate_codebase_analysis() — allowlist-only keys, size/depth limits, XPiA probe rejection
  • SEC-02/04/07: Hash verification (--require-hashes), error sanitization
  • SEC-03: _validate_working_dir() — shell metacharacter injection prevention
  • SEC-05: DeprecationWarning for unknown kwargs

CI Status

  • GitGuardian: ✅ Pass
  • Full CI suite: Not run (only GitGuardian)

Risk Factors

  1. Intentionally failing tests: Merging will cause CI test runs to report failures until implementation PRs (smart-orchestrator/default-workflow stalls at step-02b-analyze-codebase #3266/smart-orchestrator launch broken in Copilot CLI (API mismatch + PEP 668 auto-install) #3282 follow-ups) land
  2. Base SHA mismatch: Based on 7fd2fc190a77 — older than current main (c8608fe9) — branch needs rebase
  3. Security-critical scope: SEC-03 (working dir injection prevention) and SEC-01 (XPiA defense) address real attack vectors

Recommendation

⚠️ Rebase required before merge (base is behind main). Consider merging together with or immediately before the implementation PRs to avoid a CI red period. The security contracts defined here are well-specified and valuable — mark for triage:security-review.

Generated by PR Triage Agent ·

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant