Skip to content

fix: require .trained marker to load classifier, preventing use of ra…#904

Open
Pcmhacker-piro wants to merge 1 commit into
SdSarthak:mainfrom
Pcmhacker-piro:fix/guard-classifier-trained-marker
Open

fix: require .trained marker to load classifier, preventing use of ra…#904
Pcmhacker-piro wants to merge 1 commit into
SdSarthak:mainfrom
Pcmhacker-piro:fix/guard-classifier-trained-marker

Conversation

@Pcmhacker-piro
Copy link
Copy Markdown

Summary
Closes #335
The IntentClassifier's _has_trained_weights method only checked for the existence of weight files (pytorch_model.bin/model.safetensors), but did not verify the model was actually fine-tuned. This allowed random stub models (e.g., from download_model.py) to be loaded for inference, producing unreliable prompt injection detection.
The fix introduces a .trained marker file that the training pipeline creates after fine-tuning completes. The classifier now requires this marker to exist alongside the weight files before loading the model as a fine-tuned classifier. Without it, the system falls back to the deterministic heuristic classifier.
Type of Change

  • Bug fix
  • New feature
  • Documentation update
  • Refactor
  • Tests
  • Infra / CI
    Checklist
  • I have read CONTRIBUTING.md
  • My code follows the project style
  • I have added/updated tests where relevant
  • Tests/lint pass locally (if available)
  • I have not committed .env or any secrets
  • I have updated documentation if needed
    CHANGED FILES
  • backend/app/modules/guard/intent_classifier.py
  • guard-sdk/src/aegisai_guard/intent_classifier.py
  • backend/tests/test_guard_explain.py
    COMMITS
    67187e9 - fix: require .trained marker to load classifier, preventing use of random stub weights
    TESTING PERFORMED
    git diff --stat

3 files changed, 44 insertions(+), 6 deletions(-)

FINAL STATUS

  • Branch Name: fix/guard-classifier-trained-marker
  • Commit Hash: 67187e9
  • PR Created: No (PAT lacks createPullRequest scope — open manually at the PR link above)
  • Ready for Review: Yes

@Pcmhacker-piro
Copy link
Copy Markdown
Author

hyy @SdSarthak
i fixed this issus so please check it and if you found any bug then informe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG : Guard classifier using random weights

1 participant