Skip to content

fix: bind vault blobs to authenticated metadata#276

Open
saurabhhhcodes wants to merge 7 commits into
Abhash-Chakraborty:mainfrom
saurabhhhcodes:fix/vault-aead-metadata-264
Open

fix: bind vault blobs to authenticated metadata#276
saurabhhhcodes wants to merge 7 commits into
Abhash-Chakraborty:mainfrom
saurabhhhcodes:fix/vault-aead-metadata-264

Conversation

@saurabhhhcodes
Copy link
Copy Markdown

@saurabhhhcodes saurabhhhcodes commented May 28, 2026

Summary

  • bind vault AES-GCM encryption/decryption to canonical AEAD associated data: vault:v1, media_id, and file_hash
  • reject swapped encrypted blobs and tampered authenticated metadata during vault streaming
  • update the vault encryption design note to document AES-256-GCM as the active streaming implementation contract

Tests

  • backend/.codex-venv/bin/python -m py_compile src/find_api/core/crypto.py src/find_api/routers/vault.py tests/test_vault.py
  • backend/.codex-venv/bin/python -m pytest tests/test_vault.py -q
  • backend/.codex-venv/bin/python -m pytest tests/test_gallery.py tests/test_search.py tests/test_clusters.py tests/test_people.py tests/test_vault.py -q
  • git diff --check

Closes #264

Summary by CodeRabbit

  • New Features

    • Vault encryption now uses streaming AES‑GCM with mandatory associated authenticated data (AAD) derived from file metadata, so hidden files are encrypted/decrypted with metadata-bound authentication without loading entire files into memory.
  • Tests

    • Added tests verifying streaming decryption rejects tampered ciphertext or modified authenticated metadata.
  • Documentation

    • Updated vault encryption design to reflect AES‑GCM, mandatory AAD, blob format, and current operational behavior.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 99907146-80c6-4b16-a3e1-c4d36bbef7b8

📥 Commits

Reviewing files that changed from the base of the PR and between 306da90 and 218342b.

📒 Files selected for processing (1)
  • docs/plans/partial/vault-encryption-design.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • docs/plans/partial/vault-encryption-design.md

📝 Walkthrough

Walkthrough

Extended vault encryption to bind AEAD associated data (version, media_id, file_hash) into AES-GCM authentication. Updated encrypt_file and decrypt_file_stream signatures to accept optional associated_data parameter, wired AAD into hide and stream operations, added tests verifying blob swap and metadata tampering detection, and updated design documentation to reflect the AES-256-GCM implementation with mandatory AEAD.

Changes

Vault encryption with authenticated associated data binding

Layer / File(s) Summary
AEAD associated-data contract and builder
backend/src/find_api/core/crypto.py
Defined _VAULT_BLOB_AAD_VERSION = "vault:v1" constant and added build_vault_aad(media_id, file_hash) -> bytes function to construct canonical UTF-8 encoded associated data incorporating version, media id, and file hash.
Crypto function AEAD integration
backend/src/find_api/core/crypto.py
Extended encrypt_file and decrypt_file_stream with optional associated_data: bytes | None = None parameter. When provided, both functions call authenticate_additional_data(associated_data) to bind the associated data into AES-GCM authentication context.
Vault hide and stream operations with AAD
backend/src/find_api/routers/vault.py
Imported build_vault_aad and updated hide_media and stream_hidden_media to compute vault AAD from media.id and media.file_hash, then pass it to encrypt_file and decrypt_file_stream respectively, binding encrypted blobs to their media records.
AEAD tampering and swap rejection tests
backend/tests/test_vault.py
Added test_swapped_encrypted_blob_rejected and test_tampered_authenticated_metadata_rejected to verify that AAD binding detects blob swaps between media items and rejects mutations to authenticated metadata (file_hash), both raising InvalidTag during decryption.
Design documentation alignment
docs/plans/partial/vault-encryption-design.md
Updated current implementation status to assert AES-256-GCM with metadata-bound AEAD; replaced ChaCha20-Poly1305 decision with AES-256-GCM for current implementation; corrected blob format to store `ciphertext

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant VaultAPI as vault.hide_media / vault.stream_hidden_media
  participant Crypto as find_api.core.crypto
  participant AESGCM as AES-256-GCM
  Client->>VaultAPI: hide media request
  VaultAPI->>VaultAPI: aad = build_vault_aad(media.id, media.file_hash)
  VaultAPI->>Crypto: encrypt_file(source, dest, associated_data=aad)
  Crypto->>AESGCM: authenticate_additional_data(aad) / encrypt stream
  Client->>VaultAPI: stream hidden media request
  VaultAPI->>VaultAPI: aad = build_vault_aad(media.id, media.file_hash)
  VaultAPI->>Crypto: decrypt_file_stream(iv, encrypted_path, associated_data=aad)
  Crypto->>AESGCM: authenticate_additional_data(aad) / decrypt & verify
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

testing, quality:clean

Suggested reviewers

  • Abhash-Chakraborty
  • macroscopeapp

Poem

🐰 A vault secured with locks of math,
AAD ties each photo to its path,
Swap two blobs and the tag will cry,
Tamper metadata and decryption dies,
Encryption and design now align! 🔐

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 58.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix: bind vault blobs to authenticated metadata' clearly and concisely describes the primary change: binding vault encryption to AEAD associated data.
Description check ✅ Passed The PR description covers the summary, testing steps, and issue reference. It follows most of the template sections but lacks explicit checklist confirmations and some optional details like detailed testing reproduction steps.
Linked Issues check ✅ Passed All main objectives from issue #264 are met: AES-256-GCM is formalized in design, AEAD binding is implemented with format marker and media/file metadata, tests verify blob swaps and tampering rejection, and existing vault flows remain functional.
Out of Scope Changes check ✅ Passed All changes are directly aligned with issue #264: crypto.py adds AAD helpers, vault.py integrates them, test_vault.py adds security tests, and design.md updates the cipher documentation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 28, 2026

PR Context Summary

Suggested issue links

  • No strong issue match found yet.

Use Fixes #123 or Closes #123 in the PR body when one of the suggestions is the intended issue.
Manual rerun: Actions > PR Context Triage > Run workflow > set pr_number and force_review=true.

@macroscopeapp
Copy link
Copy Markdown
Contributor

macroscopeapp Bot commented May 28, 2026

Approvability

Verdict: Needs human review

This PR modifies core cryptographic functions to add AEAD metadata binding, which is security-sensitive code. Changes to encryption/decryption behavior in crypto modules warrant human review to verify the implementation is correct.

No code changes detected at 5210dd5. Prior analysis still applies.

You can customize Macroscope's approvability policy. Learn more.

@saurabhhhcodes
Copy link
Copy Markdown
Author

Thanks for the automated checks. I see CodeRabbit, GitGuardian, Macroscope correctness, and PR context triage are green, while Macroscope approvability correctly asks for human review because this touches cryptographic AEAD behavior.

Could a maintainer please review the security-sensitive crypto change when available? Also, before merge, could you mirror the linked issue scoring labels onto this PR (gssoc26, level:advanced, privacy / appropriate type labels)? The PR currently has no visible labels, so I want to avoid scoring ambiguity after merge.

@saurabhhhcodes
Copy link
Copy Markdown
Author

Quick review/scoring follow-up: this security/privacy PR is green from the automated checks I can see, but it still needs human review because it touches vault encryption/authenticated metadata.

If accepted, could you please mirror the linked issue labels onto the PR before merge? The linked issue is #264 (gssoc26, level:advanced, backend/privacy/security-related labels), and the PR currently has no visible scoring labels.

@saurabhhhcodes
Copy link
Copy Markdown
Author

Fixed the backend formatting failure by running Ruff formatting on the vault AEAD regression test file and pushing head 306da90. Local checks: python3 -m ruff format --check tests/test_vault.py and git diff --check both pass. GitHub Actions should rerun now.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

Actionable comments posted: 0

@saurabhhhcodes
Copy link
Copy Markdown
Author

Refreshed this branch with the latest upstream main in commit fc3ff1b to clear the behind-state.

Merge result:

  • no conflicts
  • upstream-only docs/assets updates were brought in
  • vault AEAD implementation remains unchanged

Validation available locally:

  • git diff --check

I could not run backend pytest locally on this machine because the repo requires Python >=3.12,<3.13, while the available python3 is 3.9.6 and there is no uv/python3.12 binary installed here. GitHub checks should rerun on the refreshed branch.

@Abhash-Chakraborty Abhash-Chakraborty added bug Something is broken and needs to be fixed. gssoc26 Related to GirlScript Summer of Code 2026. level:advanced GSSoC difficulty level: advanced. Base contributor points: 55. backend FastAPI, database, storage, and API work privacy Data privacy, security boundaries, and user trust architecture High-level design decisions and technical direction type:bug Bug-fix PR. GSSoC type bonus: +20 points. under-review Maintainer needs to verify do-not-merge Blocks merging. Remove this label only when the PR is fully ready for production. labels May 29, 2026
@saurabhhhcodes
Copy link
Copy Markdown
Author

Refreshed this branch with latest upstream main in commit 746f116 to clear the behind state again.

Validation available locally:

  • git diff --check
  • python3 -m py_compile backend/src/find_api/core/crypto.py backend/src/find_api/routers/vault.py backend/tests/test_vault.py

Full backend pytest still needs the repo Python version (>=3.12,<3.13); this machine only has Python 3.9.6. The PR remains intentionally pending human review because it touches vault AEAD/security behavior.

@saurabhhhcodes
Copy link
Copy Markdown
Author

Refreshed this branch with latest upstream main in head 4da4705 to clear the behind state again.

Merge result:

  • no conflicts
  • upstream gallery/search/docs updates were brought in
  • vault AEAD authenticated-metadata implementation remains unchanged

Validation available locally:

  • python3 -m py_compile backend/src/find_api/core/crypto.py backend/src/find_api/routers/vault.py backend/tests/test_vault.py
  • git diff --check

The PR still intentionally needs human review because it touches vault encryption/authenticated metadata behavior.

@saurabhhhcodes
Copy link
Copy Markdown
Author

Refreshed this branch with latest upstream main in head 5210dd5 to clear the behind state again.\n\nMerge result:\n- no conflicts\n- upstream dependency-lock updates were brought in\n- vault AEAD authenticated-metadata implementation remains unchanged\n\nValidation available locally:\n- python3 -m py_compile backend/src/find_api/core/crypto.py backend/src/find_api/routers/vault.py backend/tests/test_vault.py\n- git diff --check\n- exact conflict-marker scan clean\n\nThe PR still intentionally needs human review because it touches vault encryption/authenticated metadata behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

architecture High-level design decisions and technical direction backend FastAPI, database, storage, and API work bug Something is broken and needs to be fixed. do-not-merge Blocks merging. Remove this label only when the PR is fully ready for production. gssoc26 Related to GirlScript Summer of Code 2026. level:advanced GSSoC difficulty level: advanced. Base contributor points: 55. privacy Data privacy, security boundaries, and user trust type:bug Bug-fix PR. GSSoC type bonus: +20 points. under-review Maintainer needs to verify

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: align vault encryption with authenticated metadata design

2 participants