Skip to content

add backend vault metadata and hidden media flags#284

Open
shubh-gitpush wants to merge 1 commit into
Abhash-Chakraborty:mainfrom
shubh-gitpush:flags
Open

add backend vault metadata and hidden media flags#284
shubh-gitpush wants to merge 1 commit into
Abhash-Chakraborty:mainfrom
shubh-gitpush:flags

Conversation

@shubh-gitpush
Copy link
Copy Markdown
Contributor

@shubh-gitpush shubh-gitpush commented May 28, 2026

Summary

This PR introduces backend support for hidden/encrypted media state and vault metadata, applies default filtering so hidden media is excluded from normal/public surfaces, keeps backward compatibility for existing records, and adds regression tests.

Fixes #184

Type of change

  • Bug fix
  • Feature
  • Documentation update
  • Refactor
  • CI / tooling

What changed

  1. Data Model and Schema
  • Added media-level vault state fields:
    • vault_state (default: visible)
    • hidden_at
    • encrypted_at
  • Added first-class SQLAlchemy models for vault tables:
    • VaultConfig
    • VaultMetadata
  • Extended vault metadata schema with non-sensitive encryption metadata:
    • encryption_algorithm
    • key_derivation
    • ciphertext_size

Files:

  • backend/src/find_api/models/media.py
  • backend/src/find_api/models/vault.py
  • backend/src/find_api/models/init.py
  1. Migration and Backward Compatibility
  • Added Alembic migration for new vault/media state fields and vault metadata columns.
  • Added startup schema normalization in DB init for existing PostgreSQL environments:
    • Safely adds missing columns/indexes
    • Backfills legacy rows (vault_state, hidden_at, encrypted_at)

Files:

  • backend/alembic/versions/20260528_vault_state_metadata.py
  • backend/src/find_api/core/database.py
  1. Hidden Media Exclusion from Public/Normal Surfaces
    Applied default exclusion (is_hidden = false) for normal/public flows.

Updated API behavior:

  • Gallery listing excludes hidden media.
  • Public media detail/thumbnail/like/reprocess/delete/bulk-delete only resolve visible media.
  • Clusters exclude hidden members, hide hidden-only clusters, and expose visible member counts.
  • People listing and person images exclude hidden media.
  • Search continues to enforce hidden filtering in SQL.

Files:

  • backend/src/find_api/routers/gallery.py
  • backend/src/find_api/routers/clusters.py
  • backend/src/find_api/routers/people.py
  • backend/src/find_api/routers/search.py
  1. Vault Flow State Persistence and Safety
  • Vault hide operation now persists explicit state transitions (vault_state, timestamps).
  • Vault rollback path restores visible state if post-encryption storage deletion fails.
  • API responses avoid exposing sensitive encryption material (e.g., keys/iv/verifier data).

File:

  • backend/src/find_api/routers/vault.py
  1. Worker Consistency
  • Background clustering and face clustering source sets now ignore hidden media.

File:

  • backend/src/find_api/workers/jobs.py

Screenshots / recordings (for UI changes)

Attach before/after screenshots or a short video.

How to test

Checklist

  • I linked the related issue
  • I ran required checks from CONTRIBUTING.md
  • I updated docs/env notes if needed
  • My PR is scoped to a single issue
  • I followed commit message conventions
  • I am not committing secrets or local artifacts

GSSoC'26 checklist

  • I requested issue assignment before starting
  • I have meaningful commits (no spam commits)
  • I am ready to explain my implementation in review comments

Summary by CodeRabbit

Release Notes

  • New Features
    • Introduced vault feature to hide and encrypt sensitive media securely
    • Hidden media automatically filtered from galleries, search results, clusters, and people views
    • Enhanced media metadata tracking for vault operations and timestamps

Review Change Stack

Copilot AI review requested due to automatic review settings May 28, 2026 14:26
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

Warning

Review limit reached

@shubh-gitpush, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 33 minutes and 23 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1d51f316-5c70-42c3-bbf4-9c6280c458e0

📥 Commits

Reviewing files that changed from the base of the PR and between fa429f9 and 1f6c871.

📒 Files selected for processing (15)
  • backend/alembic/versions/20260528_vault_state_metadata.py
  • backend/src/find_api/core/database.py
  • backend/src/find_api/models/__init__.py
  • backend/src/find_api/models/media.py
  • backend/src/find_api/models/vault.py
  • backend/src/find_api/routers/clusters.py
  • backend/src/find_api/routers/gallery.py
  • backend/src/find_api/routers/people.py
  • backend/src/find_api/routers/vault.py
  • backend/src/find_api/workers/jobs.py
  • backend/tests/test_clusters.py
  • backend/tests/test_gallery.py
  • backend/tests/test_people.py
  • backend/tests/test_search.py
  • backend/tests/test_vault.py
📝 Walkthrough

Walkthrough

This PR establishes hidden media filtering as a first-class feature by introducing vault state columns and encryption metadata models, then applies consistent visibility constraints across all public API surfaces and background jobs while preserving encryption material secrecy in responses.

Changes

Hidden Media & Vault State Implementation

Layer / File(s) Summary
Vault data models and Media state columns
backend/src/find_api/models/vault.py, backend/src/find_api/models/media.py, backend/src/find_api/models/__init__.py
New VaultConfig singleton table stores cryptographic material; VaultMetadata per-media table holds encrypted paths and encryption parameters with cascading deletes. Media model gains vault_state (default "visible", indexed), hidden_at, and encrypted_at timestamp columns. Both vault models and media columns are exported for factory registration.
Alembic migration and database initialization
backend/alembic/versions/20260528_vault_state_metadata.py, backend/src/find_api/core/database.py
Alembic migration adds the new columns and encryption fields with dialect-safe downgrades using SQLite batch operations. init_db() mirrors the schema via DDL, creates an index on media.vault_state, and backfills vault_state from is_hidden and populates hidden_at/encrypted_at from existing timestamps based on hidden status.
Public media access helpers and refactoring
backend/src/find_api/routers/gallery.py
Introduces _public_media_query() helper to apply Media.is_hidden == False filtering and _load_public_media_or_404() helper to load a media record or raise HTTPException(404), establishing a reusable visibility pattern.
Gallery endpoint visibility enforcement
backend/src/find_api/routers/gallery.py
GET /gallery, /image/{id}, /image/{id}/thumbnail, /image/{id}/like, /image/{id}/reprocess, DELETE /image/{id}, and POST /images/bulk-delete endpoints now use public-media helpers or add visibility constraints; hidden media results in 404 or exclusion from bulk operations.
Cluster and people endpoint filtering
backend/src/find_api/routers/clusters.py, backend/src/find_api/routers/people.py
Cluster endpoints count member_count and derive sample_ids only from visible media, skipping clusters with zero visible members. People router omits person groups whose media is entirely hidden. Both changes ensure listings and counts reflect only non-hidden content.
Background job clustering exclusions
backend/src/find_api/workers/jobs.py
cluster_images() and cluster_faces() now add Media.is_hidden == False constraints to candidate queries, preventing hidden media embeddings from influencing cluster assignments.
Vault hide operation state management
backend/src/find_api/routers/vault.py
Hide operation now stores encryption parameters and ciphertext size in vault_metadata and updates Media with vault_state = "hidden_encrypted" plus UTC hidden_at/encrypted_at timestamps. Rollback path restores vault state fields to visible state on deletion failure.
Comprehensive hidden-media filtering tests
backend/tests/test_clusters.py, backend/tests/test_gallery.py, backend/tests/test_people.py, backend/tests/test_search.py, backend/tests/test_vault.py
Test helpers now accept is_hidden parameters to seed hidden media. New tests verify: hidden media excluded from gallery/cluster/people listing endpoints; hidden media does not influence clustering or search; vault list endpoint hides encryption material; OCR reranking and similarity clamping work correctly; hidden filter constraints present in executed SQL.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant APIEndpoint
  participant Gallery
  participant Database
  Client->>APIEndpoint: GET /image/{id}
  APIEndpoint->>Gallery: _load_public_media_or_404
  Gallery->>Database: SELECT * WHERE media_id=? AND is_hidden=false
  alt Media found and visible
    Database-->>Gallery: Media record
    Gallery-->>APIEndpoint: Media object
    APIEndpoint-->>Client: 200 Image response
  else Media hidden or not found
    Database-->>Gallery: No rows
    Gallery-->>APIEndpoint: HTTPException 404
    APIEndpoint-->>Client: 404 Not found
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • Abhash-Chakraborty/Find#210: Extends existing hidden vault feature by adding media.vault_state/hidden_at/encrypted_at and new vault metadata columns that match the vault session flow foundation.
  • Abhash-Chakraborty/Find#251: Both modify backend/src/find_api/workers/jobs.py clustering flows to filter inputs and adjust DB-destructive update logic.
  • Abhash-Chakraborty/Find#223: Both modify backend/src/find_api/routers/clusters.py and people.py response building; main PR adds visible-member filtering while retrieved PR adds thumbnail fields.

Suggested labels

type:security, type:testing

Suggested reviewers

  • Abhash-Chakraborty

Poem

🐰 A vault of secrets now takes shape,
With hidden walls that help escape,
All secret paths are tucked away,
While public galleries hold the day,
No cipher text leaks from the API—
Encryption done with decency!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 34.88% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main changes: adding vault metadata support and hidden media flag functionality to the backend.
Description check ✅ Passed The PR description provides comprehensive coverage of all major changes with organized sections, affected files, and links to the issue, following the repository template structure.
Linked Issues check ✅ Passed All acceptance criteria from issue #184 are met: vault metadata models added, hidden media filtering applied across APIs, migrations/backward compatibility implemented, tests added, and sensitive data not exposed.
Out of Scope Changes check ✅ Passed All changes directly support the stated objectives of issue #184; no unrelated modifications detected beyond the vault metadata, hidden media filtering, migrations, and tests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 28, 2026

PR Context Summary

Suggested issue links

  • No strong issue match found yet.

Use Fixes #123 or Closes #123 in the PR body when one of the suggestions is the intended issue.
Manual rerun: Actions > PR Context Triage > Run workflow > set pr_number and force_review=true.

Comment thread backend/alembic/versions/20260528_vault_state_metadata.py
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 High

db.query(Face).update({Face.person_id: None}, synchronize_session=False)

The person_id reset at line 454 clears assignments for ALL faces in the database, but the new filtering at lines 391-396 only loads faces from non-hidden media for clustering. Faces from hidden media will have their person_id set to None and never reassigned, permanently losing their person associations. If a user hides an image, runs clustering, then unhides it, those faces will appear as unclustered noise.

🤖 Copy this AI Prompt to have your agent fix this:
In file backend/src/find_api/workers/jobs.py around line 454:

The `person_id` reset at line 454 clears assignments for ALL faces in the database, but the new filtering at lines 391-396 only loads faces from non-hidden media for clustering. Faces from hidden media will have their `person_id` set to `None` and never reassigned, permanently losing their person associations. If a user hides an image, runs clustering, then unhides it, those faces will appear as unclustered noise.

Evidence trail:
backend/src/find_api/workers/jobs.py lines 391-396 (filtered query with Media.is_hidden.is_(False)), line 432 (face_ids derived from face_rows), line 454 (unfiltered db.query(Face).update({Face.person_id: None})), lines 497-504 (only face_ids get person_id reassigned). Commit: REVIEWED_COMMIT.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium routers/clusters.py:64

When the first 5 member_ids are all hidden but later members are visible, sample_media returns empty because it only queries those first 5 IDs filtered by is_hidden.is_(False). The cluster passes the visible_member_count > 0 check but returns samples: [], causing cluster cards to display without preview images.

-        sample_media = (
-            db.query(Media)
-            .filter(Media.id.in_(sample_ids), Media.is_hidden.is_(False))
-            .all()
-        )
🤖 Copy this AI Prompt to have your agent fix this:
In file backend/src/find_api/routers/clusters.py around lines 64-68:

When the first 5 `member_ids` are all hidden but later members are visible, `sample_media` returns empty because it only queries those first 5 IDs filtered by `is_hidden.is_(False)`. The cluster passes the `visible_member_count > 0` check but returns `samples: []`, causing cluster cards to display without preview images.

Evidence trail:
backend/src/find_api/routers/clusters.py lines 52, 56-60, 61-62, 64-68 at REVIEWED_COMMIT. Line 52 takes first 5 IDs for sampling; lines 56-60 count visible members across ALL member_ids; lines 64-68 query sample_media using only first 5 IDs filtered by is_hidden. The mismatch between the visibility check scope (all members) and sample scope (first 5) allows a cluster to pass the visibility gate but have zero sample images.

@macroscopeapp
Copy link
Copy Markdown
Contributor

macroscopeapp Bot commented May 28, 2026

Approvability

Verdict: Needs human review

2 blocking correctness issues found. Schema changes add vault metadata columns and modify hidden media filtering across multiple endpoints and clustering jobs. An unresolved HIGH severity comment identifies a bug that permanently loses face-person associations for hidden media during clustering.

You can customize Macroscope's approvability policy. Learn more.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR strengthens “hidden/vault” privacy boundaries by adding vault state metadata to Media and ensuring hidden media is excluded from public-facing endpoints and background clustering/search behavior.

Changes:

  • Add vault_state, hidden_at, encrypted_at to Media plus vault persistence models (VaultConfig, VaultMetadata) and a corresponding Alembic migration.
  • Update API routes and workers to systematically exclude is_hidden media from gallery/people/clusters and clustering jobs.
  • Expand backend tests to cover hidden-media exclusion and to prevent leaking encryption material via vault listing.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
backend/tests/test_vault.py Adds test to ensure /api/vault/list response doesn’t expose encryption material.
backend/tests/test_search.py Enhances mocked search helper and adds tests for OCR boosting, similarity bounding, OCR inclusion, and hidden filtering.
backend/tests/test_people.py Expands seeding to support hidden media and adds tests to exclude hidden groups/media from people endpoints.
backend/tests/test_gallery.py Updates seeding for hidden media and adds tests ensuring hidden media isn’t exposed on gallery/media routes.
backend/tests/test_clusters.py Updates seeding and adds tests ensuring clusters exclude hidden members and hide hidden-only clusters.
backend/src/find_api/workers/jobs.py Excludes hidden media from image and face clustering jobs.
backend/src/find_api/routers/vault.py Persists additional vault metadata and sets/rolls back media vault timestamps/state.
backend/src/find_api/routers/people.py Skips people entries when no visible sample faces are available.
backend/src/find_api/routers/gallery.py Centralizes “public media” filtering and blocks hidden media from multiple public routes.
backend/src/find_api/routers/clusters.py Filters/adjusts cluster payloads to exclude hidden media, including member counts.
backend/src/find_api/models/vault.py Introduces SQLAlchemy models for vault config and per-media vault metadata.
backend/src/find_api/models/media.py Adds vault state and timestamps to the Media model.
backend/src/find_api/models/init.py Exposes vault models via package exports.
backend/src/find_api/core/database.py Registers vault models and creates/updates DB schema elements for new vault fields.
backend/alembic/versions/20260528_vault_state_metadata.py Adds migration for vault state/timestamps and vault metadata columns.

Comment on lines +322 to +326
item = response.json()[0]
assert "encrypted_path" not in item
assert "iv" not in item
assert "verifier_ciphertext" not in item
assert "salt" not in item
Comment thread backend/src/find_api/models/__init__.py Outdated
Comment on lines +13 to +20
"Media",
"Cluster",
"Face",
"Person",
"PersonFeedback",
"GeneralFeedback",
"VaultConfig",
"VaultMetadata",
Comment on lines +56 to +60
visible_member_count = (
db.query(Media.id)
.filter(Media.id.in_(cluster.member_ids or []), Media.is_hidden.is_(False))
.count()
)
Comment on lines +142 to +147
if not member_list:
raise HTTPException(404, "Cluster not found")

payload = _cluster_payload(cluster, members=member_list)
payload["member_count"] = len(member_list)
return payload
Comment on lines +313 to +316
count_sql = str(mock_db.execute.call_args_list[0].args[0])
result_sql = str(mock_db.execute.call_args_list[1].args[0])
assert "is_hidden = false" in count_sql
assert "is_hidden = false" in result_sql
@Abhash-Chakraborty Abhash-Chakraborty added gssoc26 Related to GirlScript Summer of Code 2026. level:advanced GSSoC difficulty level: advanced. Base contributor points: 55. backend FastAPI, database, storage, and API work api API contract, endpoint behavior, and response shape architecture High-level design decisions and technical direction privacy Data privacy, security boundaries, and user trust type:feature Feature PR. GSSoC type bonus. under-review Maintainer needs to verify do-not-merge Blocks merging. Remove this label only when the PR is fully ready for production. labels May 29, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/src/find_api/routers/clusters.py (1)

52-68: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Critical: Sample selection may omit visible members.

The current logic takes sample_ids from the first 5 entries in cluster.member_ids before filtering for visibility (line 52), then queries only those IDs for visible media (line 66). If the first 5 member IDs belong to hidden media, the samples list will be empty even when visible_member_count > 0, violating the contract that clusters with visible members should display visible samples.

Impact: A cluster with member_ids [hidden1, hidden2, hidden3, hidden4, hidden5, visible1, visible2] would show member_count: 2 but samples: [], breaking the UI thumbnail display.

🔧 Proposed fix: Select sample IDs from visible members
     result = []
     for cluster in clusters:
-        # Get sample images from cluster
-        sample_ids = (cluster.member_ids or [])[:5]
-        if not sample_ids:
-            continue
-
         visible_member_count = (
             db.query(Media.id)
             .filter(Media.id.in_(cluster.member_ids or []), Media.is_hidden.is_(False))
             .count()
         )
         if visible_member_count == 0:
             continue
 
+        # Get sample images from visible members only
+        sample_ids = (
+            db.query(Media.id)
+            .filter(Media.id.in_(cluster.member_ids or []), Media.is_hidden.is_(False))
+            .limit(5)
+            .all()
+        )
+        sample_ids = [row.id for row in sample_ids]
+
         sample_media = (
             db.query(Media)
             .filter(Media.id.in_(sample_ids), Media.is_hidden.is_(False))
             .all()
         )

This ensures samples are always drawn from the visible member pool.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/src/find_api/routers/clusters.py` around lines 52 - 68, The code
builds sample_ids from cluster.member_ids before filtering out hidden media,
which can yield empty samples even when visible_member_count > 0; change the
logic in the clusters router to first query for visible member IDs (e.g., query
Media.id filtering Media.id.in_(cluster.member_ids or []) and
Media.is_hidden.is_(False)), derive visible_ids and set visible_member_count =
len(visible_ids) (or use the DB count of that same query), then pick sample_ids
= visible_ids[:5] and finally query Media for those sample_ids to populate
sample_media; update references to sample_ids, visible_member_count, and
sample_media accordingly so samples always come from the visible member pool.
🧹 Nitpick comments (3)
backend/src/find_api/routers/gallery.py (1)

65-67: ⚡ Quick win

Move public-media DB helpers out of the router layer.

Line [65] to Line [74] adds persistence/query logic directly in gallery.py. Please move _public_media_query and _load_public_media_or_404 into an existing backend module/service and keep the router as a thin transport layer.

As per coding guidelines, **/*.py: "Keep FastAPI routers thin and place storage, queue, database, and ML logic in existing backend modules."

Also applies to: 69-74

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/src/find_api/routers/gallery.py` around lines 65 - 67, The router
contains persistence helpers _public_media_query and _load_public_media_or_404;
move these functions out of gallery.py into an existing backend
service/repository module (e.g., a media service) so the router is only a
transport layer, keeping their signatures (accepting db: Session and raising
HTTPException or returning Media) and using Media/Session as before; then update
gallery.py to import and call the moved functions (replace local definitions
with imports) and ensure any tests or callers reference the new module.
backend/src/find_api/routers/vault.py (2)

300-302: ⚡ Quick win

Extract hardcoded encryption parameters to constants.

The encryption algorithm and key derivation method are hardcoded string literals. Consider defining these as module-level constants or importing them from the crypto module to maintain a single source of truth and improve maintainability.

♻️ Proposed refactor to use constants

At the top of the file (after imports):

# Encryption metadata constants
ENCRYPTION_ALGORITHM = "AES-256-GCM"
KEY_DERIVATION_METHOD = "PBKDF2-HMAC-SHA256"

Then update the INSERT:

                 {
                     "media_id": media.id,
                     "encrypted_path": str(encrypted_path),
                     "iv": iv,
-                    "encryption_algorithm": "AES-256-GCM",
-                    "key_derivation": "PBKDF2-HMAC-SHA256",
+                    "encryption_algorithm": ENCRYPTION_ALGORITHM,
+                    "key_derivation": KEY_DERIVATION_METHOD,
                     "ciphertext_size": encrypted_path.stat().st_size,
                 },
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/src/find_api/routers/vault.py` around lines 300 - 302, Define
module-level constants for the encryption parameters and replace the hardcoded
literals: add ENCRYPTION_ALGORITHM = "AES-256-GCM" and KEY_DERIVATION_METHOD =
"PBKDF2-HMAC-SHA256" near the top of the vault module (after imports or import
them from the crypto module if available), then update the code that builds the
metadata dictionary (the keys "encryption_algorithm" and "key_derivation" in the
insertion/response code around the encrypted_path handling) to use
ENCRYPTION_ALGORITHM and KEY_DERIVATION_METHOD instead of the string literals so
the values are centralized and maintainable.

262-327: ⚖️ Poor tradeoff

Consider extracting hide_media logic to a service module.

The hide_media endpoint contains significant database, storage, and cryptography logic (~65 lines). As per coding guidelines, routers should be kept thin with storage, database, and other complex logic placed in dedicated backend modules.

Consider extracting the core hide operation logic to a service module (e.g., backend/src/find_api/services/vault_service.py) to improve testability, separation of concerns, and adherence to the thin-router pattern.

As per coding guidelines: "Keep routers thin; put storage, queue, database, and ML logic in their existing modules in backend code."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/src/find_api/routers/vault.py` around lines 262 - 327, The hide_media
route implements too much business logic (DB, storage, crypto) and should be
refactored into a service; extract the core operation into a new function e.g.,
vault_service.hide_media_operation(session_token, media_id) that encapsulates
resolving the master key (using _get_cached_master_key), loading media and
metadata (_load_media_or_404, _load_vault_metadata), downloading
(download_file_to_path), encrypting (encrypt_file), storing vault metadata (the
INSERT logic), marking media hidden and committing/rolling back, deleting
original (delete_file) and invoking _rollback_hidden_state_after_delete_failure
on delete errors, and cleaning up temp files and encrypted_path on errors; then
make the router hide_media only resolve session_token (using
_resolve_session_token), call
vault_service.hide_media_operation(payload.session_token or resolved token,
payload.media_id) and return the simple {"status":"hidden","media_id":...}
response, preserving original exception-to-HTTP mappings and transaction/cleanup
behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/alembic/versions/20260528_vault_state_metadata.py`:
- Around line 61-67: The backfill UPDATE uses a WHERE clause that will never
match because the new column vault_state is added with server_default="visible"
and nullable=False; change the migration so the correct values from is_hidden
are written: either move the UPDATE on vault_state (the op.execute(...) block)
to run before adding the vault_state column with server_default, or remove the
WHERE clause and perform an unconditional UPDATE of media setting vault_state =
CASE WHEN is_hidden THEN 'hidden_encrypted' ELSE 'visible' END so all rows are
updated; update the migration around the op.add_column call and the op.execute
call referencing vault_state and is_hidden accordingly.

In `@backend/src/find_api/core/database.py`:
- Around line 225-232: The UPDATE's WHERE clause never matches because
vault_state was created with DEFAULT 'visible'; change the condition to target
hidden rows that currently have the default/empty value so they become
'hidden_encrypted'. Modify the SQL in the UPDATE (the block shown in
backend/src/find_api/core/database.py) to use WHERE is_hidden = TRUE AND
(vault_state IS NULL OR vault_state = '' OR vault_state = 'visible') or simply
WHERE is_hidden = TRUE AND vault_state = 'visible' so hidden media are correctly
backfilled to 'hidden_encrypted' without touching intentionally set vault_state
values.

In `@backend/tests/test_clusters.py`:
- Around line 100-122: Add two unit tests for the cluster detail endpoint to
mirror the list tests: create a mixed cluster (visible + hidden) using
_seed_media and _seed_cluster, call client.get(f"/api/cluster/{cluster.id}") and
assert status_code == 200, assert response body['member_count'] == 1 and that
the visible sample id is present while the hidden sample id is not in response
body['samples']; then create a hidden-only cluster, call
client.get(f"/api/cluster/{cluster.id}") and assert it is not exposed (expect
status_code == 404 and/or response body indicates not found, matching list
behavior where hidden-only clusters produce total == 0). Ensure tests reference
the existing helper functions _seed_media and _seed_cluster and use the same
response field names ("member_count", "samples") used in the current tests.

In `@backend/tests/test_gallery.py`:
- Around line 295-310: Extend the existing
test_hidden_media_not_available_on_public_media_routes to also exercise the new
delete flows: call DELETE on /api/image/{hidden.id} (e.g.,
client.delete(f"/api/image/{hidden.id}")) and call POST on
/api/images/bulk-delete with a payload that includes the hidden.id (e.g.,
{"ids":[hidden.id]}) and assert both responses return 404; this mirrors the
existing detail/thumbnail/like/reprocess checks and ensures the DELETE
/api/image/{id} and POST /api/images/bulk-delete regressions are covered.

---

Outside diff comments:
In `@backend/src/find_api/routers/clusters.py`:
- Around line 52-68: The code builds sample_ids from cluster.member_ids before
filtering out hidden media, which can yield empty samples even when
visible_member_count > 0; change the logic in the clusters router to first query
for visible member IDs (e.g., query Media.id filtering
Media.id.in_(cluster.member_ids or []) and Media.is_hidden.is_(False)), derive
visible_ids and set visible_member_count = len(visible_ids) (or use the DB count
of that same query), then pick sample_ids = visible_ids[:5] and finally query
Media for those sample_ids to populate sample_media; update references to
sample_ids, visible_member_count, and sample_media accordingly so samples always
come from the visible member pool.

---

Nitpick comments:
In `@backend/src/find_api/routers/gallery.py`:
- Around line 65-67: The router contains persistence helpers _public_media_query
and _load_public_media_or_404; move these functions out of gallery.py into an
existing backend service/repository module (e.g., a media service) so the router
is only a transport layer, keeping their signatures (accepting db: Session and
raising HTTPException or returning Media) and using Media/Session as before;
then update gallery.py to import and call the moved functions (replace local
definitions with imports) and ensure any tests or callers reference the new
module.

In `@backend/src/find_api/routers/vault.py`:
- Around line 300-302: Define module-level constants for the encryption
parameters and replace the hardcoded literals: add ENCRYPTION_ALGORITHM =
"AES-256-GCM" and KEY_DERIVATION_METHOD = "PBKDF2-HMAC-SHA256" near the top of
the vault module (after imports or import them from the crypto module if
available), then update the code that builds the metadata dictionary (the keys
"encryption_algorithm" and "key_derivation" in the insertion/response code
around the encrypted_path handling) to use ENCRYPTION_ALGORITHM and
KEY_DERIVATION_METHOD instead of the string literals so the values are
centralized and maintainable.
- Around line 262-327: The hide_media route implements too much business logic
(DB, storage, crypto) and should be refactored into a service; extract the core
operation into a new function e.g.,
vault_service.hide_media_operation(session_token, media_id) that encapsulates
resolving the master key (using _get_cached_master_key), loading media and
metadata (_load_media_or_404, _load_vault_metadata), downloading
(download_file_to_path), encrypting (encrypt_file), storing vault metadata (the
INSERT logic), marking media hidden and committing/rolling back, deleting
original (delete_file) and invoking _rollback_hidden_state_after_delete_failure
on delete errors, and cleaning up temp files and encrypted_path on errors; then
make the router hide_media only resolve session_token (using
_resolve_session_token), call
vault_service.hide_media_operation(payload.session_token or resolved token,
payload.media_id) and return the simple {"status":"hidden","media_id":...}
response, preserving original exception-to-HTTP mappings and transaction/cleanup
behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 142f37bf-c08b-479f-bb84-ce24f0f8aaea

📥 Commits

Reviewing files that changed from the base of the PR and between 44a7d3a and fa429f9.

📒 Files selected for processing (15)
  • backend/alembic/versions/20260528_vault_state_metadata.py
  • backend/src/find_api/core/database.py
  • backend/src/find_api/models/__init__.py
  • backend/src/find_api/models/media.py
  • backend/src/find_api/models/vault.py
  • backend/src/find_api/routers/clusters.py
  • backend/src/find_api/routers/gallery.py
  • backend/src/find_api/routers/people.py
  • backend/src/find_api/routers/vault.py
  • backend/src/find_api/workers/jobs.py
  • backend/tests/test_clusters.py
  • backend/tests/test_gallery.py
  • backend/tests/test_people.py
  • backend/tests/test_search.py
  • backend/tests/test_vault.py

Comment thread backend/alembic/versions/20260528_vault_state_metadata.py
Comment thread backend/src/find_api/core/database.py
Comment thread backend/tests/test_clusters.py
Comment on lines +295 to +310
def test_hidden_media_not_available_on_public_media_routes(self, client, db):
hidden = _seed(
db, filename="hidden-detail.jpg", status="indexed", is_hidden=True
)

detail_response = client.get(f"/api/image/{hidden.id}")
thumbnail_response = client.get(
f"/api/image/{hidden.id}/thumbnail", follow_redirects=False
)
like_response = client.post(f"/api/image/{hidden.id}/like")
reprocess_response = client.post(f"/api/image/{hidden.id}/reprocess")

assert detail_response.status_code == 404
assert thumbnail_response.status_code == 404
assert like_response.status_code == 404
assert reprocess_response.status_code == 404
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Add hidden-media coverage for delete endpoints.

This covers detail/thumbnail/like/reprocess, but the PR also changes DELETE /api/image/{id} and POST /api/images/bulk-delete to reject hidden media from public flows. A focused hidden-ID case there would close the main regression gap.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/tests/test_gallery.py` around lines 295 - 310, Extend the existing
test_hidden_media_not_available_on_public_media_routes to also exercise the new
delete flows: call DELETE on /api/image/{hidden.id} (e.g.,
client.delete(f"/api/image/{hidden.id}")) and call POST on
/api/images/bulk-delete with a payload that includes the hidden.id (e.g.,
{"ids":[hidden.id]}) and assert both responses return 404; this mirrors the
existing detail/thumbnail/like/reprocess checks and ensures the DELETE
/api/image/{id} and POST /api/images/bulk-delete regressions are covered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api API contract, endpoint behavior, and response shape architecture High-level design decisions and technical direction backend FastAPI, database, storage, and API work do-not-merge Blocks merging. Remove this label only when the PR is fully ready for production. gssoc26 Related to GirlScript Summer of Code 2026. level:advanced GSSoC difficulty level: advanced. Base contributor points: 55. privacy Data privacy, security boundaries, and user trust type:feature Feature PR. GSSoC type bonus. under-review Maintainer needs to verify

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: add backend vault metadata and hidden media flags

3 participants