fix(api): tolerate undecryptable rows in /api/settings/keys#52
Conversation
When ENCRYPTION_KEY is rotated (or a DB is restored from another env), existing api_key.* rows can no longer be decrypted. The previous implementation called decrypt() inside a list comprehension, so a single InvalidToken would raise out of the endpoint and return HTTP 500 — which in turn blanked the entire Settings page in the UI. Catch the exception per row, log a warning, and surface the row with `decryption_error: true` and a masked placeholder. The UI can then prompt re-entry instead of disappearing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Improves robustness of the Settings API by preventing a single undecryptable encrypted setting from crashing GET /api/settings/keys, enabling the Settings UI to load even when stale/invalid encrypted rows exist (e.g., after ENCRYPTION_KEY rotation).
Changes:
- Update
GET /api/settings/keysto handle decryption failures per-row, returning a masked placeholder anddecryption_error: trueinstead of raising. - Add logging for failed decrypt attempts in
list_keys. - Add a unit test that seeds an undecryptable row and asserts the endpoint stays HTTP 200 and flags the broken entry.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/api/settings.py |
Wrap per-row decryption in try/except; return placeholder + decryption_error on failure. |
tests/test_settings_api.py |
Add coverage ensuring /api/settings/keys tolerates an undecryptable encrypted row. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| try: | ||
| entry["masked"] = _mask_key(decrypt(k.value)) | ||
| except Exception as e: | ||
| # Decryption typically fails when ENCRYPTION_KEY has been rotated | ||
| # since this row was written. Surface the row in an error state | ||
| # so the UI can prompt re-entry, instead of crashing the whole | ||
| # endpoint and blanking the settings page. | ||
| logger.warning("Failed to decrypt %s: %s", k.key, e) | ||
| entry["masked"] = "****" | ||
| entry["decryption_error"] = True |
There was a problem hiding this comment.
The except Exception here will also swallow non-rotation failures (e.g., decrypt() raises RuntimeError when ENCRYPTION_KEY is unset/invalid in src/services/crypto.py), causing this endpoint to return 200 with placeholder data and potentially masking a fatal misconfiguration. Consider narrowing the exception to the specific decryption failures you want to tolerate (e.g., cryptography.fernet.InvalidToken, possibly decode/type errors) and letting unexpected exceptions propagate (or be handled separately) so real bugs/config issues aren’t hidden.
tests/test_settings_api.py
Outdated
| # Insert a row with raw bytes that are NOT a valid Fernet token. | ||
| # Use delete-then-insert in case a previous test left a google row behind. | ||
| from sqlalchemy import delete | ||
|
|
||
| async with async_session() as s: | ||
| await s.execute(delete(Setting).where(Setting.key == "api_key.google")) | ||
| s.add(Setting(key="api_key.google", value="not-a-valid-fernet-token", encrypted=True)) | ||
| await s.commit() |
There was a problem hiding this comment.
This test inserts value="not-a-valid-fernet-token" with encrypted=True, which isn’t actually “encrypted with a different ENCRYPTION_KEY” (it’s not a valid Fernet token at all). To better match the scenario described (key rotation), consider generating a syntactically valid Fernet token using a different Fernet key and inserting that instead, so the failure mode is specifically “valid token but cannot be decrypted with the current key.”
- Narrow exception in list_keys from `Exception` to `InvalidToken` so config errors (e.g. RuntimeError from missing ENCRYPTION_KEY in src/services/crypto.py) propagate instead of being silently masked as a "decryption_error" row - Replace the placeholder `not-a-valid-fernet-token` test fixture with a real Fernet token generated from a *different* key, so the test exercises the actual key-rotation scenario (valid token, signature fails) rather than a malformed-input path - Re-seed `api_key.openai` with a fresh test-key token at the start of the tolerance test so the assertion isn't poisoned by stale rows in a shared dev DB Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Addressed both Copilot review comments in faa4dd0:
|
Failed jobs now persist a structured error_detail JSON blob alongside the
user-facing error_message. The keyword-based translation in
classify_error() loses information and can mistranslate (e.g. an
InvalidToken decrypt failure surfaces as "Model not found"); the new
field keeps the original exception class, raw message, stage, provider,
model, and timestamp so users can debug from the History page instead of
reading server logs.
- Add Job.error_detail Text column + Alembic migration
- src/errors.py: build_error_detail / serialize_error_detail / parse_error_detail
- transcribe.py: populate at refine/verify/metadata catch sites; resolve
provider_name and refine/metadata model BEFORE the try block so the
catch site does not re-query a possibly degraded session
- jobs.py: populate at top-level _process_job and _generate_meta_job
catch sites; expose error_detail in GET /api/jobs/{id}
- templating.py: register parse_error_detail Jinja filter
- job_status.html: add <details> "Show details" affordance
- i18n: history.show_details (en/ja)
- tests: 4 new tests covering field preservation, InvalidToken
distinction (#52 scenario), serialize/parse round-trip, and full
pipeline catch path
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… (#56) * feat(observability): preserve raw API error in error_detail field (#54) Failed jobs now persist a structured error_detail JSON blob alongside the user-facing error_message. The keyword-based translation in classify_error() loses information and can mistranslate (e.g. an InvalidToken decrypt failure surfaces as "Model not found"); the new field keeps the original exception class, raw message, stage, provider, model, and timestamp so users can debug from the History page instead of reading server logs. - Add Job.error_detail Text column + Alembic migration - src/errors.py: build_error_detail / serialize_error_detail / parse_error_detail - transcribe.py: populate at refine/verify/metadata catch sites; resolve provider_name and refine/metadata model BEFORE the try block so the catch site does not re-query a possibly degraded session - jobs.py: populate at top-level _process_job and _generate_meta_job catch sites; expose error_detail in GET /api/jobs/{id} - templating.py: register parse_error_detail Jinja filter - job_status.html: add <details> "Show details" affordance - i18n: history.show_details (en/ja) - tests: 4 new tests covering field preservation, InvalidToken distinction (#52 scenario), serialize/parse round-trip, and full pipeline catch path Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(observability): address Copilot review on PR #56 - _process_job: use get_provider_name(job.provider) for error_detail.provider so internal "whisper" doesn't leak through where other catch sites record the user-facing "openai". - _generate_meta_job: pre-compute provider_name and resolve effective model before the try block, then record exactly that pair in error_detail. - transcribe.py metadata step: pass the resolved metadata_model to _run_metadata_generation so the call, cost log, and error_detail all reference the same model (was re-querying _get_model internally). - errors.build_error_detail: emit occurred_at as second-precision ISO-8601 with Z suffix (e.g. 2026-04-07T15:32:18Z) instead of microsecond +00:00 for predictable UI rendering. - tests: assert the new occurred_at format. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Why
Discovered while debugging: the dev server's Settings page rendered blank because `/api/settings/keys` returned 500. Root cause was a stale `api_key.openai` row that had been encrypted with a different `ENCRYPTION_KEY` and could no longer be decrypted (likely from a previous `.env` rotation).
The previous code:
```python
return [{"masked": _mask_key(decrypt(k.value)), ...} for k in keys]
```
A single `Fernet.InvalidToken` would raise out of the comprehension → 500 → entire Settings page unusable.
This is also the most likely root cause of the misleading "Model not found. Check Settings → LLM Models." error users see when an `api_key.*` row is undecryptable: `_get_credential` raises, the heuristic in `errors.py:131-137` mismatches, and the user gets a wrong direction.
Test plan
Follow-up
🤖 Generated with Claude Code