Skip to content

feat: refactor item metadata database#73

Merged
aturret merged 7 commits intomainfrom
message-database-refactor
Mar 30, 2026
Merged

feat: refactor item metadata database#73
aturret merged 7 commits intomainfrom
message-database-refactor

Conversation

@aturret
Copy link
Copy Markdown
Owner

@aturret aturret commented Mar 30, 2026

Summary by CodeRabbit

  • New Features

    • MongoDB-backed cache for scraped content with configurable TTL and forced-refresh option.
    • New per-user setting “Force refresh cache” and UI toggle in /settings; bot buttons and enqueueing propagate this flag.
  • Configuration

    • Added DATABASE_CACHE_TTL (default 86400s) and optional MongoDB auth/URL env vars; docker-compose/template.env updated.
    • Alembic migration to add force_refresh_cache column and shared optional mongodb dependency group.
  • Tests

    • Added unit tests covering cache, connection, lifecycle, and settings behavior.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 30, 2026

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

Adds a MongoDB-backed scraped-content cache with TTL and versioning, moves MongoDB utilities into the shared package, adds per-user force-refresh cache toggles, and wires cache behavior through API, async-worker, and Telegram handlers with migration and tests.

Changes

Cohort / File(s) Summary
MongoDB Cache Infrastructure
packages/shared/fastfetchbot_shared/database/mongodb/__init__.py, packages/shared/fastfetchbot_shared/database/mongodb/connection.py, packages/shared/fastfetchbot_shared/database/mongodb/cache.py, packages/shared/fastfetchbot_shared/database/mongodb/models/metadata.py
New shared MongoDB package: connection lifecycle (init_mongodb, close_mongodb), Beanie Metadata document + DatabaseMediaFile, save/find cache helpers with TTL and versioned inserts, and shared save_instances.
Shared Scraper Core
packages/shared/fastfetchbot_shared/services/scrapers/common.py
InfoExtractService gains database_cache_ttl and performs optional pre-scrape cache lookup returning cached payloads marked _cached.
API Refactoring
apps/api/src/database.py, apps/api/src/models/database_model.py, apps/api/src/services/scrapers/common.py, apps/api/src/config.py, apps/api/pyproject.toml
API now delegates MongoDB init/save to shared package, re-exports shared Metadata/document_list, adds DATABASE_CACHE_TTL setting, and depends on fastfetchbot-shared[mongodb] (removed direct beanie).
Async Worker Integration
apps/async-worker/async_worker/config.py, apps/async-worker/async_worker/main.py, apps/async-worker/async_worker/services/enrichment.py, apps/async-worker/async_worker/tasks/scrape.py, apps/async-worker/pyproject.toml
Worker settings include DATABASE_CACHE_TTL and optional Mongo credentials; ARQ worker lifecycle hooks initialize/close MongoDB; enrich() may persist via save_metadata; scrape_and_enrich() accepts force_refresh_cache to bypass cache and skips enrichment when _cached is returned.
User Settings System & Migration
packages/shared/fastfetchbot_shared/database/models/user_setting.py, packages/shared/alembic/versions/0002_add_force_refresh_cache_column.py, apps/telegram-bot/core/services/user_settings.py
Adds force_refresh_cache column to UserSetting with Alembic migration and async service helpers get_force_refresh_cache / toggle_force_refresh_cache (default read False).
Telegram Bot Handlers
apps/telegram-bot/core/handlers/buttons.py, apps/telegram-bot/core/handlers/commands.py, apps/telegram-bot/core/handlers/url_process.py
Propagates per-user force_refresh_cache into button flows, /settings UI now shows/toggles force-refresh, and URL processing/auto-fetch forwards the flag to enqueue or API calls.
Tests
tests/unit/database/mongodb/*, tests/unit/async_worker/*, tests/unit/scrapers/test_common_cache.py, tests/unit/telegram_bot/*
Adds comprehensive unit tests for MongoDB connection/cache/models, worker lifecycle, enrichment/scrape cache behavior, shared scraper cache logic, Telegram handlers, and user-settings toggle behavior.
Configuration & Docs
CLAUDE.md, docker-compose.template.yml, template.env, packages/shared/pyproject.toml
Documents MongoDB config and TTL semantics, adds optional mongodb extras (beanie,motor), updates compose/env templates for optional MongoDB service and DATABASE_* variables.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Service as InfoExtractService
    participant Cache as MongoDB Cache
    participant Scraper as Scraper
    participant DB as MongoDB Store

    Client->>Service: get_item(url)
    alt Database caching enabled & database_cache_ttl >= 0
        Service->>Cache: find_cached(url, ttl_seconds)
        alt Cache hit
            Cache-->>Service: Metadata {_cached: True}
            Service-->>Client: return cached metadata
        else Cache miss or expired
            Service->>Scraper: scrape URL
            Scraper-->>Service: raw metadata
            Service->>DB: save_metadata(metadata)
            DB-->>Service: saved Metadata (version N)
            Service-->>Client: return fresh metadata
        end
    else Caching disabled or force refresh (ttl < 0)
        Service->>Scraper: scrape URL
        Scraper-->>Service: raw metadata
        Service-->>Client: return fresh metadata
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Poem

🐇 In burrows deep the rabbit reads the log,
A cache with versions snug beneath the bog.
TTL ticks softly, users choose to force,
Shared code unites the scraper's winding course.
Hooray for keys and tests and migration cheer—🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.78% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: refactor item metadata database' accurately describes the primary change: a comprehensive refactor of the database layer for storing scraped item metadata, moving from local implementations to a shared MongoDB layer with versioning, caching, and user settings management.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch message-database-refactor

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 30, 2026

Codecov Report

❌ Patch coverage is 94.41341% with 10 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
apps/telegram-bot/core/handlers/url_process.py 33.33% 4 Missing ⚠️
...fastfetchbot_shared/database/mongodb/connection.py 87.50% 4 Missing ⚠️
apps/async-worker/async_worker/config.py 83.33% 1 Missing ⚠️
...etchbot_shared/database/mongodb/models/metadata.py 97.61% 1 Missing ⚠️

Impacted file tree graph

@@           Coverage Diff           @@
##             main      #73   +/-   ##
=======================================
  Coverage        ?   79.98%           
=======================================
  Files           ?       67           
  Lines           ?     3447           
  Branches        ?        0           
=======================================
  Hits            ?     2757           
  Misses          ?      690           
  Partials        ?        0           
Files with missing lines Coverage Δ
apps/async-worker/async_worker/main.py 100.00% <100.00%> (ø)
...s/async-worker/async_worker/services/enrichment.py 100.00% <100.00%> (ø)
apps/async-worker/async_worker/tasks/scrape.py 100.00% <100.00%> (ø)
apps/telegram-bot/core/handlers/buttons.py 50.76% <100.00%> (ø)
apps/telegram-bot/core/handlers/commands.py 100.00% <100.00%> (ø)
apps/telegram-bot/core/services/user_settings.py 100.00% <100.00%> (ø)
...astfetchbot_shared/database/models/user_setting.py 100.00% <100.00%> (ø)
...ared/fastfetchbot_shared/database/mongodb/cache.py 100.00% <100.00%> (ø)
...ed/fastfetchbot_shared/services/scrapers/common.py 95.23% <100.00%> (ø)
apps/async-worker/async_worker/config.py 96.42% <83.33%> (ø)
... and 3 more
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (8)
apps/api/src/services/scrapers/common.py (1)

78-82: Consider adding error handling around save_metadata() for resilience.

Unlike the async-worker's enrichment.py which wraps save_metadata() in a try/except block (logging errors but continuing), this call has no error handling. A database failure would cause the entire request to fail rather than returning successfully-scraped data.

If strict persistence is required, the current behavior is fine. Otherwise, consider wrapping for resilience:

♻️ Suggested fix for resilient persistence
         if self.store_database:
             logger.info("store in database")
-            from fastfetchbot_shared.database.mongodb.cache import save_metadata
-
-            await save_metadata(metadata_item)
+            try:
+                from fastfetchbot_shared.database.mongodb.cache import save_metadata
+                await save_metadata(metadata_item)
+            except Exception as e:
+                logger.error(f"Error saving to database: {e}")
         return metadata_item
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/api/src/services/scrapers/common.py` around lines 78 - 82, The call to
await save_metadata(metadata_item) inside the store_database branch lacks error
handling and can fail the whole request; wrap the await
save_metadata(metadata_item) call in a try/except around the block guarded by
self.store_database (the same place that imports save_metadata) and on exception
log the error via logger.error including context (e.g., metadata_item or a short
identifier) and continue without re-raising so scraping proceeds even if DB
persistence fails.
packages/shared/fastfetchbot_shared/database/mongodb/__init__.py (1)

15-23: Consider sorting __all__ alphabetically.

Ruff flags that __all__ is not sorted (RUF022). While functional, sorting improves readability and aligns with isort conventions.

♻️ Suggested fix
 __all__ = [
+    "DatabaseMediaFile",
+    "Metadata",
+    "close_mongodb",
+    "find_cached",
     "init_mongodb",
-    "close_mongodb",
     "save_instances",
-    "find_cached",
     "save_metadata",
-    "DatabaseMediaFile",
-    "Metadata",
 ]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/shared/fastfetchbot_shared/database/mongodb/__init__.py` around
lines 15 - 23, The __all__ export list is not alphabetically sorted (RUF022);
reorder the entries inside __all__ so they are in alphabetical order (e.g.,
DatabaseMediaFile, Metadata, close_mongodb, find_cached, init_mongodb,
save_instances, save_metadata) to satisfy Ruff and improve readability; update
the __all__ variable in the module where it is defined.
apps/async-worker/async_worker/tasks/scrape.py (1)

86-88: Prefer logger.exception() over manual traceback.format_exc().

Per coding guidelines, use logger.exception() which automatically includes the traceback, rather than manually formatting it.

♻️ Suggested fix
     except Exception as e:
-        logger.error(f"[{job_id}] Scrape failed: {e}")
-        logger.error(traceback.format_exc())
+        logger.exception(f"[{job_id}] Scrape failed: {e}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/async-worker/async_worker/tasks/scrape.py` around lines 86 - 88, Replace
the manual traceback logging in the except block by calling logger.exception()
so the traceback is logged automatically; specifically, where you currently call
logger.error(f"[{job_id}] Scrape failed: {e}") followed by
logger.error(traceback.format_exc()), remove the traceback.format_exc() call and
use logger.exception(f"[{job_id}] Scrape failed") (or similar) to log the
exception and stack trace in one call — reference the logger object, the job_id
variable, and the existing except Exception as e block to locate and update the
code.
tests/unit/async_worker/test_enrichment_database.py (1)

71-85: Make the MongoDB failure test enforce the logging contract.

Right now this uses RuntimeError and only checks that enrich() returns. The test will still pass if the exception is swallowed without a log entry. Please switch to the shared external-service exception and assert logger.exception() or logger.error() is emitted.

As per coding guidelines, "Use custom typed exceptions (ScraperError, ScraperNetworkError, ScraperParseError, TelegraphPublishError, FileExportError, ExternalServiceError) instead of generic RuntimeError, ValueError, or Exception for domain errors" and "Never silently swallow exceptions — if catching an exception, either re-raise it or handle it explicitly with logging".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/async_worker/test_enrichment_database.py` around lines 71 - 85,
Replace the generic RuntimeError in test_mongodb_error_does_not_crash with the
shared domain exception (ExternalServiceError) and update the assertions to
verify the logging contract: patch
"fastfetchbot_shared.database.mongodb.cache.save_metadata" to raise
ExternalServiceError("MongoDB connection failed"), then capture logging (e.g.,
caplog or patch the module's logger) and assert that logger.exception() or
logger.error() was called with the error when calling enrich(...) with
store_database=True; keep the existing assert that result["title"] == "Test
Title" to ensure behavior remains unchanged.
tests/unit/scrapers/test_common_cache.py (1)

112-130: Model the cache-failure path with the shared exception type and assert the log.

Using RuntimeError here drifts from the production error contract, and this test still passes if get_item() swallows the cache exception silently. Please raise the same shared custom exception the cache layer emits and assert logger.exception()/logger.error() on the fallback path.

As per coding guidelines, "Use custom typed exceptions (ScraperError, ScraperNetworkError, ScraperParseError, TelegraphPublishError, FileExportError, ExternalServiceError) instead of generic RuntimeError, ValueError, or Exception for domain errors" and "Never silently swallow exceptions — if catching an exception, either re-raise it or handle it explicitly with logging".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/scrapers/test_common_cache.py` around lines 112 - 130, Replace the
generic RuntimeError in the test_cache_error_proceeds_to_scrape patch of
fastfetchbot_shared.database.mongodb.cache.find_cached with the shared
cache-layer exception type used in production (e.g., ExternalServiceError or the
specific shared exception your cache module raises), and add an assertion that
the service logs the cache failure (assert logger.exception(...) or
logger.error(...) was called) before falling back to scraper.get_item; locate
the test function test_cache_error_proceeds_to_scrape, the make_service-created
svc, the patched symbol fastfetchbot_shared.database.mongodb.cache.find_cached,
and the mock_scraper_instance.get_item to implement these changes.
CLAUDE.md (1)

181-204: Add blank lines around tables for proper Markdown rendering.

Per Markdown best practices and the static analysis hints, tables should be surrounded by blank lines.

📝 Suggested fix
 ### Database
 
 **MongoDB** (scraped content — optional, feature-gated):
+
 | Variable | Default | Description |
 |----------|---------|-------------|
 ...
 | `MONGODB_URL` | derived | Full MongoDB URI. Overrides host/port/credentials if set explicitly |
+
 MongoDB models and connection logic live in...
 
 **SQLite/PostgreSQL** (user settings — always enabled for the Telegram bot):
+
 | Variable | Default | Description |
 |----------|---------|-------------|
 | `SETTINGS_DATABASE_URL` | `sqlite+aiosqlite:///data/fastfetchbot.db` | SQLAlchemy connection URL. Use `postgresql+asyncpg://...` for production |
+
 Alembic migrations live in...
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CLAUDE.md` around lines 181 - 204, Add blank lines before and after each
Markdown table block so they render correctly: put an empty line above the
MongoDB table (the section starting with "**MongoDB** (scraped content —
optional, feature-gated):" and the table rows containing `DATABASE_ON`,
`DATABASE_CACHE_TTL`, etc.) and another empty line after the table; do the same
for the "SQLite/PostgreSQL" table (the section with `SETTINGS_DATABASE_URL`).
Ensure no other content sits directly adjacent to the table delimiters so both
tables are separated by blank lines from surrounding paragraphs and code blocks.
tests/unit/database/mongodb/test_cache.py (1)

45-50: Redundant assignment can be cleaned up.

Line 48 assigns a value that is immediately overwritten by line 50. The comment suggests this was a transitional approach.

♻️ Suggested cleanup
         with patch(
             "fastfetchbot_shared.database.mongodb.cache.Metadata"
         ) as MockMetadata:
-            MockMetadata.find.return_value = mock_find.sort.return_value.__class__()
-            # Simpler approach: patch the full chain
             MockMetadata.find.return_value = mock_find
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/database/mongodb/test_cache.py` around lines 45 - 50, Remove the
redundant assignment to MockMetadata.find.return_value that is immediately
overwritten: delete the line setting MockMetadata.find.return_value =
mock_find.sort.return_value.__class__() and the transitional comment, and keep
the simpler direct patch MockMetadata.find.return_value = mock_find so that the
test uses the intended mocked chain (references: MockMetadata, mock_find,
Metadata.find).
packages/shared/fastfetchbot_shared/database/mongodb/cache.py (1)

13-43: Replace datetime.utcnow() with datetime.now(timezone.utc) for Python 3.12+ compatibility.

datetime.utcnow() is deprecated in Python 3.12+ and will be removed in a future version. While the current implementation works, using timezone-aware datetimes prevents future compatibility issues.

♻️ Suggested improvement
-from datetime import datetime, timedelta
+from datetime import datetime, timedelta, timezone
 from typing import Optional
 
 from fastfetchbot_shared.database.mongodb.models.metadata import Metadata
 from fastfetchbot_shared.utils.logger import logger
 
 
 async def find_cached(url: str, ttl_seconds: int) -> Optional[Metadata]:
     ...
     # ttl_seconds == 0 means never expire
     if ttl_seconds != 0:
-        age = datetime.utcnow() - doc.timestamp
+        age = datetime.now(timezone.utc) - doc.timestamp.replace(tzinfo=timezone.utc)
         if age > timedelta(seconds=ttl_seconds):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/shared/fastfetchbot_shared/database/mongodb/cache.py` around lines
13 - 43, find_cached uses datetime.utcnow() which is deprecated; change it to a
timezone-aware now() call (e.g. datetime.now(timezone.utc)) and update imports
to include timezone from datetime; also ensure the stored doc.timestamp is
timezone-aware or convert it to UTC-aware before subtracting (e.g.
doc.timestamp.astimezone(timezone.utc) or otherwise normalize) so the age
calculation in find_cached remains valid.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/async-worker/async_worker/config.py`:
- Around line 39-43: The MongoDB connection string is built by interpolating
MONGODB_USERNAME and MONGODB_PASSWORD directly into MONGODB_URL in config.py,
which will break for reserved URI characters; fix by importing
urllib.parse.quote_plus and percent-encoding the username and password (e.g.,
username = quote_plus(self.MONGODB_USERNAME), password =
quote_plus(self.MONGODB_PASSWORD)) before constructing self.MONGODB_URL using
the existing MONGODB_HOST and MONGODB_PORT, leaving the no-credentials branch
unchanged.

In `@CLAUDE.md`:
- Line 23: Replace the Unicode replacement characters in the snippet "���──
services/" with the intended box-drawing character (e.g., change to "├──
services/") or a plain ASCII alternative if the original glyph is unknown, then
save the CLAUDE.md file with UTF-8 encoding (no BOM) to prevent reintroduction
of the replacement character; verify the change by opening the file in a
UTF-8-aware editor and committing the fix.

In `@packages/shared/fastfetchbot_shared/database/mongodb/cache.py`:
- Around line 46-73: In save_metadata, validate the incoming metadata_item's url
before using it as the cache key: ensure metadata_item contains a non-empty,
well-formed URL (e.g., check metadata_item.get("url") is present and not just
empty/whitespace, optionally strip and basic-validate scheme/host) and raise a
clear exception or return an error when invalid; then proceed to compute
latest/version and insert into Metadata as before. Update the function's
behavior (and docstring) to explicitly reject or handle missing/invalid urls to
avoid collisions on the empty-string key.

In `@packages/shared/fastfetchbot_shared/database/mongodb/connection.py`:
- Around line 34-36: The code assumes list `instances` has at least one element
before doing `instance_type = type(instances[0])` and calling `await
instance_type.insert_many(instances)` which will raise IndexError for an empty
list; add a guard that checks `if isinstance(instances, list)` then if `not
instances` either return early (no-op) or raise a clear ValueError, otherwise
compute `instance_type = type(instances[0])` and call `await
instance_type.insert_many(instances)` so `insert_many` is never invoked with an
empty list.

In `@packages/shared/fastfetchbot_shared/database/mongodb/models/metadata.py`:
- Around line 57-59: Replace the fragile assert in Metadata.from_dict with an
explicit type check: in the from_dict method of the Metadata class, if obj is
not an instance of dict raise a typed exception from
fastfetchbot_shared.exceptions (use FastFetchBotError or add a ValidationError
subclass of FastFetchBotError) instead of using assert; update the unit test
that currently expects AssertionError (lines ~98-100) to expect the new typed
exception. Ensure the raised exception includes a clear message like "Expected
dict for Metadata.from_dict" and import the chosen exception class into the
metadata module.
- Around line 29-31: The Metadata model's fields text_length and content_length
lack explicit defaults so Pydantic v2 requires them at construction (causing
Metadata(**obj) in from_dict() to fail); update the field definitions in the
Metadata class to provide defaults (e.g., change text_length: Optional[int] =
Field(ge=0) and content_length: Optional[int] = Field(ge=0) to text_length:
Optional[int] = Field(default=None, ge=0) and content_length: Optional[int] =
Field(default=None, ge=0)) so from_dict() can construct Metadata objects and
then let prepare_for_insert() compute/override these values.

In `@packages/shared/fastfetchbot_shared/services/scrapers/common.py`:
- Line 92: The call to cached.model_dump is excluding a non-existent field
revision_id; update the model_dump call in the cached handling (the
cached.model_dump(...) invocation) to exclude only {"id"} instead of {"id",
"revision_id"}, and run/update the unit test (test_common_cache) to assert the
serialized output no longer expects revision_id to be excluded (i.e., adjust
expectations to match exclusion of only id). Ensure this aligns with the
Metadata model which uses version rather than revision_id.

---

Nitpick comments:
In `@apps/api/src/services/scrapers/common.py`:
- Around line 78-82: The call to await save_metadata(metadata_item) inside the
store_database branch lacks error handling and can fail the whole request; wrap
the await save_metadata(metadata_item) call in a try/except around the block
guarded by self.store_database (the same place that imports save_metadata) and
on exception log the error via logger.error including context (e.g.,
metadata_item or a short identifier) and continue without re-raising so scraping
proceeds even if DB persistence fails.

In `@apps/async-worker/async_worker/tasks/scrape.py`:
- Around line 86-88: Replace the manual traceback logging in the except block by
calling logger.exception() so the traceback is logged automatically;
specifically, where you currently call logger.error(f"[{job_id}] Scrape failed:
{e}") followed by logger.error(traceback.format_exc()), remove the
traceback.format_exc() call and use logger.exception(f"[{job_id}] Scrape
failed") (or similar) to log the exception and stack trace in one call —
reference the logger object, the job_id variable, and the existing except
Exception as e block to locate and update the code.

In `@CLAUDE.md`:
- Around line 181-204: Add blank lines before and after each Markdown table
block so they render correctly: put an empty line above the MongoDB table (the
section starting with "**MongoDB** (scraped content — optional, feature-gated):"
and the table rows containing `DATABASE_ON`, `DATABASE_CACHE_TTL`, etc.) and
another empty line after the table; do the same for the "SQLite/PostgreSQL"
table (the section with `SETTINGS_DATABASE_URL`). Ensure no other content sits
directly adjacent to the table delimiters so both tables are separated by blank
lines from surrounding paragraphs and code blocks.

In `@packages/shared/fastfetchbot_shared/database/mongodb/__init__.py`:
- Around line 15-23: The __all__ export list is not alphabetically sorted
(RUF022); reorder the entries inside __all__ so they are in alphabetical order
(e.g., DatabaseMediaFile, Metadata, close_mongodb, find_cached, init_mongodb,
save_instances, save_metadata) to satisfy Ruff and improve readability; update
the __all__ variable in the module where it is defined.

In `@packages/shared/fastfetchbot_shared/database/mongodb/cache.py`:
- Around line 13-43: find_cached uses datetime.utcnow() which is deprecated;
change it to a timezone-aware now() call (e.g. datetime.now(timezone.utc)) and
update imports to include timezone from datetime; also ensure the stored
doc.timestamp is timezone-aware or convert it to UTC-aware before subtracting
(e.g. doc.timestamp.astimezone(timezone.utc) or otherwise normalize) so the age
calculation in find_cached remains valid.

In `@tests/unit/async_worker/test_enrichment_database.py`:
- Around line 71-85: Replace the generic RuntimeError in
test_mongodb_error_does_not_crash with the shared domain exception
(ExternalServiceError) and update the assertions to verify the logging contract:
patch "fastfetchbot_shared.database.mongodb.cache.save_metadata" to raise
ExternalServiceError("MongoDB connection failed"), then capture logging (e.g.,
caplog or patch the module's logger) and assert that logger.exception() or
logger.error() was called with the error when calling enrich(...) with
store_database=True; keep the existing assert that result["title"] == "Test
Title" to ensure behavior remains unchanged.

In `@tests/unit/database/mongodb/test_cache.py`:
- Around line 45-50: Remove the redundant assignment to
MockMetadata.find.return_value that is immediately overwritten: delete the line
setting MockMetadata.find.return_value = mock_find.sort.return_value.__class__()
and the transitional comment, and keep the simpler direct patch
MockMetadata.find.return_value = mock_find so that the test uses the intended
mocked chain (references: MockMetadata, mock_find, Metadata.find).

In `@tests/unit/scrapers/test_common_cache.py`:
- Around line 112-130: Replace the generic RuntimeError in the
test_cache_error_proceeds_to_scrape patch of
fastfetchbot_shared.database.mongodb.cache.find_cached with the shared
cache-layer exception type used in production (e.g., ExternalServiceError or the
specific shared exception your cache module raises), and add an assertion that
the service logs the cache failure (assert logger.exception(...) or
logger.error(...) was called) before falling back to scraper.get_item; locate
the test function test_cache_error_proceeds_to_scrape, the make_service-created
svc, the patched symbol fastfetchbot_shared.database.mongodb.cache.find_cached,
and the mock_scraper_instance.get_item to implement these changes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6a5c199b-b8b3-4b9a-8ca6-ba5a379d0d02

📥 Commits

Reviewing files that changed from the base of the PR and between c91e7d6 and d170784.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (37)
  • CLAUDE.md
  • apps/api/pyproject.toml
  • apps/api/src/config.py
  • apps/api/src/database.py
  • apps/api/src/models/database_model.py
  • apps/api/src/services/scrapers/common.py
  • apps/async-worker/async_worker/config.py
  • apps/async-worker/async_worker/main.py
  • apps/async-worker/async_worker/services/enrichment.py
  • apps/async-worker/async_worker/tasks/scrape.py
  • apps/async-worker/pyproject.toml
  • apps/telegram-bot/core/handlers/buttons.py
  • apps/telegram-bot/core/handlers/commands.py
  • apps/telegram-bot/core/handlers/url_process.py
  • apps/telegram-bot/core/services/user_settings.py
  • packages/shared/alembic/versions/0002_add_force_refresh_cache_column.py
  • packages/shared/fastfetchbot_shared/database/models/user_setting.py
  • packages/shared/fastfetchbot_shared/database/mongodb/__init__.py
  • packages/shared/fastfetchbot_shared/database/mongodb/cache.py
  • packages/shared/fastfetchbot_shared/database/mongodb/connection.py
  • packages/shared/fastfetchbot_shared/database/mongodb/models/__init__.py
  • packages/shared/fastfetchbot_shared/database/mongodb/models/metadata.py
  • packages/shared/fastfetchbot_shared/services/scrapers/common.py
  • packages/shared/pyproject.toml
  • tests/unit/async_worker/test_enrichment_database.py
  • tests/unit/async_worker/test_scrape_cache.py
  • tests/unit/async_worker/test_worker_lifecycle.py
  • tests/unit/database/__init__.py
  • tests/unit/database/mongodb/__init__.py
  • tests/unit/database/mongodb/test_cache.py
  • tests/unit/database/mongodb/test_connection.py
  • tests/unit/database/mongodb/test_metadata_model.py
  • tests/unit/scrapers/test_common_cache.py
  • tests/unit/telegram_bot/test_buttons_cache.py
  • tests/unit/telegram_bot/test_commands.py
  • tests/unit/telegram_bot/test_url_process_cache.py
  • tests/unit/telegram_bot/test_user_settings.py

Comment on lines 39 to +43
if not self.MONGODB_URL:
self.MONGODB_URL = f"mongodb://{self.MONGODB_HOST}:{self.MONGODB_PORT}"
if self.MONGODB_USERNAME and self.MONGODB_PASSWORD:
self.MONGODB_URL = f"mongodb://{self.MONGODB_USERNAME}:{self.MONGODB_PASSWORD}@{self.MONGODB_HOST}:{self.MONGODB_PORT}"
else:
self.MONGODB_URL = f"mongodb://{self.MONGODB_HOST}:{self.MONGODB_PORT}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Do MongoDB connection string usernames and passwords need percent-encoding for reserved URI characters such as @, :, /, and %?

💡 Result:

Yes, MongoDB connection string usernames and passwords need percent-encoding for reserved URI characters such as @ (%40), : (%3A), / (%2F), and % (%25). Official MongoDB documentation explicitly states that if the username or password includes characters like $ : / ? # [ ] @, they must be converted using percent encoding. Examples include D1fficultP%40ssw0rd for a password containing @, and %2Ftmp%2Fmongodb-27017.sock for a UNIX socket path with /. This applies to both mongodb:// and mongodb+srv:// formats, and is confirmed across multiple MongoDB docs versions and community examples.

Citations:


🏁 Script executed:

# Check the actual code at lines 39-43 in config.py
cat -n apps/async-worker/async_worker/config.py | sed -n '35,50p'

Repository: aturret/FastFetchBot

Length of output: 679


🏁 Script executed:

# Check the connection.py file usage
if [ -f "packages/shared/fastfetchbot_shared/database/mongodb/connection.py" ]; then
  cat -n "packages/shared/fastfetchbot_shared/database/mongodb/connection.py" | head -30
else
  find . -name "connection.py" -path "*/mongodb/*" 2>/dev/null
fi

Repository: aturret/FastFetchBot

Length of output: 1217


🏁 Script executed:

# Search for other MongoDB URL construction patterns in the codebase
rg "mongodb://" --type py -B 2 -A 2

Repository: aturret/FastFetchBot

Length of output: 3748


Percent-encode MongoDB credentials in the connection string.

Raw interpolation of MONGODB_USERNAME and MONGODB_PASSWORD into the connection string will fail if credentials contain reserved URI characters like @, :, /, or %. MongoDB URIs require these characters to be percent-encoded. The URL is passed directly to AsyncIOMotorClient() at line 14 of packages/shared/fastfetchbot_shared/database/mongodb/connection.py, so invalid URIs will cause connection failures.

Suggested fix

Add the import:

from urllib.parse import quote_plus

Update the URL construction:

if not self.MONGODB_URL:
    if self.MONGODB_USERNAME and self.MONGODB_PASSWORD:
        username = quote_plus(self.MONGODB_USERNAME)
        password = quote_plus(self.MONGODB_PASSWORD)
        self.MONGODB_URL = (
            f"mongodb://{username}:{password}@"
            f"{self.MONGODB_HOST}:{self.MONGODB_PORT}"
        )
    else:
        self.MONGODB_URL = f"mongodb://{self.MONGODB_HOST}:{self.MONGODB_PORT}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/async-worker/async_worker/config.py` around lines 39 - 43, The MongoDB
connection string is built by interpolating MONGODB_USERNAME and
MONGODB_PASSWORD directly into MONGODB_URL in config.py, which will break for
reserved URI characters; fix by importing urllib.parse.quote_plus and
percent-encoding the username and password (e.g., username =
quote_plus(self.MONGODB_USERNAME), password = quote_plus(self.MONGODB_PASSWORD))
before constructing self.MONGODB_URL using the existing MONGODB_HOST and
MONGODB_PORT, leaving the no-credentials branch unchanged.

Comment on lines +29 to +31
text_length: Optional[int] = Field(ge=0)
content: Optional[str] = None
content_length: Optional[int] = Field(ge=0)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In Pydantic v2, if a field is annotated as Optional[int]but declared asField(ge=0) without a default, is it still required during model validation?

💡 Result:

Yes, in Pydantic v2, a field annotated as Optional[int] (equivalent to int | None) with Field(ge=0) but no default value is still required during model validation. Model creation or validation will fail with a "Field required" error if the field is omitted from the input data. This is because Pydantic v2 no longer provides an implicit None default for Optional types (unlike v1); the field must be explicitly present in the input (e.g., as null/None or a value), and if present as None, it will validate successfully since None is allowed by the type annotation. The ge=0 constraint only applies if a non-None value is provided (which must then be an int >= 0). To make the field truly optional (omittable), provide = None or Field(default=None).

Citations:


🏁 Script executed:

# First, check if the file exists and view the relevant section
cat -n packages/shared/fastfetchbot_shared/database/mongodb/models/metadata.py | head -60

Repository: aturret/FastFetchBot

Length of output: 2509


🏁 Script executed:

# Check the prepare_for_insert method and surrounding code
sed -n '40,60p' packages/shared/fastfetchbot_shared/database/mongodb/models/metadata.py

Repository: aturret/FastFetchBot

Length of output: 762


🏁 Script executed:

# Search for how Metadata is instantiated in the codebase
rg "Metadata\(" --type py -A 2 -B 2 | head -100

Repository: aturret/FastFetchBot

Length of output: 4263


🏁 Script executed:

# Check the test file mentioned in the comment
cat -n tests/unit/database/mongodb/test_metadata_model.py | sed -n '50,85p'

Repository: aturret/FastFetchBot

Length of output: 1483


text_length and content_length must have defaults to allow construction with incomplete metadata.

In Pydantic v2, Optional[int] with Field(ge=0) but no explicit default is still required during validation. The from_dict() method (line 59) calls Metadata(**obj) directly, which triggers validation before prepare_for_insert() can compute these fields. Raw metadata dicts without these keys will fail validation.

🔧 Suggested fix
-    text_length: Optional[int] = Field(ge=0)
+    text_length: Optional[int] = Field(default=0, ge=0)
     content: Optional[str] = None
-    content_length: Optional[int] = Field(ge=0)
+    content_length: Optional[int] = Field(default=0, ge=0)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
text_length: Optional[int] = Field(ge=0)
content: Optional[str] = None
content_length: Optional[int] = Field(ge=0)
text_length: Optional[int] = Field(default=0, ge=0)
content: Optional[str] = None
content_length: Optional[int] = Field(default=0, ge=0)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/shared/fastfetchbot_shared/database/mongodb/models/metadata.py`
around lines 29 - 31, The Metadata model's fields text_length and content_length
lack explicit defaults so Pydantic v2 requires them at construction (causing
Metadata(**obj) in from_dict() to fail); update the field definitions in the
Metadata class to provide defaults (e.g., change text_length: Optional[int] =
Field(ge=0) and content_length: Optional[int] = Field(ge=0) to text_length:
Optional[int] = Field(default=None, ge=0) and content_length: Optional[int] =
Field(default=None, ge=0)) so from_dict() can construct Metadata objects and
then let prepare_for_insert() compute/override these values.

Comment on lines +57 to +59
def from_dict(obj: Any) -> "Metadata":
assert isinstance(obj, dict)
return Metadata(**obj)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Does Python remove assert statements when running with optimization (python -O), and is assert recommended for validating external/runtime inputs?

💡 Result:

Yes, Python removes assert statements when running with optimization (python -O). No, assert is not recommended for validating external/runtime inputs.

Citations:


🏁 Script executed:

# Find and read the exceptions file to see available typed exceptions
find . -path "*fastfetchbot_shared/exceptions.py" -o -path "*fastfetchbot_shared/exceptions/*"

Repository: aturret/FastFetchBot

Length of output: 115


🏁 Script executed:

# Read the exceptions file
cat "packages/shared/fastfetchbot_shared/exceptions.py" 2>/dev/null || cat "packages/shared/fastfetchbot_shared/exceptions/__init__.py" 2>/dev/null || find . -name "exceptions.py" -path "*/fastfetchbot_shared/*" -exec cat {} \;

Repository: aturret/FastFetchBot

Length of output: 711


🏁 Script executed:

# Read the test file to see current expectations
cat "tests/unit/database/mongodb/test_metadata_model.py" 2>/dev/null | head -120

Repository: aturret/FastFetchBot

Length of output: 4025


🏁 Script executed:

# Read the actual metadata.py file for full context
cat "packages/shared/fastfetchbot_shared/database/mongodb/models/metadata.py" 2>/dev/null | head -80

Repository: aturret/FastFetchBot

Length of output: 2117


Replace assert isinstance(obj, dict) with an explicit type check that raises a typed exception.

assert is stripped under python -O, making this validation unreliable at runtime. Replace with an explicit isinstance check and raise a typed exception from fastfetchbot_shared.exceptions (e.g., FastFetchBotError or a new validation-specific exception extending it). Update the test at lines 98-100 to expect the typed exception instead of AssertionError.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/shared/fastfetchbot_shared/database/mongodb/models/metadata.py`
around lines 57 - 59, Replace the fragile assert in Metadata.from_dict with an
explicit type check: in the from_dict method of the Metadata class, if obj is
not an instance of dict raise a typed exception from
fastfetchbot_shared.exceptions (use FastFetchBotError or add a ValidationError
subclass of FastFetchBotError) instead of using assert; update the unit test
that currently expects AssertionError (lines ~98-100) to expect the new typed
exception. Ensure the raised exception includes a clear message like "Expected
dict for Metadata.from_dict" and import the chosen exception class into the
metadata module.

@aturret aturret merged commit d46e5f5 into main Mar 30, 2026
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant