Conversation
|
Caution Review failedPull request was closed or merged during review 📝 WalkthroughWalkthroughAdds a MongoDB-backed scraped-content cache with TTL and versioning, moves MongoDB utilities into the shared package, adds per-user force-refresh cache toggles, and wires cache behavior through API, async-worker, and Telegram handlers with migration and tests. Changes
Sequence DiagramsequenceDiagram
participant Client
participant Service as InfoExtractService
participant Cache as MongoDB Cache
participant Scraper as Scraper
participant DB as MongoDB Store
Client->>Service: get_item(url)
alt Database caching enabled & database_cache_ttl >= 0
Service->>Cache: find_cached(url, ttl_seconds)
alt Cache hit
Cache-->>Service: Metadata {_cached: True}
Service-->>Client: return cached metadata
else Cache miss or expired
Service->>Scraper: scrape URL
Scraper-->>Service: raw metadata
Service->>DB: save_metadata(metadata)
DB-->>Service: saved Metadata (version N)
Service-->>Client: return fresh metadata
end
else Caching disabled or force refresh (ttl < 0)
Service->>Scraper: scrape URL
Scraper-->>Service: raw metadata
Service-->>Client: return fresh metadata
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (8)
apps/api/src/services/scrapers/common.py (1)
78-82: Consider adding error handling aroundsave_metadata()for resilience.Unlike the async-worker's
enrichment.pywhich wrapssave_metadata()in a try/except block (logging errors but continuing), this call has no error handling. A database failure would cause the entire request to fail rather than returning successfully-scraped data.If strict persistence is required, the current behavior is fine. Otherwise, consider wrapping for resilience:
♻️ Suggested fix for resilient persistence
if self.store_database: logger.info("store in database") - from fastfetchbot_shared.database.mongodb.cache import save_metadata - - await save_metadata(metadata_item) + try: + from fastfetchbot_shared.database.mongodb.cache import save_metadata + await save_metadata(metadata_item) + except Exception as e: + logger.error(f"Error saving to database: {e}") return metadata_item🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/api/src/services/scrapers/common.py` around lines 78 - 82, The call to await save_metadata(metadata_item) inside the store_database branch lacks error handling and can fail the whole request; wrap the await save_metadata(metadata_item) call in a try/except around the block guarded by self.store_database (the same place that imports save_metadata) and on exception log the error via logger.error including context (e.g., metadata_item or a short identifier) and continue without re-raising so scraping proceeds even if DB persistence fails.packages/shared/fastfetchbot_shared/database/mongodb/__init__.py (1)
15-23: Consider sorting__all__alphabetically.Ruff flags that
__all__is not sorted (RUF022). While functional, sorting improves readability and aligns with isort conventions.♻️ Suggested fix
__all__ = [ + "DatabaseMediaFile", + "Metadata", + "close_mongodb", + "find_cached", "init_mongodb", - "close_mongodb", "save_instances", - "find_cached", "save_metadata", - "DatabaseMediaFile", - "Metadata", ]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/shared/fastfetchbot_shared/database/mongodb/__init__.py` around lines 15 - 23, The __all__ export list is not alphabetically sorted (RUF022); reorder the entries inside __all__ so they are in alphabetical order (e.g., DatabaseMediaFile, Metadata, close_mongodb, find_cached, init_mongodb, save_instances, save_metadata) to satisfy Ruff and improve readability; update the __all__ variable in the module where it is defined.apps/async-worker/async_worker/tasks/scrape.py (1)
86-88: Preferlogger.exception()over manualtraceback.format_exc().Per coding guidelines, use
logger.exception()which automatically includes the traceback, rather than manually formatting it.♻️ Suggested fix
except Exception as e: - logger.error(f"[{job_id}] Scrape failed: {e}") - logger.error(traceback.format_exc()) + logger.exception(f"[{job_id}] Scrape failed: {e}")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/async-worker/async_worker/tasks/scrape.py` around lines 86 - 88, Replace the manual traceback logging in the except block by calling logger.exception() so the traceback is logged automatically; specifically, where you currently call logger.error(f"[{job_id}] Scrape failed: {e}") followed by logger.error(traceback.format_exc()), remove the traceback.format_exc() call and use logger.exception(f"[{job_id}] Scrape failed") (or similar) to log the exception and stack trace in one call — reference the logger object, the job_id variable, and the existing except Exception as e block to locate and update the code.tests/unit/async_worker/test_enrichment_database.py (1)
71-85: Make the MongoDB failure test enforce the logging contract.Right now this uses
RuntimeErrorand only checks thatenrich()returns. The test will still pass if the exception is swallowed without a log entry. Please switch to the shared external-service exception and assertlogger.exception()orlogger.error()is emitted.As per coding guidelines, "Use custom typed exceptions (
ScraperError,ScraperNetworkError,ScraperParseError,TelegraphPublishError,FileExportError,ExternalServiceError) instead of genericRuntimeError,ValueError, orExceptionfor domain errors" and "Never silently swallow exceptions — if catching an exception, either re-raise it or handle it explicitly with logging".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/async_worker/test_enrichment_database.py` around lines 71 - 85, Replace the generic RuntimeError in test_mongodb_error_does_not_crash with the shared domain exception (ExternalServiceError) and update the assertions to verify the logging contract: patch "fastfetchbot_shared.database.mongodb.cache.save_metadata" to raise ExternalServiceError("MongoDB connection failed"), then capture logging (e.g., caplog or patch the module's logger) and assert that logger.exception() or logger.error() was called with the error when calling enrich(...) with store_database=True; keep the existing assert that result["title"] == "Test Title" to ensure behavior remains unchanged.tests/unit/scrapers/test_common_cache.py (1)
112-130: Model the cache-failure path with the shared exception type and assert the log.Using
RuntimeErrorhere drifts from the production error contract, and this test still passes ifget_item()swallows the cache exception silently. Please raise the same shared custom exception the cache layer emits and assertlogger.exception()/logger.error()on the fallback path.As per coding guidelines, "Use custom typed exceptions (
ScraperError,ScraperNetworkError,ScraperParseError,TelegraphPublishError,FileExportError,ExternalServiceError) instead of genericRuntimeError,ValueError, orExceptionfor domain errors" and "Never silently swallow exceptions — if catching an exception, either re-raise it or handle it explicitly with logging".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/scrapers/test_common_cache.py` around lines 112 - 130, Replace the generic RuntimeError in the test_cache_error_proceeds_to_scrape patch of fastfetchbot_shared.database.mongodb.cache.find_cached with the shared cache-layer exception type used in production (e.g., ExternalServiceError or the specific shared exception your cache module raises), and add an assertion that the service logs the cache failure (assert logger.exception(...) or logger.error(...) was called) before falling back to scraper.get_item; locate the test function test_cache_error_proceeds_to_scrape, the make_service-created svc, the patched symbol fastfetchbot_shared.database.mongodb.cache.find_cached, and the mock_scraper_instance.get_item to implement these changes.CLAUDE.md (1)
181-204: Add blank lines around tables for proper Markdown rendering.Per Markdown best practices and the static analysis hints, tables should be surrounded by blank lines.
📝 Suggested fix
### Database **MongoDB** (scraped content — optional, feature-gated): + | Variable | Default | Description | |----------|---------|-------------| ... | `MONGODB_URL` | derived | Full MongoDB URI. Overrides host/port/credentials if set explicitly | + MongoDB models and connection logic live in... **SQLite/PostgreSQL** (user settings — always enabled for the Telegram bot): + | Variable | Default | Description | |----------|---------|-------------| | `SETTINGS_DATABASE_URL` | `sqlite+aiosqlite:///data/fastfetchbot.db` | SQLAlchemy connection URL. Use `postgresql+asyncpg://...` for production | + Alembic migrations live in...🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@CLAUDE.md` around lines 181 - 204, Add blank lines before and after each Markdown table block so they render correctly: put an empty line above the MongoDB table (the section starting with "**MongoDB** (scraped content — optional, feature-gated):" and the table rows containing `DATABASE_ON`, `DATABASE_CACHE_TTL`, etc.) and another empty line after the table; do the same for the "SQLite/PostgreSQL" table (the section with `SETTINGS_DATABASE_URL`). Ensure no other content sits directly adjacent to the table delimiters so both tables are separated by blank lines from surrounding paragraphs and code blocks.tests/unit/database/mongodb/test_cache.py (1)
45-50: Redundant assignment can be cleaned up.Line 48 assigns a value that is immediately overwritten by line 50. The comment suggests this was a transitional approach.
♻️ Suggested cleanup
with patch( "fastfetchbot_shared.database.mongodb.cache.Metadata" ) as MockMetadata: - MockMetadata.find.return_value = mock_find.sort.return_value.__class__() - # Simpler approach: patch the full chain MockMetadata.find.return_value = mock_find🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/database/mongodb/test_cache.py` around lines 45 - 50, Remove the redundant assignment to MockMetadata.find.return_value that is immediately overwritten: delete the line setting MockMetadata.find.return_value = mock_find.sort.return_value.__class__() and the transitional comment, and keep the simpler direct patch MockMetadata.find.return_value = mock_find so that the test uses the intended mocked chain (references: MockMetadata, mock_find, Metadata.find).packages/shared/fastfetchbot_shared/database/mongodb/cache.py (1)
13-43: Replacedatetime.utcnow()withdatetime.now(timezone.utc)for Python 3.12+ compatibility.
datetime.utcnow()is deprecated in Python 3.12+ and will be removed in a future version. While the current implementation works, using timezone-aware datetimes prevents future compatibility issues.♻️ Suggested improvement
-from datetime import datetime, timedelta +from datetime import datetime, timedelta, timezone from typing import Optional from fastfetchbot_shared.database.mongodb.models.metadata import Metadata from fastfetchbot_shared.utils.logger import logger async def find_cached(url: str, ttl_seconds: int) -> Optional[Metadata]: ... # ttl_seconds == 0 means never expire if ttl_seconds != 0: - age = datetime.utcnow() - doc.timestamp + age = datetime.now(timezone.utc) - doc.timestamp.replace(tzinfo=timezone.utc) if age > timedelta(seconds=ttl_seconds):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/shared/fastfetchbot_shared/database/mongodb/cache.py` around lines 13 - 43, find_cached uses datetime.utcnow() which is deprecated; change it to a timezone-aware now() call (e.g. datetime.now(timezone.utc)) and update imports to include timezone from datetime; also ensure the stored doc.timestamp is timezone-aware or convert it to UTC-aware before subtracting (e.g. doc.timestamp.astimezone(timezone.utc) or otherwise normalize) so the age calculation in find_cached remains valid.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/async-worker/async_worker/config.py`:
- Around line 39-43: The MongoDB connection string is built by interpolating
MONGODB_USERNAME and MONGODB_PASSWORD directly into MONGODB_URL in config.py,
which will break for reserved URI characters; fix by importing
urllib.parse.quote_plus and percent-encoding the username and password (e.g.,
username = quote_plus(self.MONGODB_USERNAME), password =
quote_plus(self.MONGODB_PASSWORD)) before constructing self.MONGODB_URL using
the existing MONGODB_HOST and MONGODB_PORT, leaving the no-credentials branch
unchanged.
In `@CLAUDE.md`:
- Line 23: Replace the Unicode replacement characters in the snippet "���──
services/" with the intended box-drawing character (e.g., change to "├──
services/") or a plain ASCII alternative if the original glyph is unknown, then
save the CLAUDE.md file with UTF-8 encoding (no BOM) to prevent reintroduction
of the replacement character; verify the change by opening the file in a
UTF-8-aware editor and committing the fix.
In `@packages/shared/fastfetchbot_shared/database/mongodb/cache.py`:
- Around line 46-73: In save_metadata, validate the incoming metadata_item's url
before using it as the cache key: ensure metadata_item contains a non-empty,
well-formed URL (e.g., check metadata_item.get("url") is present and not just
empty/whitespace, optionally strip and basic-validate scheme/host) and raise a
clear exception or return an error when invalid; then proceed to compute
latest/version and insert into Metadata as before. Update the function's
behavior (and docstring) to explicitly reject or handle missing/invalid urls to
avoid collisions on the empty-string key.
In `@packages/shared/fastfetchbot_shared/database/mongodb/connection.py`:
- Around line 34-36: The code assumes list `instances` has at least one element
before doing `instance_type = type(instances[0])` and calling `await
instance_type.insert_many(instances)` which will raise IndexError for an empty
list; add a guard that checks `if isinstance(instances, list)` then if `not
instances` either return early (no-op) or raise a clear ValueError, otherwise
compute `instance_type = type(instances[0])` and call `await
instance_type.insert_many(instances)` so `insert_many` is never invoked with an
empty list.
In `@packages/shared/fastfetchbot_shared/database/mongodb/models/metadata.py`:
- Around line 57-59: Replace the fragile assert in Metadata.from_dict with an
explicit type check: in the from_dict method of the Metadata class, if obj is
not an instance of dict raise a typed exception from
fastfetchbot_shared.exceptions (use FastFetchBotError or add a ValidationError
subclass of FastFetchBotError) instead of using assert; update the unit test
that currently expects AssertionError (lines ~98-100) to expect the new typed
exception. Ensure the raised exception includes a clear message like "Expected
dict for Metadata.from_dict" and import the chosen exception class into the
metadata module.
- Around line 29-31: The Metadata model's fields text_length and content_length
lack explicit defaults so Pydantic v2 requires them at construction (causing
Metadata(**obj) in from_dict() to fail); update the field definitions in the
Metadata class to provide defaults (e.g., change text_length: Optional[int] =
Field(ge=0) and content_length: Optional[int] = Field(ge=0) to text_length:
Optional[int] = Field(default=None, ge=0) and content_length: Optional[int] =
Field(default=None, ge=0)) so from_dict() can construct Metadata objects and
then let prepare_for_insert() compute/override these values.
In `@packages/shared/fastfetchbot_shared/services/scrapers/common.py`:
- Line 92: The call to cached.model_dump is excluding a non-existent field
revision_id; update the model_dump call in the cached handling (the
cached.model_dump(...) invocation) to exclude only {"id"} instead of {"id",
"revision_id"}, and run/update the unit test (test_common_cache) to assert the
serialized output no longer expects revision_id to be excluded (i.e., adjust
expectations to match exclusion of only id). Ensure this aligns with the
Metadata model which uses version rather than revision_id.
---
Nitpick comments:
In `@apps/api/src/services/scrapers/common.py`:
- Around line 78-82: The call to await save_metadata(metadata_item) inside the
store_database branch lacks error handling and can fail the whole request; wrap
the await save_metadata(metadata_item) call in a try/except around the block
guarded by self.store_database (the same place that imports save_metadata) and
on exception log the error via logger.error including context (e.g.,
metadata_item or a short identifier) and continue without re-raising so scraping
proceeds even if DB persistence fails.
In `@apps/async-worker/async_worker/tasks/scrape.py`:
- Around line 86-88: Replace the manual traceback logging in the except block by
calling logger.exception() so the traceback is logged automatically;
specifically, where you currently call logger.error(f"[{job_id}] Scrape failed:
{e}") followed by logger.error(traceback.format_exc()), remove the
traceback.format_exc() call and use logger.exception(f"[{job_id}] Scrape
failed") (or similar) to log the exception and stack trace in one call —
reference the logger object, the job_id variable, and the existing except
Exception as e block to locate and update the code.
In `@CLAUDE.md`:
- Around line 181-204: Add blank lines before and after each Markdown table
block so they render correctly: put an empty line above the MongoDB table (the
section starting with "**MongoDB** (scraped content — optional, feature-gated):"
and the table rows containing `DATABASE_ON`, `DATABASE_CACHE_TTL`, etc.) and
another empty line after the table; do the same for the "SQLite/PostgreSQL"
table (the section with `SETTINGS_DATABASE_URL`). Ensure no other content sits
directly adjacent to the table delimiters so both tables are separated by blank
lines from surrounding paragraphs and code blocks.
In `@packages/shared/fastfetchbot_shared/database/mongodb/__init__.py`:
- Around line 15-23: The __all__ export list is not alphabetically sorted
(RUF022); reorder the entries inside __all__ so they are in alphabetical order
(e.g., DatabaseMediaFile, Metadata, close_mongodb, find_cached, init_mongodb,
save_instances, save_metadata) to satisfy Ruff and improve readability; update
the __all__ variable in the module where it is defined.
In `@packages/shared/fastfetchbot_shared/database/mongodb/cache.py`:
- Around line 13-43: find_cached uses datetime.utcnow() which is deprecated;
change it to a timezone-aware now() call (e.g. datetime.now(timezone.utc)) and
update imports to include timezone from datetime; also ensure the stored
doc.timestamp is timezone-aware or convert it to UTC-aware before subtracting
(e.g. doc.timestamp.astimezone(timezone.utc) or otherwise normalize) so the age
calculation in find_cached remains valid.
In `@tests/unit/async_worker/test_enrichment_database.py`:
- Around line 71-85: Replace the generic RuntimeError in
test_mongodb_error_does_not_crash with the shared domain exception
(ExternalServiceError) and update the assertions to verify the logging contract:
patch "fastfetchbot_shared.database.mongodb.cache.save_metadata" to raise
ExternalServiceError("MongoDB connection failed"), then capture logging (e.g.,
caplog or patch the module's logger) and assert that logger.exception() or
logger.error() was called with the error when calling enrich(...) with
store_database=True; keep the existing assert that result["title"] == "Test
Title" to ensure behavior remains unchanged.
In `@tests/unit/database/mongodb/test_cache.py`:
- Around line 45-50: Remove the redundant assignment to
MockMetadata.find.return_value that is immediately overwritten: delete the line
setting MockMetadata.find.return_value = mock_find.sort.return_value.__class__()
and the transitional comment, and keep the simpler direct patch
MockMetadata.find.return_value = mock_find so that the test uses the intended
mocked chain (references: MockMetadata, mock_find, Metadata.find).
In `@tests/unit/scrapers/test_common_cache.py`:
- Around line 112-130: Replace the generic RuntimeError in the
test_cache_error_proceeds_to_scrape patch of
fastfetchbot_shared.database.mongodb.cache.find_cached with the shared
cache-layer exception type used in production (e.g., ExternalServiceError or the
specific shared exception your cache module raises), and add an assertion that
the service logs the cache failure (assert logger.exception(...) or
logger.error(...) was called) before falling back to scraper.get_item; locate
the test function test_cache_error_proceeds_to_scrape, the make_service-created
svc, the patched symbol fastfetchbot_shared.database.mongodb.cache.find_cached,
and the mock_scraper_instance.get_item to implement these changes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6a5c199b-b8b3-4b9a-8ca6-ba5a379d0d02
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (37)
CLAUDE.mdapps/api/pyproject.tomlapps/api/src/config.pyapps/api/src/database.pyapps/api/src/models/database_model.pyapps/api/src/services/scrapers/common.pyapps/async-worker/async_worker/config.pyapps/async-worker/async_worker/main.pyapps/async-worker/async_worker/services/enrichment.pyapps/async-worker/async_worker/tasks/scrape.pyapps/async-worker/pyproject.tomlapps/telegram-bot/core/handlers/buttons.pyapps/telegram-bot/core/handlers/commands.pyapps/telegram-bot/core/handlers/url_process.pyapps/telegram-bot/core/services/user_settings.pypackages/shared/alembic/versions/0002_add_force_refresh_cache_column.pypackages/shared/fastfetchbot_shared/database/models/user_setting.pypackages/shared/fastfetchbot_shared/database/mongodb/__init__.pypackages/shared/fastfetchbot_shared/database/mongodb/cache.pypackages/shared/fastfetchbot_shared/database/mongodb/connection.pypackages/shared/fastfetchbot_shared/database/mongodb/models/__init__.pypackages/shared/fastfetchbot_shared/database/mongodb/models/metadata.pypackages/shared/fastfetchbot_shared/services/scrapers/common.pypackages/shared/pyproject.tomltests/unit/async_worker/test_enrichment_database.pytests/unit/async_worker/test_scrape_cache.pytests/unit/async_worker/test_worker_lifecycle.pytests/unit/database/__init__.pytests/unit/database/mongodb/__init__.pytests/unit/database/mongodb/test_cache.pytests/unit/database/mongodb/test_connection.pytests/unit/database/mongodb/test_metadata_model.pytests/unit/scrapers/test_common_cache.pytests/unit/telegram_bot/test_buttons_cache.pytests/unit/telegram_bot/test_commands.pytests/unit/telegram_bot/test_url_process_cache.pytests/unit/telegram_bot/test_user_settings.py
| if not self.MONGODB_URL: | ||
| self.MONGODB_URL = f"mongodb://{self.MONGODB_HOST}:{self.MONGODB_PORT}" | ||
| if self.MONGODB_USERNAME and self.MONGODB_PASSWORD: | ||
| self.MONGODB_URL = f"mongodb://{self.MONGODB_USERNAME}:{self.MONGODB_PASSWORD}@{self.MONGODB_HOST}:{self.MONGODB_PORT}" | ||
| else: | ||
| self.MONGODB_URL = f"mongodb://{self.MONGODB_HOST}:{self.MONGODB_PORT}" |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Do MongoDB connection string usernames and passwords need percent-encoding for reserved URI characters such as @, :, /, and %?
💡 Result:
Yes, MongoDB connection string usernames and passwords need percent-encoding for reserved URI characters such as @ (%40), : (%3A), / (%2F), and % (%25). Official MongoDB documentation explicitly states that if the username or password includes characters like $ : / ? # [ ] @, they must be converted using percent encoding. Examples include D1fficultP%40ssw0rd for a password containing @, and %2Ftmp%2Fmongodb-27017.sock for a UNIX socket path with /. This applies to both mongodb:// and mongodb+srv:// formats, and is confirmed across multiple MongoDB docs versions and community examples.
Citations:
- 1: https://www.mongodb.com/docs/manual/reference/connection-string/
- 2: https://www.mongodb.com/docs/manual/reference/connection-string-options/
- 3: https://www.mongodb.com/docs/manual/reference/connection-string-formats/
🏁 Script executed:
# Check the actual code at lines 39-43 in config.py
cat -n apps/async-worker/async_worker/config.py | sed -n '35,50p'Repository: aturret/FastFetchBot
Length of output: 679
🏁 Script executed:
# Check the connection.py file usage
if [ -f "packages/shared/fastfetchbot_shared/database/mongodb/connection.py" ]; then
cat -n "packages/shared/fastfetchbot_shared/database/mongodb/connection.py" | head -30
else
find . -name "connection.py" -path "*/mongodb/*" 2>/dev/null
fiRepository: aturret/FastFetchBot
Length of output: 1217
🏁 Script executed:
# Search for other MongoDB URL construction patterns in the codebase
rg "mongodb://" --type py -B 2 -A 2Repository: aturret/FastFetchBot
Length of output: 3748
Percent-encode MongoDB credentials in the connection string.
Raw interpolation of MONGODB_USERNAME and MONGODB_PASSWORD into the connection string will fail if credentials contain reserved URI characters like @, :, /, or %. MongoDB URIs require these characters to be percent-encoded. The URL is passed directly to AsyncIOMotorClient() at line 14 of packages/shared/fastfetchbot_shared/database/mongodb/connection.py, so invalid URIs will cause connection failures.
Suggested fix
Add the import:
from urllib.parse import quote_plusUpdate the URL construction:
if not self.MONGODB_URL:
if self.MONGODB_USERNAME and self.MONGODB_PASSWORD:
username = quote_plus(self.MONGODB_USERNAME)
password = quote_plus(self.MONGODB_PASSWORD)
self.MONGODB_URL = (
f"mongodb://{username}:{password}@"
f"{self.MONGODB_HOST}:{self.MONGODB_PORT}"
)
else:
self.MONGODB_URL = f"mongodb://{self.MONGODB_HOST}:{self.MONGODB_PORT}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/async-worker/async_worker/config.py` around lines 39 - 43, The MongoDB
connection string is built by interpolating MONGODB_USERNAME and
MONGODB_PASSWORD directly into MONGODB_URL in config.py, which will break for
reserved URI characters; fix by importing urllib.parse.quote_plus and
percent-encoding the username and password (e.g., username =
quote_plus(self.MONGODB_USERNAME), password = quote_plus(self.MONGODB_PASSWORD))
before constructing self.MONGODB_URL using the existing MONGODB_HOST and
MONGODB_PORT, leaving the no-credentials branch unchanged.
| text_length: Optional[int] = Field(ge=0) | ||
| content: Optional[str] = None | ||
| content_length: Optional[int] = Field(ge=0) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In Pydantic v2, if a field is annotated as Optional[int]but declared asField(ge=0) without a default, is it still required during model validation?
💡 Result:
Yes, in Pydantic v2, a field annotated as Optional[int] (equivalent to int | None) with Field(ge=0) but no default value is still required during model validation. Model creation or validation will fail with a "Field required" error if the field is omitted from the input data. This is because Pydantic v2 no longer provides an implicit None default for Optional types (unlike v1); the field must be explicitly present in the input (e.g., as null/None or a value), and if present as None, it will validate successfully since None is allowed by the type annotation. The ge=0 constraint only applies if a non-None value is provided (which must then be an int >= 0). To make the field truly optional (omittable), provide = None or Field(default=None).
Citations:
- 1: https://docs.pydantic.dev/2.5/concepts/models/
- 2: https://stackoverflow.com/questions/74854903/not-required-in-pydantics-base-models
- 3: https://stackoverflow.com/questions/76690463/pydantic-2-0-ignores-optional-in-schema-and-requires-the-field-to-be-available
- 4: Understanding Changes in Pydantic Behavior: Why Optional Fields are Required in V2 pydantic/pydantic#8264
- 5: Optional fields raise validation errors: Field required [type=missing...] pydantic/pydantic#6463
🏁 Script executed:
# First, check if the file exists and view the relevant section
cat -n packages/shared/fastfetchbot_shared/database/mongodb/models/metadata.py | head -60Repository: aturret/FastFetchBot
Length of output: 2509
🏁 Script executed:
# Check the prepare_for_insert method and surrounding code
sed -n '40,60p' packages/shared/fastfetchbot_shared/database/mongodb/models/metadata.pyRepository: aturret/FastFetchBot
Length of output: 762
🏁 Script executed:
# Search for how Metadata is instantiated in the codebase
rg "Metadata\(" --type py -A 2 -B 2 | head -100Repository: aturret/FastFetchBot
Length of output: 4263
🏁 Script executed:
# Check the test file mentioned in the comment
cat -n tests/unit/database/mongodb/test_metadata_model.py | sed -n '50,85p'Repository: aturret/FastFetchBot
Length of output: 1483
text_length and content_length must have defaults to allow construction with incomplete metadata.
In Pydantic v2, Optional[int] with Field(ge=0) but no explicit default is still required during validation. The from_dict() method (line 59) calls Metadata(**obj) directly, which triggers validation before prepare_for_insert() can compute these fields. Raw metadata dicts without these keys will fail validation.
🔧 Suggested fix
- text_length: Optional[int] = Field(ge=0)
+ text_length: Optional[int] = Field(default=0, ge=0)
content: Optional[str] = None
- content_length: Optional[int] = Field(ge=0)
+ content_length: Optional[int] = Field(default=0, ge=0)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| text_length: Optional[int] = Field(ge=0) | |
| content: Optional[str] = None | |
| content_length: Optional[int] = Field(ge=0) | |
| text_length: Optional[int] = Field(default=0, ge=0) | |
| content: Optional[str] = None | |
| content_length: Optional[int] = Field(default=0, ge=0) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/shared/fastfetchbot_shared/database/mongodb/models/metadata.py`
around lines 29 - 31, The Metadata model's fields text_length and content_length
lack explicit defaults so Pydantic v2 requires them at construction (causing
Metadata(**obj) in from_dict() to fail); update the field definitions in the
Metadata class to provide defaults (e.g., change text_length: Optional[int] =
Field(ge=0) and content_length: Optional[int] = Field(ge=0) to text_length:
Optional[int] = Field(default=None, ge=0) and content_length: Optional[int] =
Field(default=None, ge=0)) so from_dict() can construct Metadata objects and
then let prepare_for_insert() compute/override these values.
| def from_dict(obj: Any) -> "Metadata": | ||
| assert isinstance(obj, dict) | ||
| return Metadata(**obj) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Does Python remove assert statements when running with optimization (python -O), and is assert recommended for validating external/runtime inputs?
💡 Result:
Yes, Python removes assert statements when running with optimization (python -O). No, assert is not recommended for validating external/runtime inputs.
Citations:
- 1: https://docs.python.org/3/reference/simple_stmts.html?highlight=assert
- 2: https://docs.python.org/3/reference/simple_stmts.html
- 3: https://wiki.python.org/python/UsingAssertionsEffectively.html
🏁 Script executed:
# Find and read the exceptions file to see available typed exceptions
find . -path "*fastfetchbot_shared/exceptions.py" -o -path "*fastfetchbot_shared/exceptions/*"Repository: aturret/FastFetchBot
Length of output: 115
🏁 Script executed:
# Read the exceptions file
cat "packages/shared/fastfetchbot_shared/exceptions.py" 2>/dev/null || cat "packages/shared/fastfetchbot_shared/exceptions/__init__.py" 2>/dev/null || find . -name "exceptions.py" -path "*/fastfetchbot_shared/*" -exec cat {} \;Repository: aturret/FastFetchBot
Length of output: 711
🏁 Script executed:
# Read the test file to see current expectations
cat "tests/unit/database/mongodb/test_metadata_model.py" 2>/dev/null | head -120Repository: aturret/FastFetchBot
Length of output: 4025
🏁 Script executed:
# Read the actual metadata.py file for full context
cat "packages/shared/fastfetchbot_shared/database/mongodb/models/metadata.py" 2>/dev/null | head -80Repository: aturret/FastFetchBot
Length of output: 2117
Replace assert isinstance(obj, dict) with an explicit type check that raises a typed exception.
assert is stripped under python -O, making this validation unreliable at runtime. Replace with an explicit isinstance check and raise a typed exception from fastfetchbot_shared.exceptions (e.g., FastFetchBotError or a new validation-specific exception extending it). Update the test at lines 98-100 to expect the typed exception instead of AssertionError.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/shared/fastfetchbot_shared/database/mongodb/models/metadata.py`
around lines 57 - 59, Replace the fragile assert in Metadata.from_dict with an
explicit type check: in the from_dict method of the Metadata class, if obj is
not an instance of dict raise a typed exception from
fastfetchbot_shared.exceptions (use FastFetchBotError or add a ValidationError
subclass of FastFetchBotError) instead of using assert; update the unit test
that currently expects AssertionError (lines ~98-100) to expect the new typed
exception. Ensure the raised exception includes a clear message like "Expected
dict for Metadata.from_dict" and import the chosen exception class into the
metadata module.
Summary by CodeRabbit
New Features
Configuration
Tests