feat: use pydantic-settings as default config format#71
Conversation
📝 WalkthroughWalkthroughThis PR refactors configuration management across the entire codebase by replacing scattered module-level environment variable constants with centralized Pydantic Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 14
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
apps/telegram-bot/core/handlers/url_process.py (1)
103-117:⚠️ Potential issue | 🟠 MajorDon’t report “no supported url” after successfully general-scraping it.
When
settings.GENERAL_SCRAPING_ONis true, this branch fetches and sends the page, but then always overwrites the status with a failure message and returns. That also skips any later URLs in the same Telegram message.Proposed fix
if url_metadata["source"] == "unknown": if settings.GENERAL_SCRAPING_ON: await process_message.edit_text( text=f"Uncategorized url found. General webpage parser is on, Processing..." ) await _fetch_and_send( url=url_metadata["url"], chat_id=message.chat_id, source=url_metadata.get("source", ""), content_type=url_metadata.get("content_type", ""), ) + await process_message.delete() + continue await process_message.edit_text( text=f"For the {i + 1} th url, no supported url found." ) - return + continue🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/telegram-bot/core/handlers/url_process.py` around lines 103 - 117, The branch handling url_metadata["source"] == "unknown" currently always sends a "no supported url" message after calling _fetch_and_send when settings.GENERAL_SCRAPING_ON is true; change the control flow so that when GENERAL_SCRAPING_ON is true you send the "Processing..." message, await _fetch_and_send(url=url_metadata["url"], chat_id=message.chat_id, source=..., content_type=...), and then exit/continue without sending the failure text or returning prematurely; only send the failure message and return when GENERAL_SCRAPING_ON is false (or when the fetch actually fails), keeping references to process_message.edit_text, _fetch_and_send, url_metadata, settings.GENERAL_SCRAPING_ON and the loop index i to locate and update the logic.apps/telegram-bot/core/services/bot_app.py (1)
43-60:⚠️ Potential issue | 🟠 MajorFail fast instead of leaving
applicationundefined.If
TELEGRAM_BOT_TOKENis missing or empty, this module only logs and keeps loading. Later calls tostartup,shutdown,set_webhook, orprocess_telegram_updatewill crash becauseapplicationwas never built.Proposed fix
+application: Application | None = None + -if settings.TELEGRAM_BOT_TOKEN is not None: +if settings.TELEGRAM_BOT_TOKEN: builder = ( Application.builder() .token(settings.TELEGRAM_BOT_TOKEN) .arbitrary_callback_data(True) .connect_timeout(settings.TELEBOT_CONNECT_TIMEOUT) @@ if settings.TELEGRAM_BOT_MODE == "webhook": builder = builder.updater(None) application = builder.build() else: - logger.error("TELEGRAM_BOT_TOKEN is not set!") + raise RuntimeError("TELEGRAM_BOT_TOKEN is not set")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/telegram-bot/core/services/bot_app.py` around lines 43 - 60, The module currently logs and continues when settings.TELEGRAM_BOT_TOKEN is missing, leaving the variable application undefined; update the top-level initialization in bot_app.py to fail fast by raising an explicit exception or calling sys.exit with a clear error when settings.TELEGRAM_BOT_TOKEN is None/empty so application cannot remain undefined. Specifically, in the block that checks settings.TELEGRAM_BOT_TOKEN (which builds Application via Application.builder(), .token(...), etc.), replace or follow the logger.error("TELEGRAM_BOT_TOKEN is not set!") with a RuntimeError or SystemExit containing the same diagnostic message so subsequent functions like startup, shutdown, set_webhook, and process_telegram_update see a deterministic failure instead of an unbound application. Ensure the error message references TELEGRAM_BOT_TOKEN and keeps existing log text for clarity.
🧹 Nitpick comments (4)
apps/api/src/main.py (1)
12-12: Consider movingSENTRY_DSNto the settings object.The
SENTRY_DSNis hardcoded as an empty string here. Per coding guidelines, Sentry should be used for production error monitoring. Consider addingSENTRY_DSNto theApiSettingsclass to allow configuration via environment variables, consistent with this PR's refactor pattern.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/api/src/main.py` at line 12, Replace the hardcoded SENTRY_DSN variable with a configuration field on the ApiSettings class: add a SENTRY_DSN (string, default empty) attribute to ApiSettings, load it from environment like the other settings, and then use settings.SENTRY_DSN wherever SENTRY_DSN was referenced in main.py instead of the top-level SENTRY_DSN constant; ensure any initialization logic that configures sentry now reads from ApiSettings so the value can be provided via env vars.apps/telegram-bot/core/main.py (1)
8-13: ValidateTELEGRAM_BOT_MODEexplicitly.Right now any unexpected mode value falls back to polling. Consider raising a clear error for invalid values to avoid silent misconfiguration.
💡 Proposed refactor
- if settings.TELEGRAM_BOT_MODE == "webhook": + if settings.TELEGRAM_BOT_MODE == "webhook": logger.info(f"Running in webhook mode on port {settings.TELEGRAM_BOT_PORT}") uvicorn.run(webhook_app, host="0.0.0.0", port=settings.TELEGRAM_BOT_PORT) - else: + elif settings.TELEGRAM_BOT_MODE == "polling": logger.info(f"Running in polling mode (HTTP server on port {settings.TELEGRAM_BOT_PORT} for callbacks)") uvicorn.run(callback_app, host="0.0.0.0", port=settings.TELEGRAM_BOT_PORT) + else: + raise ValueError(f"Invalid TELEGRAM_BOT_MODE: {settings.TELEGRAM_BOT_MODE}")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/telegram-bot/core/main.py` around lines 8 - 13, Validate settings.TELEGRAM_BOT_MODE explicitly instead of defaulting to polling: check that settings.TELEGRAM_BOT_MODE is either "webhook" or "polling" before calling uvicorn.run; if it's "webhook" run webhook_app, if "polling" run callback_app, otherwise raise a clear error (e.g., ValueError) with a message that includes the invalid value and expected options; update the logging (logger.info) accordingly and reference TELEGRAM_BOT_PORT, webhook_app, callback_app, and uvicorn.run in the change.tests/unit/test_telegraph.py (1)
117-132: Clarify empty string parsing behavior in tests.Two tests (
test_no_token_list_creates_tokenandtest_empty_token_list_creates_token) both patch with""and expect token creation. The docstring at line 138 says "Empty string parses to None (falsy)", but this duplicates the previous test's scenario. Consider differentiating these tests—for example, one could testNoneexplicitly if the settings support it.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/test_telegraph.py` around lines 117 - 132, The two tests patching TELEGRAPH_TOKEN_LIST both use an empty string but should cover distinct cases; update one of them (e.g., test_empty_token_list_creates_token) to patch TELEGRAPH_TOKEN_LIST to None (or remove the patch so it resolves to None) to explicitly test the None case, keeping the other test with the empty-string patch to test falsy empty-string behavior; ensure references to AsyncTelegraphPoster, mock_poster.create_api_token, and the Telegraph(...) instantiation remain unchanged so the assertions still validate token creation vs. set_token calls.apps/telegram-bot/core/config.py (1)
121-166: Prefer parsed computed fields over duplicate module globals.
settings.TELEGRAM_CHANNEL_IDandsettings.TELEGRAM_CHANNEL_ADMIN_LISTstay as raw strings while the module exports parsed values with the same names. That leaves two config surfaces to keep straight and makes test patching or later config reloads easy to get wrong. Moving the parsed forms ontoTelegramBotSettingswould keep this refactor single-source.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/telegram-bot/core/config.py` around lines 121 - 166, The module currently leaves raw strings on settings (settings.TELEGRAM_CHANNEL_ID, settings.TELEGRAM_CHANNEL_ADMIN_LIST) while exporting parsed globals (TELEGRAM_CHANNEL_ID, TELEGRAM_CHANNEL_ADMIN_LIST), creating duplicate config surfaces; move the parsing logic (use _parse_channel_ids, _parse_debug_channel, _parse_admin_list, and ban_list_resolver) into the TelegramBotSettings dataclass (or construct) so the settings object returns already-parsed attributes (e.g., TelegramBotSettings.telegram_channel_id, .telegram_channel_admin_list, .telebot_debug_channel, .group_message_ban_list, .bot_message_ban_list) and remove the module-level parsed globals to ensure a single source of truth for config and simplify testing/patching.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/api/src/config.py`:
- Around line 14-17: The current API_KEY uses Field(default_factory=lambda:
secrets.token_urlsafe(32)) which auto-generates a new key on each restart and
breaks clients; update the config so production requires an explicit API_KEY
(remove or replace the default_factory) and implement a startup-time check that
detects when API_KEY was not provided and an auto-generated value is used, then
emit a clear warning via the app logger; reference the API_KEY Field and the
config/settings initializer (or __post_init__ / startup hook that constructs the
Settings) to perform the detection and logging so developers see when a
non-persistent key is in use.
In `@apps/api/src/services/amazon/s3.py`:
- Line 43: The function signature(s) that declare nullable string parameters
(e.g., parameters named bucket and file_name) currently use `bucket: str = None`
/ `file_name: str = None`; update those annotations to explicit optional types
(either `Optional[str]` with `from typing import Optional` or the PEP 604 style
`str | None`) and keep the default `= None`; locate and update the signatures in
the relevant functions/methods in s3.py (search for parameters named `bucket`
and `file_name`) and add the necessary import if you choose `Optional`.
In `@apps/telegram-bot/core/api_client.py`:
- Line 6: Update nullable parameter annotations to use explicit Optional (or PEP
604 union) types: change get_item signature's ban_list: list = None to ban_list:
Optional[list] = None (or ban_list: list | None = None) and do the same for the
corresponding nullable parameter in get_url_metadata; ensure you import
typing.Optional if using Optional and keep the default None unchanged.
In `@apps/telegram-bot/core/config.py`:
- Around line 15-27: Remove the process-local default token generation for
API_KEY and TELEGRAM_BOT_SECRET_TOKEN and require values from
environment/config: replace the Field(default_factory=...) usages so API_KEY and
TELEGRAM_BOT_SECRET_TOKEN are read from env (e.g., Field(..., env="API_KEY") /
Field(..., env="TELEGRAM_BOT_SECRET_TOKEN") or make them required Optional[str]
without a default and validate at startup), ensuring the symbols API_KEY and
TELEGRAM_BOT_SECRET_TOKEN in core/config.py no longer fall back to
secrets.token_urlsafe and instead rely on environment-provided secrets so tokens
are stable across processes and restarts.
In `@apps/telegram-bot/core/services/bot_app.py`:
- Around line 36-40: Remove the secret token from debug logs in set_webhook:
stop logging settings.TELEGRAM_BOT_SECRET_TOKEN and instead log only
non-sensitive contextual info (e.g., settings.TELEGRAM_WEBHOOK_URL or a
placeholder). Update the logger.debug call in the set_webhook function to omit
or redact settings.TELEGRAM_BOT_SECRET_TOKEN, leaving the rest of the call and
the await application.bot.set_webhook(...) unchanged.
In `@apps/worker/worker_core/config.py`:
- Line 17: The default DOWNLOAD_DIR = "/tmp" is insecure; change the default to
an app-owned directory (e.g., a subdirectory like "worker_downloads") and ensure
the process creates it with restrictive permissions (use pathlib/os to create
the directory if missing with mode 0o700 or call os.chmod to enforce 0700).
Update the config symbol DOWNLOAD_DIR and add initialization logic that ensures
the directory exists and has correct permissions at startup (fail-fast or
fallback with a clear error if the secure directory cannot be created).
In `@apps/worker/worker_core/tasks/pdf.py`:
- Around line 1-3: The file directly imports export_pdf from
fastfetchbot_file_export (line with export_pdf) which breaks the app-layer
boundary; replace that direct import with the shared async wrapper from
fastfetchbot_shared.services.file_export (the wrapper that submits a Celery
task, e.g. export_pdf_async or submit_pdf_export_task) and update usages in this
module (e.g., any calls to export_pdf) to call the async/shared wrapper instead
so the worker executes via the shared Celery task API rather than importing
fastfetchbot_file_export directly.
In `@packages/shared/fastfetchbot_shared/services/scrapers/config.py`:
- Around line 126-134: The property firecrawl_wait_for_int currently treats 0 as
falsy and returns 3000; change the logic to treat "0" (and integer 0) as a valid
value by only falling back when FIRECRAWL_WAIT_FOR is None or an empty string or
when parsing fails. In the firecrawl_wait_for_int property, first check if
self.FIRECRAWL_WAIT_FOR is None or equal to "" and return 3000 in that case,
otherwise parse int(self.FIRECRAWL_WAIT_FOR) and return that value (including
0); keep the existing except (ValueError, TypeError) to return 3000 on parse
errors.
- Around line 181-193: The _load_zhihu_cookies function opens zhihu_cookies.json
without an explicit encoding which can break on non-UTF-8 systems; update the
open(...) call inside _load_zhihu_cookies to include encoding="utf-8" (preserve
the try/except and return behavior) so JSON.load reads the file with UTF-8
encoding consistent with the XHS loader.
- Around line 140-144: The Jinja2 Environment currently created as JINJA2_ENV
(using Environment and FileSystemLoader with templates_directory) does not
enable autoescaping; update the Environment instantiation to enable autoescape
(e.g., use select_autoescape or set autoescape for HTML/XML templates) so HTML
templates that render scraped/user content are escaped by default; modify the
JINJA2_ENV creation (the Environment(...) call) to include autoescape behavior
for 'html' and 'xml' templates.
In
`@packages/shared/fastfetchbot_shared/services/scrapers/xiaohongshu/adaptar.py`:
- Around line 53-57: The constructor currently calls .rstrip("/") on
(sign_server_endpoint or shared_settings.SIGN_SERVER_URL) which will raise
AttributeError if shared_settings.SIGN_SERVER_URL is None; change the logic in
XhsSinglePostAdapter.__init__ to first capture the raw value (e.g., raw_endpoint
= sign_server_endpoint if sign_server_endpoint is not None else
shared_settings.SIGN_SERVER_URL), then if raw_endpoint is falsy/None raise the
intended ValueError, otherwise set self.sign_server_endpoint =
raw_endpoint.rstrip("/"); this preserves the ValueError path and avoids calling
.rstrip on None.
- Line 10: Replace the incorrect import of shared_settings with the scraper
config settings by changing the import to use
fastfetchbot_shared.services.scrapers.config (so the module variable to use is
settings rather than shared_settings) and then update any usage of
shared_settings.SIGN_SERVER_URL to settings.SIGN_SERVER_URL (or
settings.XHS_SIGN_SERVER_URL if you prefer that constant name); specifically
update the import statement and the reference in the code that currently uses
shared_settings (e.g., the place where SIGN_SERVER_URL is read) to reference the
new settings symbol.
In `@packages/shared/fastfetchbot_shared/utils/pydantic_types.py`:
- Around line 6-20: The list-branch in _parse_comma_list and
_parse_optional_comma_list returns input lists unchanged, which allows
unstripped/empty elements; change the isinstance(v, list) handling to normalize
the list exactly like the string branch (strip each item and filter out empty
strings) so e.g. [" a ", "", "b"] becomes ["a","b"]; for
_parse_optional_comma_list return the normalized list or None when it becomes
empty to preserve its current None semantics.
In `@tests/unit/scrapers/test_scraper_config.py`:
- Around line 399-406: The test only asserts the raw env field is empty but must
also assert the actual fallback path is used; update
test_xhs_cookie_default_path_when_no_override to, after calling _reload_config,
verify that the resolved cookie path equals os.path.join(cfg.CONF_DIR,
"xhs_cookies.txt") by either calling the resolver function
(_load_xhs_cookies(cfg)) or inspecting the attribute that stores the resolved
path, while still keeping the existing assertion that
cfg.settings.XHS_COOKIE_PATH == "".
---
Outside diff comments:
In `@apps/telegram-bot/core/handlers/url_process.py`:
- Around line 103-117: The branch handling url_metadata["source"] == "unknown"
currently always sends a "no supported url" message after calling
_fetch_and_send when settings.GENERAL_SCRAPING_ON is true; change the control
flow so that when GENERAL_SCRAPING_ON is true you send the "Processing..."
message, await _fetch_and_send(url=url_metadata["url"], chat_id=message.chat_id,
source=..., content_type=...), and then exit/continue without sending the
failure text or returning prematurely; only send the failure message and return
when GENERAL_SCRAPING_ON is false (or when the fetch actually fails), keeping
references to process_message.edit_text, _fetch_and_send, url_metadata,
settings.GENERAL_SCRAPING_ON and the loop index i to locate and update the
logic.
In `@apps/telegram-bot/core/services/bot_app.py`:
- Around line 43-60: The module currently logs and continues when
settings.TELEGRAM_BOT_TOKEN is missing, leaving the variable application
undefined; update the top-level initialization in bot_app.py to fail fast by
raising an explicit exception or calling sys.exit with a clear error when
settings.TELEGRAM_BOT_TOKEN is None/empty so application cannot remain
undefined. Specifically, in the block that checks settings.TELEGRAM_BOT_TOKEN
(which builds Application via Application.builder(), .token(...), etc.), replace
or follow the logger.error("TELEGRAM_BOT_TOKEN is not set!") with a RuntimeError
or SystemExit containing the same diagnostic message so subsequent functions
like startup, shutdown, set_webhook, and process_telegram_update see a
deterministic failure instead of an unbound application. Ensure the error
message references TELEGRAM_BOT_TOKEN and keeps existing log text for clarity.
---
Nitpick comments:
In `@apps/api/src/main.py`:
- Line 12: Replace the hardcoded SENTRY_DSN variable with a configuration field
on the ApiSettings class: add a SENTRY_DSN (string, default empty) attribute to
ApiSettings, load it from environment like the other settings, and then use
settings.SENTRY_DSN wherever SENTRY_DSN was referenced in main.py instead of the
top-level SENTRY_DSN constant; ensure any initialization logic that configures
sentry now reads from ApiSettings so the value can be provided via env vars.
In `@apps/telegram-bot/core/config.py`:
- Around line 121-166: The module currently leaves raw strings on settings
(settings.TELEGRAM_CHANNEL_ID, settings.TELEGRAM_CHANNEL_ADMIN_LIST) while
exporting parsed globals (TELEGRAM_CHANNEL_ID, TELEGRAM_CHANNEL_ADMIN_LIST),
creating duplicate config surfaces; move the parsing logic (use
_parse_channel_ids, _parse_debug_channel, _parse_admin_list, and
ban_list_resolver) into the TelegramBotSettings dataclass (or construct) so the
settings object returns already-parsed attributes (e.g.,
TelegramBotSettings.telegram_channel_id, .telegram_channel_admin_list,
.telebot_debug_channel, .group_message_ban_list, .bot_message_ban_list) and
remove the module-level parsed globals to ensure a single source of truth for
config and simplify testing/patching.
In `@apps/telegram-bot/core/main.py`:
- Around line 8-13: Validate settings.TELEGRAM_BOT_MODE explicitly instead of
defaulting to polling: check that settings.TELEGRAM_BOT_MODE is either "webhook"
or "polling" before calling uvicorn.run; if it's "webhook" run webhook_app, if
"polling" run callback_app, otherwise raise a clear error (e.g., ValueError)
with a message that includes the invalid value and expected options; update the
logging (logger.info) accordingly and reference TELEGRAM_BOT_PORT, webhook_app,
callback_app, and uvicorn.run in the change.
In `@tests/unit/test_telegraph.py`:
- Around line 117-132: The two tests patching TELEGRAPH_TOKEN_LIST both use an
empty string but should cover distinct cases; update one of them (e.g.,
test_empty_token_list_creates_token) to patch TELEGRAPH_TOKEN_LIST to None (or
remove the patch so it resolves to None) to explicitly test the None case,
keeping the other test with the empty-string patch to test falsy empty-string
behavior; ensure references to AsyncTelegraphPoster,
mock_poster.create_api_token, and the Telegraph(...) instantiation remain
unchanged so the assertions still validate token creation vs. set_token calls.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 21b24df1-ee81-47a2-b088-e434b22c9893
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (67)
apps/api/src/auth.pyapps/api/src/config.pyapps/api/src/database.pyapps/api/src/main.pyapps/api/src/routers/inoreader.pyapps/api/src/routers/scraper.pyapps/api/src/services/amazon/s3.pyapps/api/src/services/celery_client.pyapps/api/src/services/file_export/audio_transcribe/__init__.pyapps/api/src/services/file_export/document_export/pdf_export.pyapps/api/src/services/file_export/video_download/__init__.pyapps/api/src/services/inoreader/__init__.pyapps/api/src/services/inoreader/process.pyapps/api/src/services/scrapers/common.pyapps/async-worker/async_worker/celery_client.pyapps/async-worker/async_worker/config.pyapps/async-worker/async_worker/main.pyapps/async-worker/async_worker/services/enrichment.pyapps/async-worker/async_worker/services/outbox.pyapps/async-worker/async_worker/tasks/scrape.pyapps/telegram-bot/core/api_client.pyapps/telegram-bot/core/config.pyapps/telegram-bot/core/database.pyapps/telegram-bot/core/handlers/buttons.pyapps/telegram-bot/core/handlers/messages.pyapps/telegram-bot/core/handlers/url_process.pyapps/telegram-bot/core/main.pyapps/telegram-bot/core/queue_client.pyapps/telegram-bot/core/services/bot_app.pyapps/telegram-bot/core/services/message_sender.pyapps/telegram-bot/core/services/outbox_consumer.pyapps/telegram-bot/core/webhook/server.pyapps/worker/worker_core/config.pyapps/worker/worker_core/main.pyapps/worker/worker_core/tasks/pdf.pyapps/worker/worker_core/tasks/transcribe.pyapps/worker/worker_core/tasks/video.pypackages/shared/fastfetchbot_shared/config.pypackages/shared/fastfetchbot_shared/services/scrapers/config.pypackages/shared/fastfetchbot_shared/services/scrapers/general/base.pypackages/shared/fastfetchbot_shared/services/scrapers/general/firecrawl.pypackages/shared/fastfetchbot_shared/services/scrapers/general/firecrawl_client.pypackages/shared/fastfetchbot_shared/services/scrapers/general/scraper.pypackages/shared/fastfetchbot_shared/services/scrapers/general/zyte.pypackages/shared/fastfetchbot_shared/services/scrapers/instagram/__init__.pypackages/shared/fastfetchbot_shared/services/scrapers/reddit/__init__.pypackages/shared/fastfetchbot_shared/services/scrapers/scraper_manager.pypackages/shared/fastfetchbot_shared/services/scrapers/twitter/__init__.pypackages/shared/fastfetchbot_shared/services/scrapers/xiaohongshu/adaptar.pypackages/shared/fastfetchbot_shared/services/scrapers/zhihu/__init__.pypackages/shared/fastfetchbot_shared/services/scrapers/zhihu/config.pypackages/shared/fastfetchbot_shared/services/telegraph/__init__.pypackages/shared/fastfetchbot_shared/utils/logger.pypackages/shared/fastfetchbot_shared/utils/network.pypackages/shared/fastfetchbot_shared/utils/parse.pypackages/shared/fastfetchbot_shared/utils/pydantic_types.pypackages/shared/pyproject.tomltests/unit/async_worker/test_enrichment.pytests/unit/scrapers/test_general_base.pytests/unit/scrapers/test_general_firecrawl.pytests/unit/scrapers/test_general_scraper.pytests/unit/scrapers/test_general_zyte.pytests/unit/scrapers/test_scraper_config.pytests/unit/scrapers/test_scraper_manager.pytests/unit/scrapers/test_xiaohongshu.pytests/unit/scrapers/test_zhihu.pytests/unit/test_telegraph.py
💤 Files with no reviewable changes (1)
- packages/shared/fastfetchbot_shared/utils/parse.py
| # FastAPI | ||
| BASE_URL: str = "localhost" | ||
| API_KEY_NAME: str = "pwd" | ||
| API_KEY: str = Field(default_factory=lambda: secrets.token_urlsafe(32)) |
There was a problem hiding this comment.
Auto-generated API_KEY may cause authentication failures on restart.
If API_KEY is not set via environment variable, secrets.token_urlsafe(32) generates a new random key each time the application starts. This means:
- Clients authenticated with the previous key will fail after a restart
- The key is ephemeral and not persisted anywhere
Consider either requiring API_KEY to be explicitly set in production, or logging a warning when using an auto-generated key.
🛡️ Proposed fix: warn on auto-generated key
+import warnings
+
class ApiSettings(BaseSettings):
model_config = SettingsConfigDict(extra="ignore")
# FastAPI
BASE_URL: str = "localhost"
API_KEY_NAME: str = "pwd"
API_KEY: str = Field(default_factory=lambda: secrets.token_urlsafe(32))
+
+ `@model_validator`(mode="after")
+ def _resolve_derived(self) -> "ApiSettings":
+ # Warn if API_KEY was auto-generated (check if it looks like a random token)
+ if not os.environ.get("API_KEY"):
+ warnings.warn(
+ "API_KEY not set; using auto-generated key that will change on restart",
+ UserWarning
+ )
if not self.DOWNLOAD_DIR:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/api/src/config.py` around lines 14 - 17, The current API_KEY uses
Field(default_factory=lambda: secrets.token_urlsafe(32)) which auto-generates a
new key on each restart and breaks clients; update the config so production
requires an explicit API_KEY (remove or replace the default_factory) and
implement a startup-time check that detects when API_KEY was not provided and an
auto-generated value is used, then emit a clear warning via the app logger;
reference the API_KEY Field and the config/settings initializer (or
__post_init__ / startup hook that constructs the Settings) to perform the
detection and logging so developers see when a non-persistent key is in use.
| async def upload( | ||
| staging_path: Path, | ||
| bucket: str = AWS_S3_BUCKET_NAME, | ||
| bucket: str = None, |
There was a problem hiding this comment.
Use explicit Optional or | None for nullable parameters.
Per PEP 484 and the static analysis hints, bucket: str = None and file_name: str = None should use explicit optional typing.
🔧 Proposed fix
+from typing import Optional
+
async def upload(
staging_path: Path,
- bucket: str = None,
+ bucket: Optional[str] = None,
suite: str = "test",
release: str = datetime.now().strftime("%Y-%m-%d"),
- file_name: str = None,
+ file_name: Optional[str] = None,
) -> str:Also applies to: 46-46
🧰 Tools
🪛 Ruff (0.15.7)
[warning] 43-43: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/api/src/services/amazon/s3.py` at line 43, The function signature(s)
that declare nullable string parameters (e.g., parameters named bucket and
file_name) currently use `bucket: str = None` / `file_name: str = None`; update
those annotations to explicit optional types (either `Optional[str]` with `from
typing import Optional` or the PEP 604 style `str | None`) and keep the default
`= None`; locate and update the signatures in the relevant functions/methods in
s3.py (search for parameters named `bucket` and `file_name`) and add the
necessary import if you choose `Optional`.
| from fastfetchbot_shared.utils.logger import logger | ||
|
|
||
|
|
||
| async def get_item(url: str, ban_list: list = None, **kwargs) -> dict: |
There was a problem hiding this comment.
Use explicit Optional or | None for nullable parameters.
Per PEP 484 and the static analysis hint, ban_list: list = None should use explicit optional typing. The same applies to line 22.
🔧 Proposed fix
+from typing import Optional
+
+
-async def get_item(url: str, ban_list: list = None, **kwargs) -> dict:
+async def get_item(url: str, ban_list: Optional[list] = None, **kwargs) -> dict:And similarly for get_url_metadata:
-async def get_url_metadata(url: str, ban_list: list = None) -> dict:
+async def get_url_metadata(url: str, ban_list: Optional[list] = None) -> dict:🧰 Tools
🪛 Ruff (0.15.7)
[warning] 6-6: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/telegram-bot/core/api_client.py` at line 6, Update nullable parameter
annotations to use explicit Optional (or PEP 604 union) types: change get_item
signature's ban_list: list = None to ban_list: Optional[list] = None (or
ban_list: list | None = None) and do the same for the corresponding nullable
parameter in get_url_metadata; ensure you import typing.Optional if using
Optional and keep the default None unchanged.
| API_KEY_NAME: str = "pwd" | ||
| API_KEY: str = Field(default_factory=lambda: secrets.token_urlsafe(32)) | ||
|
|
||
| # Bot's own BASE_URL | ||
| BASE_URL: str = "localhost" | ||
|
|
||
| # Telegram bot | ||
| TELEGRAM_BOT_ON: bool = True | ||
| TELEGRAM_BOT_MODE: str = "polling" | ||
| TELEGRAM_BOT_TOKEN: Optional[str] = None | ||
| TELEGRAM_BOT_SECRET_TOKEN: str = Field( | ||
| default_factory=lambda: secrets.token_urlsafe(32) | ||
| ) |
There was a problem hiding this comment.
Do not generate shared auth tokens per process.
API_KEY and TELEGRAM_BOT_SECRET_TOKEN cross process boundaries. Falling back to secrets.token_urlsafe(...) means each bot/API instance gets a different value unless every deployment explicitly sets both env vars, which breaks bot→API auth and makes webhook auth unstable across restarts or replicas.
Proposed fix
- API_KEY: str = Field(default_factory=lambda: secrets.token_urlsafe(32))
+ API_KEY: str
@@
- TELEGRAM_BOT_SECRET_TOKEN: str = Field(
- default_factory=lambda: secrets.token_urlsafe(32)
- )
+ TELEGRAM_BOT_SECRET_TOKEN: str🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/telegram-bot/core/config.py` around lines 15 - 27, Remove the
process-local default token generation for API_KEY and TELEGRAM_BOT_SECRET_TOKEN
and require values from environment/config: replace the
Field(default_factory=...) usages so API_KEY and TELEGRAM_BOT_SECRET_TOKEN are
read from env (e.g., Field(..., env="API_KEY") / Field(...,
env="TELEGRAM_BOT_SECRET_TOKEN") or make them required Optional[str] without a
default and validate at startup), ensuring the symbols API_KEY and
TELEGRAM_BOT_SECRET_TOKEN in core/config.py no longer fall back to
secrets.token_urlsafe and instead rely on environment-provided secrets so tokens
are stable across processes and restarts.
| async def set_webhook() -> bool: | ||
| logger.debug(f"set_webhook: {TELEGRAM_WEBHOOK_URL}, secret_token: {TELEGRAM_BOT_SECRET_TOKEN}") | ||
| logger.debug(f"set_webhook: {settings.TELEGRAM_WEBHOOK_URL}, secret_token: {settings.TELEGRAM_BOT_SECRET_TOKEN}") | ||
| return await application.bot.set_webhook( | ||
| url=TELEGRAM_WEBHOOK_URL, secret_token=TELEGRAM_BOT_SECRET_TOKEN | ||
| url=settings.TELEGRAM_WEBHOOK_URL, secret_token=settings.TELEGRAM_BOT_SECRET_TOKEN | ||
| ) |
There was a problem hiding this comment.
Remove the webhook secret from logs.
settings.TELEGRAM_BOT_SECRET_TOKEN is an authentication credential; emitting it here leaks the value into centralized logs and error reports.
Proposed fix
- logger.debug(f"set_webhook: {settings.TELEGRAM_WEBHOOK_URL}, secret_token: {settings.TELEGRAM_BOT_SECRET_TOKEN}")
+ logger.debug(
+ f"set_webhook: {settings.TELEGRAM_WEBHOOK_URL}, "
+ f"secret_token_configured={bool(settings.TELEGRAM_BOT_SECRET_TOKEN)}"
+ )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/telegram-bot/core/services/bot_app.py` around lines 36 - 40, Remove the
secret token from debug logs in set_webhook: stop logging
settings.TELEGRAM_BOT_SECRET_TOKEN and instead log only non-sensitive contextual
info (e.g., settings.TELEGRAM_WEBHOOK_URL or a placeholder). Update the
logger.debug call in the set_webhook function to omit or redact
settings.TELEGRAM_BOT_SECRET_TOKEN, leaving the rest of the call and the await
application.bot.set_webhook(...) unchanged.
| def _load_zhihu_cookies(conf_dir: str) -> Optional[dict]: | ||
| zhihu_cookie_path = os.path.join(conf_dir, "zhihu_cookies.json") | ||
| if os.path.exists(zhihu_cookie_path): | ||
| try: | ||
| with open(zhihu_cookie_path, "r") as f: | ||
| return json.load(f) | ||
| except json.JSONDecodeError: | ||
| logger.error("Error: zhihu_cookies.json is not in a valid JSON format.") | ||
| return None | ||
| except FileNotFoundError: | ||
| logger.error("Error: zhihu_cookies.json does not exist.") | ||
| return None | ||
| return None |
There was a problem hiding this comment.
Missing encoding parameter for file read.
Line 185 opens the Zhihu cookies file without specifying encoding, while Line 166 (XHS cookies) correctly uses encoding="utf-8". This inconsistency could cause issues with non-ASCII content on systems where the default encoding isn't UTF-8.
🔧 Proposed fix: add encoding parameter
if os.path.exists(zhihu_cookie_path):
try:
- with open(zhihu_cookie_path, "r") as f:
+ with open(zhihu_cookie_path, "r", encoding="utf-8") as f:
return json.load(f)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/shared/fastfetchbot_shared/services/scrapers/config.py` around lines
181 - 193, The _load_zhihu_cookies function opens zhihu_cookies.json without an
explicit encoding which can break on non-UTF-8 systems; update the open(...)
call inside _load_zhihu_cookies to include encoding="utf-8" (preserve the
try/except and return behavior) so JSON.load reads the file with UTF-8 encoding
consistent with the XHS loader.
| import httpx | ||
|
|
||
| from fastfetchbot_shared.config import SIGN_SERVER_URL | ||
| from fastfetchbot_shared.config import settings as shared_settings |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Find SIGN_SERVER_URL definitions =="
rg -n --type=py '\bSIGN_SERVER_URL\b'
echo "== Find settings modules under fastfetchbot_shared =="
fd config.py packages/shared/fastfetchbot_shared
echo "== Find scraper modules importing settings =="
rg -n --type=py 'from fastfetchbot_shared(\.services\.scrapers)?\.config import settings|from \.config import .*settings' packages/shared/fastfetchbot_shared/services/scrapersRepository: aturret/FastFetchBot
Length of output: 3434
Import SIGN_SERVER_URL from the scraper config module.
Line 10 imports from fastfetchbot_shared.config instead of the scraper config module. All other scrapers import from fastfetchbot_shared.services.scrapers.config. Change the import to:
from fastfetchbot_shared.services.scrapers.config import settingsThen update line 53 to use settings.SIGN_SERVER_URL (or XHS_SIGN_SERVER_URL if preferred), removing the shared_settings alias.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/shared/fastfetchbot_shared/services/scrapers/xiaohongshu/adaptar.py`
at line 10, Replace the incorrect import of shared_settings with the scraper
config settings by changing the import to use
fastfetchbot_shared.services.scrapers.config (so the module variable to use is
settings rather than shared_settings) and then update any usage of
shared_settings.SIGN_SERVER_URL to settings.SIGN_SERVER_URL (or
settings.XHS_SIGN_SERVER_URL if you prefer that constant name); specifically
update the import statement and the reference in the code that currently uses
shared_settings (e.g., the place where SIGN_SERVER_URL is read) to reference the
new settings symbol.
| self.sign_server_endpoint = (sign_server_endpoint or shared_settings.SIGN_SERVER_URL).rstrip("/") | ||
| if not self.sign_server_endpoint: | ||
| raise ValueError( | ||
| "XhsSinglePostAdapter requires a sign server URL. " | ||
| "Set SIGN_SERVER_URL in the environment or pass sign_server_endpoint explicitly." | ||
| "Set shared_settings.SIGN_SERVER_URL in the environment or pass sign_server_endpoint explicitly." |
There was a problem hiding this comment.
Guard nullable settings before calling .rstrip().
Line 53 can raise AttributeError if shared_settings.SIGN_SERVER_URL is None, preventing the intended ValueError path.
💡 Proposed fix
- self.sign_server_endpoint = (sign_server_endpoint or shared_settings.SIGN_SERVER_URL).rstrip("/")
+ configured_endpoint = sign_server_endpoint or shared_settings.SIGN_SERVER_URL or ""
+ self.sign_server_endpoint = configured_endpoint.rstrip("/")
if not self.sign_server_endpoint:
raise ValueError(
"XhsSinglePostAdapter requires a sign server URL. "
- "Set shared_settings.SIGN_SERVER_URL in the environment or pass sign_server_endpoint explicitly."
+ "Set SIGN_SERVER_URL in the environment or pass sign_server_endpoint explicitly."
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| self.sign_server_endpoint = (sign_server_endpoint or shared_settings.SIGN_SERVER_URL).rstrip("/") | |
| if not self.sign_server_endpoint: | |
| raise ValueError( | |
| "XhsSinglePostAdapter requires a sign server URL. " | |
| "Set SIGN_SERVER_URL in the environment or pass sign_server_endpoint explicitly." | |
| "Set shared_settings.SIGN_SERVER_URL in the environment or pass sign_server_endpoint explicitly." | |
| configured_endpoint = sign_server_endpoint or shared_settings.SIGN_SERVER_URL or "" | |
| self.sign_server_endpoint = configured_endpoint.rstrip("/") | |
| if not self.sign_server_endpoint: | |
| raise ValueError( | |
| "XhsSinglePostAdapter requires a sign server URL. " | |
| "Set SIGN_SERVER_URL in the environment or pass sign_server_endpoint explicitly." | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/shared/fastfetchbot_shared/services/scrapers/xiaohongshu/adaptar.py`
around lines 53 - 57, The constructor currently calls .rstrip("/") on
(sign_server_endpoint or shared_settings.SIGN_SERVER_URL) which will raise
AttributeError if shared_settings.SIGN_SERVER_URL is None; change the logic in
XhsSinglePostAdapter.__init__ to first capture the raw value (e.g., raw_endpoint
= sign_server_endpoint if sign_server_endpoint is not None else
shared_settings.SIGN_SERVER_URL), then if raw_endpoint is falsy/None raise the
intended ValueError, otherwise set self.sign_server_endpoint =
raw_endpoint.rstrip("/"); this preserves the ValueError path and avoids calling
.rstrip on None.
| def _parse_comma_list(v: str | list[str]) -> list[str]: | ||
| """Parse a comma-separated string into a list of stripped, non-empty strings.""" | ||
| if isinstance(v, list): | ||
| return v | ||
| return [x.strip() for x in v.split(",") if x.strip()] if v else [] | ||
|
|
||
|
|
||
| def _parse_optional_comma_list(v: str | list[str] | None) -> Optional[list[str]]: | ||
| """Parse a comma-separated string into a list, returning None for empty input.""" | ||
| if v is None: | ||
| return None | ||
| if isinstance(v, list): | ||
| return v or None | ||
| result = [x.strip() for x in v.split(",") if x.strip()] | ||
| return result or None |
There was a problem hiding this comment.
Normalize list inputs in the list branch too.
Both helpers strip and drop empty items for string input, but direct list input is returned unchanged. [" a ", "", "b"] will therefore keep invalid entries and diverge from the documented behavior.
Proposed fix
+def _normalize_list(values: list[str]) -> list[str]:
+ return [item.strip() for item in values if item and item.strip()]
+
+
def _parse_comma_list(v: str | list[str]) -> list[str]:
"""Parse a comma-separated string into a list of stripped, non-empty strings."""
if isinstance(v, list):
- return v
+ return _normalize_list(v)
return [x.strip() for x in v.split(",") if x.strip()] if v else []
@@
def _parse_optional_comma_list(v: str | list[str] | None) -> Optional[list[str]]:
"""Parse a comma-separated string into a list, returning None for empty input."""
if v is None:
return None
if isinstance(v, list):
- return v or None
+ result = _normalize_list(v)
+ return result or None
result = [x.strip() for x in v.split(",") if x.strip()]
return result or None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/shared/fastfetchbot_shared/utils/pydantic_types.py` around lines 6 -
20, The list-branch in _parse_comma_list and _parse_optional_comma_list returns
input lists unchanged, which allows unstripped/empty elements; change the
isinstance(v, list) handling to normalize the list exactly like the string
branch (strip each item and filter out empty strings) so e.g. [" a ", "", "b"]
becomes ["a","b"]; for _parse_optional_comma_list return the normalized list or
None when it becomes empty to preserve its current None semantics.
| def test_xhs_cookie_default_path_when_no_override(self): | ||
| """When XHS_COOKIE_PATH is empty, uses CONF_DIR/xhs_cookies.txt.""" | ||
| """When XHS_COOKIE_PATH is empty, _load_xhs_cookies uses CONF_DIR/xhs_cookies.txt.""" | ||
| cfg = _reload_config( | ||
| path_exists_side_effect=lambda p: False, | ||
| xhs_cookie_path_override="", | ||
| ) | ||
| expected = os.path.join(cfg.CONF_DIR, "xhs_cookies.txt") | ||
| assert cfg.xhs_cookie_path == expected | ||
| # The settings field stores the raw env value (empty string) | ||
| assert cfg.settings.XHS_COOKIE_PATH == "" |
There was a problem hiding this comment.
Keep asserting the fallback lookup, not just the raw env field.
This test no longer checks that an empty XHS_COOKIE_PATH falls back to CONF_DIR/xhs_cookies.txt; it only checks that the settings field is empty, so the behavior described in the docstring can now regress unnoticed.
Proposed fix
def test_xhs_cookie_default_path_when_no_override(self):
"""When XHS_COOKIE_PATH is empty, _load_xhs_cookies uses CONF_DIR/xhs_cookies.txt."""
+ seen_paths = []
cfg = _reload_config(
- path_exists_side_effect=lambda p: False,
+ path_exists_side_effect=lambda p: seen_paths.append(p) or False,
xhs_cookie_path_override="",
)
- # The settings field stores the raw env value (empty string)
assert cfg.settings.XHS_COOKIE_PATH == ""
+ assert os.path.join(cfg.settings.CONF_DIR, "xhs_cookies.txt") in seen_paths📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def test_xhs_cookie_default_path_when_no_override(self): | |
| """When XHS_COOKIE_PATH is empty, uses CONF_DIR/xhs_cookies.txt.""" | |
| """When XHS_COOKIE_PATH is empty, _load_xhs_cookies uses CONF_DIR/xhs_cookies.txt.""" | |
| cfg = _reload_config( | |
| path_exists_side_effect=lambda p: False, | |
| xhs_cookie_path_override="", | |
| ) | |
| expected = os.path.join(cfg.CONF_DIR, "xhs_cookies.txt") | |
| assert cfg.xhs_cookie_path == expected | |
| # The settings field stores the raw env value (empty string) | |
| assert cfg.settings.XHS_COOKIE_PATH == "" | |
| def test_xhs_cookie_default_path_when_no_override(self): | |
| """When XHS_COOKIE_PATH is empty, _load_xhs_cookies uses CONF_DIR/xhs_cookies.txt.""" | |
| seen_paths = [] | |
| cfg = _reload_config( | |
| path_exists_side_effect=lambda p: seen_paths.append(p) or False, | |
| xhs_cookie_path_override="", | |
| ) | |
| assert cfg.settings.XHS_COOKIE_PATH == "" | |
| assert os.path.join(cfg.settings.CONF_DIR, "xhs_cookies.txt") in seen_paths |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/scrapers/test_scraper_config.py` around lines 399 - 406, The test
only asserts the raw env field is empty but must also assert the actual fallback
path is used; update test_xhs_cookie_default_path_when_no_override to, after
calling _reload_config, verify that the resolved cookie path equals
os.path.join(cfg.CONF_DIR, "xhs_cookies.txt") by either calling the resolver
function (_load_xhs_cookies(cfg)) or inspecting the attribute that stores the
resolved path, while still keeping the existing assertion that
cfg.settings.XHS_COOKIE_PATH == "".
Summary by CodeRabbit
Refactor