Skip to content

feat: use pydantic-settings as default config format#71

Merged
aturret merged 1 commit intomainfrom
pydantic-config-refactor
Mar 26, 2026
Merged

feat: use pydantic-settings as default config format#71
aturret merged 1 commit intomainfrom
pydantic-config-refactor

Conversation

@aturret
Copy link
Copy Markdown
Owner

@aturret aturret commented Mar 26, 2026

Summary by CodeRabbit

Refactor

  • Configuration Management Overhaul - Centralized application settings from scattered constants to a unified, type-safe configuration system using Pydantic BaseSettings across all services. This improves configuration consistency and environment variable handling across the platform while maintaining full backward compatibility with existing deployments.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 26, 2026

📝 Walkthrough

Walkthrough

This PR refactors configuration management across the entire codebase by replacing scattered module-level environment variable constants with centralized Pydantic BaseSettings-based configuration classes. Each application and package now exports a single settings instance that provides typed, validated access to environment variables.

Changes

Cohort / File(s) Summary
Core Config Classes
apps/api/src/config.py, apps/async-worker/async_worker/config.py, apps/telegram-bot/core/config.py, apps/worker/worker_core/config.py, packages/shared/fastfetchbot_shared/config.py, packages/shared/fastfetchbot_shared/services/scrapers/config.py
Introduced BaseSettings-derived configuration classes (ApiSettings, AsyncWorkerSettings, TelegramBotSettings, WorkerSettings, SharedSettings, ScrapersSettings) with typed fields, defaults, and post-validation hooks for derived values. Removed module-level environment variable assignments, replacing them with settings instances.
API Layer Configuration Consumers
apps/api/src/auth.py, apps/api/src/database.py, apps/api/src/main.py, apps/api/src/routers/inoreader.py, apps/api/src/routers/scraper.py, apps/api/src/services/amazon/s3.py, apps/api/src/services/celery_client.py, apps/api/src/services/file_export/.../*, apps/api/src/services/inoreader/*, apps/api/src/services/scrapers/common.py
Updated to import and use settings from src.config instead of importing individual constants; all configuration references now access settings.* attributes.
Async-Worker Consumers
apps/async-worker/async_worker/main.py, apps/async-worker/async_worker/services/enrichment.py, apps/async-worker/async_worker/services/outbox.py, apps/async-worker/async_worker/tasks/scrape.py
Updated to source configuration from async_worker.config.settings instead of module-level constants for Celery, Redis, feature flags, and timeout settings.
Telegram Bot Consumers
apps/telegram-bot/core/api_client.py, apps/telegram-bot/core/database.py, apps/telegram-bot/core/handlers/buttons.py, apps/telegram-bot/core/handlers/messages.py, apps/telegram-bot/core/handlers/url_process.py, apps/telegram-bot/core/main.py, apps/telegram-bot/core/queue_client.py, apps/telegram-bot/core/services/bot_app.py, apps/telegram-bot/core/services/message_sender.py, apps/telegram-bot/core/services/outbox_consumer.py, apps/telegram-bot/core/webhook/server.py
Refactored to access core.config.settings for all configuration needs (API server URL, credentials, bot mode, timeouts, image limits, Telegram channel/admin IDs, database flags, Redis URLs).
Worker Task Consumers
apps/worker/worker_core/main.py, apps/worker/worker_core/tasks/pdf.py, apps/worker/worker_core/tasks/transcribe.py, apps/worker/worker_core/tasks/video.py
Updated Celery configuration and task implementations to source broker/backend/directory/API-key settings from worker_core.config.settings.
Shared Library Consumers
packages/shared/fastfetchbot_shared/utils/logger.py, packages/shared/fastfetchbot_shared/utils/network.py, packages/shared/fastfetchbot_shared/services/scrapers/general/base.py, packages/shared/fastfetchbot_shared/services/scrapers/general/firecrawl.py, packages/shared/fastfetchbot_shared/services/scrapers/general/firecrawl_client.py, packages/shared/fastfetchbot_shared/services/scrapers/general/scraper.py, packages/shared/fastfetchbot_shared/services/scrapers/general/zyte.py, packages/shared/fastfetchbot_shared/services/scrapers/instagram/__init__.py, packages/shared/fastfetchbot_shared/services/scrapers/reddit/__init__.py, packages/shared/fastfetchbot_shared/services/scrapers/scraper_manager.py, packages/shared/fastfetchbot_shared/services/scrapers/twitter/__init__.py, packages/shared/fastfetchbot_shared/services/scrapers/xiaohongshu/adaptar.py, packages/shared/fastfetchbot_shared/services/scrapers/zhihu/__init__.py, packages/shared/fastfetchbot_shared/services/scrapers/zhihu/config.py, packages/shared/fastfetchbot_shared/services/telegraph/__init__.py
Updated to use settings objects from their respective config modules for API keys, timeouts, URLs, cookies, and feature flags across logging, HTTP clients, and scraper implementations.
Pydantic Type Utilities & Dependencies
packages/shared/fastfetchbot_shared/utils/pydantic_types.py, packages/shared/pyproject.toml, packages/shared/fastfetchbot_shared/utils/parse.py
Added new CommaSeparatedList and OptionalCommaSeparatedList Pydantic-annotated types for configuration list parsing; added pydantic-settings>=2.0.0 dependency; removed get_env_bool helper function.
Unit Tests
tests/unit/async_worker/test_enrichment.py, tests/unit/scrapers/test_general_*.py, tests/unit/scrapers/test_scraper_*.py, tests/unit/test_telegraph.py, tests/unit/scrapers/test_zhihu.py, tests/unit/scrapers/test_xiaohongshu.py
Updated mock/patch targets to use centralized settings objects instead of module-level constants; adjusted test assertions to read configuration from settings.* attributes and updated reload/reload-testing logic.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • PR #67: Overlapping refactoring of scraper and telegraph configuration code to use shared settings; both modify the same config and scraper modules.
  • PR #69: Introduced async-worker configuration that this PR now refactors to use Pydantic BaseSettings; both modify async-worker config and Celery patterns.
  • PR #58: Relates to Telegram bot configuration changes including database flag renaming and settings-based configuration sourcing.

Poem

🐰 From scattered constants to one settings home,
Environment variables no longer roam—
Pydantic brings order, with validation's might,
Config now typed, validated, and right! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 32.53% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: use pydantic-settings as default config format' accurately summarizes the core change across all modified files—migrating from direct environment-variable imports and constants to a centralized pydantic_settings.BaseSettings configuration pattern.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch pydantic-config-refactor

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 26, 2026

Codecov Report

❌ Patch coverage is 90.50445% with 32 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
apps/telegram-bot/core/config.py 90.72% 9 Missing ⚠️
apps/telegram-bot/core/handlers/url_process.py 0.00% 8 Missing ⚠️
apps/telegram-bot/core/services/bot_app.py 37.50% 5 Missing ⚠️
...shared/fastfetchbot_shared/utils/pydantic_types.py 80.00% 3 Missing ⚠️
apps/telegram-bot/core/api_client.py 33.33% 2 Missing ⚠️
apps/telegram-bot/core/database.py 50.00% 1 Missing ⚠️
apps/telegram-bot/core/handlers/buttons.py 50.00% 1 Missing ⚠️
apps/telegram-bot/core/handlers/messages.py 50.00% 1 Missing ⚠️
apps/telegram-bot/core/services/message_sender.py 50.00% 1 Missing ⚠️
...ckages/shared/fastfetchbot_shared/utils/network.py 50.00% 1 Missing ⚠️

Impacted file tree graph

@@           Coverage Diff           @@
##             main      #71   +/-   ##
=======================================
  Coverage        ?   72.17%           
=======================================
  Files           ?       62           
  Lines           ?     3184           
  Branches        ?        0           
=======================================
  Hits            ?     2298           
  Misses          ?      886           
  Partials        ?        0           
Files with missing lines Coverage Δ
apps/async-worker/async_worker/config.py 100.00% <100.00%> (ø)
apps/async-worker/async_worker/main.py 100.00% <100.00%> (ø)
...s/async-worker/async_worker/services/enrichment.py 100.00% <100.00%> (ø)
apps/async-worker/async_worker/services/outbox.py 100.00% <100.00%> (ø)
apps/async-worker/async_worker/tasks/scrape.py 100.00% <100.00%> (ø)
apps/telegram-bot/core/queue_client.py 100.00% <100.00%> (ø)
apps/telegram-bot/core/services/outbox_consumer.py 95.52% <100.00%> (ø)
packages/shared/fastfetchbot_shared/config.py 100.00% <100.00%> (ø)
...ed/fastfetchbot_shared/services/scrapers/config.py 100.00% <100.00%> (ø)
...tfetchbot_shared/services/scrapers/general/base.py 100.00% <100.00%> (ø)
... and 19 more
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
apps/telegram-bot/core/handlers/url_process.py (1)

103-117: ⚠️ Potential issue | 🟠 Major

Don’t report “no supported url” after successfully general-scraping it.

When settings.GENERAL_SCRAPING_ON is true, this branch fetches and sends the page, but then always overwrites the status with a failure message and returns. That also skips any later URLs in the same Telegram message.

Proposed fix
         if url_metadata["source"] == "unknown":
             if settings.GENERAL_SCRAPING_ON:
                 await process_message.edit_text(
                     text=f"Uncategorized url found. General webpage parser is on, Processing..."
                 )
                 await _fetch_and_send(
                     url=url_metadata["url"],
                     chat_id=message.chat_id,
                     source=url_metadata.get("source", ""),
                     content_type=url_metadata.get("content_type", ""),
                 )
+                await process_message.delete()
+                continue
             await process_message.edit_text(
                 text=f"For the {i + 1} th url, no supported url found."
             )
-            return
+            continue
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/telegram-bot/core/handlers/url_process.py` around lines 103 - 117, The
branch handling url_metadata["source"] == "unknown" currently always sends a "no
supported url" message after calling _fetch_and_send when
settings.GENERAL_SCRAPING_ON is true; change the control flow so that when
GENERAL_SCRAPING_ON is true you send the "Processing..." message, await
_fetch_and_send(url=url_metadata["url"], chat_id=message.chat_id, source=...,
content_type=...), and then exit/continue without sending the failure text or
returning prematurely; only send the failure message and return when
GENERAL_SCRAPING_ON is false (or when the fetch actually fails), keeping
references to process_message.edit_text, _fetch_and_send, url_metadata,
settings.GENERAL_SCRAPING_ON and the loop index i to locate and update the
logic.
apps/telegram-bot/core/services/bot_app.py (1)

43-60: ⚠️ Potential issue | 🟠 Major

Fail fast instead of leaving application undefined.

If TELEGRAM_BOT_TOKEN is missing or empty, this module only logs and keeps loading. Later calls to startup, shutdown, set_webhook, or process_telegram_update will crash because application was never built.

Proposed fix
+application: Application | None = None
+
-if settings.TELEGRAM_BOT_TOKEN is not None:
+if settings.TELEGRAM_BOT_TOKEN:
     builder = (
         Application.builder()
         .token(settings.TELEGRAM_BOT_TOKEN)
         .arbitrary_callback_data(True)
         .connect_timeout(settings.TELEBOT_CONNECT_TIMEOUT)
@@
     if settings.TELEGRAM_BOT_MODE == "webhook":
         builder = builder.updater(None)
     application = builder.build()
 else:
-    logger.error("TELEGRAM_BOT_TOKEN is not set!")
+    raise RuntimeError("TELEGRAM_BOT_TOKEN is not set")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/telegram-bot/core/services/bot_app.py` around lines 43 - 60, The module
currently logs and continues when settings.TELEGRAM_BOT_TOKEN is missing,
leaving the variable application undefined; update the top-level initialization
in bot_app.py to fail fast by raising an explicit exception or calling sys.exit
with a clear error when settings.TELEGRAM_BOT_TOKEN is None/empty so application
cannot remain undefined. Specifically, in the block that checks
settings.TELEGRAM_BOT_TOKEN (which builds Application via Application.builder(),
.token(...), etc.), replace or follow the logger.error("TELEGRAM_BOT_TOKEN is
not set!") with a RuntimeError or SystemExit containing the same diagnostic
message so subsequent functions like startup, shutdown, set_webhook, and
process_telegram_update see a deterministic failure instead of an unbound
application. Ensure the error message references TELEGRAM_BOT_TOKEN and keeps
existing log text for clarity.
🧹 Nitpick comments (4)
apps/api/src/main.py (1)

12-12: Consider moving SENTRY_DSN to the settings object.

The SENTRY_DSN is hardcoded as an empty string here. Per coding guidelines, Sentry should be used for production error monitoring. Consider adding SENTRY_DSN to the ApiSettings class to allow configuration via environment variables, consistent with this PR's refactor pattern.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/api/src/main.py` at line 12, Replace the hardcoded SENTRY_DSN variable
with a configuration field on the ApiSettings class: add a SENTRY_DSN (string,
default empty) attribute to ApiSettings, load it from environment like the other
settings, and then use settings.SENTRY_DSN wherever SENTRY_DSN was referenced in
main.py instead of the top-level SENTRY_DSN constant; ensure any initialization
logic that configures sentry now reads from ApiSettings so the value can be
provided via env vars.
apps/telegram-bot/core/main.py (1)

8-13: Validate TELEGRAM_BOT_MODE explicitly.

Right now any unexpected mode value falls back to polling. Consider raising a clear error for invalid values to avoid silent misconfiguration.

💡 Proposed refactor
-    if settings.TELEGRAM_BOT_MODE == "webhook":
+    if settings.TELEGRAM_BOT_MODE == "webhook":
         logger.info(f"Running in webhook mode on port {settings.TELEGRAM_BOT_PORT}")
         uvicorn.run(webhook_app, host="0.0.0.0", port=settings.TELEGRAM_BOT_PORT)
-    else:
+    elif settings.TELEGRAM_BOT_MODE == "polling":
         logger.info(f"Running in polling mode (HTTP server on port {settings.TELEGRAM_BOT_PORT} for callbacks)")
         uvicorn.run(callback_app, host="0.0.0.0", port=settings.TELEGRAM_BOT_PORT)
+    else:
+        raise ValueError(f"Invalid TELEGRAM_BOT_MODE: {settings.TELEGRAM_BOT_MODE}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/telegram-bot/core/main.py` around lines 8 - 13, Validate
settings.TELEGRAM_BOT_MODE explicitly instead of defaulting to polling: check
that settings.TELEGRAM_BOT_MODE is either "webhook" or "polling" before calling
uvicorn.run; if it's "webhook" run webhook_app, if "polling" run callback_app,
otherwise raise a clear error (e.g., ValueError) with a message that includes
the invalid value and expected options; update the logging (logger.info)
accordingly and reference TELEGRAM_BOT_PORT, webhook_app, callback_app, and
uvicorn.run in the change.
tests/unit/test_telegraph.py (1)

117-132: Clarify empty string parsing behavior in tests.

Two tests (test_no_token_list_creates_token and test_empty_token_list_creates_token) both patch with "" and expect token creation. The docstring at line 138 says "Empty string parses to None (falsy)", but this duplicates the previous test's scenario. Consider differentiating these tests—for example, one could test None explicitly if the settings support it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/test_telegraph.py` around lines 117 - 132, The two tests patching
TELEGRAPH_TOKEN_LIST both use an empty string but should cover distinct cases;
update one of them (e.g., test_empty_token_list_creates_token) to patch
TELEGRAPH_TOKEN_LIST to None (or remove the patch so it resolves to None) to
explicitly test the None case, keeping the other test with the empty-string
patch to test falsy empty-string behavior; ensure references to
AsyncTelegraphPoster, mock_poster.create_api_token, and the Telegraph(...)
instantiation remain unchanged so the assertions still validate token creation
vs. set_token calls.
apps/telegram-bot/core/config.py (1)

121-166: Prefer parsed computed fields over duplicate module globals.

settings.TELEGRAM_CHANNEL_ID and settings.TELEGRAM_CHANNEL_ADMIN_LIST stay as raw strings while the module exports parsed values with the same names. That leaves two config surfaces to keep straight and makes test patching or later config reloads easy to get wrong. Moving the parsed forms onto TelegramBotSettings would keep this refactor single-source.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/telegram-bot/core/config.py` around lines 121 - 166, The module
currently leaves raw strings on settings (settings.TELEGRAM_CHANNEL_ID,
settings.TELEGRAM_CHANNEL_ADMIN_LIST) while exporting parsed globals
(TELEGRAM_CHANNEL_ID, TELEGRAM_CHANNEL_ADMIN_LIST), creating duplicate config
surfaces; move the parsing logic (use _parse_channel_ids, _parse_debug_channel,
_parse_admin_list, and ban_list_resolver) into the TelegramBotSettings dataclass
(or construct) so the settings object returns already-parsed attributes (e.g.,
TelegramBotSettings.telegram_channel_id, .telegram_channel_admin_list,
.telebot_debug_channel, .group_message_ban_list, .bot_message_ban_list) and
remove the module-level parsed globals to ensure a single source of truth for
config and simplify testing/patching.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/api/src/config.py`:
- Around line 14-17: The current API_KEY uses Field(default_factory=lambda:
secrets.token_urlsafe(32)) which auto-generates a new key on each restart and
breaks clients; update the config so production requires an explicit API_KEY
(remove or replace the default_factory) and implement a startup-time check that
detects when API_KEY was not provided and an auto-generated value is used, then
emit a clear warning via the app logger; reference the API_KEY Field and the
config/settings initializer (or __post_init__ / startup hook that constructs the
Settings) to perform the detection and logging so developers see when a
non-persistent key is in use.

In `@apps/api/src/services/amazon/s3.py`:
- Line 43: The function signature(s) that declare nullable string parameters
(e.g., parameters named bucket and file_name) currently use `bucket: str = None`
/ `file_name: str = None`; update those annotations to explicit optional types
(either `Optional[str]` with `from typing import Optional` or the PEP 604 style
`str | None`) and keep the default `= None`; locate and update the signatures in
the relevant functions/methods in s3.py (search for parameters named `bucket`
and `file_name`) and add the necessary import if you choose `Optional`.

In `@apps/telegram-bot/core/api_client.py`:
- Line 6: Update nullable parameter annotations to use explicit Optional (or PEP
604 union) types: change get_item signature's ban_list: list = None to ban_list:
Optional[list] = None (or ban_list: list | None = None) and do the same for the
corresponding nullable parameter in get_url_metadata; ensure you import
typing.Optional if using Optional and keep the default None unchanged.

In `@apps/telegram-bot/core/config.py`:
- Around line 15-27: Remove the process-local default token generation for
API_KEY and TELEGRAM_BOT_SECRET_TOKEN and require values from
environment/config: replace the Field(default_factory=...) usages so API_KEY and
TELEGRAM_BOT_SECRET_TOKEN are read from env (e.g., Field(..., env="API_KEY") /
Field(..., env="TELEGRAM_BOT_SECRET_TOKEN") or make them required Optional[str]
without a default and validate at startup), ensuring the symbols API_KEY and
TELEGRAM_BOT_SECRET_TOKEN in core/config.py no longer fall back to
secrets.token_urlsafe and instead rely on environment-provided secrets so tokens
are stable across processes and restarts.

In `@apps/telegram-bot/core/services/bot_app.py`:
- Around line 36-40: Remove the secret token from debug logs in set_webhook:
stop logging settings.TELEGRAM_BOT_SECRET_TOKEN and instead log only
non-sensitive contextual info (e.g., settings.TELEGRAM_WEBHOOK_URL or a
placeholder). Update the logger.debug call in the set_webhook function to omit
or redact settings.TELEGRAM_BOT_SECRET_TOKEN, leaving the rest of the call and
the await application.bot.set_webhook(...) unchanged.

In `@apps/worker/worker_core/config.py`:
- Line 17: The default DOWNLOAD_DIR = "/tmp" is insecure; change the default to
an app-owned directory (e.g., a subdirectory like "worker_downloads") and ensure
the process creates it with restrictive permissions (use pathlib/os to create
the directory if missing with mode 0o700 or call os.chmod to enforce 0700).
Update the config symbol DOWNLOAD_DIR and add initialization logic that ensures
the directory exists and has correct permissions at startup (fail-fast or
fallback with a clear error if the secure directory cannot be created).

In `@apps/worker/worker_core/tasks/pdf.py`:
- Around line 1-3: The file directly imports export_pdf from
fastfetchbot_file_export (line with export_pdf) which breaks the app-layer
boundary; replace that direct import with the shared async wrapper from
fastfetchbot_shared.services.file_export (the wrapper that submits a Celery
task, e.g. export_pdf_async or submit_pdf_export_task) and update usages in this
module (e.g., any calls to export_pdf) to call the async/shared wrapper instead
so the worker executes via the shared Celery task API rather than importing
fastfetchbot_file_export directly.

In `@packages/shared/fastfetchbot_shared/services/scrapers/config.py`:
- Around line 126-134: The property firecrawl_wait_for_int currently treats 0 as
falsy and returns 3000; change the logic to treat "0" (and integer 0) as a valid
value by only falling back when FIRECRAWL_WAIT_FOR is None or an empty string or
when parsing fails. In the firecrawl_wait_for_int property, first check if
self.FIRECRAWL_WAIT_FOR is None or equal to "" and return 3000 in that case,
otherwise parse int(self.FIRECRAWL_WAIT_FOR) and return that value (including
0); keep the existing except (ValueError, TypeError) to return 3000 on parse
errors.
- Around line 181-193: The _load_zhihu_cookies function opens zhihu_cookies.json
without an explicit encoding which can break on non-UTF-8 systems; update the
open(...) call inside _load_zhihu_cookies to include encoding="utf-8" (preserve
the try/except and return behavior) so JSON.load reads the file with UTF-8
encoding consistent with the XHS loader.
- Around line 140-144: The Jinja2 Environment currently created as JINJA2_ENV
(using Environment and FileSystemLoader with templates_directory) does not
enable autoescaping; update the Environment instantiation to enable autoescape
(e.g., use select_autoescape or set autoescape for HTML/XML templates) so HTML
templates that render scraped/user content are escaped by default; modify the
JINJA2_ENV creation (the Environment(...) call) to include autoescape behavior
for 'html' and 'xml' templates.

In
`@packages/shared/fastfetchbot_shared/services/scrapers/xiaohongshu/adaptar.py`:
- Around line 53-57: The constructor currently calls .rstrip("/") on
(sign_server_endpoint or shared_settings.SIGN_SERVER_URL) which will raise
AttributeError if shared_settings.SIGN_SERVER_URL is None; change the logic in
XhsSinglePostAdapter.__init__ to first capture the raw value (e.g., raw_endpoint
= sign_server_endpoint if sign_server_endpoint is not None else
shared_settings.SIGN_SERVER_URL), then if raw_endpoint is falsy/None raise the
intended ValueError, otherwise set self.sign_server_endpoint =
raw_endpoint.rstrip("/"); this preserves the ValueError path and avoids calling
.rstrip on None.
- Line 10: Replace the incorrect import of shared_settings with the scraper
config settings by changing the import to use
fastfetchbot_shared.services.scrapers.config (so the module variable to use is
settings rather than shared_settings) and then update any usage of
shared_settings.SIGN_SERVER_URL to settings.SIGN_SERVER_URL (or
settings.XHS_SIGN_SERVER_URL if you prefer that constant name); specifically
update the import statement and the reference in the code that currently uses
shared_settings (e.g., the place where SIGN_SERVER_URL is read) to reference the
new settings symbol.

In `@packages/shared/fastfetchbot_shared/utils/pydantic_types.py`:
- Around line 6-20: The list-branch in _parse_comma_list and
_parse_optional_comma_list returns input lists unchanged, which allows
unstripped/empty elements; change the isinstance(v, list) handling to normalize
the list exactly like the string branch (strip each item and filter out empty
strings) so e.g. [" a ", "", "b"] becomes ["a","b"]; for
_parse_optional_comma_list return the normalized list or None when it becomes
empty to preserve its current None semantics.

In `@tests/unit/scrapers/test_scraper_config.py`:
- Around line 399-406: The test only asserts the raw env field is empty but must
also assert the actual fallback path is used; update
test_xhs_cookie_default_path_when_no_override to, after calling _reload_config,
verify that the resolved cookie path equals os.path.join(cfg.CONF_DIR,
"xhs_cookies.txt") by either calling the resolver function
(_load_xhs_cookies(cfg)) or inspecting the attribute that stores the resolved
path, while still keeping the existing assertion that
cfg.settings.XHS_COOKIE_PATH == "".

---

Outside diff comments:
In `@apps/telegram-bot/core/handlers/url_process.py`:
- Around line 103-117: The branch handling url_metadata["source"] == "unknown"
currently always sends a "no supported url" message after calling
_fetch_and_send when settings.GENERAL_SCRAPING_ON is true; change the control
flow so that when GENERAL_SCRAPING_ON is true you send the "Processing..."
message, await _fetch_and_send(url=url_metadata["url"], chat_id=message.chat_id,
source=..., content_type=...), and then exit/continue without sending the
failure text or returning prematurely; only send the failure message and return
when GENERAL_SCRAPING_ON is false (or when the fetch actually fails), keeping
references to process_message.edit_text, _fetch_and_send, url_metadata,
settings.GENERAL_SCRAPING_ON and the loop index i to locate and update the
logic.

In `@apps/telegram-bot/core/services/bot_app.py`:
- Around line 43-60: The module currently logs and continues when
settings.TELEGRAM_BOT_TOKEN is missing, leaving the variable application
undefined; update the top-level initialization in bot_app.py to fail fast by
raising an explicit exception or calling sys.exit with a clear error when
settings.TELEGRAM_BOT_TOKEN is None/empty so application cannot remain
undefined. Specifically, in the block that checks settings.TELEGRAM_BOT_TOKEN
(which builds Application via Application.builder(), .token(...), etc.), replace
or follow the logger.error("TELEGRAM_BOT_TOKEN is not set!") with a RuntimeError
or SystemExit containing the same diagnostic message so subsequent functions
like startup, shutdown, set_webhook, and process_telegram_update see a
deterministic failure instead of an unbound application. Ensure the error
message references TELEGRAM_BOT_TOKEN and keeps existing log text for clarity.

---

Nitpick comments:
In `@apps/api/src/main.py`:
- Line 12: Replace the hardcoded SENTRY_DSN variable with a configuration field
on the ApiSettings class: add a SENTRY_DSN (string, default empty) attribute to
ApiSettings, load it from environment like the other settings, and then use
settings.SENTRY_DSN wherever SENTRY_DSN was referenced in main.py instead of the
top-level SENTRY_DSN constant; ensure any initialization logic that configures
sentry now reads from ApiSettings so the value can be provided via env vars.

In `@apps/telegram-bot/core/config.py`:
- Around line 121-166: The module currently leaves raw strings on settings
(settings.TELEGRAM_CHANNEL_ID, settings.TELEGRAM_CHANNEL_ADMIN_LIST) while
exporting parsed globals (TELEGRAM_CHANNEL_ID, TELEGRAM_CHANNEL_ADMIN_LIST),
creating duplicate config surfaces; move the parsing logic (use
_parse_channel_ids, _parse_debug_channel, _parse_admin_list, and
ban_list_resolver) into the TelegramBotSettings dataclass (or construct) so the
settings object returns already-parsed attributes (e.g.,
TelegramBotSettings.telegram_channel_id, .telegram_channel_admin_list,
.telebot_debug_channel, .group_message_ban_list, .bot_message_ban_list) and
remove the module-level parsed globals to ensure a single source of truth for
config and simplify testing/patching.

In `@apps/telegram-bot/core/main.py`:
- Around line 8-13: Validate settings.TELEGRAM_BOT_MODE explicitly instead of
defaulting to polling: check that settings.TELEGRAM_BOT_MODE is either "webhook"
or "polling" before calling uvicorn.run; if it's "webhook" run webhook_app, if
"polling" run callback_app, otherwise raise a clear error (e.g., ValueError)
with a message that includes the invalid value and expected options; update the
logging (logger.info) accordingly and reference TELEGRAM_BOT_PORT, webhook_app,
callback_app, and uvicorn.run in the change.

In `@tests/unit/test_telegraph.py`:
- Around line 117-132: The two tests patching TELEGRAPH_TOKEN_LIST both use an
empty string but should cover distinct cases; update one of them (e.g.,
test_empty_token_list_creates_token) to patch TELEGRAPH_TOKEN_LIST to None (or
remove the patch so it resolves to None) to explicitly test the None case,
keeping the other test with the empty-string patch to test falsy empty-string
behavior; ensure references to AsyncTelegraphPoster,
mock_poster.create_api_token, and the Telegraph(...) instantiation remain
unchanged so the assertions still validate token creation vs. set_token calls.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 21b24df1-ee81-47a2-b088-e434b22c9893

📥 Commits

Reviewing files that changed from the base of the PR and between fe687ec and 88b50d1.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (67)
  • apps/api/src/auth.py
  • apps/api/src/config.py
  • apps/api/src/database.py
  • apps/api/src/main.py
  • apps/api/src/routers/inoreader.py
  • apps/api/src/routers/scraper.py
  • apps/api/src/services/amazon/s3.py
  • apps/api/src/services/celery_client.py
  • apps/api/src/services/file_export/audio_transcribe/__init__.py
  • apps/api/src/services/file_export/document_export/pdf_export.py
  • apps/api/src/services/file_export/video_download/__init__.py
  • apps/api/src/services/inoreader/__init__.py
  • apps/api/src/services/inoreader/process.py
  • apps/api/src/services/scrapers/common.py
  • apps/async-worker/async_worker/celery_client.py
  • apps/async-worker/async_worker/config.py
  • apps/async-worker/async_worker/main.py
  • apps/async-worker/async_worker/services/enrichment.py
  • apps/async-worker/async_worker/services/outbox.py
  • apps/async-worker/async_worker/tasks/scrape.py
  • apps/telegram-bot/core/api_client.py
  • apps/telegram-bot/core/config.py
  • apps/telegram-bot/core/database.py
  • apps/telegram-bot/core/handlers/buttons.py
  • apps/telegram-bot/core/handlers/messages.py
  • apps/telegram-bot/core/handlers/url_process.py
  • apps/telegram-bot/core/main.py
  • apps/telegram-bot/core/queue_client.py
  • apps/telegram-bot/core/services/bot_app.py
  • apps/telegram-bot/core/services/message_sender.py
  • apps/telegram-bot/core/services/outbox_consumer.py
  • apps/telegram-bot/core/webhook/server.py
  • apps/worker/worker_core/config.py
  • apps/worker/worker_core/main.py
  • apps/worker/worker_core/tasks/pdf.py
  • apps/worker/worker_core/tasks/transcribe.py
  • apps/worker/worker_core/tasks/video.py
  • packages/shared/fastfetchbot_shared/config.py
  • packages/shared/fastfetchbot_shared/services/scrapers/config.py
  • packages/shared/fastfetchbot_shared/services/scrapers/general/base.py
  • packages/shared/fastfetchbot_shared/services/scrapers/general/firecrawl.py
  • packages/shared/fastfetchbot_shared/services/scrapers/general/firecrawl_client.py
  • packages/shared/fastfetchbot_shared/services/scrapers/general/scraper.py
  • packages/shared/fastfetchbot_shared/services/scrapers/general/zyte.py
  • packages/shared/fastfetchbot_shared/services/scrapers/instagram/__init__.py
  • packages/shared/fastfetchbot_shared/services/scrapers/reddit/__init__.py
  • packages/shared/fastfetchbot_shared/services/scrapers/scraper_manager.py
  • packages/shared/fastfetchbot_shared/services/scrapers/twitter/__init__.py
  • packages/shared/fastfetchbot_shared/services/scrapers/xiaohongshu/adaptar.py
  • packages/shared/fastfetchbot_shared/services/scrapers/zhihu/__init__.py
  • packages/shared/fastfetchbot_shared/services/scrapers/zhihu/config.py
  • packages/shared/fastfetchbot_shared/services/telegraph/__init__.py
  • packages/shared/fastfetchbot_shared/utils/logger.py
  • packages/shared/fastfetchbot_shared/utils/network.py
  • packages/shared/fastfetchbot_shared/utils/parse.py
  • packages/shared/fastfetchbot_shared/utils/pydantic_types.py
  • packages/shared/pyproject.toml
  • tests/unit/async_worker/test_enrichment.py
  • tests/unit/scrapers/test_general_base.py
  • tests/unit/scrapers/test_general_firecrawl.py
  • tests/unit/scrapers/test_general_scraper.py
  • tests/unit/scrapers/test_general_zyte.py
  • tests/unit/scrapers/test_scraper_config.py
  • tests/unit/scrapers/test_scraper_manager.py
  • tests/unit/scrapers/test_xiaohongshu.py
  • tests/unit/scrapers/test_zhihu.py
  • tests/unit/test_telegraph.py
💤 Files with no reviewable changes (1)
  • packages/shared/fastfetchbot_shared/utils/parse.py

Comment on lines +14 to +17
# FastAPI
BASE_URL: str = "localhost"
API_KEY_NAME: str = "pwd"
API_KEY: str = Field(default_factory=lambda: secrets.token_urlsafe(32))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Auto-generated API_KEY may cause authentication failures on restart.

If API_KEY is not set via environment variable, secrets.token_urlsafe(32) generates a new random key each time the application starts. This means:

  • Clients authenticated with the previous key will fail after a restart
  • The key is ephemeral and not persisted anywhere

Consider either requiring API_KEY to be explicitly set in production, or logging a warning when using an auto-generated key.

🛡️ Proposed fix: warn on auto-generated key
+import warnings
+
 class ApiSettings(BaseSettings):
     model_config = SettingsConfigDict(extra="ignore")
 
     # FastAPI
     BASE_URL: str = "localhost"
     API_KEY_NAME: str = "pwd"
     API_KEY: str = Field(default_factory=lambda: secrets.token_urlsafe(32))
+    
+    `@model_validator`(mode="after")
+    def _resolve_derived(self) -> "ApiSettings":
+        # Warn if API_KEY was auto-generated (check if it looks like a random token)
+        if not os.environ.get("API_KEY"):
+            warnings.warn(
+                "API_KEY not set; using auto-generated key that will change on restart",
+                UserWarning
+            )
         if not self.DOWNLOAD_DIR:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/api/src/config.py` around lines 14 - 17, The current API_KEY uses
Field(default_factory=lambda: secrets.token_urlsafe(32)) which auto-generates a
new key on each restart and breaks clients; update the config so production
requires an explicit API_KEY (remove or replace the default_factory) and
implement a startup-time check that detects when API_KEY was not provided and an
auto-generated value is used, then emit a clear warning via the app logger;
reference the API_KEY Field and the config/settings initializer (or
__post_init__ / startup hook that constructs the Settings) to perform the
detection and logging so developers see when a non-persistent key is in use.

async def upload(
staging_path: Path,
bucket: str = AWS_S3_BUCKET_NAME,
bucket: str = None,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use explicit Optional or | None for nullable parameters.

Per PEP 484 and the static analysis hints, bucket: str = None and file_name: str = None should use explicit optional typing.

🔧 Proposed fix
+from typing import Optional
+
 async def upload(
         staging_path: Path,
-        bucket: str = None,
+        bucket: Optional[str] = None,
         suite: str = "test",
         release: str = datetime.now().strftime("%Y-%m-%d"),
-        file_name: str = None,
+        file_name: Optional[str] = None,
 ) -> str:

Also applies to: 46-46

🧰 Tools
🪛 Ruff (0.15.7)

[warning] 43-43: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/api/src/services/amazon/s3.py` at line 43, The function signature(s)
that declare nullable string parameters (e.g., parameters named bucket and
file_name) currently use `bucket: str = None` / `file_name: str = None`; update
those annotations to explicit optional types (either `Optional[str]` with `from
typing import Optional` or the PEP 604 style `str | None`) and keep the default
`= None`; locate and update the signatures in the relevant functions/methods in
s3.py (search for parameters named `bucket` and `file_name`) and add the
necessary import if you choose `Optional`.

from fastfetchbot_shared.utils.logger import logger


async def get_item(url: str, ban_list: list = None, **kwargs) -> dict:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use explicit Optional or | None for nullable parameters.

Per PEP 484 and the static analysis hint, ban_list: list = None should use explicit optional typing. The same applies to line 22.

🔧 Proposed fix
+from typing import Optional
+
+
-async def get_item(url: str, ban_list: list = None, **kwargs) -> dict:
+async def get_item(url: str, ban_list: Optional[list] = None, **kwargs) -> dict:

And similarly for get_url_metadata:

-async def get_url_metadata(url: str, ban_list: list = None) -> dict:
+async def get_url_metadata(url: str, ban_list: Optional[list] = None) -> dict:
🧰 Tools
🪛 Ruff (0.15.7)

[warning] 6-6: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/telegram-bot/core/api_client.py` at line 6, Update nullable parameter
annotations to use explicit Optional (or PEP 604 union) types: change get_item
signature's ban_list: list = None to ban_list: Optional[list] = None (or
ban_list: list | None = None) and do the same for the corresponding nullable
parameter in get_url_metadata; ensure you import typing.Optional if using
Optional and keep the default None unchanged.

Comment on lines +15 to +27
API_KEY_NAME: str = "pwd"
API_KEY: str = Field(default_factory=lambda: secrets.token_urlsafe(32))

# Bot's own BASE_URL
BASE_URL: str = "localhost"

# Telegram bot
TELEGRAM_BOT_ON: bool = True
TELEGRAM_BOT_MODE: str = "polling"
TELEGRAM_BOT_TOKEN: Optional[str] = None
TELEGRAM_BOT_SECRET_TOKEN: str = Field(
default_factory=lambda: secrets.token_urlsafe(32)
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Do not generate shared auth tokens per process.

API_KEY and TELEGRAM_BOT_SECRET_TOKEN cross process boundaries. Falling back to secrets.token_urlsafe(...) means each bot/API instance gets a different value unless every deployment explicitly sets both env vars, which breaks bot→API auth and makes webhook auth unstable across restarts or replicas.

Proposed fix
-    API_KEY: str = Field(default_factory=lambda: secrets.token_urlsafe(32))
+    API_KEY: str
@@
-    TELEGRAM_BOT_SECRET_TOKEN: str = Field(
-        default_factory=lambda: secrets.token_urlsafe(32)
-    )
+    TELEGRAM_BOT_SECRET_TOKEN: str
As per coding guidelines, "Sensitive cookies and tokens must be stored in environment variables, never hardcoded in source code".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/telegram-bot/core/config.py` around lines 15 - 27, Remove the
process-local default token generation for API_KEY and TELEGRAM_BOT_SECRET_TOKEN
and require values from environment/config: replace the
Field(default_factory=...) usages so API_KEY and TELEGRAM_BOT_SECRET_TOKEN are
read from env (e.g., Field(..., env="API_KEY") / Field(...,
env="TELEGRAM_BOT_SECRET_TOKEN") or make them required Optional[str] without a
default and validate at startup), ensuring the symbols API_KEY and
TELEGRAM_BOT_SECRET_TOKEN in core/config.py no longer fall back to
secrets.token_urlsafe and instead rely on environment-provided secrets so tokens
are stable across processes and restarts.

Comment on lines 36 to 40
async def set_webhook() -> bool:
logger.debug(f"set_webhook: {TELEGRAM_WEBHOOK_URL}, secret_token: {TELEGRAM_BOT_SECRET_TOKEN}")
logger.debug(f"set_webhook: {settings.TELEGRAM_WEBHOOK_URL}, secret_token: {settings.TELEGRAM_BOT_SECRET_TOKEN}")
return await application.bot.set_webhook(
url=TELEGRAM_WEBHOOK_URL, secret_token=TELEGRAM_BOT_SECRET_TOKEN
url=settings.TELEGRAM_WEBHOOK_URL, secret_token=settings.TELEGRAM_BOT_SECRET_TOKEN
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Remove the webhook secret from logs.

settings.TELEGRAM_BOT_SECRET_TOKEN is an authentication credential; emitting it here leaks the value into centralized logs and error reports.

Proposed fix
-    logger.debug(f"set_webhook: {settings.TELEGRAM_WEBHOOK_URL}, secret_token: {settings.TELEGRAM_BOT_SECRET_TOKEN}")
+    logger.debug(
+        f"set_webhook: {settings.TELEGRAM_WEBHOOK_URL}, "
+        f"secret_token_configured={bool(settings.TELEGRAM_BOT_SECRET_TOKEN)}"
+    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/telegram-bot/core/services/bot_app.py` around lines 36 - 40, Remove the
secret token from debug logs in set_webhook: stop logging
settings.TELEGRAM_BOT_SECRET_TOKEN and instead log only non-sensitive contextual
info (e.g., settings.TELEGRAM_WEBHOOK_URL or a placeholder). Update the
logger.debug call in the set_webhook function to omit or redact
settings.TELEGRAM_BOT_SECRET_TOKEN, leaving the rest of the call and the await
application.bot.set_webhook(...) unchanged.

Comment on lines +181 to +193
def _load_zhihu_cookies(conf_dir: str) -> Optional[dict]:
zhihu_cookie_path = os.path.join(conf_dir, "zhihu_cookies.json")
if os.path.exists(zhihu_cookie_path):
try:
with open(zhihu_cookie_path, "r") as f:
return json.load(f)
except json.JSONDecodeError:
logger.error("Error: zhihu_cookies.json is not in a valid JSON format.")
return None
except FileNotFoundError:
logger.error("Error: zhihu_cookies.json does not exist.")
return None
return None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing encoding parameter for file read.

Line 185 opens the Zhihu cookies file without specifying encoding, while Line 166 (XHS cookies) correctly uses encoding="utf-8". This inconsistency could cause issues with non-ASCII content on systems where the default encoding isn't UTF-8.

🔧 Proposed fix: add encoding parameter
     if os.path.exists(zhihu_cookie_path):
         try:
-            with open(zhihu_cookie_path, "r") as f:
+            with open(zhihu_cookie_path, "r", encoding="utf-8") as f:
                 return json.load(f)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/shared/fastfetchbot_shared/services/scrapers/config.py` around lines
181 - 193, The _load_zhihu_cookies function opens zhihu_cookies.json without an
explicit encoding which can break on non-UTF-8 systems; update the open(...)
call inside _load_zhihu_cookies to include encoding="utf-8" (preserve the
try/except and return behavior) so JSON.load reads the file with UTF-8 encoding
consistent with the XHS loader.

import httpx

from fastfetchbot_shared.config import SIGN_SERVER_URL
from fastfetchbot_shared.config import settings as shared_settings
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Find SIGN_SERVER_URL definitions =="
rg -n --type=py '\bSIGN_SERVER_URL\b'

echo "== Find settings modules under fastfetchbot_shared =="
fd config.py packages/shared/fastfetchbot_shared

echo "== Find scraper modules importing settings =="
rg -n --type=py 'from fastfetchbot_shared(\.services\.scrapers)?\.config import settings|from \.config import .*settings' packages/shared/fastfetchbot_shared/services/scrapers

Repository: aturret/FastFetchBot

Length of output: 3434


Import SIGN_SERVER_URL from the scraper config module.

Line 10 imports from fastfetchbot_shared.config instead of the scraper config module. All other scrapers import from fastfetchbot_shared.services.scrapers.config. Change the import to:

from fastfetchbot_shared.services.scrapers.config import settings

Then update line 53 to use settings.SIGN_SERVER_URL (or XHS_SIGN_SERVER_URL if preferred), removing the shared_settings alias.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/shared/fastfetchbot_shared/services/scrapers/xiaohongshu/adaptar.py`
at line 10, Replace the incorrect import of shared_settings with the scraper
config settings by changing the import to use
fastfetchbot_shared.services.scrapers.config (so the module variable to use is
settings rather than shared_settings) and then update any usage of
shared_settings.SIGN_SERVER_URL to settings.SIGN_SERVER_URL (or
settings.XHS_SIGN_SERVER_URL if you prefer that constant name); specifically
update the import statement and the reference in the code that currently uses
shared_settings (e.g., the place where SIGN_SERVER_URL is read) to reference the
new settings symbol.

Comment on lines +53 to +57
self.sign_server_endpoint = (sign_server_endpoint or shared_settings.SIGN_SERVER_URL).rstrip("/")
if not self.sign_server_endpoint:
raise ValueError(
"XhsSinglePostAdapter requires a sign server URL. "
"Set SIGN_SERVER_URL in the environment or pass sign_server_endpoint explicitly."
"Set shared_settings.SIGN_SERVER_URL in the environment or pass sign_server_endpoint explicitly."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard nullable settings before calling .rstrip().

Line 53 can raise AttributeError if shared_settings.SIGN_SERVER_URL is None, preventing the intended ValueError path.

💡 Proposed fix
-        self.sign_server_endpoint = (sign_server_endpoint or shared_settings.SIGN_SERVER_URL).rstrip("/")
+        configured_endpoint = sign_server_endpoint or shared_settings.SIGN_SERVER_URL or ""
+        self.sign_server_endpoint = configured_endpoint.rstrip("/")
         if not self.sign_server_endpoint:
             raise ValueError(
                 "XhsSinglePostAdapter requires a sign server URL. "
-                "Set shared_settings.SIGN_SERVER_URL in the environment or pass sign_server_endpoint explicitly."
+                "Set SIGN_SERVER_URL in the environment or pass sign_server_endpoint explicitly."
             )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
self.sign_server_endpoint = (sign_server_endpoint or shared_settings.SIGN_SERVER_URL).rstrip("/")
if not self.sign_server_endpoint:
raise ValueError(
"XhsSinglePostAdapter requires a sign server URL. "
"Set SIGN_SERVER_URL in the environment or pass sign_server_endpoint explicitly."
"Set shared_settings.SIGN_SERVER_URL in the environment or pass sign_server_endpoint explicitly."
configured_endpoint = sign_server_endpoint or shared_settings.SIGN_SERVER_URL or ""
self.sign_server_endpoint = configured_endpoint.rstrip("/")
if not self.sign_server_endpoint:
raise ValueError(
"XhsSinglePostAdapter requires a sign server URL. "
"Set SIGN_SERVER_URL in the environment or pass sign_server_endpoint explicitly."
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/shared/fastfetchbot_shared/services/scrapers/xiaohongshu/adaptar.py`
around lines 53 - 57, The constructor currently calls .rstrip("/") on
(sign_server_endpoint or shared_settings.SIGN_SERVER_URL) which will raise
AttributeError if shared_settings.SIGN_SERVER_URL is None; change the logic in
XhsSinglePostAdapter.__init__ to first capture the raw value (e.g., raw_endpoint
= sign_server_endpoint if sign_server_endpoint is not None else
shared_settings.SIGN_SERVER_URL), then if raw_endpoint is falsy/None raise the
intended ValueError, otherwise set self.sign_server_endpoint =
raw_endpoint.rstrip("/"); this preserves the ValueError path and avoids calling
.rstrip on None.

Comment on lines +6 to +20
def _parse_comma_list(v: str | list[str]) -> list[str]:
"""Parse a comma-separated string into a list of stripped, non-empty strings."""
if isinstance(v, list):
return v
return [x.strip() for x in v.split(",") if x.strip()] if v else []


def _parse_optional_comma_list(v: str | list[str] | None) -> Optional[list[str]]:
"""Parse a comma-separated string into a list, returning None for empty input."""
if v is None:
return None
if isinstance(v, list):
return v or None
result = [x.strip() for x in v.split(",") if x.strip()]
return result or None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Normalize list inputs in the list branch too.

Both helpers strip and drop empty items for string input, but direct list input is returned unchanged. [" a ", "", "b"] will therefore keep invalid entries and diverge from the documented behavior.

Proposed fix
+def _normalize_list(values: list[str]) -> list[str]:
+    return [item.strip() for item in values if item and item.strip()]
+
+
 def _parse_comma_list(v: str | list[str]) -> list[str]:
     """Parse a comma-separated string into a list of stripped, non-empty strings."""
     if isinstance(v, list):
-        return v
+        return _normalize_list(v)
     return [x.strip() for x in v.split(",") if x.strip()] if v else []
@@
 def _parse_optional_comma_list(v: str | list[str] | None) -> Optional[list[str]]:
     """Parse a comma-separated string into a list, returning None for empty input."""
     if v is None:
         return None
     if isinstance(v, list):
-        return v or None
+        result = _normalize_list(v)
+        return result or None
     result = [x.strip() for x in v.split(",") if x.strip()]
     return result or None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/shared/fastfetchbot_shared/utils/pydantic_types.py` around lines 6 -
20, The list-branch in _parse_comma_list and _parse_optional_comma_list returns
input lists unchanged, which allows unstripped/empty elements; change the
isinstance(v, list) handling to normalize the list exactly like the string
branch (strip each item and filter out empty strings) so e.g. [" a ", "", "b"]
becomes ["a","b"]; for _parse_optional_comma_list return the normalized list or
None when it becomes empty to preserve its current None semantics.

Comment on lines 399 to +406
def test_xhs_cookie_default_path_when_no_override(self):
"""When XHS_COOKIE_PATH is empty, uses CONF_DIR/xhs_cookies.txt."""
"""When XHS_COOKIE_PATH is empty, _load_xhs_cookies uses CONF_DIR/xhs_cookies.txt."""
cfg = _reload_config(
path_exists_side_effect=lambda p: False,
xhs_cookie_path_override="",
)
expected = os.path.join(cfg.CONF_DIR, "xhs_cookies.txt")
assert cfg.xhs_cookie_path == expected
# The settings field stores the raw env value (empty string)
assert cfg.settings.XHS_COOKIE_PATH == ""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Keep asserting the fallback lookup, not just the raw env field.

This test no longer checks that an empty XHS_COOKIE_PATH falls back to CONF_DIR/xhs_cookies.txt; it only checks that the settings field is empty, so the behavior described in the docstring can now regress unnoticed.

Proposed fix
     def test_xhs_cookie_default_path_when_no_override(self):
         """When XHS_COOKIE_PATH is empty, _load_xhs_cookies uses CONF_DIR/xhs_cookies.txt."""
+        seen_paths = []
         cfg = _reload_config(
-            path_exists_side_effect=lambda p: False,
+            path_exists_side_effect=lambda p: seen_paths.append(p) or False,
             xhs_cookie_path_override="",
         )
-        # The settings field stores the raw env value (empty string)
         assert cfg.settings.XHS_COOKIE_PATH == ""
+        assert os.path.join(cfg.settings.CONF_DIR, "xhs_cookies.txt") in seen_paths
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def test_xhs_cookie_default_path_when_no_override(self):
"""When XHS_COOKIE_PATH is empty, uses CONF_DIR/xhs_cookies.txt."""
"""When XHS_COOKIE_PATH is empty, _load_xhs_cookies uses CONF_DIR/xhs_cookies.txt."""
cfg = _reload_config(
path_exists_side_effect=lambda p: False,
xhs_cookie_path_override="",
)
expected = os.path.join(cfg.CONF_DIR, "xhs_cookies.txt")
assert cfg.xhs_cookie_path == expected
# The settings field stores the raw env value (empty string)
assert cfg.settings.XHS_COOKIE_PATH == ""
def test_xhs_cookie_default_path_when_no_override(self):
"""When XHS_COOKIE_PATH is empty, _load_xhs_cookies uses CONF_DIR/xhs_cookies.txt."""
seen_paths = []
cfg = _reload_config(
path_exists_side_effect=lambda p: seen_paths.append(p) or False,
xhs_cookie_path_override="",
)
assert cfg.settings.XHS_COOKIE_PATH == ""
assert os.path.join(cfg.settings.CONF_DIR, "xhs_cookies.txt") in seen_paths
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/scrapers/test_scraper_config.py` around lines 399 - 406, The test
only asserts the raw env field is empty but must also assert the actual fallback
path is used; update test_xhs_cookie_default_path_when_no_override to, after
calling _reload_config, verify that the resolved cookie path equals
os.path.join(cfg.CONF_DIR, "xhs_cookies.txt") by either calling the resolver
function (_load_xhs_cookies(cfg)) or inspecting the attribute that stores the
resolved path, while still keeping the existing assertion that
cfg.settings.XHS_COOKIE_PATH == "".

@aturret aturret merged commit 5b1d57a into main Mar 26, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant