feat: refactor exception handling by aturret · Pull Request #72 · aturret/FastFetchBot

aturret · 2026-03-28T06:17:05Z

Summary by CodeRabbit

Bug Fixes
- Better per-item media download recovery (skip failing files) and improved error propagation across services.
- Global API error handling standardized to return consistent HTTP statuses for domain errors vs. generic failures.
- Background task and webhook endpoints now catch and log exceptions to avoid silent failures.
Documentation
- Added an Exception Handling section specifying typed domain errors and logging/handling guidance.
Tests
- Extensive new and updated tests covering exception paths, API handlers, scrapers, file exports, downloads, and webhook behavior.

coderabbitai · 2026-03-28T06:17:19Z

Warning

Rate limit exceeded

@aturret has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 17 minutes and 14 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 17 minutes and 14 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f6917dd2-df1b-498e-80e1-4464d352b35a

📥 Commits

Reviewing files that changed from the base of the PR and between 6389f0e and 0697a13.

📒 Files selected for processing (2)

apps/telegram-bot/core/services/message_sender.py
packages/shared/fastfetchbot_shared/services/scrapers/zhihu/__init__.py

📝 Walkthrough

Walkthrough

Introduces a domain-specific exception hierarchy and widespread error-handling standardization: new FastFetchBotError subclasses, centralized FastAPI handlers (502 for domain errors, 500 for generic), structured logging via logger.exception/debug, per-item fault isolation in media processing, and updated tests reflecting new exception semantics.

Changes

Cohort / File(s)	Summary
Exception Hierarchy `packages/shared/fastfetchbot_shared/exceptions.py`	Add `FastFetchBotError` and domain-specific subclasses (`ScraperError`, `ScraperNetworkError`, `ScraperParseError`, `TelegraphPublishError`, `FileExportError`, `ExternalServiceError`).
API Entrypoint `apps/api/src/main.py`	Register FastAPI handlers for `FastFetchBotError` (502) and generic `Exception` (500); wrap `call_next` in middleware try/except with logger.exception and re-raise.
Telegram Bot `apps/telegram-bot/core/services/message_sender.py`, `apps/telegram-bot/core/services/outbox_consumer.py`, `apps/telegram-bot/core/webhook/server.py`	Per-item download fault isolation (skip-on-failure); route scrape errors to debug channel; add background-task exception callback; validate/handle invalid JSON and catch exceptions returning proper HTTP responses.
File Export `packages/file-export/.../pdf_export.py`, `transcribe.py`, `video_download.py`	Replace generic exceptions with `FileExportError`, wrap major flows in try/except, log with logger.exception, preserve chaining via `from e`.
Scrapers / Services `packages/shared/.../services/scrapers/...`	Replace prints/tracebacks with structured logging; convert ValueError/RuntimeError/KeyError to domain-specific exceptions (`ScraperError`, `ScraperNetworkError`, `ScraperParseError`, `ExternalServiceError`); ensure proper re-raise/chaining.
Network Utilities `packages/shared/fastfetchbot_shared/utils/network.py`	Replace traceback/print usage with logger.exception/debug; adjust download error handling to log and re-raise.
Telegraph `packages/shared/.../services/telegraph/__init__.py`	Adjust logger.exception usage to pass exc_info=True when logging failures.
Tests added/updated `tests/unit/...` (many files: file_export, scrapers, telegram_bot, test_api_exception_handlers, test_exceptions, test_network)	Add and update tests to assert new exception types, chaining, FastAPI handler behavior, webhook error handling, and media download fault isolation.
Docs `CLAUDE.md`	Document exception hierarchy, wrapping semantics, logging guidance, and boundary handling requirements.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

feat: refactor codebase to microservices monorepo #56 — touches apps/api/src/main.py; overlaps FastAPI exception handler and middleware changes.
fix: add bot id as queue identifier and fix reply to message #70 — modifies apps/telegram-bot/core/services/message_sender.py; overlaps media download fault isolation and message-sending error handling.
feat: integrate file exporter to monorepo as a celery worker #57 — related to file-export changes (pdf/transcribe/video) and introduction of FileExportError semantics.

Poem

🐰
Hops and hops through stack and log,
I catch exceptions in my bog.
One little error, wrapped with care,
Routed, logged, no wild despair.
Carrots and crates — clean traces everywhere.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'feat: refactor exception handling' is specific and directly summarizes the main change—comprehensive exception handling improvements across multiple modules.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch exception-refactor

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-03-28T06:18:01Z

Codecov Report

❌ Patch coverage is 89.13043% with 15 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
apps/telegram-bot/core/services/message_sender.py	66.66%	6 Missing ⚠️
...ckages/shared/fastfetchbot_shared/utils/network.py	55.55%	4 Missing ⚠️
apps/telegram-bot/core/webhook/server.py	85.71%	3 Missing ⚠️
...file-export/fastfetchbot_file_export/pdf_export.py	90.00%	1 Missing ⚠️
...file-export/fastfetchbot_file_export/transcribe.py	96.87%	1 Missing ⚠️

@@           Coverage Diff           @@
##             main      #72   +/-   ##
=======================================
  Coverage        ?   76.85%           
=======================================
  Files           ?       64           
  Lines           ?     3283           
  Branches        ?        0           
=======================================
  Hits            ?     2523           
  Misses          ?      760           
  Partials        ?        0

Files with missing lines	Coverage Δ
apps/telegram-bot/core/services/outbox_consumer.py	`95.08% <100.00%> (ø)`
...-export/fastfetchbot_file_export/video_download.py	`55.26% <100.00%> (ø)`
packages/shared/fastfetchbot_shared/exceptions.py	`100.00% <100.00%> (ø)`
...tchbot_shared/services/scrapers/bluesky/scraper.py	`100.00% <100.00%> (ø)`
...ed/fastfetchbot_shared/services/scrapers/common.py	`94.11% <100.00%> (ø)`
...ared/services/scrapers/general/firecrawl_client.py	`100.00% <100.00%> (ø)`
...tfetchbot_shared/services/scrapers/general/zyte.py	`100.00% <100.00%> (ø)`
...tchbot_shared/services/scrapers/scraper_manager.py	`100.00% <100.00%> (ø)`
...fetchbot_shared/services/scrapers/weibo/scraper.py	`100.00% <100.00%> (ø)`
...ot_shared/services/scrapers/xiaohongshu/adaptar.py	`100.00% <100.00%> (ø)`
... and 5 more

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

packages/shared/fastfetchbot_shared/services/scrapers/threads/__init__.py (1)
74-104: ⚠️ Potential issue | 🟠 Major

Add exception handling to the scrape_thread_data method to handle browser operations, page navigation, JSON parsing, and data extraction failures.

This function lacks try/except blocks for multiple failure points:

Browser launch and navigation failures (pw.chromium.launch(), page.goto())

JSON parsing failures (json.loads() on line 100)

Dictionary key access failures (lines 101-102)

Wrap the scraping logic in appropriate exception handling and consider raising custom exceptions (similar to ScraperError and ScraperParseError used in other scrapers) to align with the PR's exception handling refactoring objectives.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/shared/fastfetchbot_shared/services/scrapers/threads/__init__.py`
around lines 74 - 104, Wrap the body of scrape_thread_data in a try/except that
catches Playwright/browser/navigation exceptions (around async_playwright(),
pw.chromium.launch(), new_context(), page.goto(), wait_for_selector and page
operations) and re-raises a ScraperError with a clear message; additionally
catch json.JSONDecodeError when calling json.loads(text) and KeyError/TypeError
when accessing data["data"]["data"]["containing_thread"]["thread_items"] and
raise a ScraperParseError with context; ensure you still close/cleanup the
browser/context/page in a finally block (or use context managers) and keep
existing parsing logic that calls parse_single_threads_data(thread["post"])
inside the guarded block so parse errors can be captured and converted to
ScraperParseError.
packages/shared/fastfetchbot_shared/services/scrapers/weibo/scraper.py (1)
121-136: ⚠️ Potential issue | 🟡 Minor

Hardcoded SUB cookie should be moved to environment variables.

Line 122 contains a hardcoded authentication cookie (SUB). Per coding guidelines, sensitive cookies and tokens must be stored in environment variables, never hardcoded in source code. This cookie appears to be used for Weibo API authentication.
🔧 Suggested fix
 async def _get_weibo_info_api(self) -> dict:
     """Fetch weibo info using the API with SUB cookie and isGetLongText=true."""
     url = self.ajax_url + "&isGetLongText=true"
     headers = {"referer": WEIBO_HOST}
-    cookies = {
-        "SUB": "_2AkMR47Mlf8NxqwFRmfocxG_lbox2wg7EieKnv0L-JRMxHRl-yT9yqhFdtRB6OmOdyoia9pKPkqoHRRmSBA_WNPaHuybH",
-    }
+    cookies = {
+        "SUB": settings.WEIBO_SUB_COOKIE,  # Define in config.py
+    } if settings.WEIBO_SUB_COOKIE else {}
Add WEIBO_SUB_COOKIE to packages/shared/fastfetchbot_shared/services/scrapers/config.py.
As per coding guidelines: "Sensitive cookies and tokens must be stored in environment variables, never hardcoded in source code".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/shared/fastfetchbot_shared/services/scrapers/weibo/scraper.py`
around lines 121 - 136, The hardcoded SUB cookie in the local cookies dict
should be moved to an environment-backed config constant: add WEIBO_SUB_COOKIE
to config.py and load it from the environment, then import that constant into
scraper.py and replace the hardcoded value in the cookies dict (the variable
named cookies) with WEIBO_SUB_COOKIE; also add a defensive check in the scraper
(raise a clear error or log and abort) if WEIBO_SUB_COOKIE is missing or empty
to avoid silently using an invalid cookie.

🧹 Nitpick comments (8)

packages/shared/fastfetchbot_shared/services/scrapers/threads/__init__.py (1)

85-91: Consider explicit browser cleanup.

While async_playwright() context manager handles cleanup, it's safer to explicitly ensure browser and context are closed, especially if exceptions occur during page operations.

♻️ Proposed explicit cleanup

         async with async_playwright() as pw:
             browser = await pw.chromium.launch()
-            context = await browser.new_context(viewport={"width": 1920, "height": 1080})
-            page = await context.new_page()
-            page.on("response", intercept_response)  # enable background request intercepting
-            await page.goto(url)  # go to url and wait for the page to load
-            await page.wait_for_selector("[data-pressable-container=true]")  # wait for page to finish loading
+            try:
+                context = await browser.new_context(viewport={"width": 1920, "height": 1080})
+                page = await context.new_page()
+                page.on("response", intercept_response)
+                await page.goto(url)
+                await page.wait_for_selector("[data-pressable-container=true]")
+            finally:
+                await browser.close()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/shared/fastfetchbot_shared/services/scrapers/threads/__init__.py`
around lines 85 - 91, Wrap the Playwright usage so browser/context/page are
explicitly closed on error: keep the async_playwright() as pw but assign browser
= await pw.chromium.launch(), context = await browser.new_context(...), and page
= await context.new_page() inside a try block, and in a finally block call await
page.close() (if created), await context.close() and await browser.close();
ensure existing page.on("response", intercept_response) and await page.goto(url)
remain inside the try so any exception triggers the cleanup.

packages/shared/fastfetchbot_shared/utils/network.py (1)

41-43: Catching bare Exception is acceptable here but consider narrowing scope.

The static analysis flags this as BLE001. For a JSON-fetching utility that returns None on failure, this pattern is reasonable since various exceptions can occur (network, parsing, etc.). The logger.exception() properly captures the full stack trace.

If you want to silence the linter while being explicit, you could catch (httpx.HTTPError, json.JSONDecodeError) as the primary cases, with a fallback Exception handler.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/shared/fastfetchbot_shared/utils/network.py` around lines 41 - 43,
The except block that currently catches a bare Exception when fetching JSON
(logging via logger.exception and setting json_result = None) should be
narrowed: first catch the main expected errors (httpx.HTTPError and
json.JSONDecodeError) and log them with logger.exception, then optionally add a
final broad except Exception as fallback to preserve current behavior; ensure
you still set json_result = None and include the URL in log messages
(referencing url and logger.exception) so the function that performs the fetch
(the JSON fetcher that assigns json_result) remains behaviorally unchanged but
satisfies the linter.

packages/shared/fastfetchbot_shared/services/telegraph/__init__.py (1)

3-3: Remove unused traceback import.

The traceback module is imported but no longer used after the logging refactor.

🧹 Proposed fix

 # TODO: copy the html-to-telegraph package and modify it to fit the asynchronous model
 import random
-import traceback
 from typing import Any

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/shared/fastfetchbot_shared/services/telegraph/__init__.py` at line
3, Remove the unused "traceback" import from the top of the telegraph package
__init__.py; open
packages/shared/fastfetchbot_shared/services/telegraph/__init__.py, delete the
"import traceback" line (it’s no longer referenced after the logging refactor)
so the module has no unused imports.

packages/shared/fastfetchbot_shared/services/scrapers/instagram/__init__.py (1)

86-86: Use isinstance() for type checks (Ruff E721).

Static analysis flags this line. While not part of the direct changes, it's adjacent and easy to fix while you're here.

🧹 Proposed fix

-                elif type(ins_data) == str and "400" in ins_data:
+                elif isinstance(ins_data, str) and "400" in ins_data:

Note: Similar patterns exist on lines 75, 77, 79-80 with type(ins_data) == dict that could also be updated to isinstance(ins_data, dict).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/shared/fastfetchbot_shared/services/scrapers/instagram/__init__.py`
at line 86, The condition using type(ins_data) == str should be changed to use
isinstance(ins_data, str) to satisfy Ruff E721; update the specific branch
containing "elif type(ins_data) == str and '400' in ins_data" to use
isinstance(ins_data, str), and likewise replace other occurrences like
type(ins_data) == dict with isinstance(ins_data, dict) in the same module (the
branches handling ins_data) so all runtime type checks use isinstance.

apps/api/src/main.py (1)

43-48: Logging in middleware may not capture all exceptions handled by FastAPI exception handlers.

The call_next() function in BaseHTTPMiddleware may not always propagate exceptions as expected—FastAPI's exception handlers intercept exceptions before they reach the middleware's except block. This means logger.exception() here might not log all handled exceptions.

Consider whether this logging is necessary given that the exception handlers on lines 54-62 already log errors.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/api/src/main.py` around lines 43 - 48, The middleware's try/except
around call_next(request) (in the BaseHTTPMiddleware) duplicates logging that
FastAPI's exception handlers already perform and may miss exceptions the
handlers intercept; remove the try/except/logging (the logger.exception call
referencing request.method and request.url) and let the existing FastAPI
exception handlers (the handlers defined later in this module) perform error
logging, or alternatively move any needed contextual logging into those handler
functions so you don't rely on call_next's except block to capture handled
exceptions.

apps/telegram-bot/core/webhook/server.py (1)

76-86: Consider distinguishing client errors from server errors.

Currently, all exceptions (including KeyError for missing "data" key in the request payload) return 500. A missing required field is a client error (400), not a server error.

♻️ Suggested improvement

 async def send_message_endpoint(request: Request):
     try:
         data = await request.json()
+    except Exception:
+        logger.exception("Failed to parse send_message request body")
+        return JSONResponse({"error": "invalid JSON"}, status_code=400)
+    try:
         metadata_item = data["data"]
         chat_id = data.get("chat_id")
+    except KeyError as e:
+        return JSONResponse({"error": f"missing required field: {e}"}, status_code=400)
+    try:
         if isinstance(chat_id, str) and chat_id.startswith("-"):
             chat_id = int(chat_id)
         await send_item_message(metadata_item, chat_id=chat_id)
         return JSONResponse({"status": "ok"})
     except Exception:
         logger.exception("Failed to handle send_message request")
         return JSONResponse({"error": "Internal server error"}, status_code=500)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@apps/telegram-bot/core/webhook/server.py` around lines 76 - 86, The handler
currently treats all exceptions as server errors; update the request parsing
logic around data = await request.json() and metadata_item = data["data"] to
validate input and distinguish client errors: explicitly check that "data"
exists and that chat_id (from data.get("chat_id")) is of an acceptable type
(handle the negative-string-to-int conversion for values starting with "-" but
return a 400 JSONResponse when chat_id is malformed or required "data" is
missing), catching KeyError/ValueError/TypeError to return
JSONResponse({"error": "Bad request", "details": "<brief message>"} ,
status_code=400); keep the existing except Exception block around
send_item_message(...) to logger.exception(...) and return 500 for true
server-side failures.

apps/telegram-bot/core/services/message_sender.py (2)

306-312: Same pattern: Consider adding exc_info=True for debuggability.

These exception handlers follow the same pattern. Adding exc_info=True to the warning calls would help diagnose persistent download failures without changing the fault-isolation behavior.

Also applies to: 322-330

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@apps/telegram-bot/core/services/message_sender.py` around lines 306 - 312,
The logger.warning calls inside the exception handlers around the
download_file_by_metadata_item calls should include exception info for
debuggability; update the two except blocks that currently call
logger.warning(f"Skipping document download: {image_url}") (and the similar one
later) to pass exc_info=True (and optionally include the exception in the
message) so the stack trace is captured while preserving the current continue
behavior.

236-242: Silent failure masks non-retriable errors.

The broad except Exception catch silently skips all download failures, including authentication errors (401), rate limiting (429), and server errors (5xx). Per the download_file_by_metadata_item implementation (which doesn't validate HTTP status codes), non-retriable errors like auth failures are treated the same as transient network issues.

Consider logging the exception for debugging purposes:

♻️ Suggested improvement

                 try:
                     io_object = await download_file_by_metadata_item(
                         media_item["url"], data=data, file_format=file_format
                     )
                 except Exception:
-                    logger.warning(f"Skipping media download: {media_item['url']}")
+                    logger.warning(f"Skipping media download: {media_item['url']}", exc_info=True)
                     continue

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@apps/telegram-bot/core/services/message_sender.py` around lines 236 - 242,
The broad except in the media download loop swallows all errors; change the
handler to catch Exception as e, log the exception details (include
media_item["url"] and e) via logger.warning/error, and then inspect the
exception for an HTTP status (from the exception attributes or underlying
response) to re-raise or propagate non-retriable errors (e.g., 401/403/429 and
other client/server codes) while only skipping or retrying for transient network
errors; update the block around download_file_by_metadata_item and the
media_item loop to implement this selective re-raise and improved logging.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/telegram-bot/core/services/outbox_consumer.py`:
- Around line 45-47: The debug messages sent with HTML parse mode are including
unescaped user-controlled text (the `error` string in outbox_consumer.py's
send_debug_channel call and the `traceback.format_exc()` output used in
message_sender.py), which can break Telegram HTML parsing; wrap those
interpolated values with html.escape() before passing to send_debug_channel (and
anywhere you build messages for ParseMode.HTML) so special characters like <, >,
& are escaped while preserving your surrounding HTML-safe formatting.

In `@packages/shared/fastfetchbot_shared/services/scrapers/threads/__init__.py`:
- Line 101: The direct nested access assigning threads =
data["data"]["data"]["containing_thread"]["thread_items"] is unsafe and can
raise KeyError/TypeError; update the code (the assignment to threads) to
validate the structure before indexing—either wrap it in a try/except catching
KeyError/TypeError and handle/massage/log a clear error or use defensive lookups
(chain dict.get checks or utility to safely traverse keys) to default to an
empty list or None when any level is missing and then proceed accordingly;
ensure the fix logs enough context about the missing field and preserves the
existing downstream behavior when threads is absent.

In `@packages/shared/fastfetchbot_shared/services/scrapers/zhihu/__init__.py`:
- Line 562: The ScraperError raises for Zhihu request failures are losing the
original exception context; update both instances where you re-raise
ScraperError("zhihu request failed") so they chain the original exception (use
"from e") — e.g., in the request handling block(s) where ScraperError is raised
after catching an exception variable (likely named e), change to raise
ScraperError("zhihu request failed") from e so the root cause traceback is
preserved for Sentry and consistency with the earlier pattern used around line
394.

In `@tests/unit/file_export/test_transcribe_exceptions.py`:
- Line 19: Tests use hardcoded "/tmp" paths (Ruff S108); update the test to use
a secure temp path instead of "/tmp/nonexistent.mp3" by using pytest's tmp_path
fixture or Python's tempfile (e.g., create a Path via tmp_path /
"nonexistent.mp3" or use tempfile.NamedTemporaryFile/TemporaryDirectory to
produce a safe temp filename), then pass that generated path to get_audio_text
to replace the hardcoded strings referenced in test_transcribe_exceptions.py
(look for get_audio_text calls at the failing locations).

In `@tests/unit/scrapers/test_xiaohongshu.py`:
- Line 1296: The test's pytest.raises uses match="did not redirect to
xiaohongshu.com" where the dot is a regex metacharacter; change the pattern to
match the literal dot by escaping it (e.g., use "did not redirect to
xiaohongshu\\.com" or build the pattern with re.escape("did not redirect to
xiaohongshu.com")) so the assertion checks the exact string in the ScraperError.

In `@tests/unit/telegram_bot/test_webhook_server.py`:
- Around line 65-67: Replace the hardcoded secret by generating a runtime token
in the test and using that value for both the mocked setting and the request:
create a temporary token (e.g., via secrets.token_urlsafe() or uuid.uuid4().hex)
and assign it to mock_settings.TELEGRAM_BOT_SECRET_TOKEN, then ensure
mock_request carries the same token in the header/authorization used by
telegram_webhook; update the test assertion setup in
tests/unit/telegram_bot/test_webhook_server.py around the mock_settings and
telegram_webhook call to use this generated token instead of the literal
"correct".

---

Outside diff comments:
In `@packages/shared/fastfetchbot_shared/services/scrapers/threads/__init__.py`:
- Around line 74-104: Wrap the body of scrape_thread_data in a try/except that
catches Playwright/browser/navigation exceptions (around async_playwright(),
pw.chromium.launch(), new_context(), page.goto(), wait_for_selector and page
operations) and re-raises a ScraperError with a clear message; additionally
catch json.JSONDecodeError when calling json.loads(text) and KeyError/TypeError
when accessing data["data"]["data"]["containing_thread"]["thread_items"] and
raise a ScraperParseError with context; ensure you still close/cleanup the
browser/context/page in a finally block (or use context managers) and keep
existing parsing logic that calls parse_single_threads_data(thread["post"])
inside the guarded block so parse errors can be captured and converted to
ScraperParseError.

In `@packages/shared/fastfetchbot_shared/services/scrapers/weibo/scraper.py`:
- Around line 121-136: The hardcoded SUB cookie in the local cookies dict should
be moved to an environment-backed config constant: add WEIBO_SUB_COOKIE to
config.py and load it from the environment, then import that constant into
scraper.py and replace the hardcoded value in the cookies dict (the variable
named cookies) with WEIBO_SUB_COOKIE; also add a defensive check in the scraper
(raise a clear error or log and abort) if WEIBO_SUB_COOKIE is missing or empty
to avoid silently using an invalid cookie.

---

Nitpick comments:
In `@apps/api/src/main.py`:
- Around line 43-48: The middleware's try/except around call_next(request) (in
the BaseHTTPMiddleware) duplicates logging that FastAPI's exception handlers
already perform and may miss exceptions the handlers intercept; remove the
try/except/logging (the logger.exception call referencing request.method and
request.url) and let the existing FastAPI exception handlers (the handlers
defined later in this module) perform error logging, or alternatively move any
needed contextual logging into those handler functions so you don't rely on
call_next's except block to capture handled exceptions.

In `@apps/telegram-bot/core/services/message_sender.py`:
- Around line 306-312: The logger.warning calls inside the exception handlers
around the download_file_by_metadata_item calls should include exception info
for debuggability; update the two except blocks that currently call
logger.warning(f"Skipping document download: {image_url}") (and the similar one
later) to pass exc_info=True (and optionally include the exception in the
message) so the stack trace is captured while preserving the current continue
behavior.
- Around line 236-242: The broad except in the media download loop swallows all
errors; change the handler to catch Exception as e, log the exception details
(include media_item["url"] and e) via logger.warning/error, and then inspect the
exception for an HTTP status (from the exception attributes or underlying
response) to re-raise or propagate non-retriable errors (e.g., 401/403/429 and
other client/server codes) while only skipping or retrying for transient network
errors; update the block around download_file_by_metadata_item and the
media_item loop to implement this selective re-raise and improved logging.

In `@apps/telegram-bot/core/webhook/server.py`:
- Around line 76-86: The handler currently treats all exceptions as server
errors; update the request parsing logic around data = await request.json() and
metadata_item = data["data"] to validate input and distinguish client errors:
explicitly check that "data" exists and that chat_id (from data.get("chat_id"))
is of an acceptable type (handle the negative-string-to-int conversion for
values starting with "-" but return a 400 JSONResponse when chat_id is malformed
or required "data" is missing), catching KeyError/ValueError/TypeError to return
JSONResponse({"error": "Bad request", "details": "<brief message>"} ,
status_code=400); keep the existing except Exception block around
send_item_message(...) to logger.exception(...) and return 500 for true
server-side failures.

In `@packages/shared/fastfetchbot_shared/services/scrapers/instagram/__init__.py`:
- Line 86: The condition using type(ins_data) == str should be changed to use
isinstance(ins_data, str) to satisfy Ruff E721; update the specific branch
containing "elif type(ins_data) == str and '400' in ins_data" to use
isinstance(ins_data, str), and likewise replace other occurrences like
type(ins_data) == dict with isinstance(ins_data, dict) in the same module (the
branches handling ins_data) so all runtime type checks use isinstance.

In `@packages/shared/fastfetchbot_shared/services/scrapers/threads/__init__.py`:
- Around line 85-91: Wrap the Playwright usage so browser/context/page are
explicitly closed on error: keep the async_playwright() as pw but assign browser
= await pw.chromium.launch(), context = await browser.new_context(...), and page
= await context.new_page() inside a try block, and in a finally block call await
page.close() (if created), await context.close() and await browser.close();
ensure existing page.on("response", intercept_response) and await page.goto(url)
remain inside the try so any exception triggers the cleanup.

In `@packages/shared/fastfetchbot_shared/services/telegraph/__init__.py`:
- Line 3: Remove the unused "traceback" import from the top of the telegraph
package __init__.py; open
packages/shared/fastfetchbot_shared/services/telegraph/__init__.py, delete the
"import traceback" line (it’s no longer referenced after the logging refactor)
so the module has no unused imports.

In `@packages/shared/fastfetchbot_shared/utils/network.py`:
- Around line 41-43: The except block that currently catches a bare Exception
when fetching JSON (logging via logger.exception and setting json_result = None)
should be narrowed: first catch the main expected errors (httpx.HTTPError and
json.JSONDecodeError) and log them with logger.exception, then optionally add a
final broad except Exception as fallback to preserve current behavior; ensure
you still set json_result = None and include the URL in log messages
(referencing url and logger.exception) so the function that performs the fetch
(the JSON fetcher that assigns json_result) remains behaviorally unchanged but
satisfies the linter.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6e0e7050-49e1-4b25-978a-d5517e1f6080

📥 Commits

Reviewing files that changed from the base of the PR and between 5b1d57a and a694ac3.

📒 Files selected for processing (37)

apps/api/src/main.py
apps/telegram-bot/core/services/message_sender.py
apps/telegram-bot/core/services/outbox_consumer.py
apps/telegram-bot/core/webhook/server.py
packages/file-export/fastfetchbot_file_export/pdf_export.py
packages/file-export/fastfetchbot_file_export/transcribe.py
packages/file-export/fastfetchbot_file_export/video_download.py
packages/shared/fastfetchbot_shared/exceptions.py
packages/shared/fastfetchbot_shared/services/scrapers/bluesky/scraper.py
packages/shared/fastfetchbot_shared/services/scrapers/common.py
packages/shared/fastfetchbot_shared/services/scrapers/douban/__init__.py
packages/shared/fastfetchbot_shared/services/scrapers/general/firecrawl_client.py
packages/shared/fastfetchbot_shared/services/scrapers/general/zyte.py
packages/shared/fastfetchbot_shared/services/scrapers/instagram/__init__.py
packages/shared/fastfetchbot_shared/services/scrapers/scraper_manager.py
packages/shared/fastfetchbot_shared/services/scrapers/threads/__init__.py
packages/shared/fastfetchbot_shared/services/scrapers/twitter/__init__.py
packages/shared/fastfetchbot_shared/services/scrapers/weibo/scraper.py
packages/shared/fastfetchbot_shared/services/scrapers/xiaohongshu/adaptar.py
packages/shared/fastfetchbot_shared/services/scrapers/zhihu/__init__.py
packages/shared/fastfetchbot_shared/services/telegraph/__init__.py
packages/shared/fastfetchbot_shared/utils/network.py
tests/unit/file_export/test_pdf_export_exceptions.py
tests/unit/file_export/test_transcribe_exceptions.py
tests/unit/file_export/test_video_download.py
tests/unit/scrapers/test_bluesky.py
tests/unit/scrapers/test_common.py
tests/unit/scrapers/test_general_firecrawl.py
tests/unit/scrapers/test_general_zyte.py
tests/unit/scrapers/test_scraper_manager.py
tests/unit/scrapers/test_twitter_exceptions.py
tests/unit/scrapers/test_weibo.py
tests/unit/scrapers/test_xiaohongshu.py
tests/unit/telegram_bot/test_outbox_consumer.py
tests/unit/telegram_bot/test_webhook_server.py
tests/unit/test_api_exception_handlers.py
tests/unit/test_exceptions.py

coderabbitai · 2026-03-28T06:25:23Z

apps/telegram-bot/core/services/outbox_consumer.py

            if error:
                logger.warning(f"[{job_id}] Scrape failed: {error}")
-                await _send_error_to_chat(chat_id, error)
+                await send_debug_channel(
+                    f"[Scrape Error] job_id={job_id}\nchat_id: {chat_id}\n\n{error}"
+                )
            else:


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify producer intent vs consumer behavior for scrape errors. rg -n -C2 "Push error to outbox so the bot can notify the user|error=str\\(e\\)|chat_id=chat_id" apps/async-worker/async_worker/tasks/scrape.py rg -n -C4 "if error:|send_debug_channel|send_item_message" apps/telegram-bot/core/services/outbox_consumer.py

Repository: aturret/FastFetchBot

Length of output: 1708

🏁 Script executed:

fd message_sender.py

Repository: aturret/FastFetchBot

Length of output: 113

🏁 Script executed:

cat -n apps/telegram-bot/core/services/message_sender.py

Repository: aturret/FastFetchBot

Length of output: 19294

Error payloads are no longer delivered to the originating user chat.

Line 43–48 now routes scrape failures only to the debug channel. The producer at apps/async-worker/async_worker/tasks/scrape.py:85 explicitly documents that errors are pushed to outbox "so the bot can notify the user," but the consumer never calls send_item_message() to deliver the error to the originating chat_id. This silently fails user flows where the requester never receives a failure response.

Additionally, the raw error string (from str(e)) is sent to send_debug_channel() which uses ParseMode.HTML without escaping, risking malformed messages if the exception text contains HTML-unsafe characters like <, >, or &.

apps/telegram-bot/core/services/outbox_consumer.py

coderabbitai · 2026-03-28T06:25:23Z

packages/shared/fastfetchbot_shared/services/scrapers/threads/__init__.py

                data = json.loads(text)
-                print(json.dumps(data, indent=4, ensure_ascii=False))
+                logger.debug(json.dumps(data, indent=4, ensure_ascii=False))
                threads = data["data"]["data"]["containing_thread"]["thread_items"]


⚠️ Potential issue | 🟠 Major

Add validation for nested data access.

The deeply nested dictionary access data["data"]["data"]["containing_thread"]["thread_items"] can raise KeyError if the GraphQL response structure is unexpected or incomplete. This should be validated or wrapped in exception handling.

🛡️ Proposed defensive approach

- threads = data["data"]["data"]["containing_thread"]["thread_items"] + try: + threads = data["data"]["data"]["containing_thread"]["thread_items"] + except KeyError as e: + logger.error(f"Unexpected GraphQL response structure: {e}") + continue

Or use safe dictionary access:

- threads = data["data"]["data"]["containing_thread"]["thread_items"] + threads = data.get("data", {}).get("data", {}).get("containing_thread", {}).get("thread_items", []) + if not threads: + logger.warning("No thread items found in GraphQL response") + continue

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@packages/shared/fastfetchbot_shared/services/scrapers/threads/__init__.py` at line 101, The direct nested access assigning threads = data["data"]["data"]["containing_thread"]["thread_items"] is unsafe and can raise KeyError/TypeError; update the code (the assignment to threads) to validate the structure before indexing—either wrap it in a try/except catching KeyError/TypeError and handle/massage/log a clear error or use defensive lookups (chain dict.get checks or utility to safely traverse keys) to default to an empty list or None when any level is missing and then proceed accordingly; ensure the fix logs enough context about the missing field and preserves the existing downstream behavior when threads is absent.

packages/shared/fastfetchbot_shared/services/scrapers/zhihu/__init__.py

tests/unit/file_export/test_transcribe_exceptions.py

coderabbitai · 2026-03-28T06:25:23Z

tests/unit/scrapers/test_xiaohongshu.py

            MockClient.return_value = mock_client

-            with pytest.raises(RuntimeError, match="did not redirect to xiaohongshu.com"):
+            with pytest.raises(ScraperError, match="did not redirect to xiaohongshu.com"):


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify regex-style `match=` strings containing unescaped dot in this test file. rg -nP 'pytest\.raises\([^)]*match="[^"]*\.[^"]*"' tests/unit/scrapers/test_xiaohongshu.py

Repository: aturret/FastFetchBot

Length of output: 159

Escape regex metacharacters in match= parameter for exact pattern matching.

At line 1296, the . in match="did not redirect to xiaohongshu.com" is treated as a regex metacharacter matching any character, not a literal dot. This should be escaped for precision:

Proposed fix

- with pytest.raises(ScraperError, match="did not redirect to xiaohongshu.com"): + with pytest.raises(ScraperError, match=r"did not redirect to xiaohongshu\.com"):

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

with pytest.raises(ScraperError, match="did not redirect to xiaohongshu.com"):

with pytest.raises(ScraperError, match=r"did not redirect to xiaohongshu\.com"):

🧰 Tools

🪛 Ruff (0.15.7)

[warning] 1296-1296: Pattern passed to match= contains metacharacters but is neither escaped nor raw

(RUF043)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/unit/scrapers/test_xiaohongshu.py` at line 1296, The test's pytest.raises uses match="did not redirect to xiaohongshu.com" where the dot is a regex metacharacter; change the pattern to match the literal dot by escaping it (e.g., use "did not redirect to xiaohongshu\\.com" or build the pattern with re.escape("did not redirect to xiaohongshu.com")) so the assertion checks the exact string in the ScraperError.

coderabbitai · 2026-03-28T06:25:23Z

tests/unit/telegram_bot/test_webhook_server.py

+        with patch("core.webhook.server.settings") as mock_settings:
+            mock_settings.TELEGRAM_BOT_SECRET_TOKEN = "correct"
+            response = await telegram_webhook(mock_request)


⚠️ Potential issue | 🟡 Minor

Avoid hardcoded secret-token literals, even in tests.

Line 66 assigns a literal to TELEGRAM_BOT_SECRET_TOKEN; this is likely why S105 triggers and it weakens secret-hygiene consistency. Prefer a generated runtime value for this test.

Proposed fix

+import secrets @@ - mock_settings.TELEGRAM_BOT_SECRET_TOKEN = "correct" + mock_settings.TELEGRAM_BOT_SECRET_TOKEN = secrets.token_urlsafe(16) response = await telegram_webhook(mock_request) assert response.status_code == 401

As per coding guidelines "Sensitive cookies and tokens must be stored in environment variables, never hardcoded in source code".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

with patch("core.webhook.server.settings") as mock_settings:

mock_settings.TELEGRAM_BOT_SECRET_TOKEN = "correct"

response = await telegram_webhook(mock_request)

with patch("core.webhook.server.settings") as mock_settings:

mock_settings.TELEGRAM_BOT_SECRET_TOKEN = secrets.token_urlsafe(16)

response = await telegram_webhook(mock_request)

🧰 Tools

🪛 Ruff (0.15.7)

[error] 66-66: Possible hardcoded password assigned to: "TELEGRAM_BOT_SECRET_TOKEN"

(S105)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/unit/telegram_bot/test_webhook_server.py` around lines 65 - 67, Replace the hardcoded secret by generating a runtime token in the test and using that value for both the mocked setting and the request: create a temporary token (e.g., via secrets.token_urlsafe() or uuid.uuid4().hex) and assign it to mock_settings.TELEGRAM_BOT_SECRET_TOKEN, then ensure mock_request carries the same token in the header/authorization used by telegram_webhook; update the test assertion setup in tests/unit/telegram_bot/test_webhook_server.py around the mock_settings and telegram_webhook call to use this generated token instead of the literal "correct".

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

tests/unit/test_network.py (1)

75-87: Test name claims to verify logging but doesn't assert on it.

The test is named test_exception_returns_none_and_logs but only asserts on the return value. Consider either:

Removing "and_logs" from the name, or
Adding a mock/spy on the logger to verify logger.exception was called

Example assertion for logging verification

+    `@pytest.mark.asyncio`
+    async def test_exception_returns_none_and_logs(self):
+        from fastfetchbot_shared.utils.network import get_response_json
+
-        with patch(
+        with patch("fastfetchbot_shared.utils.network.logger") as mock_logger, \
+             patch(
             "fastfetchbot_shared.utils.network.get_response",
             new_callable=AsyncMock,
             side_effect=ConnectionError("timeout"),
         ):
             result = await get_response_json("https://example.com/api")
 
         assert result is None
+        mock_logger.exception.assert_called_once()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/unit/test_network.py` around lines 75 - 87, The test
TestGetResponseJson.test_exception_returns_none_and_logs currently only asserts
the return value from get_response_json but the name implies it should also
verify logging; either rename the test to remove "and_logs" or add a mock/spy
for the module logger (e.g., patch
fastfetchbot_shared.utils.network.logger.exception) around the call to
get_response_json and assert logger.exception was called once with the expected
context after the ConnectionError side_effect.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/file_export/test_transcribe_exceptions.py`:
- Around line 77-83: Replace hardcoded "/tmp" test paths with the pytest
tmp_path fixture: update the patched os.path.splitext return_value, the call to
get_audio_text, and any other hardcoded file references (e.g. where
MockAudioSegment.from_file and builtins.open are used) to use tmp_path /
"audio.mp3" (or equivalent Path.as_posix()) so the test uses a temporary,
lint-clean path; ensure patches that mock open and splitext return the updated
tmp_path-based string and pass tmp_path.joinpath("audio.mp3").as_posix() into
get_audio_text("...") instead of "/tmp/audio.mp3".

In `@tests/unit/telegram_bot/test_message_sender.py`:
- Around line 110-115: The test captures media_group from media_files_packaging
but never asserts its contents; update the test for media_files_packaging to
assert that media_group contains the packaged image (e.g., assert media_group is
not empty and that its first item represents a photo or has the expected photo
attribute/type), while keeping the existing assertion that file_group == [] to
ensure document failures are skipped. Use the media_group and file_group
variables already in the test and check the photo-specific property (type/name)
on the media_group item to validate the image was packaged.

In `@tests/unit/test_network.py`:
- Around line 51-73: The test name claims redirect-following but the
implementation of download_file_by_metadata_item only reads the Location header
and returns response.content from the 302; either rename the test to reflect
that behavior (e.g., test_redirect_response_returns_named_bytes_io) or update
download_file_by_metadata_item to follow redirects by detecting a 3xx response,
extracting response.headers["Location"], performing a second GET to that URL
with the same AsyncClient (or a new one) and returning the final
response.content; if you implement redirect-following also update the test to
assert mock_client.get.call_count == 2 and that the second call used the
Location URL so the Location header extraction and subsequent request are
validated.

---

Nitpick comments:
In `@tests/unit/test_network.py`:
- Around line 75-87: The test
TestGetResponseJson.test_exception_returns_none_and_logs currently only asserts
the return value from get_response_json but the name implies it should also
verify logging; either rename the test to remove "and_logs" or add a mock/spy
for the module logger (e.g., patch
fastfetchbot_shared.utils.network.logger.exception) around the call to
get_response_json and assert logger.exception was called once with the expected
context after the ConnectionError side_effect.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 121cf7c0-2072-42eb-a1c2-9e1e84453c05

📥 Commits

Reviewing files that changed from the base of the PR and between a694ac3 and 6389f0e.

📒 Files selected for processing (4)

CLAUDE.md
tests/unit/file_export/test_transcribe_exceptions.py
tests/unit/telegram_bot/test_message_sender.py
tests/unit/test_network.py

✅ Files skipped from review due to trivial changes (1)

CLAUDE.md

coderabbitai · 2026-03-28T06:29:35Z

tests/unit/file_export/test_transcribe_exceptions.py

+             patch("fastfetchbot_file_export.transcribe.os.path.splitext", return_value=("/tmp/audio", ".mp3")), \
+             patch("builtins.open", mock_open(read_data=b"audio data")), \
+             patch("fastfetchbot_file_export.transcribe.milliseconds_until_sound", return_value=0):
+
+            MockAudioSegment.from_file.return_value = mock_audio
+
+            result = get_audio_text("/tmp/audio.mp3", "test-api-key")


⚠️ Potential issue | 🟠 Major

Replace remaining hardcoded /tmp test paths to fix Ruff S108.

Line 77, Line 83, and Line 101 still use fixed /tmp paths. Use tmp_path so tests are secure and lint-clean.

🔧 Suggested fix

class TestGetAudioTextHappyPath: - def test_successful_transcription(self): + def test_successful_transcription(self, tmp_path): from fastfetchbot_file_export.transcribe import get_audio_text + audio_file = tmp_path / "audio.mp3" + audio_stem = tmp_path / "audio" # Mock OpenAI client mock_client = MagicMock() @@ with patch("fastfetchbot_file_export.transcribe.OpenAI", return_value=mock_client), \ patch("fastfetchbot_file_export.transcribe.AudioSegment") as MockAudioSegment, \ patch("fastfetchbot_file_export.transcribe.os.remove") as mock_remove, \ - patch("fastfetchbot_file_export.transcribe.os.path.splitext", return_value=("/tmp/audio", ".mp3")), \ + patch("fastfetchbot_file_export.transcribe.os.path.splitext", return_value=(str(audio_stem), ".mp3")), \ patch("builtins.open", mock_open(read_data=b"audio data")), \ patch("fastfetchbot_file_export.transcribe.milliseconds_until_sound", return_value=0): @@ - result = get_audio_text("/tmp/audio.mp3", "test-api-key") + result = get_audio_text(str(audio_file), "test-api-key") @@ class TestGetAudioTextHappyPath: - def test_audio_segment_failure_wraps_in_file_export_error(self): + def test_audio_segment_failure_wraps_in_file_export_error(self, tmp_path): from fastfetchbot_file_export.transcribe import get_audio_text @@ MockAudioSegment.from_file.side_effect = FileNotFoundError("no such file") with pytest.raises(FileExportError, match="Audio transcription failed"): - get_audio_text("/tmp/missing.mp3", "key") + get_audio_text(str(tmp_path / "missing.mp3"), "key")

Also applies to: 101-101

🧰 Tools

🪛 Ruff (0.15.7)

[error] 77-77: Probable insecure usage of temporary file or directory: "/tmp/audio"

(S108)

[error] 83-83: Probable insecure usage of temporary file or directory: "/tmp/audio.mp3"

(S108)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/unit/file_export/test_transcribe_exceptions.py` around lines 77 - 83, Replace hardcoded "/tmp" test paths with the pytest tmp_path fixture: update the patched os.path.splitext return_value, the call to get_audio_text, and any other hardcoded file references (e.g. where MockAudioSegment.from_file and builtins.open are used) to use tmp_path / "audio.mp3" (or equivalent Path.as_posix()) so the test uses a temporary, lint-clean path; ensure patches that mock open and splitext return the updated tmp_path-based string and pass tmp_path.joinpath("audio.mp3").as_posix() into get_audio_text("...") instead of "/tmp/audio.mp3".

coderabbitai · 2026-03-28T06:29:35Z

tests/unit/telegram_bot/test_message_sender.py

+            media_group, file_group = await media_files_packaging(media_files, data)
+
+        # The image media_group should have the photo, but file_group should be empty
+        # because the document download failed and was skipped
+        assert file_group == []
+


⚠️ Potential issue | 🟡 Minor

Assert that media_group still contains the packaged image.

On Line 110, media_group is captured but never validated. This test can pass even if photo packaging regresses, despite the test description saying packaging should continue.

✅ Suggested test assertion update

- media_group, file_group = await media_files_packaging(media_files, data) + media_group, file_group = await media_files_packaging(media_files, data) # The image media_group should have the photo, but file_group should be empty # because the document download failed and was skipped + assert len(media_group) == 1 assert file_group == []

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

media_group, file_group = await media_files_packaging(media_files, data)

# The image media_group should have the photo, but file_group should be empty

# because the document download failed and was skipped

assert file_group == []

media_group, file_group = await media_files_packaging(media_files, data)

# The image media_group should have the photo, but file_group should be empty

# because the document download failed and was skipped

assert len(media_group) == 1

assert file_group == []

🧰 Tools

🪛 Ruff (0.15.7)

[warning] 110-110: Unpacked variable media_group is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/unit/telegram_bot/test_message_sender.py` around lines 110 - 115, The test captures media_group from media_files_packaging but never asserts its contents; update the test for media_files_packaging to assert that media_group contains the packaged image (e.g., assert media_group is not empty and that its first item represents a photo or has the expected photo attribute/type), while keeping the existing assertion that file_group == [] to ensure document failures are skipped. Use the media_group and file_group variables already in the test and check the photo-specific property (type/name) on the media_group item to validate the image was packaged.

coderabbitai · 2026-03-28T06:29:35Z

tests/unit/test_network.py

+    @pytest.mark.asyncio
+    async def test_redirect_follows_location(self):
+        from fastfetchbot_shared.utils.network import download_file_by_metadata_item
+
+        mock_response = MagicMock()
+        mock_response.status_code = 302
+        mock_response.headers = {"Location": "https://cdn.example.com/real.jpg"}
+        mock_response.content = b"redirected content"
+
+        mock_client = AsyncMock()
+        mock_client.get = AsyncMock(return_value=mock_response)
+        mock_client.__aenter__ = AsyncMock(return_value=mock_client)
+        mock_client.__aexit__ = AsyncMock(return_value=False)
+
+        with patch("fastfetchbot_shared.utils.network.httpx.AsyncClient", return_value=mock_client), \
+             patch("fastfetchbot_shared.utils.network.get_random_user_agent", return_value="TestAgent"):
+            result = await download_file_by_metadata_item(
+                url="https://example.com/image.jpg",
+                data={"url": "https://example.com", "category": "twitter"},
+            )
+
+        assert isinstance(result, NamedBytesIO)
+


⚠️ Potential issue | 🟡 Minor

Test name is misleading - implementation doesn't follow redirects.

Looking at download_file_by_metadata_item (network.py lines 154-155), the function extracts the Location header into a local variable but then uses response.content from the original 302 response—no second request is made to the redirect URL.

This test only verifies that a 302 response doesn't crash the function. Consider either:

Renaming to test_redirect_response_returns_named_bytes_io to reflect actual behavior, or

If redirect-following is intended, the implementation needs to make a follow-up request

The current test also doesn't verify the Location header extraction—you could assert mock_client.get.call_count or check some observable effect of the URL update.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/unit/test_network.py` around lines 51 - 73, The test name claims redirect-following but the implementation of download_file_by_metadata_item only reads the Location header and returns response.content from the 302; either rename the test to reflect that behavior (e.g., test_redirect_response_returns_named_bytes_io) or update download_file_by_metadata_item to follow redirects by detecting a 3xx response, extracting response.headers["Location"], performing a second GET to that URL with the same AsyncClient (or a new one) and returning the final response.content; if you implement redirect-following also update the test to assert mock_client.get.call_count == 2 and that the second call used the Location URL so the Location header extraction and subsequent request are validated.

aturret added 3 commits March 27, 2026 23:30

feat: send bot error message to dev channel instead

82d4131

feat: add exception handling

b542a68

feat: add tests for exception handlings

a694ac3

feat: update CLAUDE.md

dcbeb34

coderabbitai bot reviewed Mar 28, 2026

View reviewed changes

feat: add unit tests for new code

6389f0e

coderabbitai bot reviewed Mar 28, 2026

View reviewed changes

fix: code glitches

0697a13

aturret merged commit c91e7d6 into main Mar 28, 2026
3 checks passed

This was referenced Mar 30, 2026

feat: refactor item metadata database #73

Merged

feat: Telegram file id update #74

Merged

	with pytest.raises(ScraperError, match="did not redirect to xiaohongshu.com"):
	with pytest.raises(ScraperError, match=r"did not redirect to xiaohongshu\.com"):

Conversation

aturret commented Mar 28, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

codecov bot commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aturret commented Mar 28, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 28, 2026 •

edited

Loading

codecov bot commented Mar 28, 2026 •

edited

Loading