Conversation
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 17 minutes and 14 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughIntroduces a domain-specific exception hierarchy and widespread error-handling standardization: new FastFetchBotError subclasses, centralized FastAPI handlers (502 for domain errors, 500 for generic), structured logging via logger.exception/debug, per-item fault isolation in media processing, and updated tests reflecting new exception semantics. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
packages/shared/fastfetchbot_shared/services/scrapers/threads/__init__.py (1)
74-104:⚠️ Potential issue | 🟠 MajorAdd exception handling to the
scrape_thread_datamethod to handle browser operations, page navigation, JSON parsing, and data extraction failures.This function lacks try/except blocks for multiple failure points:
- Browser launch and navigation failures (
pw.chromium.launch(),page.goto())- JSON parsing failures (
json.loads()on line 100)- Dictionary key access failures (lines 101-102)
Wrap the scraping logic in appropriate exception handling and consider raising custom exceptions (similar to
ScraperErrorandScraperParseErrorused in other scrapers) to align with the PR's exception handling refactoring objectives.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/shared/fastfetchbot_shared/services/scrapers/threads/__init__.py` around lines 74 - 104, Wrap the body of scrape_thread_data in a try/except that catches Playwright/browser/navigation exceptions (around async_playwright(), pw.chromium.launch(), new_context(), page.goto(), wait_for_selector and page operations) and re-raises a ScraperError with a clear message; additionally catch json.JSONDecodeError when calling json.loads(text) and KeyError/TypeError when accessing data["data"]["data"]["containing_thread"]["thread_items"] and raise a ScraperParseError with context; ensure you still close/cleanup the browser/context/page in a finally block (or use context managers) and keep existing parsing logic that calls parse_single_threads_data(thread["post"]) inside the guarded block so parse errors can be captured and converted to ScraperParseError.packages/shared/fastfetchbot_shared/services/scrapers/weibo/scraper.py (1)
121-136:⚠️ Potential issue | 🟡 MinorHardcoded SUB cookie should be moved to environment variables.
Line 122 contains a hardcoded authentication cookie (
SUB). Per coding guidelines, sensitive cookies and tokens must be stored in environment variables, never hardcoded in source code. This cookie appears to be used for Weibo API authentication.🔧 Suggested fix
async def _get_weibo_info_api(self) -> dict: """Fetch weibo info using the API with SUB cookie and isGetLongText=true.""" url = self.ajax_url + "&isGetLongText=true" headers = {"referer": WEIBO_HOST} - cookies = { - "SUB": "_2AkMR47Mlf8NxqwFRmfocxG_lbox2wg7EieKnv0L-JRMxHRl-yT9yqhFdtRB6OmOdyoia9pKPkqoHRRmSBA_WNPaHuybH", - } + cookies = { + "SUB": settings.WEIBO_SUB_COOKIE, # Define in config.py + } if settings.WEIBO_SUB_COOKIE else {}Add
WEIBO_SUB_COOKIEtopackages/shared/fastfetchbot_shared/services/scrapers/config.py.As per coding guidelines: "Sensitive cookies and tokens must be stored in environment variables, never hardcoded in source code".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/shared/fastfetchbot_shared/services/scrapers/weibo/scraper.py` around lines 121 - 136, The hardcoded SUB cookie in the local cookies dict should be moved to an environment-backed config constant: add WEIBO_SUB_COOKIE to config.py and load it from the environment, then import that constant into scraper.py and replace the hardcoded value in the cookies dict (the variable named cookies) with WEIBO_SUB_COOKIE; also add a defensive check in the scraper (raise a clear error or log and abort) if WEIBO_SUB_COOKIE is missing or empty to avoid silently using an invalid cookie.
🧹 Nitpick comments (8)
packages/shared/fastfetchbot_shared/services/scrapers/threads/__init__.py (1)
85-91: Consider explicit browser cleanup.While
async_playwright()context manager handles cleanup, it's safer to explicitly ensure browser and context are closed, especially if exceptions occur during page operations.♻️ Proposed explicit cleanup
async with async_playwright() as pw: browser = await pw.chromium.launch() - context = await browser.new_context(viewport={"width": 1920, "height": 1080}) - page = await context.new_page() - page.on("response", intercept_response) # enable background request intercepting - await page.goto(url) # go to url and wait for the page to load - await page.wait_for_selector("[data-pressable-container=true]") # wait for page to finish loading + try: + context = await browser.new_context(viewport={"width": 1920, "height": 1080}) + page = await context.new_page() + page.on("response", intercept_response) + await page.goto(url) + await page.wait_for_selector("[data-pressable-container=true]") + finally: + await browser.close()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/shared/fastfetchbot_shared/services/scrapers/threads/__init__.py` around lines 85 - 91, Wrap the Playwright usage so browser/context/page are explicitly closed on error: keep the async_playwright() as pw but assign browser = await pw.chromium.launch(), context = await browser.new_context(...), and page = await context.new_page() inside a try block, and in a finally block call await page.close() (if created), await context.close() and await browser.close(); ensure existing page.on("response", intercept_response) and await page.goto(url) remain inside the try so any exception triggers the cleanup.packages/shared/fastfetchbot_shared/utils/network.py (1)
41-43: Catching bareExceptionis acceptable here but consider narrowing scope.The static analysis flags this as
BLE001. For a JSON-fetching utility that returnsNoneon failure, this pattern is reasonable since various exceptions can occur (network, parsing, etc.). Thelogger.exception()properly captures the full stack trace.If you want to silence the linter while being explicit, you could catch
(httpx.HTTPError, json.JSONDecodeError)as the primary cases, with a fallbackExceptionhandler.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/shared/fastfetchbot_shared/utils/network.py` around lines 41 - 43, The except block that currently catches a bare Exception when fetching JSON (logging via logger.exception and setting json_result = None) should be narrowed: first catch the main expected errors (httpx.HTTPError and json.JSONDecodeError) and log them with logger.exception, then optionally add a final broad except Exception as fallback to preserve current behavior; ensure you still set json_result = None and include the URL in log messages (referencing url and logger.exception) so the function that performs the fetch (the JSON fetcher that assigns json_result) remains behaviorally unchanged but satisfies the linter.packages/shared/fastfetchbot_shared/services/telegraph/__init__.py (1)
3-3: Remove unusedtracebackimport.The
tracebackmodule is imported but no longer used after the logging refactor.🧹 Proposed fix
# TODO: copy the html-to-telegraph package and modify it to fit the asynchronous model import random -import traceback from typing import Any🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/shared/fastfetchbot_shared/services/telegraph/__init__.py` at line 3, Remove the unused "traceback" import from the top of the telegraph package __init__.py; open packages/shared/fastfetchbot_shared/services/telegraph/__init__.py, delete the "import traceback" line (it’s no longer referenced after the logging refactor) so the module has no unused imports.packages/shared/fastfetchbot_shared/services/scrapers/instagram/__init__.py (1)
86-86: Useisinstance()for type checks (Ruff E721).Static analysis flags this line. While not part of the direct changes, it's adjacent and easy to fix while you're here.
🧹 Proposed fix
- elif type(ins_data) == str and "400" in ins_data: + elif isinstance(ins_data, str) and "400" in ins_data:Note: Similar patterns exist on lines 75, 77, 79-80 with
type(ins_data) == dictthat could also be updated toisinstance(ins_data, dict).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/shared/fastfetchbot_shared/services/scrapers/instagram/__init__.py` at line 86, The condition using type(ins_data) == str should be changed to use isinstance(ins_data, str) to satisfy Ruff E721; update the specific branch containing "elif type(ins_data) == str and '400' in ins_data" to use isinstance(ins_data, str), and likewise replace other occurrences like type(ins_data) == dict with isinstance(ins_data, dict) in the same module (the branches handling ins_data) so all runtime type checks use isinstance.apps/api/src/main.py (1)
43-48: Logging in middleware may not capture all exceptions handled by FastAPI exception handlers.The
call_next()function inBaseHTTPMiddlewaremay not always propagate exceptions as expected—FastAPI's exception handlers intercept exceptions before they reach the middleware's except block. This meanslogger.exception()here might not log all handled exceptions.Consider whether this logging is necessary given that the exception handlers on lines 54-62 already log errors.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/api/src/main.py` around lines 43 - 48, The middleware's try/except around call_next(request) (in the BaseHTTPMiddleware) duplicates logging that FastAPI's exception handlers already perform and may miss exceptions the handlers intercept; remove the try/except/logging (the logger.exception call referencing request.method and request.url) and let the existing FastAPI exception handlers (the handlers defined later in this module) perform error logging, or alternatively move any needed contextual logging into those handler functions so you don't rely on call_next's except block to capture handled exceptions.apps/telegram-bot/core/webhook/server.py (1)
76-86: Consider distinguishing client errors from server errors.Currently, all exceptions (including
KeyErrorfor missing"data"key in the request payload) return 500. A missing required field is a client error (400), not a server error.♻️ Suggested improvement
async def send_message_endpoint(request: Request): try: data = await request.json() + except Exception: + logger.exception("Failed to parse send_message request body") + return JSONResponse({"error": "invalid JSON"}, status_code=400) + try: metadata_item = data["data"] chat_id = data.get("chat_id") + except KeyError as e: + return JSONResponse({"error": f"missing required field: {e}"}, status_code=400) + try: if isinstance(chat_id, str) and chat_id.startswith("-"): chat_id = int(chat_id) await send_item_message(metadata_item, chat_id=chat_id) return JSONResponse({"status": "ok"}) except Exception: logger.exception("Failed to handle send_message request") return JSONResponse({"error": "Internal server error"}, status_code=500)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/telegram-bot/core/webhook/server.py` around lines 76 - 86, The handler currently treats all exceptions as server errors; update the request parsing logic around data = await request.json() and metadata_item = data["data"] to validate input and distinguish client errors: explicitly check that "data" exists and that chat_id (from data.get("chat_id")) is of an acceptable type (handle the negative-string-to-int conversion for values starting with "-" but return a 400 JSONResponse when chat_id is malformed or required "data" is missing), catching KeyError/ValueError/TypeError to return JSONResponse({"error": "Bad request", "details": "<brief message>"} , status_code=400); keep the existing except Exception block around send_item_message(...) to logger.exception(...) and return 500 for true server-side failures.apps/telegram-bot/core/services/message_sender.py (2)
306-312: Same pattern: Consider addingexc_info=Truefor debuggability.These exception handlers follow the same pattern. Adding
exc_info=Trueto the warning calls would help diagnose persistent download failures without changing the fault-isolation behavior.Also applies to: 322-330
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/telegram-bot/core/services/message_sender.py` around lines 306 - 312, The logger.warning calls inside the exception handlers around the download_file_by_metadata_item calls should include exception info for debuggability; update the two except blocks that currently call logger.warning(f"Skipping document download: {image_url}") (and the similar one later) to pass exc_info=True (and optionally include the exception in the message) so the stack trace is captured while preserving the current continue behavior.
236-242: Silent failure masks non-retriable errors.The broad
except Exceptioncatch silently skips all download failures, including authentication errors (401), rate limiting (429), and server errors (5xx). Per thedownload_file_by_metadata_itemimplementation (which doesn't validate HTTP status codes), non-retriable errors like auth failures are treated the same as transient network issues.Consider logging the exception for debugging purposes:
♻️ Suggested improvement
try: io_object = await download_file_by_metadata_item( media_item["url"], data=data, file_format=file_format ) except Exception: - logger.warning(f"Skipping media download: {media_item['url']}") + logger.warning(f"Skipping media download: {media_item['url']}", exc_info=True) continue🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/telegram-bot/core/services/message_sender.py` around lines 236 - 242, The broad except in the media download loop swallows all errors; change the handler to catch Exception as e, log the exception details (include media_item["url"] and e) via logger.warning/error, and then inspect the exception for an HTTP status (from the exception attributes or underlying response) to re-raise or propagate non-retriable errors (e.g., 401/403/429 and other client/server codes) while only skipping or retrying for transient network errors; update the block around download_file_by_metadata_item and the media_item loop to implement this selective re-raise and improved logging.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/telegram-bot/core/services/outbox_consumer.py`:
- Around line 45-47: The debug messages sent with HTML parse mode are including
unescaped user-controlled text (the `error` string in outbox_consumer.py's
send_debug_channel call and the `traceback.format_exc()` output used in
message_sender.py), which can break Telegram HTML parsing; wrap those
interpolated values with html.escape() before passing to send_debug_channel (and
anywhere you build messages for ParseMode.HTML) so special characters like <, >,
& are escaped while preserving your surrounding HTML-safe formatting.
In `@packages/shared/fastfetchbot_shared/services/scrapers/threads/__init__.py`:
- Line 101: The direct nested access assigning threads =
data["data"]["data"]["containing_thread"]["thread_items"] is unsafe and can
raise KeyError/TypeError; update the code (the assignment to threads) to
validate the structure before indexing—either wrap it in a try/except catching
KeyError/TypeError and handle/massage/log a clear error or use defensive lookups
(chain dict.get checks or utility to safely traverse keys) to default to an
empty list or None when any level is missing and then proceed accordingly;
ensure the fix logs enough context about the missing field and preserves the
existing downstream behavior when threads is absent.
In `@packages/shared/fastfetchbot_shared/services/scrapers/zhihu/__init__.py`:
- Line 562: The ScraperError raises for Zhihu request failures are losing the
original exception context; update both instances where you re-raise
ScraperError("zhihu request failed") so they chain the original exception (use
"from e") — e.g., in the request handling block(s) where ScraperError is raised
after catching an exception variable (likely named e), change to raise
ScraperError("zhihu request failed") from e so the root cause traceback is
preserved for Sentry and consistency with the earlier pattern used around line
394.
In `@tests/unit/file_export/test_transcribe_exceptions.py`:
- Line 19: Tests use hardcoded "/tmp" paths (Ruff S108); update the test to use
a secure temp path instead of "/tmp/nonexistent.mp3" by using pytest's tmp_path
fixture or Python's tempfile (e.g., create a Path via tmp_path /
"nonexistent.mp3" or use tempfile.NamedTemporaryFile/TemporaryDirectory to
produce a safe temp filename), then pass that generated path to get_audio_text
to replace the hardcoded strings referenced in test_transcribe_exceptions.py
(look for get_audio_text calls at the failing locations).
In `@tests/unit/scrapers/test_xiaohongshu.py`:
- Line 1296: The test's pytest.raises uses match="did not redirect to
xiaohongshu.com" where the dot is a regex metacharacter; change the pattern to
match the literal dot by escaping it (e.g., use "did not redirect to
xiaohongshu\\.com" or build the pattern with re.escape("did not redirect to
xiaohongshu.com")) so the assertion checks the exact string in the ScraperError.
In `@tests/unit/telegram_bot/test_webhook_server.py`:
- Around line 65-67: Replace the hardcoded secret by generating a runtime token
in the test and using that value for both the mocked setting and the request:
create a temporary token (e.g., via secrets.token_urlsafe() or uuid.uuid4().hex)
and assign it to mock_settings.TELEGRAM_BOT_SECRET_TOKEN, then ensure
mock_request carries the same token in the header/authorization used by
telegram_webhook; update the test assertion setup in
tests/unit/telegram_bot/test_webhook_server.py around the mock_settings and
telegram_webhook call to use this generated token instead of the literal
"correct".
---
Outside diff comments:
In `@packages/shared/fastfetchbot_shared/services/scrapers/threads/__init__.py`:
- Around line 74-104: Wrap the body of scrape_thread_data in a try/except that
catches Playwright/browser/navigation exceptions (around async_playwright(),
pw.chromium.launch(), new_context(), page.goto(), wait_for_selector and page
operations) and re-raises a ScraperError with a clear message; additionally
catch json.JSONDecodeError when calling json.loads(text) and KeyError/TypeError
when accessing data["data"]["data"]["containing_thread"]["thread_items"] and
raise a ScraperParseError with context; ensure you still close/cleanup the
browser/context/page in a finally block (or use context managers) and keep
existing parsing logic that calls parse_single_threads_data(thread["post"])
inside the guarded block so parse errors can be captured and converted to
ScraperParseError.
In `@packages/shared/fastfetchbot_shared/services/scrapers/weibo/scraper.py`:
- Around line 121-136: The hardcoded SUB cookie in the local cookies dict should
be moved to an environment-backed config constant: add WEIBO_SUB_COOKIE to
config.py and load it from the environment, then import that constant into
scraper.py and replace the hardcoded value in the cookies dict (the variable
named cookies) with WEIBO_SUB_COOKIE; also add a defensive check in the scraper
(raise a clear error or log and abort) if WEIBO_SUB_COOKIE is missing or empty
to avoid silently using an invalid cookie.
---
Nitpick comments:
In `@apps/api/src/main.py`:
- Around line 43-48: The middleware's try/except around call_next(request) (in
the BaseHTTPMiddleware) duplicates logging that FastAPI's exception handlers
already perform and may miss exceptions the handlers intercept; remove the
try/except/logging (the logger.exception call referencing request.method and
request.url) and let the existing FastAPI exception handlers (the handlers
defined later in this module) perform error logging, or alternatively move any
needed contextual logging into those handler functions so you don't rely on
call_next's except block to capture handled exceptions.
In `@apps/telegram-bot/core/services/message_sender.py`:
- Around line 306-312: The logger.warning calls inside the exception handlers
around the download_file_by_metadata_item calls should include exception info
for debuggability; update the two except blocks that currently call
logger.warning(f"Skipping document download: {image_url}") (and the similar one
later) to pass exc_info=True (and optionally include the exception in the
message) so the stack trace is captured while preserving the current continue
behavior.
- Around line 236-242: The broad except in the media download loop swallows all
errors; change the handler to catch Exception as e, log the exception details
(include media_item["url"] and e) via logger.warning/error, and then inspect the
exception for an HTTP status (from the exception attributes or underlying
response) to re-raise or propagate non-retriable errors (e.g., 401/403/429 and
other client/server codes) while only skipping or retrying for transient network
errors; update the block around download_file_by_metadata_item and the
media_item loop to implement this selective re-raise and improved logging.
In `@apps/telegram-bot/core/webhook/server.py`:
- Around line 76-86: The handler currently treats all exceptions as server
errors; update the request parsing logic around data = await request.json() and
metadata_item = data["data"] to validate input and distinguish client errors:
explicitly check that "data" exists and that chat_id (from data.get("chat_id"))
is of an acceptable type (handle the negative-string-to-int conversion for
values starting with "-" but return a 400 JSONResponse when chat_id is malformed
or required "data" is missing), catching KeyError/ValueError/TypeError to return
JSONResponse({"error": "Bad request", "details": "<brief message>"} ,
status_code=400); keep the existing except Exception block around
send_item_message(...) to logger.exception(...) and return 500 for true
server-side failures.
In `@packages/shared/fastfetchbot_shared/services/scrapers/instagram/__init__.py`:
- Line 86: The condition using type(ins_data) == str should be changed to use
isinstance(ins_data, str) to satisfy Ruff E721; update the specific branch
containing "elif type(ins_data) == str and '400' in ins_data" to use
isinstance(ins_data, str), and likewise replace other occurrences like
type(ins_data) == dict with isinstance(ins_data, dict) in the same module (the
branches handling ins_data) so all runtime type checks use isinstance.
In `@packages/shared/fastfetchbot_shared/services/scrapers/threads/__init__.py`:
- Around line 85-91: Wrap the Playwright usage so browser/context/page are
explicitly closed on error: keep the async_playwright() as pw but assign browser
= await pw.chromium.launch(), context = await browser.new_context(...), and page
= await context.new_page() inside a try block, and in a finally block call await
page.close() (if created), await context.close() and await browser.close();
ensure existing page.on("response", intercept_response) and await page.goto(url)
remain inside the try so any exception triggers the cleanup.
In `@packages/shared/fastfetchbot_shared/services/telegraph/__init__.py`:
- Line 3: Remove the unused "traceback" import from the top of the telegraph
package __init__.py; open
packages/shared/fastfetchbot_shared/services/telegraph/__init__.py, delete the
"import traceback" line (it’s no longer referenced after the logging refactor)
so the module has no unused imports.
In `@packages/shared/fastfetchbot_shared/utils/network.py`:
- Around line 41-43: The except block that currently catches a bare Exception
when fetching JSON (logging via logger.exception and setting json_result = None)
should be narrowed: first catch the main expected errors (httpx.HTTPError and
json.JSONDecodeError) and log them with logger.exception, then optionally add a
final broad except Exception as fallback to preserve current behavior; ensure
you still set json_result = None and include the URL in log messages
(referencing url and logger.exception) so the function that performs the fetch
(the JSON fetcher that assigns json_result) remains behaviorally unchanged but
satisfies the linter.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6e0e7050-49e1-4b25-978a-d5517e1f6080
📒 Files selected for processing (37)
apps/api/src/main.pyapps/telegram-bot/core/services/message_sender.pyapps/telegram-bot/core/services/outbox_consumer.pyapps/telegram-bot/core/webhook/server.pypackages/file-export/fastfetchbot_file_export/pdf_export.pypackages/file-export/fastfetchbot_file_export/transcribe.pypackages/file-export/fastfetchbot_file_export/video_download.pypackages/shared/fastfetchbot_shared/exceptions.pypackages/shared/fastfetchbot_shared/services/scrapers/bluesky/scraper.pypackages/shared/fastfetchbot_shared/services/scrapers/common.pypackages/shared/fastfetchbot_shared/services/scrapers/douban/__init__.pypackages/shared/fastfetchbot_shared/services/scrapers/general/firecrawl_client.pypackages/shared/fastfetchbot_shared/services/scrapers/general/zyte.pypackages/shared/fastfetchbot_shared/services/scrapers/instagram/__init__.pypackages/shared/fastfetchbot_shared/services/scrapers/scraper_manager.pypackages/shared/fastfetchbot_shared/services/scrapers/threads/__init__.pypackages/shared/fastfetchbot_shared/services/scrapers/twitter/__init__.pypackages/shared/fastfetchbot_shared/services/scrapers/weibo/scraper.pypackages/shared/fastfetchbot_shared/services/scrapers/xiaohongshu/adaptar.pypackages/shared/fastfetchbot_shared/services/scrapers/zhihu/__init__.pypackages/shared/fastfetchbot_shared/services/telegraph/__init__.pypackages/shared/fastfetchbot_shared/utils/network.pytests/unit/file_export/test_pdf_export_exceptions.pytests/unit/file_export/test_transcribe_exceptions.pytests/unit/file_export/test_video_download.pytests/unit/scrapers/test_bluesky.pytests/unit/scrapers/test_common.pytests/unit/scrapers/test_general_firecrawl.pytests/unit/scrapers/test_general_zyte.pytests/unit/scrapers/test_scraper_manager.pytests/unit/scrapers/test_twitter_exceptions.pytests/unit/scrapers/test_weibo.pytests/unit/scrapers/test_xiaohongshu.pytests/unit/telegram_bot/test_outbox_consumer.pytests/unit/telegram_bot/test_webhook_server.pytests/unit/test_api_exception_handlers.pytests/unit/test_exceptions.py
| if error: | ||
| logger.warning(f"[{job_id}] Scrape failed: {error}") | ||
| await _send_error_to_chat(chat_id, error) | ||
| await send_debug_channel( | ||
| f"[Scrape Error] job_id={job_id}\nchat_id: {chat_id}\n\n{error}" | ||
| ) | ||
| else: |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify producer intent vs consumer behavior for scrape errors.
rg -n -C2 "Push error to outbox so the bot can notify the user|error=str\\(e\\)|chat_id=chat_id" apps/async-worker/async_worker/tasks/scrape.py
rg -n -C4 "if error:|send_debug_channel|send_item_message" apps/telegram-bot/core/services/outbox_consumer.pyRepository: aturret/FastFetchBot
Length of output: 1708
🏁 Script executed:
fd message_sender.pyRepository: aturret/FastFetchBot
Length of output: 113
🏁 Script executed:
cat -n apps/telegram-bot/core/services/message_sender.pyRepository: aturret/FastFetchBot
Length of output: 19294
Error payloads are no longer delivered to the originating user chat.
Line 43–48 now routes scrape failures only to the debug channel. The producer at apps/async-worker/async_worker/tasks/scrape.py:85 explicitly documents that errors are pushed to outbox "so the bot can notify the user," but the consumer never calls send_item_message() to deliver the error to the originating chat_id. This silently fails user flows where the requester never receives a failure response.
Additionally, the raw error string (from str(e)) is sent to send_debug_channel() which uses ParseMode.HTML without escaping, risking malformed messages if the exception text contains HTML-unsafe characters like <, >, or &.
| data = json.loads(text) | ||
| print(json.dumps(data, indent=4, ensure_ascii=False)) | ||
| logger.debug(json.dumps(data, indent=4, ensure_ascii=False)) | ||
| threads = data["data"]["data"]["containing_thread"]["thread_items"] |
There was a problem hiding this comment.
Add validation for nested data access.
The deeply nested dictionary access data["data"]["data"]["containing_thread"]["thread_items"] can raise KeyError if the GraphQL response structure is unexpected or incomplete. This should be validated or wrapped in exception handling.
🛡️ Proposed defensive approach
- threads = data["data"]["data"]["containing_thread"]["thread_items"]
+ try:
+ threads = data["data"]["data"]["containing_thread"]["thread_items"]
+ except KeyError as e:
+ logger.error(f"Unexpected GraphQL response structure: {e}")
+ continueOr use safe dictionary access:
- threads = data["data"]["data"]["containing_thread"]["thread_items"]
+ threads = data.get("data", {}).get("data", {}).get("containing_thread", {}).get("thread_items", [])
+ if not threads:
+ logger.warning("No thread items found in GraphQL response")
+ continue🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/shared/fastfetchbot_shared/services/scrapers/threads/__init__.py` at
line 101, The direct nested access assigning threads =
data["data"]["data"]["containing_thread"]["thread_items"] is unsafe and can
raise KeyError/TypeError; update the code (the assignment to threads) to
validate the structure before indexing—either wrap it in a try/except catching
KeyError/TypeError and handle/massage/log a clear error or use defensive lookups
(chain dict.get checks or utility to safely traverse keys) to default to an
empty list or None when any level is missing and then proceed accordingly;
ensure the fix logs enough context about the missing field and preserves the
existing downstream behavior when threads is absent.
packages/shared/fastfetchbot_shared/services/scrapers/zhihu/__init__.py
Outdated
Show resolved
Hide resolved
| MockClient.return_value = mock_client | ||
|
|
||
| with pytest.raises(RuntimeError, match="did not redirect to xiaohongshu.com"): | ||
| with pytest.raises(ScraperError, match="did not redirect to xiaohongshu.com"): |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify regex-style `match=` strings containing unescaped dot in this test file.
rg -nP 'pytest\.raises\([^)]*match="[^"]*\.[^"]*"' tests/unit/scrapers/test_xiaohongshu.pyRepository: aturret/FastFetchBot
Length of output: 159
Escape regex metacharacters in match= parameter for exact pattern matching.
At line 1296, the . in match="did not redirect to xiaohongshu.com" is treated as a regex metacharacter matching any character, not a literal dot. This should be escaped for precision:
Proposed fix
- with pytest.raises(ScraperError, match="did not redirect to xiaohongshu.com"):
+ with pytest.raises(ScraperError, match=r"did not redirect to xiaohongshu\.com"):📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| with pytest.raises(ScraperError, match="did not redirect to xiaohongshu.com"): | |
| with pytest.raises(ScraperError, match=r"did not redirect to xiaohongshu\.com"): |
🧰 Tools
🪛 Ruff (0.15.7)
[warning] 1296-1296: Pattern passed to match= contains metacharacters but is neither escaped nor raw
(RUF043)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/scrapers/test_xiaohongshu.py` at line 1296, The test's
pytest.raises uses match="did not redirect to xiaohongshu.com" where the dot is
a regex metacharacter; change the pattern to match the literal dot by escaping
it (e.g., use "did not redirect to xiaohongshu\\.com" or build the pattern with
re.escape("did not redirect to xiaohongshu.com")) so the assertion checks the
exact string in the ScraperError.
| with patch("core.webhook.server.settings") as mock_settings: | ||
| mock_settings.TELEGRAM_BOT_SECRET_TOKEN = "correct" | ||
| response = await telegram_webhook(mock_request) |
There was a problem hiding this comment.
Avoid hardcoded secret-token literals, even in tests.
Line 66 assigns a literal to TELEGRAM_BOT_SECRET_TOKEN; this is likely why S105 triggers and it weakens secret-hygiene consistency. Prefer a generated runtime value for this test.
Proposed fix
+import secrets
@@
- mock_settings.TELEGRAM_BOT_SECRET_TOKEN = "correct"
+ mock_settings.TELEGRAM_BOT_SECRET_TOKEN = secrets.token_urlsafe(16)
response = await telegram_webhook(mock_request)
assert response.status_code == 401As per coding guidelines "Sensitive cookies and tokens must be stored in environment variables, never hardcoded in source code".
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| with patch("core.webhook.server.settings") as mock_settings: | |
| mock_settings.TELEGRAM_BOT_SECRET_TOKEN = "correct" | |
| response = await telegram_webhook(mock_request) | |
| with patch("core.webhook.server.settings") as mock_settings: | |
| mock_settings.TELEGRAM_BOT_SECRET_TOKEN = secrets.token_urlsafe(16) | |
| response = await telegram_webhook(mock_request) |
🧰 Tools
🪛 Ruff (0.15.7)
[error] 66-66: Possible hardcoded password assigned to: "TELEGRAM_BOT_SECRET_TOKEN"
(S105)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/telegram_bot/test_webhook_server.py` around lines 65 - 67, Replace
the hardcoded secret by generating a runtime token in the test and using that
value for both the mocked setting and the request: create a temporary token
(e.g., via secrets.token_urlsafe() or uuid.uuid4().hex) and assign it to
mock_settings.TELEGRAM_BOT_SECRET_TOKEN, then ensure mock_request carries the
same token in the header/authorization used by telegram_webhook; update the test
assertion setup in tests/unit/telegram_bot/test_webhook_server.py around the
mock_settings and telegram_webhook call to use this generated token instead of
the literal "correct".
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
tests/unit/test_network.py (1)
75-87: Test name claims to verify logging but doesn't assert on it.The test is named
test_exception_returns_none_and_logsbut only asserts on the return value. Consider either:
- Removing "and_logs" from the name, or
- Adding a mock/spy on the logger to verify
logger.exceptionwas calledExample assertion for logging verification
+ `@pytest.mark.asyncio` + async def test_exception_returns_none_and_logs(self): + from fastfetchbot_shared.utils.network import get_response_json + - with patch( + with patch("fastfetchbot_shared.utils.network.logger") as mock_logger, \ + patch( "fastfetchbot_shared.utils.network.get_response", new_callable=AsyncMock, side_effect=ConnectionError("timeout"), ): result = await get_response_json("https://example.com/api") assert result is None + mock_logger.exception.assert_called_once()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/test_network.py` around lines 75 - 87, The test TestGetResponseJson.test_exception_returns_none_and_logs currently only asserts the return value from get_response_json but the name implies it should also verify logging; either rename the test to remove "and_logs" or add a mock/spy for the module logger (e.g., patch fastfetchbot_shared.utils.network.logger.exception) around the call to get_response_json and assert logger.exception was called once with the expected context after the ConnectionError side_effect.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/unit/file_export/test_transcribe_exceptions.py`:
- Around line 77-83: Replace hardcoded "/tmp" test paths with the pytest
tmp_path fixture: update the patched os.path.splitext return_value, the call to
get_audio_text, and any other hardcoded file references (e.g. where
MockAudioSegment.from_file and builtins.open are used) to use tmp_path /
"audio.mp3" (or equivalent Path.as_posix()) so the test uses a temporary,
lint-clean path; ensure patches that mock open and splitext return the updated
tmp_path-based string and pass tmp_path.joinpath("audio.mp3").as_posix() into
get_audio_text("...") instead of "/tmp/audio.mp3".
In `@tests/unit/telegram_bot/test_message_sender.py`:
- Around line 110-115: The test captures media_group from media_files_packaging
but never asserts its contents; update the test for media_files_packaging to
assert that media_group contains the packaged image (e.g., assert media_group is
not empty and that its first item represents a photo or has the expected photo
attribute/type), while keeping the existing assertion that file_group == [] to
ensure document failures are skipped. Use the media_group and file_group
variables already in the test and check the photo-specific property (type/name)
on the media_group item to validate the image was packaged.
In `@tests/unit/test_network.py`:
- Around line 51-73: The test name claims redirect-following but the
implementation of download_file_by_metadata_item only reads the Location header
and returns response.content from the 302; either rename the test to reflect
that behavior (e.g., test_redirect_response_returns_named_bytes_io) or update
download_file_by_metadata_item to follow redirects by detecting a 3xx response,
extracting response.headers["Location"], performing a second GET to that URL
with the same AsyncClient (or a new one) and returning the final
response.content; if you implement redirect-following also update the test to
assert mock_client.get.call_count == 2 and that the second call used the
Location URL so the Location header extraction and subsequent request are
validated.
---
Nitpick comments:
In `@tests/unit/test_network.py`:
- Around line 75-87: The test
TestGetResponseJson.test_exception_returns_none_and_logs currently only asserts
the return value from get_response_json but the name implies it should also
verify logging; either rename the test to remove "and_logs" or add a mock/spy
for the module logger (e.g., patch
fastfetchbot_shared.utils.network.logger.exception) around the call to
get_response_json and assert logger.exception was called once with the expected
context after the ConnectionError side_effect.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 121cf7c0-2072-42eb-a1c2-9e1e84453c05
📒 Files selected for processing (4)
CLAUDE.mdtests/unit/file_export/test_transcribe_exceptions.pytests/unit/telegram_bot/test_message_sender.pytests/unit/test_network.py
✅ Files skipped from review due to trivial changes (1)
- CLAUDE.md
| patch("fastfetchbot_file_export.transcribe.os.path.splitext", return_value=("/tmp/audio", ".mp3")), \ | ||
| patch("builtins.open", mock_open(read_data=b"audio data")), \ | ||
| patch("fastfetchbot_file_export.transcribe.milliseconds_until_sound", return_value=0): | ||
|
|
||
| MockAudioSegment.from_file.return_value = mock_audio | ||
|
|
||
| result = get_audio_text("/tmp/audio.mp3", "test-api-key") |
There was a problem hiding this comment.
Replace remaining hardcoded /tmp test paths to fix Ruff S108.
Line 77, Line 83, and Line 101 still use fixed /tmp paths. Use tmp_path so tests are secure and lint-clean.
🔧 Suggested fix
class TestGetAudioTextHappyPath:
- def test_successful_transcription(self):
+ def test_successful_transcription(self, tmp_path):
from fastfetchbot_file_export.transcribe import get_audio_text
+ audio_file = tmp_path / "audio.mp3"
+ audio_stem = tmp_path / "audio"
# Mock OpenAI client
mock_client = MagicMock()
@@
with patch("fastfetchbot_file_export.transcribe.OpenAI", return_value=mock_client), \
patch("fastfetchbot_file_export.transcribe.AudioSegment") as MockAudioSegment, \
patch("fastfetchbot_file_export.transcribe.os.remove") as mock_remove, \
- patch("fastfetchbot_file_export.transcribe.os.path.splitext", return_value=("/tmp/audio", ".mp3")), \
+ patch("fastfetchbot_file_export.transcribe.os.path.splitext", return_value=(str(audio_stem), ".mp3")), \
patch("builtins.open", mock_open(read_data=b"audio data")), \
patch("fastfetchbot_file_export.transcribe.milliseconds_until_sound", return_value=0):
@@
- result = get_audio_text("/tmp/audio.mp3", "test-api-key")
+ result = get_audio_text(str(audio_file), "test-api-key")
@@
class TestGetAudioTextHappyPath:
- def test_audio_segment_failure_wraps_in_file_export_error(self):
+ def test_audio_segment_failure_wraps_in_file_export_error(self, tmp_path):
from fastfetchbot_file_export.transcribe import get_audio_text
@@
MockAudioSegment.from_file.side_effect = FileNotFoundError("no such file")
with pytest.raises(FileExportError, match="Audio transcription failed"):
- get_audio_text("/tmp/missing.mp3", "key")
+ get_audio_text(str(tmp_path / "missing.mp3"), "key")Also applies to: 101-101
🧰 Tools
🪛 Ruff (0.15.7)
[error] 77-77: Probable insecure usage of temporary file or directory: "/tmp/audio"
(S108)
[error] 83-83: Probable insecure usage of temporary file or directory: "/tmp/audio.mp3"
(S108)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/file_export/test_transcribe_exceptions.py` around lines 77 - 83,
Replace hardcoded "/tmp" test paths with the pytest tmp_path fixture: update the
patched os.path.splitext return_value, the call to get_audio_text, and any other
hardcoded file references (e.g. where MockAudioSegment.from_file and
builtins.open are used) to use tmp_path / "audio.mp3" (or equivalent
Path.as_posix()) so the test uses a temporary, lint-clean path; ensure patches
that mock open and splitext return the updated tmp_path-based string and pass
tmp_path.joinpath("audio.mp3").as_posix() into get_audio_text("...") instead of
"/tmp/audio.mp3".
| media_group, file_group = await media_files_packaging(media_files, data) | ||
|
|
||
| # The image media_group should have the photo, but file_group should be empty | ||
| # because the document download failed and was skipped | ||
| assert file_group == [] | ||
|
|
There was a problem hiding this comment.
Assert that media_group still contains the packaged image.
On Line 110, media_group is captured but never validated. This test can pass even if photo packaging regresses, despite the test description saying packaging should continue.
✅ Suggested test assertion update
- media_group, file_group = await media_files_packaging(media_files, data)
+ media_group, file_group = await media_files_packaging(media_files, data)
# The image media_group should have the photo, but file_group should be empty
# because the document download failed and was skipped
+ assert len(media_group) == 1
assert file_group == []📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| media_group, file_group = await media_files_packaging(media_files, data) | |
| # The image media_group should have the photo, but file_group should be empty | |
| # because the document download failed and was skipped | |
| assert file_group == [] | |
| media_group, file_group = await media_files_packaging(media_files, data) | |
| # The image media_group should have the photo, but file_group should be empty | |
| # because the document download failed and was skipped | |
| assert len(media_group) == 1 | |
| assert file_group == [] |
🧰 Tools
🪛 Ruff (0.15.7)
[warning] 110-110: Unpacked variable media_group is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/telegram_bot/test_message_sender.py` around lines 110 - 115, The
test captures media_group from media_files_packaging but never asserts its
contents; update the test for media_files_packaging to assert that media_group
contains the packaged image (e.g., assert media_group is not empty and that its
first item represents a photo or has the expected photo attribute/type), while
keeping the existing assertion that file_group == [] to ensure document failures
are skipped. Use the media_group and file_group variables already in the test
and check the photo-specific property (type/name) on the media_group item to
validate the image was packaged.
| @pytest.mark.asyncio | ||
| async def test_redirect_follows_location(self): | ||
| from fastfetchbot_shared.utils.network import download_file_by_metadata_item | ||
|
|
||
| mock_response = MagicMock() | ||
| mock_response.status_code = 302 | ||
| mock_response.headers = {"Location": "https://cdn.example.com/real.jpg"} | ||
| mock_response.content = b"redirected content" | ||
|
|
||
| mock_client = AsyncMock() | ||
| mock_client.get = AsyncMock(return_value=mock_response) | ||
| mock_client.__aenter__ = AsyncMock(return_value=mock_client) | ||
| mock_client.__aexit__ = AsyncMock(return_value=False) | ||
|
|
||
| with patch("fastfetchbot_shared.utils.network.httpx.AsyncClient", return_value=mock_client), \ | ||
| patch("fastfetchbot_shared.utils.network.get_random_user_agent", return_value="TestAgent"): | ||
| result = await download_file_by_metadata_item( | ||
| url="https://example.com/image.jpg", | ||
| data={"url": "https://example.com", "category": "twitter"}, | ||
| ) | ||
|
|
||
| assert isinstance(result, NamedBytesIO) | ||
|
|
There was a problem hiding this comment.
Test name is misleading - implementation doesn't follow redirects.
Looking at download_file_by_metadata_item (network.py lines 154-155), the function extracts the Location header into a local variable but then uses response.content from the original 302 response—no second request is made to the redirect URL.
This test only verifies that a 302 response doesn't crash the function. Consider either:
- Renaming to
test_redirect_response_returns_named_bytes_ioto reflect actual behavior, or - If redirect-following is intended, the implementation needs to make a follow-up request
The current test also doesn't verify the Location header extraction—you could assert mock_client.get.call_count or check some observable effect of the URL update.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/test_network.py` around lines 51 - 73, The test name claims
redirect-following but the implementation of download_file_by_metadata_item only
reads the Location header and returns response.content from the 302; either
rename the test to reflect that behavior (e.g.,
test_redirect_response_returns_named_bytes_io) or update
download_file_by_metadata_item to follow redirects by detecting a 3xx response,
extracting response.headers["Location"], performing a second GET to that URL
with the same AsyncClient (or a new one) and returning the final
response.content; if you implement redirect-following also update the test to
assert mock_client.get.call_count == 2 and that the second call used the
Location URL so the Location header extraction and subsequent request are
validated.
Summary by CodeRabbit
Bug Fixes
Documentation
Tests