Skip to content

test(app): smoke suite for /start-download → /download-file flow#23

Merged
fbmoulin merged 1 commit into
mainfrom
test/download-route-smoke
May 11, 2026
Merged

test(app): smoke suite for /start-download → /download-file flow#23
fbmoulin merged 1 commit into
mainfrom
test/download-route-smoke

Conversation

@fbmoulin
Copy link
Copy Markdown
Owner

Summary

Closes the last 2 items from TODO.md 🟢 Later:

  1. Smoke test for WebsiteDownloader.process() — 9 new cases in tests/test_download_smoke.py covering the Flask worker-thread lifecycle that Stage D typing left runtime-uncovered.
  2. HardenedCapture structlog refactor — confirmed already shipped by PR refactor(capture): adopt structlog bound logger; drop log= callback #20 (caac782) on 2026-05-10. Bullet was stale.

After this PR merges, the "Later" list is empty.

Coverage gap closed

downloader.py was strictly typed in Stage D (#19), but the Flask wiring (POST /start-download → daemon thread → POST /download-file/<sid>) had zero runtime coverage. This suite locks down the integration contract — session lifecycle, file delivery, error propagation — without hitting the real network or Playwright.

Cases (9)

Input validation (×3) — missing url, empty url, no JSON body → 400

Happy pathWebsiteDownloader + zip_directory monkeypatched to return success + write a fake PK-prefixed zip. Poll download_results[sid].status == \"complete\" (3s budget), then GET /download-file/<sid> → 200 + Content-Disposition: attachment.

Failure paths

  • process() → False → status becomes \"error\" with error == \"Failed to download site\"; /download-file → 404 \"File not ready\"
  • process() raises RuntimeError → caught by worker try/except, message propagated; /download-file → 404

Edge cases

  • Unknown session UUID → 404
  • threading.Event-blocked downloader holds the worker mid-process(); /download-file during "processing" → 404, then release + wait for "complete" to ensure clean state
  • Two consecutive POSTs return distinct UUIDs

Strategy

Monkeypatch app.{WebsiteDownloader, zip_directory, DOWNLOAD_FOLDER} for determinism. tmp_path keeps real downloads/ clean. Real WebsiteDownloader behavior is type-checked (Stage D) and exercised by the legacy CLI; this suite covers the Flask integration contract, not downloader internals.

TODO reconciliation

Second stale bullet: kratos_clone/capture.py has 0 self.log(...) sites, 29 logger.{info,warning,error,debug}(...) sites, and a module-level logger = structlog.get_logger(\"kratos_clone.capture\"). LogCallback is gone repo-wide. PR #20 closed it; TODO bullet just hadn't been ticked.

Test plan

  • pytest tests/test_download_smoke.py -v → 9 passed in 0.39s
  • pytest -q219 passed, 2 skipped (was 210 + 2; +9 new cases)
  • ruff check kratos_clone/ scripts/ tests/test_download_smoke.py → clean
  • ruff format --check → 12 files already formatted
  • mypy --config-file pyproject.toml → Success, no issues in 21 source files
  • bandit --severity-level medium → 0 medium/high
  • CI green (8 jobs)

Closes the `WebsiteDownloader.process()` smoke-test item from TODO `🟢 Later`
and reconciles a stale bullet that PR #20 had already closed.

The Flask wiring around `downloader.py` had zero runtime coverage even after
Stage D typed the module strictly. This adds 9 cases in
`tests/test_download_smoke.py` covering the worker-thread lifecycle:

Input validation
- missing url → 400
- empty url → 400
- no JSON body → 400

Happy path
- POST /start-download → 200 + valid UUID session_id
- worker runs in daemon thread; poll `download_results[sid].status`
  until "complete" with a 3s budget
- GET /download-file/<sid> → 200 + `Content-Disposition: attachment` +
  PK-prefixed zip body

Failure paths
- `process() → False`: status becomes "error" with
  `error == "Failed to download site"`; /download-file → 404 "File not ready"
- `process()` raises RuntimeError: caught by the worker try/except,
  error message propagated; /download-file → 404

Edge cases
- Unknown session UUID → 404
- threading.Event-blocked downloader holds the worker mid-process();
  /download-file during "processing" → 404, then release + wait for
  "complete" to ensure no leaked state
- Two consecutive POSTs → distinct UUIDs

Strategy: monkeypatch `app.{WebsiteDownloader, zip_directory, DOWNLOAD_FOLDER}`
to control the worker outcome deterministically. `tmp_path` keeps real
`downloads/` clean. Real `WebsiteDownloader.process()` behavior is already
type-checked (Stage D) and exercised by the legacy CLI; this suite locks
down the Flask integration contract, not the downloader internals.

TODO.md also closes the second stale "Later" bullet: the `HardenedCapture`
structlog refactor was already shipped by PR #20 (caac782) — verified
`kratos_clone/capture.py` has 0 `self.log(...)` sites, 29 `logger.{info,
warning,error,debug}(...)` sites, and module-level
`logger = structlog.get_logger("kratos_clone.capture")`. `LogCallback` is
gone repo-wide. "Later" list is now empty.

Verifications:
- pytest -q → 219 passed, 2 skipped (was 210 + 2; +9 new cases)
- pytest tests/test_download_smoke.py → 9 passed in 0.39s
- ruff check kratos_clone/ scripts/ tests/test_download_smoke.py → clean
- ruff format --check → 12 files already formatted
- mypy --config-file pyproject.toml → Success, no issues in 21 source files
- bandit --severity-level medium → 0 medium/high
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 11, 2026

Warning

Rate limit exceeded

@fbmoulin has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 46 minutes and 26 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e9d11bab-3867-48ad-be7b-f3b221705412

📥 Commits

Reviewing files that changed from the base of the PR and between 1e1a7ee and 3a8b6f6.

📒 Files selected for processing (2)
  • TODO.md
  • tests/test_download_smoke.py
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch test/download-route-smoke

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3a8b6f6eb4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +241 to +242
r1 = client.post("/start-download", json={"url": "https://a.com"})
r2 = client.post("/start-download", json={"url": "https://b.com"})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Wait for background workers before leaving the test

When these two /start-download calls return, the daemon worker threads may not yet have reached process_download()'s patched WebsiteDownloader/DOWNLOAD_FOLDER access. This test then exits without polling either session to a terminal state, so pytest can tear down monkeypatch and the shared Flask state while those workers are still running; under slow CI scheduling that can make a worker instantiate the real downloader or repopulate download_results after flask_app resets it. Please wait for both session IDs to reach complete (as the earlier tests do) before returning.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a comprehensive smoke test suite for the Flask download flow in tests/test_download_smoke.py and updates TODO.md to mark the completion of the download smoke tests and the HardenedCapture refactor. Review feedback suggests improving the _wait_for_status helper to fail fast on worker errors and avoid potential unbound variable issues. It also recommends moving repeated local imports of the app module to the top of the test file for better maintainability.

Comment on lines +26 to +42
def _wait_for_status(app_module, session_id: str, expected: set[str], timeout: float = 3.0):
"""Poll `download_results[sid]["status"]` until it leaves `processing`.

Worker is a daemon thread; the test client returns immediately after
POST /start-download. We need to wait for the thread to write its
terminal status before asserting on /download-file behavior.
"""
deadline = time.time() + timeout
while time.time() < deadline:
with app_module.session_lock:
status = app_module.download_results.get(session_id, {}).get("status")
if status in expected:
return status
time.sleep(0.02)
raise AssertionError(
f"worker did not reach {expected} within {timeout}s (last status={status!r})"
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The _wait_for_status helper can be improved in two ways:

  1. Fail Fast: If the worker thread transitions to an "error" state while the test is waiting for "complete", the test should fail immediately with the error message from the worker, rather than waiting for the full timeout.
  2. Robustness: Initialize the status variable before the loop to prevent a potential UnboundLocalError if the loop body never executes (e.g., if timeout is zero).
def _wait_for_status(app_module, session_id: str, expected: set[str], timeout: float = 3.0):
    """Poll download_results[sid]["status"] until it leaves processing.

    Worker is a daemon thread; the test client returns immediately after
    POST /start-download. We need to wait for the thread to write its
    terminal status before asserting on /download-file behavior.
    """
    deadline = time.time() + timeout
    status = None
    while time.time() < deadline:
        with app_module.session_lock:
            res = app_module.download_results.get(session_id, {})
            status = res.get("status")
            error_msg = res.get("error")

        if status in expected:
            return status
        if status == "error" and "error" not in expected:
            raise AssertionError(f"Worker failed unexpectedly: {error_msg}")

        time.sleep(0.02)
    raise AssertionError(
        f"worker did not reach {expected} within {timeout}s (last status={status!r})"
    )



def test_happy_path_download_complete(client, flask_app, tmp_path, monkeypatch):
import app as app_module
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The import app as app_module statement is repeated in multiple test functions (lines 114, 144, 167, 199, 235). Since app.py is designed to be side-effect free on import, it is better to import it once at the top of the file to improve readability and maintainability.

@fbmoulin fbmoulin merged commit 03a19a9 into main May 11, 2026
8 checks passed
@fbmoulin fbmoulin deleted the test/download-route-smoke branch May 11, 2026 03:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant