Skip to content

fix: core-pipeline crash fixes + effective-python audit (9 commits)#17

Open
forkni wants to merge 9 commits into
feat/cuda-ipc-outputfrom
forkni/core-pipeline-audit-crash-fixes
Open

fix: core-pipeline crash fixes + effective-python audit (9 commits)#17
forkni wants to merge 9 commits into
feat/cuda-ipc-outputfrom
forkni/core-pipeline-audit-crash-fixes

Conversation

@forkni
Copy link
Copy Markdown
Collaborator

@forkni forkni commented Jun 3, 2026

Changes

  • 1affba3 refactor: remove dead set_nsfw_fallback_img; screen skip-diffusion via safety checker
  • b74577a fix: unify TRT logger — use polygraphy get_trt_logger() singleton across all build/load paths
  • f8706d9 style: replace print() with logger.warning() in detect_model_from_diffusers_unet
  • b23c286 fix: add logger.debug(exc_info=True) to silent teardown except-pass handlers
  • 5305916 fix: replace mutable default kvo_cache=[] with None-pattern (defensive hardening)
  • 926e46d fix: replace undefined preset[key] with explicit ValueError in validate_architecture
  • 7297e5c fix: move seed-weight validation inside for-loop to prevent NameError on empty seed_list
  • b327402 fix: correct Dict[str, any] annotation to Dict[str, Any] in detect_unet_characteristics
  • 9af12ff fix: run safety checker before postprocess_image to fix IPC+pt crash

Branch

forkni/core-pipeline-audit-crash-fixes -> feat/cuda-ipc-output

forkni added 9 commits June 2, 2026 20:24
When use_cuda_ipc_output=True and output_type='pt', postprocess_image()
exports the frame and returns None.  The old code fed that None to
self.safety_checker, which called torchvision T.Resize(None) and raised:
    TypeError: Unexpected type <class 'NoneType'>
This crashed the streaming loop every frame, silently, because the except
block logged and retried.

Root cause: the CUDA-IPC fast-path lives at the TOP of postprocess_image
(lines 928-948) and returns None before any output-type branch runs. The
post-hoc safety check then did denormalized = image (the None) for pt path.

Two bugs fixed:
1. The crash: None passed to NSFWDetectorEngine.image_transforms (torchvision)
2. The silent bypass: even without the crash, the NSFW substitution was dead
   code on the IPC path -- the frame was already exported before the check ran.

Fix: extract _apply_safety_checker() to operate on the raw diffusion-range
[-1, 1] pipeline tensor BEFORE postprocess_image.  Both txt2img and img2img
now call it first; the substituted (-1.0 = black) or previous clean tensor
flows into postprocess_image and out through every output path including IPC.

Fallback tensor: torch.full_like(t, -1.0) maps to 0.0 after
_denormalize_on_gpu -> true black on all paths (pt, np, pil, CUDA-IPC).
Previous-frame fallback caches via _prev_clean_tensor (diffusion range).

nsfw_fallback_img / set_nsfw_fallback_img remain defined but are no longer
read by the inference path (dead code, removal deferred to avoid churn).

Regression test: tests/unit/test_safety_checker.py -- 8 CPU-only pytest
cases covering: no-None contract, blank/previous fallback, clean passthrough,
disabled bypass, first-frame-flagged edge case, cache behaviour.

Related gap (not fixed here): _process_skip_diffusion has a TODO for the
safety checker call and still bypasses the check entirely.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant