Skip to content

Implement image-6 text removal and align generation defaults#5

Merged
mohitgargai merged 8 commits into
lica-world:mainfrom
jaejung-dev:text-removal-implementation
May 5, 2026
Merged

Implement image-6 text removal and align generation defaults#5
mohitgargai merged 8 commits into
lica-world:mainfrom
jaejung-dev:text-removal-implementation

Conversation

@jaejung-dev
Copy link
Copy Markdown
Contributor

Summary

  • Implement image-6 (Text Removal & Background Inpainting) pipeline end-to-end in GDB, including data loading, model I/O handling, and evaluation integration.
  • Add ReMOVE metric support (src/gdb/metrics/remove_metric.py) and PSNR metric wiring (src/gdb/metrics/core.py, src/gdb/metrics/__init__.py), and classify image-6 under typography while preserving benchmark ID compatibility.
  • Align generation behavior with current policy by removing forced masked composition in FLUX.2 local generation, making typography-8 mask-free at model-input time, and updating diffusion defaults/docs from flux.2-klein-4b to flux.2-klein-9b.

Test plan

  • python scripts/run_benchmarks.py --list
  • Regenerate impacted FLUX outputs on GPU 1 and replace prior artifacts for layout-8 and typography-7.
  • Regenerate typography-8 outputs after removing mask metadata and verify summary/log updates.
  • Validate that typography-8 ModelInput no longer includes mask metadata.

Made with Cursor

…r defaults.

This adds the text removal task pipeline and ReMOVE/PSNR metric wiring, removes forced masked composition behavior, updates typography-8 to run without mask metadata, and switches default diffusion runs/docs to FLUX.2 klein 9B for consistent baseline behavior.

Made-with: Cursor
@jaejung-dev jaejung-dev requested a review from purvanshi as a code owner April 20, 2026 12:50
jaejung-dev and others added 6 commits April 20, 2026 12:53
This updates the smoke assertion to the current 40 benchmark registry size and applies Ruff-compliant import sorting in remove_metric so lint passes.

Made-with: Cursor
Keep a compatibility shim in tasks/image.py and align README/HELM metadata so image-6 remains discoverable as a typography-domain benchmark with consistent benchmark counts.

Made-with: Cursor
Apply Ruff-compatible import grouping in typography.py only so the PR passes lint without changing runtime behavior.

Made-with: Cursor
Co-authored-by: Cursor <cursoragent@cursor.com>
Use a non-degenerate edit mask for image-conditioned layout adaptation, extend typography bbox evaluation to count missed detections, ignore local model artifacts, and remove the obsolete image task shim.

Co-authored-by: Cursor <cursoragent@cursor.com>
@mohitgargai
Copy link
Copy Markdown
Contributor

Thanks for the implementation. One blocker before merge: README.md still has unresolved merge conflict markers (<<<<<<< text-removal-implementation, =======, >>>>>>> main) in the intro, benchmark table, and diffusion example. CI is green because markdown is not covered by the current lint checks, but this would ship broken docs/PyPI README.

While resolving that, please also update the gdb list comment that still says 39 benchmarks to 40, and remove the extra blank line at EOF in src/gdb/metrics/remove_metric.py (git diff --check main...HEAD catches both the conflict markers and the trailing blank line).

@mohitgargai
Copy link
Copy Markdown
Contributor

Functional review follow-up after exercising image-6 locally with synthetic JSON/CSV manifests:

  • Local manifest loading/build/eval basically works: TextRemoval.load_data() accepts JSON/CSV, build_model_input() includes the source image and mask metadata, and a perfect synthetic prediction gives expected PSNR/SSIM when heavy optional metrics are disabled.

Issues I think should be fixed before merge:

  1. image-6 looks broken for the default Hugging Face/no---dataset-root path. scripts/upload_to_hf.py stores only one primary image in the image column and normalizes the rest of the path fields into metadata/ground_truth. gdb.hf.load_from_hub() caches that primary image as image_path, but TextRemoval.build_model_input() reads sample["input_image"] and sample["mask"] directly, and evaluate() reads ground_truth["image"]/ground_truth["mask"] directly. For hub-loaded rows those remain relative dataset paths, not cached local files, so the model input points at a nonexistent source/mask and evaluation silently skips samples / reports zero coverage. Either load_from_hub() needs task-aware materialization for input_image, mask, and target image, or image-6 needs to explicitly not advertise/support hub loading until those assets are carried through.

  2. The PR says/docs that diffusion defaults move to flux.2-klein-9b, but the actual gdb eval --provider diffusion default is still flux.2-klein-4b in src/gdb/cli.py (DEFAULT_MODEL_IDS["diffusion"]). scripts/run_benchmarks.py was changed, but the main CLI users are told to use remains on 4B.

  3. ReMOVE initialization can trigger a large SAM checkpoint download before checking whether segment-anything is installed. _remove_score() calls ensure_sam_checkpoint(...) before constructing RemoveMetricEvaluator, and the segment_anything import happens inside RemoveMetricEvaluator._setup_predictor(). In an environment with torch but without segment-anything, evaluating image-6 can download the SAM vit-h checkpoint and then still return NaN. Please check optional deps/importability first, or require/probe an explicit checkpoint only when the metric is actually usable.

Resolve documentation conflicts, align diffusion defaults, materialize image-6 auxiliary assets from HuggingFace rows, and avoid downloading ReMOVE checkpoints before optional dependencies are available.

Co-authored-by: Cursor <cursoragent@cursor.com>
@mohitgargai mohitgargai merged commit 74bdff6 into lica-world:main May 5, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants