Implement image-6 text removal and align generation defaults by jaejung-dev · Pull Request #5 · lica-world/GDB

jaejung-dev · 2026-04-20T12:50:18Z

Summary

Implement image-6 (Text Removal & Background Inpainting) pipeline end-to-end in GDB, including data loading, model I/O handling, and evaluation integration.
Add ReMOVE metric support (src/gdb/metrics/remove_metric.py) and PSNR metric wiring (src/gdb/metrics/core.py, src/gdb/metrics/__init__.py), and classify image-6 under typography while preserving benchmark ID compatibility.
Align generation behavior with current policy by removing forced masked composition in FLUX.2 local generation, making typography-8 mask-free at model-input time, and updating diffusion defaults/docs from flux.2-klein-4b to flux.2-klein-9b.

Test plan

python scripts/run_benchmarks.py --list
Regenerate impacted FLUX outputs on GPU 1 and replace prior artifacts for layout-8 and typography-7.
Regenerate typography-8 outputs after removing mask metadata and verify summary/log updates.
Validate that typography-8 ModelInput no longer includes mask metadata.

Made with Cursor

…r defaults. This adds the text removal task pipeline and ReMOVE/PSNR metric wiring, removes forced masked composition behavior, updates typography-8 to run without mask metadata, and switches default diffusion runs/docs to FLUX.2 klein 9B for consistent baseline behavior. Made-with: Cursor

This updates the smoke assertion to the current 40 benchmark registry size and applies Ruff-compliant import sorting in remove_metric so lint passes. Made-with: Cursor

Keep a compatibility shim in tasks/image.py and align README/HELM metadata so image-6 remains discoverable as a typography-domain benchmark with consistent benchmark counts. Made-with: Cursor

Apply Ruff-compatible import grouping in typography.py only so the PR passes lint without changing runtime behavior. Made-with: Cursor

Co-authored-by: Cursor <cursoragent@cursor.com>

Use a non-degenerate edit mask for image-conditioned layout adaptation, extend typography bbox evaluation to count missed detections, ignore local model artifacts, and remove the obsolete image task shim. Co-authored-by: Cursor <cursoragent@cursor.com>

mohitgargai · 2026-05-05T03:32:29Z

Thanks for the implementation. One blocker before merge: README.md still has unresolved merge conflict markers (<<<<<<< text-removal-implementation, =======, >>>>>>> main) in the intro, benchmark table, and diffusion example. CI is green because markdown is not covered by the current lint checks, but this would ship broken docs/PyPI README.

While resolving that, please also update the gdb list comment that still says 39 benchmarks to 40, and remove the extra blank line at EOF in src/gdb/metrics/remove_metric.py (git diff --check main...HEAD catches both the conflict markers and the trailing blank line).

mohitgargai · 2026-05-05T03:35:41Z

Functional review follow-up after exercising image-6 locally with synthetic JSON/CSV manifests:

Local manifest loading/build/eval basically works: TextRemoval.load_data() accepts JSON/CSV, build_model_input() includes the source image and mask metadata, and a perfect synthetic prediction gives expected PSNR/SSIM when heavy optional metrics are disabled.

Issues I think should be fixed before merge:

image-6 looks broken for the default Hugging Face/no---dataset-root path. scripts/upload_to_hf.py stores only one primary image in the image column and normalizes the rest of the path fields into metadata/ground_truth. gdb.hf.load_from_hub() caches that primary image as image_path, but TextRemoval.build_model_input() reads sample["input_image"] and sample["mask"] directly, and evaluate() reads ground_truth["image"]/ground_truth["mask"] directly. For hub-loaded rows those remain relative dataset paths, not cached local files, so the model input points at a nonexistent source/mask and evaluation silently skips samples / reports zero coverage. Either load_from_hub() needs task-aware materialization for input_image, mask, and target image, or image-6 needs to explicitly not advertise/support hub loading until those assets are carried through.
The PR says/docs that diffusion defaults move to flux.2-klein-9b, but the actual gdb eval --provider diffusion default is still flux.2-klein-4b in src/gdb/cli.py (DEFAULT_MODEL_IDS["diffusion"]). scripts/run_benchmarks.py was changed, but the main CLI users are told to use remains on 4B.
ReMOVE initialization can trigger a large SAM checkpoint download before checking whether segment-anything is installed. _remove_score() calls ensure_sam_checkpoint(...) before constructing RemoveMetricEvaluator, and the segment_anything import happens inside RemoveMetricEvaluator._setup_predictor(). In an environment with torch but without segment-anything, evaluating image-6 can download the SAM vit-h checkpoint and then still return NaN. Please check optional deps/importability first, or require/probe an explicit checkpoint only when the metric is actually usable.

Resolve documentation conflicts, align diffusion defaults, materialize image-6 auxiliary assets from HuggingFace rows, and avoid downloading ReMOVE checkpoints before optional dependencies are available. Co-authored-by: Cursor <cursoragent@cursor.com>

jaejung-dev requested a review from purvanshi as a code owner April 20, 2026 12:50

jaejung-dev and others added 6 commits April 20, 2026 12:53

Fix CI smoke count and lint import ordering.

e86cb7c

This updates the smoke assertion to the current 40 benchmark registry size and applies Ruff-compliant import sorting in remove_metric so lint passes. Made-with: Cursor

Move image-6 implementation into typography module.

acdbee8

Keep a compatibility shim in tasks/image.py and align README/HELM metadata so image-6 remains discoverable as a typography-domain benchmark with consistent benchmark counts. Made-with: Cursor

Fix typography import ordering for CI lint.

2cf7ccc

Apply Ruff-compatible import grouping in typography.py only so the PR passes lint without changing runtime behavior. Made-with: Cursor

Merge branch 'main' into text-removal-implementation

2073136

Fix CLI benchmark count smoke test.

4858e38

Co-authored-by: Cursor <cursoragent@cursor.com>

mohitgargai merged commit 74bdff6 into lica-world:main May 5, 2026
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement image-6 text removal and align generation defaults#5

Implement image-6 text removal and align generation defaults#5
mohitgargai merged 8 commits into
lica-world:mainfrom
jaejung-dev:text-removal-implementation

jaejung-dev commented Apr 20, 2026

Uh oh!

mohitgargai commented May 5, 2026

Uh oh!

mohitgargai commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jaejung-dev commented Apr 20, 2026

Summary

Test plan

Uh oh!

mohitgargai commented May 5, 2026

Uh oh!

mohitgargai commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants