Skip to content

Improve EchoNet-LVH conversion script#435

Open
wesselvannierop wants to merge 59 commits into
openh-rffrom
feature/dataset-conversion-cleanup
Open

Improve EchoNet-LVH conversion script#435
wesselvannierop wants to merge 59 commits into
openh-rffrom
feature/dataset-conversion-cleanup

Conversation

@wesselvannierop

@wesselvannierop wesselvannierop commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator
  • Deduplicates code
  • Fixes a bug where only the angle from cone_params was actually used in the cartesian -> polar conversion. Now the tip, theta_angle and r_max are used.
  • All polar images are stored as (600, 600) px, this is easier for dataloading, and sufficient resolution to reconstruct the original frames with high accuracy. Previously, the resolution for storing was kind of arbirary, based on the shape of the cropped scan cone. Now you can use LVHProcessor.scan_convert for exact reconstruction. There is a test for this.
  • The script sorts the files on shape now to reduce jit tracing, and some other tricks to reduce re-tracing.
  • Removed the hard jax dependency, but still favor it.
  • Some documentation improvements.
  • huggingface_hub >=1.2 for non uv install
  • Conversion script runs in ~4h now due to parallel saving/loading/precomputing
  • Added flag to not log warnings for optional fields for saving files. We might want to refine this.
  • Atomic write for File.create, never save an incomplete file.

Example of bug, but I've seen more extreme examples that I cannot quickly find now:

image

Note that the round-trip error is scaled for visualization.

Some useful preprocessing could be to crop the black top from the polar image.

Code for testing
"""
Sanity-check the EchoNet-LVH conversion on this branch against the older
``openh-rf`` branch implementation by side-by-side rendering of intermediate
steps on a handful of AVI files.

For each input AVI this script saves into ``--output_dir``:

1. ``<stem>_scan_cone_visualization.png`` — the cone detection overlay
   produced by :func:`zea.tools.fit_scan_cone.visualize_scan_cone`.
2. ``<stem>_comparison.png`` — a 1x4 panel of:
   original first frame | cropped+centered frame | polar (this branch) |
   polar (openh-rf style).

The only meaningful difference between the two branches is the polar
conversion: the old branch passed ``angle=opening_angle/2`` (symmetric, default
tip/r_max), while this branch builds an asymmetric ``theta_range`` from the
fitted left/right slopes, uses the detected apex as the polar tip and the
circle radius as ``r_max``.
"""

import argparse
import math
import os
from pathlib import Path

os.environ.setdefault("KERAS_BACKEND", "jax")

import matplotlib.pyplot as plt
import numpy as np
from keras import ops

from zea import log
from zea.data.convert.echonetlvh import LVHProcessor
from zea.display import cartesian_to_polar_matrix
from zea.tools.fit_scan_cone import (
    _load_first_frame,
    crop_and_center_cone,
    detect_cone_parameters,
    visualize_scan_cone,
)


def polar_this_branch(cropped_frame, cone_params):
    """Polar conversion as performed on this branch (asymmetric, apex-anchored)."""
    apex_x_in_crop = cone_params["apex_x"] - cone_params["crop_left"]
    cropped_width = cone_params["crop_right"] - cone_params["crop_left"]
    left_padding = max(0, int(cropped_width / 2 - apex_x_in_crop))
    tip_x = apex_x_in_crop + left_padding
    tip_y = cone_params["apex_y"] - cone_params["crop_top"]
    theta_min = -math.atan(cone_params["right_slope"])
    theta_max = -math.atan(cone_params["left_slope"])
    return cartesian_to_polar_matrix(
        cropped_frame,
        tip=(tip_x, tip_y),
        r_max=cone_params["circle_radius"],
        theta_range=(theta_min, theta_max),
    )


def polar_openh_rf(cropped_frame, cone_params):
    """Polar conversion as performed on the openh-rf branch (symmetric, defaults)."""
    angle = cone_params["opening_angle"] / 2
    theta_range = (-angle, angle)
    return cartesian_to_polar_matrix(cropped_frame, theta_range=theta_range)


def polar_direct(original_frame, cone_params):
    """Polar conversion straight from the uncropped frame using apex coordinates."""
    theta_min = -math.atan(cone_params["right_slope"])
    theta_max = -math.atan(cone_params["left_slope"])
    return cartesian_to_polar_matrix(
        original_frame,
        tip=(cone_params["apex_x"], cone_params["apex_y"]),
        r_max=cone_params["circle_radius"],
        theta_range=(theta_min, theta_max),
    )


def save_comparison(
    original, cropped, polar_new, polar_old, polar_no_crop, back_cartesian, out_path
):
    """Save a 1x5 side-by-side comparison figure on a white background."""
    titles = [
        "Original first frame",
        "Cropped + centered",
        "Polar (this branch)\nfrom cropped frame",
        "Polar (openh-rf)\nsymmetric, default tip/r_max",
        "Polar direct (no crop)\napex_x, apex_y, r_max, theta_range",
        "Polar back to cartesian",
        "Error",
    ]
    fig, axes = plt.subplots(1, len(titles), figsize=(25, 6), facecolor="white")
    error = np.abs(cropped - back_cartesian)
    images = [original, cropped, polar_new, polar_old, polar_no_crop, back_cartesian, error]
    for ax, img, title in zip(axes, images, titles):
        if title == "Error":
            vmin = 0
            vmax = 10
        else:
            vmin = None
            vmax = None
        ax.imshow(img, cmap="gray", vmin=vmin, vmax=vmax)
        ax.set_title(title, color="black")
        ax.set_facecolor("white")
        ax.set_xticks([])
        ax.set_yticks([])
        for spine in ax.spines.values():
            spine.set_visible(True)
            spine.set_color("black")
            spine.set_linewidth(1.0)
    fig.tight_layout()
    fig.savefig(out_path, dpi=150, bbox_inches="tight", facecolor="white")
    plt.close(fig)


def process_one(avi_path: Path, output_dir: Path):
    log.info(f"Processing {log.yellow(avi_path)}")
    frame = _load_first_frame(avi_path)

    cone_params = detect_cone_parameters(frame)
    if cone_params is None:
        log.error(f"  cone detection failed for {avi_path}")
        return

    visualize_scan_cone(frame, cone_params, output_dir=output_dir)
    (output_dir / "scan_cone_visualization.png").rename(
        output_dir / f"{avi_path.stem}_scan_cone_visualization.png"
    )

    frame_f32 = ops.cast(frame, "float32")
    cropped = crop_and_center_cone(frame_f32, cone_params, backend=ops)

    polar_new = polar_this_branch(cropped, cone_params)
    polar_old = polar_openh_rf(cropped, cone_params)
    polar_no_crop = polar_direct(frame_f32, cone_params)

    back_cartesian = LVHProcessor.scan_convert(polar_no_crop, cone_params, cropped.shape)

    save_comparison(
        np.asarray(frame),
        np.asarray(cropped),
        np.asarray(polar_new),
        np.asarray(polar_old),
        np.asarray(polar_no_crop),
        np.asarray(back_cartesian),
        output_dir / f"{avi_path.stem}_comparison.png",
    )
    log.info(f"  saved {avi_path.stem}_comparison.png")


def main():
    parser = argparse.ArgumentParser(description=__doc__)
    parser.add_argument(
        "--input_files",
        nargs="+",
        required=True,
        help="One or more AVI file paths to compare.",
    )
    parser.add_argument(
        "--output_dir",
        default="output/lvh_branch_comparison",
        help="Directory where PNGs are written.",
    )
    args = parser.parse_args()

    output_dir = Path(args.output_dir)
    output_dir.mkdir(parents=True, exist_ok=True)

    for avi in args.input_files:
        process_one(Path(avi), output_dir)

    log.info(f"All comparisons written to {log.yellow(output_dir)}")


if __name__ == "__main__":
    main()

Summary by CodeRabbit

Release Notes

  • Documentation

    • Clarified timeout configuration to accept positive integers only.
  • Dependencies

    • Updated huggingface_hub requirement from >=0.26 to >=1.2.
  • New Features

    • Added parallel processing control for dataset conversion with configurable worker threads.
    • Added polar-to-Cartesian image transformation capability.
    • Added HDF5 frame chunking option for optimized storage.
  • Bug Fixes

    • Improved robustness of dataset converters for various input layouts (EchoNet-LVH, CAMUS, PICMUS, EchoNet).
    • Enhanced automatic detection of extracted datasets.
  • Tests

    • Expanded test coverage for conversion pipelines and image transforms.

@wesselvannierop

Copy link
Copy Markdown
Collaborator Author

(still ironing out some bugs that I encountered while running on the full set)

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (1)
tests/test_display.py (1)

121-172: ⚡ Quick win

Unused angle parameter in test function.

The angle parameter is still defined in the parametrization (line 128) and function signature (line 132), but it's no longer used in the test body after the refactor to use a fixed symmetric theta_range. This creates dead code and could confuse future maintainers.

♻️ Proposed fix to remove unused parameter
 `@pytest.mark.parametrize`(
-    "size, pattern_creator, allowed_error, angle",
+    "size, pattern_creator, allowed_error",
     [
-        ((200, 200), "create_radial_pattern", 0.001, None),
-        ((100, 333), "create_radial_pattern", 0.001, None),
-        ((200, 200), "create_concentric_rings", 0.1, None),
-        ((100, 333), "create_concentric_rings", 0.1, None),
-        ((200, 200), "create_radial_pattern", 0.001, np.deg2rad(30)),
+        ((200, 200), "create_radial_pattern", 0.001),
+        ((100, 333), "create_radial_pattern", 0.001),
+        ((200, 200), "create_concentric_rings", 0.1),
+        ((100, 333), "create_concentric_rings", 0.1),
     ],
 )
 `@backend_equality_check`(decimal=2)
-def test_scan_conversion_and_inverse(size, pattern_creator, allowed_error, angle):
+def test_scan_conversion_and_inverse(size, pattern_creator, allowed_error):
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_display.py` around lines 121 - 172, The
test_scan_conversion_and_inverse function has an unused angle parameter in both
the pytest.mark.parametrize decorator and the function signature. Remove the
angle parameter from the parametrize list (which appears in all test case
tuples) and remove it from the function signature, since the test uses a fixed
theta_range value instead of utilizing the angle parameter.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/source/environment.rst`:
- Around line 42-45: The documentation table in docs/source/environment.rst
contains a duplicate entry for ZEA_DOWNLOAD_TIMEOUT and the documented default
value of 60 seconds is incorrect. Remove the duplicate ZEA_DOWNLOAD_TIMEOUT row
shown in lines 42-45, and update the default value in the remaining
ZEA_DOWNLOAD_TIMEOUT entry to 600 to match the actual runtime default used in
zea/data/convert/utils.py line 149. This ensures the documentation accurately
reflects the actual configuration and eliminates the conflicting guidance from
having two entries for the same environment variable.

In `@zea/data/convert/camus.py`:
- Around line 448-456: The function currently returns `src / "database_nifti"`
when found, but when CAMUS_public is extracted as a directory structure, the
patient folders are located under `src / "CAMUS_public" / "database_nifti"`.
Update the logic to check for and return the correct nested path: after finding
`database_nifti` exists at line 448-456 area, verify if it actually contains
patient folders; if not but `CAMUS_public / database_nifti` exists with patient
folders, return that nested path instead. Apply the same fix at line 510 where
patient folder paths are resolved to ensure consistent path handling across both
locations.

In `@zea/data/convert/echonet.py`:
- Around line 455-466: The _resolve_path function returns the wrong value when
unzipping is needed. On line 465, instead of returning the direct result of
unzip(src / zip_name, src), it should return unzip_dir (which is src /
folder_name / "Videos") to match the behavior when the folder already exists on
line 463. This ensures convert_echonet scans the correct Videos directory for
avi files after unzipping.

In `@zea/data/convert/picmus.py`:
- Around line 318-328: The _resolve_path function currently returns the raw src
directory instead of the stable archive_to_download subdirectory after
unzipping, which causes inconsistent output paths for zip-based runs.
Additionally, it only checks for picmus.zip but should also handle
archive_to_download.zip for manual downloads. Modify the function to return
unzip_dir (src/archive_to_download) after calling unzip, and update the zip file
resolution logic to check for both picmus.zip and archive_to_download.zip files
to support both standard and manual download formats.

In `@zea/data/convert/utils.py`:
- Around line 115-117: The zip extraction in the code is vulnerable to path
traversal attacks where a malicious archive could contain member paths with
traversal sequences like `../` to write files outside the destination directory.
Validate each member path before extraction to ensure the resolved absolute path
remains within the destination directory. For each member in zip_ref.namelist(),
normalize and resolve the full path that would result from extraction, then
verify it starts with the destination directory's absolute path. Only extract
members whose resolved paths are safe. Consider using os.path.normpath and
os.path.abspath for path validation, or alternatively manually extract and write
files by reading from the zip archive and writing to validated target paths.

---

Nitpick comments:
In `@tests/test_display.py`:
- Around line 121-172: The test_scan_conversion_and_inverse function has an
unused angle parameter in both the pytest.mark.parametrize decorator and the
function signature. Remove the angle parameter from the parametrize list (which
appears in all test case tuples) and remove it from the function signature,
since the test uses a fixed theta_range value instead of utilizing the angle
parameter.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 30b8bd34-0d52-4ba9-b081-b7b79c32d193

📥 Commits

Reviewing files that changed from the base of the PR and between dd0375b and 973b6cd.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (19)
  • docs/source/environment.rst
  • pyproject.toml
  • tests/data/test_conversion_scripts.py
  • tests/data/test_data_format.py
  • tests/test_display.py
  • zea/data/convert/__main__.py
  • zea/data/convert/camus.py
  • zea/data/convert/echonet.py
  • zea/data/convert/echonetlvh/README.md
  • zea/data/convert/echonetlvh/__init__.py
  • zea/data/convert/echonetlvh/precompute_crop.py
  • zea/data/convert/picmus.py
  • zea/data/convert/utils.py
  • zea/data/file.py
  • zea/data/file_operations.py
  • zea/data/spec.py
  • zea/display.py
  • zea/scan.py
  • zea/tools/fit_scan_cone.py
💤 Files with no reviewable changes (3)
  • zea/data/convert/echonetlvh/README.md
  • zea/data/convert/echonetlvh/precompute_crop.py
  • zea/data/file_operations.py

Comment thread docs/source/environment.rst Outdated
Comment thread zea/data/convert/camus.py
Comment thread zea/data/convert/echonet.py
Comment thread zea/data/convert/picmus.py
Comment thread zea/data/convert/utils.py
@tristan-deep tristan-deep added this to the v0.1.0 (OpenH-RF) milestone Jun 17, 2026
Comment thread tests/test_display.py
reversed order (matching cartesian_to_polar_matrix's rot90 column ordering, which is what
polar_geometry_from_coords_for_interp returns).
"""
from keras import ops

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either remove this inline imports or add backend_equality_check (if that runs that is preferred).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data format Related to the zea data format saving and loading

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants