Skip to content

EchoXFlow conversion#424

Open
swpenninga wants to merge 20 commits into
openh-rffrom
echoxflow-convert
Open

EchoXFlow conversion#424
swpenninga wants to merge 20 commits into
openh-rffrom
echoxflow-convert

Conversation

@swpenninga

@swpenninga swpenninga commented Jun 10, 2026

Copy link
Copy Markdown
Contributor
  • Merging into openh-rf, not main, to use the new dataformat. (but this is not channel data, its B-mode)
  • Created a huggingface repository v0.1.0 here. It is public which is allowed under cc by nc sa 4.0.
  • The conversion script test had to break the format of the other scripts, since the echoxflow module/package was required, so i broke out of the subprocess CLI run as a workaround.

This is not the entire dataset, it contains some filters. To keep things managable (and high quality), we copy only the 2D (single plane) B-modes:

image

with the following filters:
>30fps, >10frames in the sequence
this avoids some low quality single frames or noise frames that are used for probe positioning for example.

All the metadata that comes with these files is transferred to zea format.

Some interesting statistics about the 1291 different data shapes with these filters:

H × W Count %
718 × 242 1,605 11.8%
723 × 242 721 5.3%
728 × 242 678 5.0%
737 × 242 556 4.1%
727 × 242 485 3.6%
732 × 242 471 3.5%
735 × 242 420 3.1%
723 × 274 276 2.0%
718 × 274 234 1.7%
728 × 274 230 1.7%
W Count %
242 5,420 39.8%
274 1,679 12.3%
210 1,123 8.2%
198 1,042 7.6%
226 801 5.9%

@swpenninga swpenninga added enhancement New feature or request tests Tests data format Related to the zea data format saving and loading labels Jun 10, 2026
@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 7a34b2f8-64a3-430f-84fd-7d38904db478

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch echoxflow-convert

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov

codecov Bot commented Jun 10, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 83.00000% with 34 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
zea/data/convert/echoxflow.py 70.52% 19 Missing and 9 partials ⚠️
tests/data/test_conversion_scripts.py 94.17% 5 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@swpenninga swpenninga marked this pull request as ready for review June 11, 2026 08:52
@swpenninga swpenninga requested review from a team and OisinNolan and removed request for a team June 11, 2026 08:52
Comment thread zea/data/convert/echoxflow.py Outdated
@swpenninga

Copy link
Copy Markdown
Contributor Author

fixed now, i will rerun conversion + upload to use the new timestamps

@tristan-deep

tristan-deep commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

Testing right now. @swpenninga could you upload the dataset on the HF branch "v0.1.0" instead of "0.1.0"? That is how it is done for the other datasets too now. After you can remove the "0.1.0". Also could you update the embedded zea version to "0.1.0" (by updating the toml and installing again). I see it is now 0.0.11 which contradicts the HF branch version.

@tristan-deep

Copy link
Copy Markdown
Collaborator

@swpenninga I see the images are now saved as (z, x, 1) ? I think (z, x) is also allowed. Have to check with @OisinNolan if that also makes sense with coordinates that are stored alongside, seems that always has to be [x, y, z] in the last axis.

@swpenninga

Copy link
Copy Markdown
Contributor Author

Testing right now. @swpenninga could you upload the dataset on the HF branch "v0.1.0" instead of "0.1.0"? That is how it is done for the other datasets too now. After you can remove the "0.1.0". Also could you update the embedded zea version to "0.1.0" (by updating the toml and installing again). I see it is now 0.0.11 which contradicts the HF branch version.

Running that right now!

@tristan-deep

Copy link
Copy Markdown
Collaborator

Testing right now. @swpenninga could you upload the dataset on the HF branch "v0.1.0" instead of "0.1.0"? That is how it is done for the other datasets too now. After you can remove the "0.1.0". Also could you update the embedded zea version to "0.1.0" (by updating the toml and installing again). I see it is now 0.0.11 which contradicts the HF branch version.

Running that right now!

Nice! Also wouldn't be opposed to putting everything in a dataset folder, so it is easier for people to see the readme / license etc. auxiliary files. Don't need to rerun, can use hugging face checkout to rearrange files maybe as quick fix? Sorry for last minute comment....

@tristan-deep

tristan-deep commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

Just used this snippet to test the dataset, added some loose comments above but overall seems working. Really nice addition! This is quite a cool dataset.

import zea

with zea.Dataset("hf://zeahub/echoxflow/", revision="0.1.0", lazy=True) as ds:
    f = ds[20]
    sequence = f.data.image.values[:].astype(np.float32)  # (n_frames, z, x, y)
    sequence_time = np.asarray(f.metadata.bmode_timestamps.timestamps[:])
    coordinates = np.asarray(f.data.image.coordinates[:])  # (H, W, 3): [x, y, z] in metres
    ecg = np.asarray(f.metadata.ecg.samples[:])
    ecg_time = np.asarray(f.metadata.ecg.timestamps[:])

    sequence = sequence[..., 0]  # (n_frames, z, x)
echoxflow

And full example below:

Full snippet
import numpy as np
from PIL import Image, ImageDraw

import zea
from zea.display import scan_convert
from zea.io_lib import save_video

with zea.Dataset("hf://zeahub/echoxflow/", revision="0.1.0", lazy=True) as ds:
    f = ds[20]
    sequence = f.data.image.values[:].astype(np.float32)  # (n_frames, z, x, y)
    sequence_time = np.asarray(f.metadata.bmode_timestamps.timestamps[:])
    coordinates = np.asarray(f.data.image.coordinates[:])  # (H, W, 3): [x, y, z] in metres
    ecg = np.asarray(f.metadata.ecg.samples[:])
    ecg_time = np.asarray(f.metadata.ecg.timestamps[:])

    sequence = sequence[..., 0]  # (n_frames, z, x)

    # FPS from actual frame timestamps
    fps = float(1.0 / np.mean(np.diff(sequence_time)))

    # Derive polar ranges from stored per-pixel Cartesian coordinates.
    # The first row sits at the apex (rho=0), so all coords are [0,0,0] there.
    # Use the last (deepest) row to get a valid theta range.
    coords = coordinates[0] if coordinates.ndim == 4 else coordinates  # (z, x, 3)
    x, z = coords[..., 0], coords[..., 2]
    W_mid = coords.shape[1] // 2
    rho_range = (
        float(np.sqrt(x[0, W_mid] ** 2 + z[0, W_mid] ** 2)),
        float(np.sqrt(x[-1, W_mid] ** 2 + z[-1, W_mid] ** 2)),
    )
    theta_range = (
        float(np.arctan2(x[-1, 0], z[-1, 0])),
        float(np.arctan2(x[-1, -1], z[-1, -1])),
    )

    # Scan convert: polar → Cartesian for the full sequence in one pass
    sequence_sc, _ = scan_convert(
        sequence,
        rho_range=rho_range,
        theta_range=theta_range,
        with_batch_dim=True,
    )
    sequence_sc = np.clip(np.asarray(sequence_sc), 0, 255).astype(np.uint8)  # (N, H_sc, W_sc)

    # --- ECG overlay strip ---
    N, H_sc, W_sc = sequence_sc.shape
    strip_h = 80
    t_total = float(ecg_time[-1])

    # Resample ECG onto the frame width (handles irregular ECG timestamps)
    t_uniform = np.linspace(0.0, t_total, W_sc)
    ecg_resampled = np.interp(t_uniform, ecg_time, ecg)
    ecg_min, ecg_max = ecg_resampled.min(), ecg_resampled.max()
    ecg_norm = (ecg_resampled - ecg_min) / (ecg_max - ecg_min + 1e-8)

    # Pre-render the ECG waveform as a PIL strip (red on black)
    strip_base = Image.new("RGB", (W_sc, strip_h), (0, 0, 0))
    draw = ImageDraw.Draw(strip_base)
    pts = [(j, strip_h - 1 - int(ecg_norm[j] * (strip_h - 1))) for j in range(W_sc)]
    draw.line(pts, fill=(220, 30, 30), width=2)
    strip_base = np.array(strip_base)

    # Composite each frame: scan-converted B-mode (→ RGB) + ECG strip with time cursor
    frames = []
    for i in range(N):
        frame_rgb = np.repeat(sequence_sc[i, :, :, np.newaxis], 3, axis=-1)

        strip = strip_base.copy()
        x_cursor = int(sequence_time[i] / t_total * (W_sc - 1))
        strip[:, max(0, x_cursor - 1) : x_cursor + 2] = (220, 30, 30)  # red cursor

        frames.append(np.concatenate([frame_rgb, strip], axis=0))

    save_video(np.array(frames), "echoxflow.gif", fps=fps)
echoxflow

@swpenninga

Copy link
Copy Markdown
Contributor Author

branch was changed to v0.1.0 now, not sure how to delete the old one

@OisinNolan

Copy link
Copy Markdown
Collaborator

@swpenninga I see the images are now saved as (z, x, 1) ? I think (z, x) is also allowed. Have to check with @OisinNolan if that also makes sense with coordinates that are stored alongside, seems that always has to be [x, y, z] in the last axis.

It is allowed by the spec to have a channel dim in your images, which can be 1 (it is an empty dim, but the spec allows it.) The coordinates also look correct -- they just need to match the spatial dims
cc @tristan-deep

@tristan-deep

Copy link
Copy Markdown
Collaborator

@swpenninga I see the images are now saved as (z, x, 1) ? I think (z, x) is also allowed. Have to check with @OisinNolan if that also makes sense with coordinates that are stored alongside, seems that always has to be [x, y, z] in the last axis.

It is allowed by the spec to have a channel dim in your images, which can be 1 (it is an empty dim, but the spec allows it.) The coordinates also look correct -- they just need to match the spatial dims cc @tristan-deep

Right, but would (z, x) simply also be allowed? Since the data is 2D. Or better to be consistent and always do (z, x, y) even if y=1? Since (z, x) seems allowed from the docs I thought it would be a better option for this dataset.

@OisinNolan OisinNolan left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good @swpenninga , nice work! This was a cool dataset to test the metadata spec with -- left one comment on that but otherwise I think it's good to go

Comment thread zea/data/convert/echoxflow.py Outdated
Comment on lines +121 to +124
metadata["bmode_timestamps"] = {
"samples": ts,
"timestamps": ts - ts[0],
"start_time_offset": np.float32(ts[0]),

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this part a little confusing -- we're storing here a signal whose values are absolute timestamps and whose timestamps are relative timestamps

I think this happens because (i) we want to record the timestamps of the B-mode frames, but (ii) we require a custom signal to have samples

I think there are two options here:
(i) We add an optional timestamp array to Map (non-breaking), where it's simply a 1D float array whose length needs to match the first dim of the Map.values (preferred)
(ii) We allow custom signals to be just Signal instead of SignalND

what do you think?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think option one would be perfect. Als found the current way of saving confusing when working on my example snippet.

@OisinNolan

Copy link
Copy Markdown
Collaborator

@swpenninga I see the images are now saved as (z, x, 1) ? I think (z, x) is also allowed. Have to check with @OisinNolan if that also makes sense with coordinates that are stored alongside, seems that always has to be [x, y, z] in the last axis.

It is allowed by the spec to have a channel dim in your images, which can be 1 (it is an empty dim, but the spec allows it.) The coordinates also look correct -- they just need to match the spatial dims cc @tristan-deep

Right, but would (z, x) simply also be allowed? Since the data is 2D. Or better to be consistent and always do (z, x, y) even if y=1? Since (z, x) seems allowed from the docs I thought it would be a better option for this dataset.

@tristan-deep yeah indeed -- (z, x) is also allowed and maybe preferred here, unless there's a good reason to keep the channel dim (e.g. maybe it helps distinguish between different types of image in the dataset or something). What do you think @swpenninga ?

@tristan-deep

tristan-deep commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Testing right now. @swpenninga could you upload the dataset on the HF branch "v0.1.0" instead of "0.1.0"? That is how it is done for the other datasets too now. After you can remove the "0.1.0". Also could you update the embedded zea version to "0.1.0" (by updating the toml and installing again). I see it is now 0.0.11 which contradicts the HF branch version.

Basically just git, but using pointers so you don't have to download the entire dataset (using GIT_LFS_SKIP_SMUDGE=1).

# clone repo, but using pointers only
export GIT_LFS_SKIP_SMUDGE=1

git clone https://huggingface.co/datasets/zeahub/echoxflow
cd echoxflow
# see branches
git branch -a
# delete the local branch 
git branch --delete 0.1.0
# delete remote branch
git push origin --delete 0.1.0
# check
git log --oneline

Make sure your HF_TOKEN is set to write as well, but probably already is given that is also necessary for the Python API.

@tristan-deep tristan-deep added this to the v0.1.0 (OpenH-RF) milestone Jun 17, 2026
@swpenninga

Copy link
Copy Markdown
Contributor Author

@OisinNolan @tristan-deep i added the timestamp to the Map, and i removed the trailing 1 dim to get (z, x) in the last commit. Let me know if I can start (hopefully the last) conversion and upload to hf!

@tristan-deep

Copy link
Copy Markdown
Collaborator

@OisinNolan @tristan-deep i added the timestamp to the Map, and i removed the trailing 1 dim to get (z, x) in the last commit. Let me know if I can start (hopefully the last) conversion and upload to hf!

For me all good! Changes look good. Will test after conversion again! (or if you already have a sample available)

@OisinNolan OisinNolan left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conversion lgtm! Just one remaining issue regarding the update to the spec

Comment thread zea/data/spec.py
the shape is ``(*values.shape[:-1], 3)``. The last axis holds ``[x, y, z]``.
The leading ``n_frames`` axis may be omitted to broadcast one coordinate grid
across all frames.
timestamps: Optional per-frame acquisition timestamps in seconds, shape ``(n_frames,)``.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice addition!
I'm just thinking -- maybe the definition here should match that of timestamps on the Signal and ProbePose spec -- there we generally define them as relative to the first transmit event of the acquisition. This would mean we could more easily compare timestamps across these different sources.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would also then need a start_time_offset like in the other cases, with some validation that it's also set if we specify timestamps, and that timestamps are >0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data format Related to the zea data format saving and loading enhancement New feature or request tests Tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants