replaced scipy.savemat with hdf5storage.savemat by behinger · Pull Request #16 · jackz314/eeglabio

behinger · 2025-06-05T07:39:09Z

Hi!
this PR would fix #10 by defaulting to saving in matlabs v7.3 format. That format is available since 2006 - maybe it is time to adopt it.

It is breaking (thus version to 0.2) in cases people rely on workflows that do not have pymatreader installed´

I have not run docs / tests, because i'm not super familiar with testing in python. I'm hoping there is a CI on github 🙈.

I didnt know where to put a changelog. Also I hope the PR is fine like this. Please feel free to change it if needed!
Cheers, Bene

PS: I havent checked if scipy is needed for something else than savemat, but you would probably know :)

larsoner · 2025-06-05T15:20:47Z

Can we use pymatreader instead?

Also probably worth just having a backward compatible fmt="v5" that could be set to "v7" or similar. I'd rather do that than introduce a breaking change. If we want a breaking change then it would technically ideally have a major version bump to 1.0

behinger · 2025-06-05T15:38:29Z

pymatreader only reads mat files, not saves them, as far as I could see. The problem of fmt= is, that all downstream packages would have to expose that option (mne, mne-bids, others?). Maybe that is not a big issue, you know better :) I'm only familiar with semver as used in Julia, there for 0 versions, bumping major indicates breaking. Am 5. Juni 2025 17:21:08 MESZ schrieb Eric Larson ***@***.***>:

…

larsoner left a comment (jackz314/eeglabio#16) Can we use pymatreader instead? Also probably worth just having a backward compatible `fmt="v5"` that could be set to `"v7"` or similar. I'd rather do that than introduce a breaking change. If we want a breaking change then it would technically ideally have a major version bump to 1.0 -- Reply to this email directly or view it on GitHub: #16 (comment) You are receiving this because you authored the thread. Message ID: ***@***.***>

larsoner · 2025-06-08T21:15:28Z

I don't think it would be too bad to add an argument to mne-bids and MNE-Python at least. But I can live with a backward compat breaking fmt="v7" for example and then we only add to MNE-BIDS and MNE-Python if needed. I think for example the existing MNE-Python test would probably work okay...

https://github.com/mne-tools/mne-python/blob/4033b3a339c1226e248953eeee30e7488d50659f/mne/export/tests/test_export.py#L92

behinger · 2025-12-19T15:03:52Z

sorry - I havent really looked into this any further.

One other thing I noticed, that probably was not a problem before if max filesize is <2gb - is the memory use of eeglabio.

Especially two lines make it really expensive:

data = data * 1e6  # convert to microvolts
# [...]
data = data.astype(precision)

Both lines make fresh copies, thus requiring 2-3x the memory (I dont exactly know when python garbage collects, but e.g. I couldnt save a 10gb EEG file with 38GB of ram - ~3-5gb system memory, but already loading requires 20GB, dunno where the missing ~5GB come from)

So I changed it to the following (potentially problematic)

data *= 1e6  # convert to microvolts
# [...]
data = data.astype(precision, copy=False)

The second change is easy and cheap and should be done in any case. The first one is tricky, because I think it is an inline change, thus if somebody does something like

data = raw.get_data()
export_set(data)
data.max()

their max might be changed. But in MNE i have only seen the calls like this:

export_set(data=raw.get_data()).

generally, I'm wondering if the rescaling should really be done in eeglabio, or rather before, maybe optional? Then one could change MNE & eeglabio at the same time - a bit tricky business with which I dont really know how to coordinate, so I dont really feel comfortable to do right now.

For me locally, I changed it to the above solution and now can save all my datasets with slack-RAM free :)

larsoner · 2025-12-19T19:33:53Z

generally, I'm wondering if the rescaling should really be done in eeglabio, or rather before, maybe optional? Then one could change MNE & eeglabio at the same time - a bit tricky business with which I dont really know how to coordinate, so I dont really feel comfortable to do right now.

I think the way to do this would be to add a *, scale_data=True in this repo, which means "the data I'm passing are in V and need to be scaled by eeglabio before writing". Then in MNE-Python we use inspect to look to see if this scale_data kwarg exists in eeglabio, and if it does, we use the more memory efficient codepath

data = raw.get_data()
kwargs = dict()
if "scale_data" in inspect.getfullargspec(...)...:
    data *= 1e6
    kwargs["scale_data"] = False
export_set(data, **kwargs)

replaced scipy.savemat with hdf5storage.savemat

e926af2

behinger mentioned this pull request Mar 5, 2026

make some data copies optional #24

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

replaced scipy.savemat with hdf5storage.savemat#16

replaced scipy.savemat with hdf5storage.savemat#16
behinger wants to merge 1 commit into
jackz314:mainfrom
behinger:feature-hdf5

behinger commented Jun 5, 2025

Uh oh!

larsoner commented Jun 5, 2025

Uh oh!

behinger commented Jun 5, 2025 via email

Uh oh!

larsoner commented Jun 8, 2025

Uh oh!

behinger commented Dec 19, 2025

Uh oh!

larsoner commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

behinger commented Jun 5, 2025

Uh oh!

larsoner commented Jun 5, 2025

Uh oh!

behinger commented Jun 5, 2025 via email

Uh oh!

larsoner commented Jun 8, 2025

Uh oh!

behinger commented Dec 19, 2025

Uh oh!

larsoner commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants