Skip to content

replaced scipy.savemat with hdf5storage.savemat#16

Open
behinger wants to merge 1 commit into
jackz314:mainfrom
behinger:feature-hdf5
Open

replaced scipy.savemat with hdf5storage.savemat#16
behinger wants to merge 1 commit into
jackz314:mainfrom
behinger:feature-hdf5

Conversation

@behinger

@behinger behinger commented Jun 5, 2025

Copy link
Copy Markdown

Hi!
this PR would fix #10 by defaulting to saving in matlabs v7.3 format. That format is available since 2006 - maybe it is time to adopt it.

It is breaking (thus version to 0.2) in cases people rely on workflows that do not have pymatreader installed´

I have not run docs / tests, because i'm not super familiar with testing in python. I'm hoping there is a CI on github 🙈.

I didnt know where to put a changelog. Also I hope the PR is fine like this. Please feel free to change it if needed!
Cheers, Bene

PS: I havent checked if scipy is needed for something else than savemat, but you would probably know :)

@larsoner

larsoner commented Jun 5, 2025

Copy link
Copy Markdown
Collaborator

Can we use pymatreader instead?

Also probably worth just having a backward compatible fmt="v5" that could be set to "v7" or similar. I'd rather do that than introduce a breaking change. If we want a breaking change then it would technically ideally have a major version bump to 1.0

@behinger

behinger commented Jun 5, 2025 via email

Copy link
Copy Markdown
Author

@larsoner

larsoner commented Jun 8, 2025

Copy link
Copy Markdown
Collaborator

I don't think it would be too bad to add an argument to mne-bids and MNE-Python at least. But I can live with a backward compat breaking fmt="v7" for example and then we only add to MNE-BIDS and MNE-Python if needed. I think for example the existing MNE-Python test would probably work okay...

https://github.com/mne-tools/mne-python/blob/4033b3a339c1226e248953eeee30e7488d50659f/mne/export/tests/test_export.py#L92

@behinger

Copy link
Copy Markdown
Author

sorry - I havent really looked into this any further.


One other thing I noticed, that probably was not a problem before if max filesize is <2gb - is the memory use of eeglabio.

Especially two lines make it really expensive:

data = data * 1e6  # convert to microvolts
# [...]
data = data.astype(precision)

Both lines make fresh copies, thus requiring 2-3x the memory (I dont exactly know when python garbage collects, but e.g. I couldnt save a 10gb EEG file with 38GB of ram - ~3-5gb system memory, but already loading requires 20GB, dunno where the missing ~5GB come from)

So I changed it to the following (potentially problematic)

data *= 1e6  # convert to microvolts
# [...]
data = data.astype(precision, copy=False)

The second change is easy and cheap and should be done in any case. The first one is tricky, because I think it is an inline change, thus if somebody does something like

data = raw.get_data()
export_set(data)
data.max() 

their max might be changed. But in MNE i have only seen the calls like this:

export_set(data=raw.get_data()).

generally, I'm wondering if the rescaling should really be done in eeglabio, or rather before, maybe optional? Then one could change MNE & eeglabio at the same time - a bit tricky business with which I dont really know how to coordinate, so I dont really feel comfortable to do right now.

For me locally, I changed it to the above solution and now can save all my datasets with slack-RAM free :)

@larsoner

Copy link
Copy Markdown
Collaborator

generally, I'm wondering if the rescaling should really be done in eeglabio, or rather before, maybe optional? Then one could change MNE & eeglabio at the same time - a bit tricky business with which I dont really know how to coordinate, so I dont really feel comfortable to do right now.

I think the way to do this would be to add a *, scale_data=True in this repo, which means "the data I'm passing are in V and need to be scaled by eeglabio before writing". Then in MNE-Python we use inspect to look to see if this scale_data kwarg exists in eeglabio, and if it does, we use the more memory efficient codepath

data = raw.get_data()
kwargs = dict()
if "scale_data" in inspect.getfullargspec(...)...:
    data *= 1e6
    kwargs["scale_data"] = False
export_set(data, **kwargs)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Support writing EEGLAB to mat files of version 7.3 and higher (=HDF5)

2 participants