Skip to content

[Bug]: Extremely long load time during a call to .toTable() after reading NWB file #787

@maxdougherty

Description

@maxdougherty

What happened?

matnwb: 2.10.0

I noticed significantly longer load times during the execution of nwb.units.toTable() when waveforms were added to the units table. When first initializing the nwb object and the units table, this load time is on the order of <10 seconds. But after saving the .nwb file and reading it again, the call to nwb.units.toTable() takes between 20-30 minutes to complete.

I was able to circumvent this issue in 2.9.0 by setting waveforms to an ObjectView pointed to a SpikeEventSeries stored in nwb.analysis. This allowed me to directly reference the waveforms from the units table while avoiding these long call times. However, in 2.10.0 an additional check was added in this commit to +types/+core/Units.m which checks that waveforms is numeric. While I recognize that this new check helps enforce the schema, the >200x slowdown of the nwb.units.toTable() call when using waveforms, waveforms_index, and waveforms_index_index properly are difficult to work with.

I have included a standalone snippet that reliably reproduces this bug.

Steps to Reproduce

%% Testing the nwbRead .toTable() time
% This snippet tests the .toTable() execution times before saving and after
% reading the same .nwb file. 
clear all;

%% Create the test NWB object
% Test NWBFile
nwb = NwbFile( ...
    'session_description', 'mouse in open exploration',...
    'identifier', 'Mouse5_Day3', ...
    'session_start_time', datetime(2018, 4, 25, 2, 30, 3, 'TimeZone', 'local') ...
);

% Test Subject
subject = types.core.Subject( ...
    'subject_id', '005', ...
    'age', '25', ...
    'description', 'subject 5', ...
    'species', 'Homo sapien', ...
    'sex', 'M' ...
);
nwb.general_subject = subject;

% Create the device model
device_model = types.core.DeviceModel( ...
     'manufacturer', 'Array Technologies', ...
     'model_number', 'PRB_1_4_0480_123', ...
     'description', 'Neurovoxels 0.99 - A 12-channel array with 4 shanks and 3 channels per shank' ...
);
% Add device model to nwb object
nwb.general_devices_models.set('Neurovoxels 0.99', device_model);

% Create the device
device = types.core.Device(...
    'description', 'A 12-channel array with 4 shanks and 3 channels per shank', ...
    'serial_number', '1234567890', ...
    'model', device_model ...
);
% Add device to nwb object
nwb.general_devices.set('array', device);

% Create the ElectrodesTable
numShanks = 4;
numChannelsPerShank = 8;
numChannels = numShanks * numChannelsPerShank;

electrodesDynamicTable = types.core.ElectrodesTable(...
    'colnames', {'location', 'group', 'group_name', 'label'}, ...
    'description', 'all electrodes');
 
% Create the electrodeGroups
for iShank = 1:numShanks
    shankGroupName = sprintf('shank%d', iShank);
    electrodeGroup = types.core.ElectrodeGroup( ...
        'description', sprintf('electrode group for %s', shankGroupName), ...
        'location', 'brain area', ...
        'device', types.untyped.SoftLink(device) ...
    );
    
    nwb.general_extracellular_ephys.set(shankGroupName, electrodeGroup);
    for iElectrode = 1:numChannelsPerShank
        electrodesDynamicTable.addRow( ...
            'location', 'unknown', ...
            'group', types.untyped.ObjectView(electrodeGroup), ...
            'group_name', shankGroupName, ...
            'label', sprintf('%s-electrode%d', shankGroupName, iElectrode));
    end
end
nwb.general_extracellular_ephys_electrodes = electrodesDynamicTable;

% Generate the electrode table region
electrode_table_region = types.hdmf_common.DynamicTableRegion( ...
    'table', types.untyped.ObjectView(electrodesDynamicTable), ...
    'description', 'all electrodes', ...
    'data', (0:length(electrodesDynamicTable.id.data)-1)');

% Adding simulated units
num_cells = 172;
spike_times = cell(1, num_cells);
waveform_row = 0;
waveforms_index_index_data = [];
for iShank = 1:num_cells
    spike_times{iShank} = sort( rand(1, randi([500, 2000])), 'ascend');
    waveform_row = waveform_row + size(spike_times{iShank},2);
    waveforms_index_index_data(end+1) = waveform_row;
end

% Convert spike times into a ragged array
[spike_times_vector, spike_times_index] = util.create_indexed_column(spike_times);

% Reformat waveforms and waveform indices
waveforms_all = rand(64, waveform_row)';
waveforms_index_data = uint64((1:size(waveforms_all,1))');
waveforms_index_index_data = uint64(waveforms_index_index_data');


% Raw waveforms
waveforms = types.hdmf_common.VectorData( ...
    'data',        waveforms_all', ...
    'description', 'Spike waveforms; each row is one spike (single electrode)' ...
    );
% Waveform index: one value per spike (assumes one electrode per spike)
waveforms_index = types.hdmf_common.VectorIndex( ...
    'data',        waveforms_index_data', ...
    'target',      types.untyped.ObjectView(waveforms), ...
    'description', 'Index into waveforms; one value per spike event' ...
    );
% Waveform index index: associates waveforms with units
waveforms_index_index = types.hdmf_common.VectorIndex( ...
    'data',        waveforms_index_index_data', ...
    'target',      types.untyped.ObjectView(waveforms_index), ...
    'description', 'Index into waveforms_index; one value per unit' ...
    );

% Initialize the units table
nwb.units = types.core.Units( ...
    'colnames', {'spike_times','waveforms'}, ...
    'description', 'units table', ...
    'spike_times', spike_times_vector, ...
    'spike_times_index', spike_times_index, ...
    'waveforms', waveforms, ...
    'waveforms_index', waveforms_index, ...
    'waveforms_index_index', waveforms_index_index ...
);

%% TESTING
% Run toTable() before saving the .nwb file
tic; nwb.units.toTable(); fprintf('Time to generate units table BEFORE saving: %4.2f seconds\n', toc);

% Save the .nwb file
nwbExport(nwb, 'nwbtest.nwb');

% Read the .nwb file
nwb_test = nwbRead('nwbtest.nwb');

% Run toTable() after reading the saved .nwb file
tic; nwb_test.units.toTable(); fprintf('Time to generate units table AFTER saving and reading: %4.2f seconds\n', toc);

Error Message

Time to generate units table BEFORE saving: 6.20 seconds
Time to generate units table AFTER saving and reading: 1792.40 seconds

Operating System

Windows

Matlab Version

2024b

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions