Skip to content

Integrate tools for final analysis #211

Open
judewerth wants to merge 30 commits intodj-sciops:mainfrom
judewerth:integrate
Open

Integrate tools for final analysis #211
judewerth wants to merge 30 commits intodj-sciops:mainfrom
judewerth:integrate

Conversation

@judewerth
Copy link
Copy Markdown
Contributor

This is a big pull request, I'll send an email with a detailed powerpoint on monday. Essentially I've integrated all of the code from the past few years into the pipeline in an organized way. Additionally I've made notebooks to showcase the new analysis. EXPLORE_single_session.ipynb contains an overview of everything while specific CREATE_ notebooks have functionality for insertion as well as plots.

The largest problem in this pull request is pyproject.toml, I've inserted the new libraries I used but for the ones that require a github link I only pasted from the main repository. The others appear to have a datajoint specific repository so that will need to be resolved.

I am happy to meet and discuss these changes and what needs to be done to get this approved. Once this request is approved I can start processing the analysis for our upcoming paper!

@judewerth
Copy link
Copy Markdown
Contributor Author

@MilagrosMarin I realized you may have not gotten the notification when I sent this pull request. Please let me know your thoughts on this. Thank you!

@MilagrosMarin
Copy link
Copy Markdown
Contributor

Hi @judewerth, thanks for putting this together! It's great to see all the analysis work from the past months integrated into the DataJoint framework. The presentation was very helpful for understanding the overall architecture.

This is a large PR (20K+ additions, 15 files, 7 new table families), so it will take some time to review thoroughly. I'll leave specific comments here as I go. Thanks!

@MilagrosMarin
Copy link
Copy Markdown
Contributor

MilagrosMarin commented Mar 20, 2026

Comment 1: SpikeInterface import path
The change in mua.py reverts the import from spikeinterface.extractors.extractor_classes back to spikeinterface.extractors.extractorlist. This was already addressed in V0.18.0 (see CHANGELOG): extractorlist was renamed to extractor_classes starting from SpikeInterface 0.103.0, and our element-array-ephys repository points SI to git main (version 0.103+).

Additionally, the new LongitudinalSpectralAnalysis class in analysis.py still uses the old extractorlist path.

Both should use extractor_classes (the current module name), consistent with what's on main and in the element-array-ephys:

from spikeinterface.extractors.extractor_classes import (
      recording_extractor_full_dict,
)                                                                                                                                                                                                       

You're likely running an older SI locally; updating to 0.103+ should align things.

@MilagrosMarin
Copy link
Copy Markdown
Contributor

Comment 2: FrameSessionframe_param_idx as foreign key
In frame.py, FrameSession declares frame_param_idx: int with a comment saying "Reference to TimeFrameParamset". Did you intend this to be a foreign key? If so, you can replace it with the -> notation:

 definition = """                                                                                                                                                                                        
 -> culture.Experiment                                     
 start_boundary     : datetime
 end_boundary       : datetime
 -> TimeFrameParamset                                                                                                                                                                                    
 """

This way, DataJoint enforces that the value exists in the lookup table and shows the dependency in the schema graph. FrameAnalysis.make() already fetches from (TimeFrameParamset & key), so the foreign key would make that contract explicit.

@MilagrosMarin
Copy link
Copy Markdown
Contributor

Comment 3: find_active_frames — random fallback issues (with evidence from existing data)
In frame.py, when find_peaks doesn't find enough peaks for num_frames, the fallback fills remaining slots with random indices. Looking at the existing ActiveTimeFrames data, this fallback has already triggered: there are many entries with frame_firing_rate = 0.0 (e.g., O10/param_idx=5 has 3, O12/param_idx=5 has 20+, O13/param_idx=5 has 15+). These are "active" time frames that contain zero activity.

A few specific concerns:

  1. Overlapping frames: O12/param_idx=5 has frames starting at 08:32 and 08:34 — only 2 minutes apart despite min_per_frame=5. The overlap check at line 144 doesn't work correctly when rand_idx < min_per_frame because np.arange produces negative values that won't match in np.isin.
  2. Potential infinite loop: If the recording window doesn't have enough space for num_frames non-overlapping windows, the while loop never terminates. Unlikely with the current 2-day windows but could happen with shorter boundaries.
  3. Scientifically: A table called ActiveTimeFrames shouldn't contain windows with zero firing. Downstream analyses (burst detection, coherence, coupling) would be processing silence.

The simplest fix might be to skip the random fallback entirely; just return the peaks that were found. Downstream code already iterates over whatever ActiveTimeFrames entries exist, so fewer frames would just mean fewer analyses. Do you need a fixed count of frames for a specific reason?

@MilagrosMarin
Copy link
Copy Markdown
Contributor

Comment 4: FrameAnalysis.make() — validation check and schema mismatch

Two things to flag here:

  1. The validation check (lines 66-70) compares mismatched counts:
  spike_rates, start_times, channel_ids = (mua.MUASpikes.Channel & ...).fetch(...)
  num_files = len(ephys.EphysRawFile & ...)
  if len(spike_rates) != num_files:
      raise ValueError(...)

spike_rates comes from MUASpikes.Channel (32 entries per file, one per channel), while num_files counts EphysRawFile entries (1 per file). This would always raise the error since the counts are ~32x apart. If the intent is to verify all files have been processed, the comparison should be against MUASpikes (parent table).

This check hasn't actually run yet; the existing FrameAnalysis data in the database was populated from an earlier version of the code (likely from the work in PR#196).

  1. Schema mismatch with existing database table:
    The FrameAnalysis table in the database has num_processed_sessions and num_available_files columns that don't exist in this PR's definition. If this PR merges and the code tries to interact with the existing table, there will be a mismatch. This will need either a table migration or a drop-and-repopulate. Same question applies to any other tables that may have changed between your earlier development code and this PR.

Can you check which tables in the frame schema need to be reconciled before merging?

@MilagrosMarin
Copy link
Copy Markdown
Contributor

Comment 5: pyproject.toml — new dependencies
The new dependencies don't have version constraints:
"bottleneck",
"neo",
"quantities",
"elephant",

Adding at least the minimum versions you've tested (e.g., "elephant>=0.12.0") would help ensure others can reproduce the same behavior.

Also, you mentioned in the PR description that specparam and tensorpac may need DataJoint-specific forks for some libraries. Could you clarify which ones need to be resolved before merging?

Separately, the EXPLORE_single_session.ipynb notebook imports diptest (from diptest import diptest), which isn't listed as a dependency. Should it be added, or is that notebook meant to be run separately?

@MilagrosMarin
Copy link
Copy Markdown
Contributor

Comment 6: STTFA.make() — silent skip on low spike count
In analysis.py, STTFA.make() returns without inserting when spike_count < 10:

  if spike_count < min_spikes:
      return

Currently there are 363 STTFA entries out of 241,490 keys in key_source (~0.15%). Most electrodes probably don't have enough spikes for this analysis, which means populate() would re-attempt those keys every time it runs, fetching MUA data and checking the count, only to skip again.

Two options to consider:

  • Insert a row with spike_count=0 and empty arrays so the key is marked as done, and downstream code can filter on spike_count > 0
  • Filter low-spike electrodes out in key_source itself, before make() is called — though this might be complex given the MUA data structure

Not urgent since the table is small now, but worth thinking about before scaling to all organoids.

@MilagrosMarin
Copy link
Copy Markdown
Contributor

Comment 7: repeated probe_type lookup pattern (minor)
The same pattern for looking up probe type and mapping channels to electrodes appears 5 times across frame.py, mua.py, and analysis.py. Not blocking, but if you'd like to clean it up, a small helper function would reduce the repetition. Something like:

  def get_electrode_ids(key, channel_ids):
      probe_type = set((ephys.EphysSessionProbe * probe.Probe & key).fetch('probe_type'))
      if len(probe_type) != 1:
          raise ValueError(...)
      return map_channel_to_electrode(probe_type.pop(), input_indices=channel_ids)

@MilagrosMarin
Copy link
Copy Markdown
Contributor

Comment 8: Architecture suggestion: NumElectrodesInside and OrganoidImplantationImage
Currently NumElectrodesInside is a standalone Lookup table keyed by organoid_id with no foreign key connection to the rest of the culture schema. Since the electrode count is determined from the implantation images, consider adding num_electrodes_inside as a secondary attribute on OrganoidImplantationImage as well — populated from the Lookup when the image is inserted:

  class OrganoidImplantationImage(dj.Manual):
      definition = """
      -> Experiment  # Use the Control experiment entry for each organoid
      ---
      organoid_implantation_image: attach
      num_electrodes_inside=null: int  # from NumElectrodesInside lookup
      """

The Lookup stays as the source of truth and downstream analysis tables (FrameAnalysis, PopulationBursts, etc.) keep fetching from it as they do now. But this way the image and its electrode count annotation are stored together, so anyone looking at the implantation image entry also sees the count in one place.

This is a small schema change to culture that's independent from the analysis integration. I can do this as a separate PR to keep things clean.

@MilagrosMarin
Copy link
Copy Markdown
Contributor

Comment 9: LongitudinalSpectralAnalysis — different preprocessing from LFP pipeline
This table reads raw data directly from EphysRawFile and computes band power per 1-minute file. The purpose makes sense: tracking spectral changes across the full recording timeline for developmental analysis, not just framed sessions.

However, the preprocessing differs from the standard ephys.LFP pipeline. Looking at the element-array-ephys LFP.make(), the LFP pipeline applies a 60 Hz notch filter (Q=30) and downsamples to 2,500 Hz with an anti-aliasing FIR filter. LongitudinalSpectralAnalysis instead applies common median referencing but skips the notch filter and runs the PSD at the full 20 kHz sampling rate.

The missing notch filter is the main concern: 60 Hz powerline noise and its harmonics (120, 180 Hz) will show up in the PSD and inflate power estimates in the beta and gamma bands. Since this table is meant for comparing band power across sessions and development, consistent noise removal matters.

Is the omission intentional (e.g., to keep processing fast for ~10K files), or should the notch filter be added? At minimum, it would be worth documenting the preprocessing differences so that band power values from this table aren't compared directly with LFPSpectrogram.ChannelPower.

@MilagrosMarin
Copy link
Copy Markdown
Contributor

Comment 10: frame schema — cleanup needed before merge
The database currently has two parallel sets of tables: the original frame schema (from your earlier development) and the new locations in analysis/mua/culture. Both have data:

  • frame.Coherence (2 entries) vs analysis.Coherence (49 entries)
  • frame.FOOOFAnalysis (5) vs analysis.FOOOFAnalysis (49)
  • frame.STTFA (32) vs analysis.STTFA (363)
  • frame.PopulationBursts (1) vs mua.PopulationBursts (49)
  • frame.PhaseAmplitudeCoupling (1) vs analysis.PhaseAmplitudeCoupling (512)

The frame schema also has development artifacts that aren't in this PR: Testtable, Testcomputed, FOOOFSession (renamed to FOOOFandFBOSCSession), BOSCParamset (renamed to FBOSCParamset), ImpedanceFile,ImpedanceMeasurements, MUATraceSession, STTC.

A few questions to resolve:

  1. Is the data in the old frame tables superseded by the new locations, or does any of it need to be preserved?
  2. Can the old frame tables be dropped after merge, or should that be a separate cleanup PR?
  3. The frame.FrameAnalysis table definition in the database has columns (num_processed_sessions, num_available_files) that don't match this PR's code — will this need a table recreation?

Might be worth adding a cleanup/migration plan to the PR description so we know what to expect after merge.

@MilagrosMarin
Copy link
Copy Markdown
Contributor

Comment 11: PopulationBursts.make() — spike array indexing issues
Lines 25428-25431 have two related issues in how spikes are mapped to the burst spike array:

burst_spike_times = elec_spike_times[((index-num_burst_samples) <= elec_spike_times) & (elec_spike_times <= (index+num_burst_samples))]
burst_spike_indices = (burst_spike_times - (index-num_burst_samples) - 1).astype(int)

The array is sized 2 * num_burst_samples (2000), so valid indices are 0–1999. The <= (index+num_burst_samples) at the upper bound would produce index 2000 (out of bounds). The - 1 at line 25431 was likely added to fix that, but it shifts all indices down: a spike at exactly the window start gets index -1, which numpy wraps to index 1999 (the wrong end of the array).

The existing MUATracePlot and TracePlot both use spike indices without any - 1 offset.

The fix would be to use strict < for the upper bound and remove the - 1:

burst_spike_times = elec_spike_times[((index-num_burst_samples) <= elec_spike_times) & (elec_spike_times < (index+num_burst_samples))]
burst_spike_indices = (burst_spike_times - (index-num_burst_samples)).astype(int)

This affects the spike raster that feeds into the STTC connectivity calculation, so the current functional connectivity values may have some edge artifacts.

@MilagrosMarin
Copy link
Copy Markdown
Contributor

Comment 12: EventBasedCoupling.make() — no guard against empty spikes
After filtering spikes to the matching electrode (line 24719) and removing boundary spikes (line 24729), spike_times could be empty, either because the unit has no spikes on that electrode, or all spikes are within 0.5s of the recording edges.

If empty, lfp_epochs_itc stays as an empty list, np.array([]) creates a 1D array, and line 24748 (lfp_epochs_itc[:, edge:-edge]) fails with IndexError.

A minimum spike count check before the loop would prevent the crash. Same design question as STTFA.make() (comment #6): whether to skip silently, insert with empty values, or raise an error.

@MilagrosMarin
Copy link
Copy Markdown
Contributor

Comment 13: PopulationBursts.make() — division by zero in weighted STTC
Line 25466: weighted_sttc = (sttc_array * weight_array).sum(axis=(1,2)) / weight_array.sum(axis=(1,2))

If a burst has no spikes on any electrode pair, all weights are 0 and the division produces NaN. Since weighted_sttc is a longblob (one value per burst), NaN values would silently propagate through downstream aggregations like np.mean(weighted_sttc) unless explicitly handled with np.nanmean.

Worth deciding whether to store NaN (scientifically meaningful — "connectivity undefined for this burst") or 0.0 (simpler for downstream). Either way, it should be a conscious choice rather than an unguarded division.

@MilagrosMarin
Copy link
Copy Markdown
Contributor

Comment 14: Merge conflicts
The PR branch (judewerth:integrate) is 7 commits behind main and has conflicts. You'll need to update your branch before we can merge. From your fork:

  git fetch upstream
  git checkout integrate
  git merge upstream/main
  # resolve any conflicts
  git push

@MilagrosMarin
Copy link
Copy Markdown
Contributor

Comment 15: Code standards review
A few code hygiene items to clean up before merge:

  1. Commented-out code: analysis.py line 24 has # from .mua import mua , leftover from trying different import approaches. Can be removed since line 25 is the working version.
  2. Unused imports in analysis.py: stats (from scipy.stats), test_stationarity (from tensorpac.stats), and pac_trivec (from tensorpac.utils) are imported but not used in the pipeline code. Removing them keeps the imports clean and avoids unnecessary dependencies at load time.
  3. Missing docstrings: TracePlot (mua.py) and LongitudinalSpectralAnalysis.BandPower (analysis.py) have no docstrings. FOOOFAnalysis has a placeholder docstring ("Docstring for FOOOFandFBOSCAnalysis").
  4. Missing newlines at the end of files: Both frame.py and mua.py are missing the trailing newline.

@MilagrosMarin
Copy link
Copy Markdown
Contributor

Comment 16: Design question: downstream analyses decoupled from frame selection
Looking at your original design document (attached to PR #196), the architecture had all downstream analyses flowing through ActiveTimeFrames:

  • PopulationBursts -> ActiveTimeFrames -> AnalysisBoundaries -> Experiment
  • Coherence -> ActiveTimeFrames -> AnalysisBoundaries -> Experiment

In #211, these dependencies changed:

  • PopulationBursts -> BurstSession -> ephys.EphysSession
  • Coherence -> ephys.LFP

Both now bypass the frame selection entirely. This means coherence and burst detection can run on any session, not just the active time frames identified by FrameAnalysis.

Was this decoupling intentional? The flexibility is nice (you can analyze any session), but it also means there's no enforced link between "this is an active period worth analyzing" and the actual analysis results. In the original design, that connection was built into the dependency graph.

If intentional, it might be worth documenting the reasoning, especially since the frame extraction step (finding active windows based on MUA firing rate) was designed specifically to scope which periods get analyzed.

@judewerth
Copy link
Copy Markdown
Contributor Author

Comment 1: SpikeInterface import path The change in mua.py reverts the import from spikeinterface.extractors.extractor_classes back to spikeinterface.extractors.extractorlist. This was already addressed in V0.18.0 (see CHANGELOG): extractorlist was renamed to extractor_classes starting from SpikeInterface 0.103.0, and our element-array-ephys repository points SI to git main (version 0.103+).

Additionally, the new LongitudinalSpectralAnalysis class in analysis.py still uses the old extractorlist path.

Both should use extractor_classes (the current module name), consistent with what's on main and in the element-array-ephys:

from spikeinterface.extractors.extractor_classes import (
      recording_extractor_full_dict,
)                                                                                                                                                                                                       

You're likely running an older SI locally; updating to 0.103+ should align things.

Comment 1: SpikeInterface import path The change in mua.py reverts the import from spikeinterface.extractors.extractor_classes back to spikeinterface.extractors.extractorlist. This was already addressed in V0.18.0 (see CHANGELOG): extractorlist was renamed to extractor_classes starting from SpikeInterface 0.103.0, and our element-array-ephys repository points SI to git main (version 0.103+).

Additionally, the new LongitudinalSpectralAnalysis class in analysis.py still uses the old extractorlist path.

Both should use extractor_classes (the current module name), consistent with what's on main and in the element-array-ephys:

from spikeinterface.extractors.extractor_classes import (
      recording_extractor_full_dict,
)                                                                                                                                                                                                       

You're likely running an older SI locally; updating to 0.103+ should align things.

This comment has been addressed in commit 1283bd9

PS: Is this how you'd like me to address changes moving forward (quote replying and linking small commits)? If not please let me know what you'd prefer.

@judewerth
Copy link
Copy Markdown
Contributor Author

Comment 2: FrameSessionframe_param_idx as foreign key In frame.py, FrameSession declares frame_param_idx: int with a comment saying "Reference to TimeFrameParamset". Did you intend this to be a foreign key? If so, you can replace it with the -> notation:

 definition = """                                                                                                                                                                                        
 -> culture.Experiment                                     
 start_boundary     : datetime
 end_boundary       : datetime
 -> TimeFrameParamset                                                                                                                                                                                    
 """

This way, DataJoint enforces that the value exists in the lookup table and shows the dependency in the schema graph. FrameAnalysis.make() already fetches from (TimeFrameParamset & key), so the foreign key would make that contract explicit.

Addressed here: 13b94f8

This was a mistake on my end. Good catch!

@judewerth
Copy link
Copy Markdown
Contributor Author

Comment 3: find_active_frames — random fallback issues (with evidence from existing data) In frame.py, when find_peaks doesn't find enough peaks for num_frames, the fallback fills remaining slots with random indices. Looking at the existing ActiveTimeFrames data, this fallback has already triggered: there are many entries with frame_firing_rate = 0.0 (e.g., O10/param_idx=5 has 3, O12/param_idx=5 has 20+, O13/param_idx=5 has 15+). These are "active" time frames that contain zero activity.

A few specific concerns:

  1. Overlapping frames: O12/param_idx=5 has frames starting at 08:32 and 08:34 — only 2 minutes apart despite min_per_frame=5. The overlap check at line 144 doesn't work correctly when rand_idx < min_per_frame because np.arange produces negative values that won't match in np.isin.
  2. Potential infinite loop: If the recording window doesn't have enough space for num_frames non-overlapping windows, the while loop never terminates. Unlikely with the current 2-day windows but could happen with shorter boundaries.
  3. Scientifically: A table called ActiveTimeFrames shouldn't contain windows with zero firing. Downstream analyses (burst detection, coherence, coupling) would be processing silence.

The simplest fix might be to skip the random fallback entirely; just return the peaks that were found. Downstream code already iterates over whatever ActiveTimeFrames entries exist, so fewer frames would just mean fewer analyses. Do you need a fixed count of frames for a specific reason?

Comment 3: find_active_frames — random fallback issues (with evidence from existing data) In frame.py, when find_peaks doesn't find enough peaks for num_frames, the fallback fills remaining slots with random indices. Looking at the existing ActiveTimeFrames data, this fallback has already triggered: there are many entries with frame_firing_rate = 0.0 (e.g., O10/param_idx=5 has 3, O12/param_idx=5 has 20+, O13/param_idx=5 has 15+). These are "active" time frames that contain zero activity.

A few specific concerns:

  1. Overlapping frames: O12/param_idx=5 has frames starting at 08:32 and 08:34 — only 2 minutes apart despite min_per_frame=5. The overlap check at line 144 doesn't work correctly when rand_idx < min_per_frame because np.arange produces negative values that won't match in np.isin.
  2. Potential infinite loop: If the recording window doesn't have enough space for num_frames non-overlapping windows, the while loop never terminates. Unlikely with the current 2-day windows but could happen with shorter boundaries.
  3. Scientifically: A table called ActiveTimeFrames shouldn't contain windows with zero firing. Downstream analyses (burst detection, coherence, coupling) would be processing silence.

The simplest fix might be to skip the random fallback entirely; just return the peaks that were found. Downstream code already iterates over whatever ActiveTimeFrames entries exist, so fewer frames would just mean fewer analyses. Do you need a fixed count of frames for a specific reason?

Addressed in 9033e64 (along with Comment 4)

There isn't a specific region we need to keep the number of active windows consistent. I am ok with removing the random index functionality. I've also included the most recent frame schema that includes the new processed sessions and num files entries.

@judewerth
Copy link
Copy Markdown
Contributor Author

Comment 5: pyproject.toml — new dependencies The new dependencies don't have version constraints: "bottleneck", "neo", "quantities", "elephant",

Adding at least the minimum versions you've tested (e.g., "elephant>=0.12.0") would help ensure others can reproduce the same behavior.

Also, you mentioned in the PR description that specparam and tensorpac may need DataJoint-specific forks for some libraries. Could you clarify which ones need to be resolved before merging?

Separately, the EXPLORE_single_session.ipynb notebook imports diptest (from diptest import diptest), which isn't listed as a dependency. Should it be added, or is that notebook meant to be run separately?

Addressed in 52703cc.

Regarding the repositories that may need a Datajoint fork. I don't know when listing the library name is sufficient or when a github link is needed. TensorPac and specparam are the two largest external repositories I used. Insight into how these should be included in pyproject would be greatly appreciated.

@judewerth
Copy link
Copy Markdown
Contributor Author

Comment 16: Design question: downstream analyses decoupled from frame selection Looking at your original design document (attached to PR #196), the architecture had all downstream analyses flowing through ActiveTimeFrames:

  • PopulationBursts -> ActiveTimeFrames -> AnalysisBoundaries -> Experiment
  • Coherence -> ActiveTimeFrames -> AnalysisBoundaries -> Experiment

In #211, these dependencies changed:

  • PopulationBursts -> BurstSession -> ephys.EphysSession
  • Coherence -> ephys.LFP

Both now bypass the frame selection entirely. This means coherence and burst detection can run on any session, not just the active time frames identified by FrameAnalysis.

Was this decoupling intentional? The flexibility is nice (you can analyze any session), but it also means there's no enforced link between "this is an active period worth analyzing" and the actual analysis results. In the original design, that connection was built into the dependency graph.

If intentional, it might be worth documenting the reasoning, especially since the frame extraction step (finding active windows based on MUA firing rate) was designed specifically to scope which periods get analyzed.

This decoupling was intentional. I didn't like the idea of frame analysis triggering tons of ephys sessions at once. Also, this way the analysis tools can be used outside of the active time frame context.

In the processing case I used an insertion number of 2 for the active time frame analysis (all other entries have an insertion number of 0). For our context, this could be the direct link you were previously referring to. I'm working on my paragraph for figure 3 (and the figure) tomorrow, I'll be sure to address this decision.

@MilagrosMarin
Copy link
Copy Markdown
Contributor

MilagrosMarin commented Mar 26, 2026

Comment 1: SpikeInterface import path The change in mua.py reverts the import from spikeinterface.extractors.extractor_classes back to spikeinterface.extractors.extractorlist. This was already addressed in V0.18.0 (see CHANGELOG): extractorlist was renamed to extractor_classes starting from SpikeInterface 0.103.0, and our element-array-ephys repository points SI to git main (version 0.103+).
Additionally, the new LongitudinalSpectralAnalysis class in analysis.py still uses the old extractorlist path.
Both should use extractor_classes (the current module name), consistent with what's on main and in the element-array-ephys:

from spikeinterface.extractors.extractor_classes import (
      recording_extractor_full_dict,
)                                                                                                                                                                                                       

You're likely running an older SI locally; updating to 0.103+ should align things.

Comment 1: SpikeInterface import path The change in mua.py reverts the import from spikeinterface.extractors.extractor_classes back to spikeinterface.extractors.extractorlist. This was already addressed in V0.18.0 (see CHANGELOG): extractorlist was renamed to extractor_classes starting from SpikeInterface 0.103.0, and our element-array-ephys repository points SI to git main (version 0.103+).
Additionally, the new LongitudinalSpectralAnalysis class in analysis.py still uses the old extractorlist path.
Both should use extractor_classes (the current module name), consistent with what's on main and in the element-array-ephys:

from spikeinterface.extractors.extractor_classes import (
      recording_extractor_full_dict,
)                                                                                                                                                                                                       

You're likely running an older SI locally; updating to 0.103+ should align things.

This comment has been addressed in commit 1283bd9

PS: Is this how you'd like me to address changes moving forward (quote replying and linking small commits)? If not please let me know what you'd prefer.

Feel free to handle it as you prefer, @judewerth . You can directly push all commits addressing the reviewer's suggestions, then leave a comment summarizing your changes and mentioning any unresolved issues or disagreements. It's entirely up to you! For this huge PR, I personally appreciate the individual replies to comments. Thank you!

MilagrosMarin and others added 6 commits March 27, 2026 01:15
…edesign

- analysis.py: move spikeinterface import to lazy import in LongitudinalSpectralAnalysis.make()
- mua.py: move recording_extractor_full_dict to lazy import in _build_si_recording_object()
- analysis.py: restore LFPSpectrogram.key_source (Jude had commented it out)
- analysis.py: replace FOOOFAnalysis spec_param_idx: int with -> SpectrogramParameters FK
- analysis.py: add FOOOFAnalysis.key_source = FOOOFandFBOSCSession * SpectrogramParameters & LFPSpectrogram
- analysis.py: remove two fetch("param_idx")[0] dead lines from FOOOFAnalysis.make()

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ted STTC division by zero

- Move neo, quantities, elephant imports to lazy imports inside PopulationBursts.make()
  (same rationale as ephys_no_curation.py fix: optional heavy deps should not block
  module import on environments without them)
- Fix division by zero in weighted_sttc: use np.where to produce NaN for bursts
  with no spike pairs instead of dividing by zero

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace inline probe_type lookup in TracePlot.make() with get_probe_type(key).
The inline version restricted by organoid_id only; get_probe_type uses the full
session key, which is more precise and consistent with the pattern in FrameAnalysis,
PopulationBursts, and STTFA.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…trodes_inside (Comment 8)

- culture.py: move num_electrodes_inside before attach column in definition
- frame.py: replace culture.NumElectrodesInside fetch with OrganoidImplantationImage fetch
- mua.py: same replacement in PopulationBursts.make()
- Both fetches include a null guard with a clear ValueError (MB05 currently has null)

NumElectrodesInside was removed from culture.py by Jude (commit 28a8cde) but
mua.py and frame.py still referenced it, causing AttributeError at runtime.
OrganoidImplantationImage is the correct source of truth since the count is
determined directly from the implantation image.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ysis (Comment 9)

Existence check must come before fetch — same bug as Issue E in ImpedanceMeasurements.
Previously, if no EphysSessionProbe existed, port_id would be an empty set and
port_id.pop() would raise KeyError instead of the intended ValueError.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Addresses Comment 15 (code standards): part table was missing a class-level
docstring explaining its purpose.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
MilagrosMarin and others added 3 commits March 27, 2026 02:18
get_probe_type(key) uses the full key including start_time from MUAEphysSession
(a 1-minute file window), which doesn't match EphysSession.start_time (a full
session boundary). This would cause all future TracePlot computations to fail
with a ValueError. Revert to the original organoid_id-only restriction, which
is correct for this table's key lineage.

get_probe_type remains in use in PopulationBursts.make() where the key correctly
inherits from EphysSession, making the start_time match valid.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ationImage

The previous pattern called fetch1() then checked for None, but fetch1() raises
a DataJointError (not None) when no row exists — making the None guard unreachable
in the missing-row case. Also raises DataJointError if multiple rows exist.

Replace with explicit existence and count checks before fetch1, so all three
failure modes produce clear, actionable ValueErrors.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add `electrode_ids >= 0` to the elec_bool filter in PopulationBursts.make()
(mua.py) and create_population_firing_vector() (frame.py).

map_channel_to_electrode now raises ValueError for unmapped channels, but
the >= 0 guard here is defensive coding: prevents -1 from silently passing
the `< num_elec_inside` test and contaminating population spike rate or
firing vector calculations if an unmapped index somehow reaches this point.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@MilagrosMarin
Copy link
Copy Markdown
Contributor

Comment 13: PopulationBursts.make() — division by zero in weighted STTC Line 25466: weighted_sttc = (sttc_array * weight_array).sum(axis=(1,2)) / weight_array.sum(axis=(1,2))
If a burst has no spikes on any electrode pair, all weights are 0 and the division produces NaN. Since weighted_sttc is a longblob (one value per burst), NaN values would silently propagate through downstream aggregations like np.mean(weighted_sttc) unless explicitly handled with np.nanmean.
Worth deciding whether to store NaN (scientifically meaningful — "connectivity undefined for this burst") or 0.0 (simpler for downstream). Either way, it should be a conscious choice rather than an unguarded division.

I don't believe it's possible for a burst to have no spikes on any electrode pair (by how we defined the burst times). If it did, I think it makes the most sense to leave it as a NaN and have it be addressed in later analysis.

Although I don't feel super strongly about NaN vs 0, if you prefer to go a different direction.

NaN explicitly signals "no connectivity data for this burst" rather than 0 (which would imply zero connectivity, a false claim).

MilagrosMarin and others added 2 commits March 27, 2026 04:29
…ng 2500

Coherence.make() was hardcoding fs=2500, causing the bandpass filter
cutoffs and coherence frequency axis to be wrong if the LFP sampling
rate differs. Fetch from ephys.LFP.lfp_sampling_rate like LFPSpectrogram
already does.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Spike indices in MUASpikes are stored at Intan acquisition rate (20000 Hz).
Changing acquisition systems would require repopulating MUASpikes, so the
hardcoded value is intentional for this project's data.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@MilagrosMarin MilagrosMarin self-requested a review March 27, 2026 03:37
MilagrosMarin and others added 2 commits March 27, 2026 04:46
…tudinalSpectralAnalysis

Both tables already use self.update1() to set the real value after computation;
the placeholder comment was misleading.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rror

frame_bounds[1] = frame_idx + 1, so a peak at the last time vector index
produces an out-of-bounds NumPy index. Guard both edges symmetrically.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@MilagrosMarin
Copy link
Copy Markdown
Contributor

[Review comment — WARNING #6] BurstSession depends on ephys.EphysSession instead of FrameAnalysis.ActiveTimeFrames

BurstSession is currently defined as:

class BurstSession(dj.Manual):
    definition = """ 
    -> ephys.EphysSession
    -> BurstDetectionParamset
    """

PopulationBursts.make() then fetches MUA data using key['start_time'] and key['end_time'] from EphysSession. This means a user can manually trigger burst detection on any arbitrary ephys session.

The intended workflow appears to be:

  1. FrameAnalysis.ActiveTimeFrames identifies the scientifically validated most-active windows (by firing rate)
  2. Burst detection runs on those pre-selected active frames — not on arbitrary sessions

Suggestion: Consider changing BurstSession to depend on FrameAnalysis.ActiveTimeFrames instead:

class BurstSession(dj.Manual):
    definition = """ 
    -> frame.FrameAnalysis.ActiveTimeFrames
    -> BurstDetectionParamset
    """

make() would then use frame_start/frame_end from ActiveTimeFrames instead of start_time/end_time from EphysSession.

This is an architectural/scientific decision — it enforces that burst detection only runs on frames that passed the activity threshold. Worth discussing with the team before merging.

@MilagrosMarin
Copy link
Copy Markdown
Contributor

[Review comment — WARNING #7] Coherence has no key_source — will populate for all LFP sessions

Coherence currently depends only on -> ephys.LFP with no key_source. A populate() call will trigger coherence computation for every LFP session in the database, which will become very expensive as the dataset grows.

The intended behavior is presumably to compute coherence only for scientifically selected sessions (e.g., active time frames from FrameAnalysis.ActiveTimeFrames).

Minimum viable fix: restrict to organoids that have gone through frame selection:

@property
def key_source(self):
    return ephys.LFP & frame.FrameAnalysis

More precise fix (follow-up): restrict to LFP sessions that overlap with a specific ActiveTimeFrame — requires a cross-temporal join that can't be expressed as a simple DataJoint restriction, so leaving for a future refactor.

Worth adding the simple key_source before merging to avoid accidentally populating the full LFP table.

@MilagrosMarin
Copy link
Copy Markdown
Contributor

Comment 5: pyproject.toml — new dependencies The new dependencies don't have version constraints: "bottleneck", "neo", "quantities", "elephant",
Adding at least the minimum versions you've tested (e.g., "elephant>=0.12.0") would help ensure others can reproduce the same behavior.
Also, you mentioned in the PR description that specparam and tensorpac may need DataJoint-specific forks for some libraries. Could you clarify which ones need to be resolved before merging?
Separately, the EXPLORE_single_session.ipynb notebook imports diptest (from diptest import diptest), which isn't listed as a dependency. Should it be added, or is that notebook meant to be run separately?

Addressed in 52703cc.

Regarding the repositories that may need a Datajoint fork. I don't know when listing the library name is sufficient or when a github link is needed. TensorPac and specparam are the two largest external repositories I used. Insight into how these should be included in pyproject would be greatly appreciated.

Both tensorpac and specparam are on PyPI, so the package name alone is sufficient; no GitHub URL needed. The main thing to add is a minimum version constraint so the environment is reproducible. Something like:
"tensorpac>=0.6",
"specparam>=1.0",
Use whatever minimum versions you've tested against. GitHub URLs are only needed for packages that aren't on PyPI (e.g., internal forks or pre-release packages installed directly from a repo).

@MilagrosMarin
Copy link
Copy Markdown
Contributor

Comment 9: LongitudinalSpectralAnalysis — different preprocessing from LFP pipeline This table reads raw data directly from EphysRawFile and computes band power per 1-minute file. The purpose makes sense: tracking spectral changes across the full recording timeline for developmental analysis, not just framed sessions.
However, the preprocessing differs from the standard ephys.LFP pipeline. Looking at the element-array-ephys LFP.make(), the LFP pipeline applies a 60 Hz notch filter (Q=30) and downsamples to 2,500 Hz with an anti-aliasing FIR filter. LongitudinalSpectralAnalysis instead applies common median referencing but skips the notch filter and runs the PSD at the full 20 kHz sampling rate.
The missing notch filter is the main concern: 60 Hz powerline noise and its harmonics (120, 180 Hz) will show up in the PSD and inflate power estimates in the beta and gamma bands. Since this table is meant for comparing band power across sessions and development, consistent noise removal matters.
Is the omission intentional (e.g., to keep processing fast for ~10K files), or should the notch filter be added? At minimum, it would be worth documenting the preprocessing differences so that band power values from this table aren't compared directly with LFPSpectrogram.ChannelPower.

Addressed c0f58d4

The missing filter was not intentional, I built the previous code mirroring the mua pipeline which caused the disconnect. Additionally, it may be worth changing the ephys.Trace code to match this structure (using spikeinterface). The current version uses intanrhdreader which will only work for the rhd system (we've recently changed to an rhs system)

Aware of the RHS issue, it's on the roadmap, and we should create a separate PR for it. This PR is already very large!

@judewerth
Copy link
Copy Markdown
Contributor Author

[Review comment — WARNING #6] BurstSession depends on ephys.EphysSession instead of FrameAnalysis.ActiveTimeFrames

BurstSession is currently defined as:

class BurstSession(dj.Manual):
    definition = """ 
    -> ephys.EphysSession
    -> BurstDetectionParamset
    """

PopulationBursts.make() then fetches MUA data using key['start_time'] and key['end_time'] from EphysSession. This means a user can manually trigger burst detection on any arbitrary ephys session.

The intended workflow appears to be:

  1. FrameAnalysis.ActiveTimeFrames identifies the scientifically validated most-active windows (by firing rate)
  2. Burst detection runs on those pre-selected active frames — not on arbitrary sessions

Suggestion: Consider changing BurstSession to depend on FrameAnalysis.ActiveTimeFrames instead:

class BurstSession(dj.Manual):
    definition = """ 
    -> frame.FrameAnalysis.ActiveTimeFrames
    -> BurstDetectionParamset
    """

make() would then use frame_start/frame_end from ActiveTimeFrames instead of start_time/end_time from EphysSession.

This is an architectural/scientific decision — it enforces that burst detection only runs on frames that passed the activity threshold. Worth discussing with the team before merging.

I agree that it's worth discussing with the entire team.

Personally, I believe EphysSession is the best choice. I like the idea of frame not automatically triggering downstream analysis and simply identifying active windows. Also, PopulationBurst analysis can potentially span several days if the user would like (unlinke coherence, spike sorting, PAC, etc). This format gives flexibility if and when individual analysis types are used. (The same point can be made for every other analysis niche within the pipeline).

With that said, if the team thinks it would be better to automate these analysis tools with the frame schema, I don't mind switching to that structure.

@judewerth
Copy link
Copy Markdown
Contributor Author

[Review comment — WARNING #7] Coherence has no key_source — will populate for all LFP sessions

Coherence currently depends only on -> ephys.LFP with no key_source. A populate() call will trigger coherence computation for every LFP session in the database, which will become very expensive as the dataset grows.

The intended behavior is presumably to compute coherence only for scientifically selected sessions (e.g., active time frames from FrameAnalysis.ActiveTimeFrames).

Minimum viable fix: restrict to organoids that have gone through frame selection:

@property
def key_source(self):
    return ephys.LFP & frame.FrameAnalysis

More precise fix (follow-up): restrict to LFP sessions that overlap with a specific ActiveTimeFrame — requires a cross-temporal join that can't be expressed as a simple DataJoint restriction, so leaving for a future refactor.

Worth adding the simple key_source before merging to avoid accidentally populating the full LFP table.

[Review comment — WARNING #7] Coherence has no key_source — will populate for all LFP sessions

Coherence currently depends only on -> ephys.LFP with no key_source. A populate() call will trigger coherence computation for every LFP session in the database, which will become very expensive as the dataset grows.

The intended behavior is presumably to compute coherence only for scientifically selected sessions (e.g., active time frames from FrameAnalysis.ActiveTimeFrames).

Minimum viable fix: restrict to organoids that have gone through frame selection:

@property
def key_source(self):
    return ephys.LFP & frame.FrameAnalysis

More precise fix (follow-up): restrict to LFP sessions that overlap with a specific ActiveTimeFrame — requires a cross-temporal join that can't be expressed as a simple DataJoint restriction, so leaving for a future refactor.

Worth adding the simple key_source before merging to avoid accidentally populating the full LFP table.

Addressed 848b9c9

@judewerth
Copy link
Copy Markdown
Contributor Author

Comment 5: pyproject.toml — new dependencies The new dependencies don't have version constraints: "bottleneck", "neo", "quantities", "elephant",
Adding at least the minimum versions you've tested (e.g., "elephant>=0.12.0") would help ensure others can reproduce the same behavior.
Also, you mentioned in the PR description that specparam and tensorpac may need DataJoint-specific forks for some libraries. Could you clarify which ones need to be resolved before merging?
Separately, the EXPLORE_single_session.ipynb notebook imports diptest (from diptest import diptest), which isn't listed as a dependency. Should it be added, or is that notebook meant to be run separately?

Addressed in 52703cc.
Regarding the repositories that may need a Datajoint fork. I don't know when listing the library name is sufficient or when a github link is needed. TensorPac and specparam are the two largest external repositories I used. Insight into how these should be included in pyproject would be greatly appreciated.

Both tensorpac and specparam are on PyPI, so the package name alone is sufficient; no GitHub URL needed. The main thing to add is a minimum version constraint so the environment is reproducible. Something like: "tensorpac>=0.6", "specparam>=1.0", Use whatever minimum versions you've tested against. GitHub URLs are only needed for packages that aren't on PyPI (e.g., internal forks or pre-release packages installed directly from a repo).

Comment 5: pyproject.toml — new dependencies The new dependencies don't have version constraints: "bottleneck", "neo", "quantities", "elephant",
Adding at least the minimum versions you've tested (e.g., "elephant>=0.12.0") would help ensure others can reproduce the same behavior.
Also, you mentioned in the PR description that specparam and tensorpac may need DataJoint-specific forks for some libraries. Could you clarify which ones need to be resolved before merging?
Separately, the EXPLORE_single_session.ipynb notebook imports diptest (from diptest import diptest), which isn't listed as a dependency. Should it be added, or is that notebook meant to be run separately?

Addressed in 52703cc.
Regarding the repositories that may need a Datajoint fork. I don't know when listing the library name is sufficient or when a github link is needed. TensorPac and specparam are the two largest external repositories I used. Insight into how these should be included in pyproject would be greatly appreciated.

Both tensorpac and specparam are on PyPI, so the package name alone is sufficient; no GitHub URL needed. The main thing to add is a minimum version constraint so the environment is reproducible. Something like: "tensorpac>=0.6", "specparam>=1.0", Use whatever minimum versions you've tested against. GitHub URLs are only needed for packages that aren't on PyPI (e.g., internal forks or pre-release packages installed directly from a repo).

Addressed 3cc4456

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants