Integrate tools for final analysis #211
Conversation
|
@MilagrosMarin I realized you may have not gotten the notification when I sent this pull request. Please let me know your thoughts on this. Thank you! |
|
Hi @judewerth, thanks for putting this together! It's great to see all the analysis work from the past months integrated into the DataJoint framework. The presentation was very helpful for understanding the overall architecture. This is a large PR (20K+ additions, 15 files, 7 new table families), so it will take some time to review thoroughly. I'll leave specific comments here as I go. Thanks! |
|
Comment 1: SpikeInterface import path Additionally, the new Both should use from spikeinterface.extractors.extractor_classes import (
recording_extractor_full_dict,
) You're likely running an older SI locally; updating to 0.103+ should align things. |
|
Comment 2: This way, DataJoint enforces that the value exists in the lookup table and shows the dependency in the schema graph. |
|
Comment 3: find_active_frames — random fallback issues (with evidence from existing data) A few specific concerns:
The simplest fix might be to skip the random fallback entirely; just return the peaks that were found. Downstream code already iterates over whatever |
|
Comment 4: Two things to flag here:
This check hasn't actually run yet; the existing
Can you check which tables in the frame schema need to be reconciled before merging? |
|
Comment 5: pyproject.toml — new dependencies Adding at least the minimum versions you've tested (e.g., "elephant>=0.12.0") would help ensure others can reproduce the same behavior. Also, you mentioned in the PR description that Separately, the |
|
Comment 6: STTFA.make() — silent skip on low spike count Currently there are 363 STTFA entries out of 241,490 keys in Two options to consider:
Not urgent since the table is small now, but worth thinking about before scaling to all organoids. |
|
Comment 7: repeated |
|
Comment 8: Architecture suggestion: The Lookup stays as the source of truth and downstream analysis tables (FrameAnalysis, PopulationBursts, etc.) keep fetching from it as they do now. But this way the image and its electrode count annotation are stored together, so anyone looking at the implantation image entry also sees the count in one place. This is a small schema change to culture that's independent from the analysis integration. I can do this as a separate PR to keep things clean. |
|
Comment 9: LongitudinalSpectralAnalysis — different preprocessing from LFP pipeline However, the preprocessing differs from the standard The missing notch filter is the main concern: 60 Hz powerline noise and its harmonics (120, 180 Hz) will show up in the PSD and inflate power estimates in the beta and gamma bands. Since this table is meant for comparing band power across sessions and development, consistent noise removal matters. Is the omission intentional (e.g., to keep processing fast for ~10K files), or should the notch filter be added? At minimum, it would be worth documenting the preprocessing differences so that band power values from this table aren't compared directly with |
|
Comment 10: frame schema — cleanup needed before merge
The frame schema also has development artifacts that aren't in this PR: A few questions to resolve:
Might be worth adding a cleanup/migration plan to the PR description so we know what to expect after merge. |
|
Comment 11: PopulationBursts.make() — spike array indexing issues The array is sized 2 * num_burst_samples (2000), so valid indices are 0–1999. The <= (index+num_burst_samples) at the upper bound would produce index 2000 (out of bounds). The - 1 at line 25431 was likely added to fix that, but it shifts all indices down: a spike at exactly the window start gets index -1, which numpy wraps to index 1999 (the wrong end of the array). The existing The fix would be to use strict < for the upper bound and remove the - 1: This affects the spike raster that feeds into the STTC connectivity calculation, so the current functional connectivity values may have some edge artifacts. |
|
Comment 12: EventBasedCoupling.make() — no guard against empty spikes If empty, A minimum spike count check before the loop would prevent the crash. Same design question as STTFA.make() (comment #6): whether to skip silently, insert with empty values, or raise an error. |
|
Comment 13: PopulationBursts.make() — division by zero in weighted STTC If a burst has no spikes on any electrode pair, all weights are 0 and the division produces NaN. Since Worth deciding whether to store NaN (scientifically meaningful — "connectivity undefined for this burst") or 0.0 (simpler for downstream). Either way, it should be a conscious choice rather than an unguarded division. |
|
Comment 14: Merge conflicts |
|
Comment 15: Code standards review
|
|
Comment 16: Design question: downstream analyses decoupled from frame selection
In #211, these dependencies changed:
Both now bypass the frame selection entirely. This means coherence and burst detection can run on any session, not just the active time frames identified by Was this decoupling intentional? The flexibility is nice (you can analyze any session), but it also means there's no enforced link between "this is an active period worth analyzing" and the actual analysis results. In the original design, that connection was built into the dependency graph. If intentional, it might be worth documenting the reasoning, especially since the frame extraction step (finding active windows based on MUA firing rate) was designed specifically to scope which periods get analyzed. |
This comment has been addressed in commit 1283bd9 PS: Is this how you'd like me to address changes moving forward (quote replying and linking small commits)? If not please let me know what you'd prefer. |
Addressed here: 13b94f8 This was a mistake on my end. Good catch! |
…ks. Also resolves frame version issue.
Addressed in 9033e64 (along with Comment 4) There isn't a specific region we need to keep the number of active windows consistent. I am ok with removing the random index functionality. I've also included the most recent frame schema that includes the new processed sessions and num files entries. |
Addressed in 52703cc. Regarding the repositories that may need a Datajoint fork. I don't know when listing the library name is sufficient or when a github link is needed. TensorPac and specparam are the two largest external repositories I used. Insight into how these should be included in pyproject would be greatly appreciated. |
This decoupling was intentional. I didn't like the idea of frame analysis triggering tons of ephys sessions at once. Also, this way the analysis tools can be used outside of the active time frame context. In the processing case I used an insertion number of 2 for the active time frame analysis (all other entries have an insertion number of 0). For our context, this could be the direct link you were previously referring to. I'm working on my paragraph for figure 3 (and the figure) tomorrow, I'll be sure to address this decision. |
Feel free to handle it as you prefer, @judewerth . You can directly push all commits addressing the reviewer's suggestions, then leave a comment summarizing your changes and mentioning any unresolved issues or disagreements. It's entirely up to you! For this huge PR, I personally appreciate the individual replies to comments. Thank you! |
…edesign
- analysis.py: move spikeinterface import to lazy import in LongitudinalSpectralAnalysis.make()
- mua.py: move recording_extractor_full_dict to lazy import in _build_si_recording_object()
- analysis.py: restore LFPSpectrogram.key_source (Jude had commented it out)
- analysis.py: replace FOOOFAnalysis spec_param_idx: int with -> SpectrogramParameters FK
- analysis.py: add FOOOFAnalysis.key_source = FOOOFandFBOSCSession * SpectrogramParameters & LFPSpectrogram
- analysis.py: remove two fetch("param_idx")[0] dead lines from FOOOFAnalysis.make()
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ted STTC division by zero - Move neo, quantities, elephant imports to lazy imports inside PopulationBursts.make() (same rationale as ephys_no_curation.py fix: optional heavy deps should not block module import on environments without them) - Fix division by zero in weighted_sttc: use np.where to produce NaN for bursts with no spike pairs instead of dividing by zero Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace inline probe_type lookup in TracePlot.make() with get_probe_type(key). The inline version restricted by organoid_id only; get_probe_type uses the full session key, which is more precise and consistent with the pattern in FrameAnalysis, PopulationBursts, and STTFA. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…trodes_inside (Comment 8) - culture.py: move num_electrodes_inside before attach column in definition - frame.py: replace culture.NumElectrodesInside fetch with OrganoidImplantationImage fetch - mua.py: same replacement in PopulationBursts.make() - Both fetches include a null guard with a clear ValueError (MB05 currently has null) NumElectrodesInside was removed from culture.py by Jude (commit 28a8cde) but mua.py and frame.py still referenced it, causing AttributeError at runtime. OrganoidImplantationImage is the correct source of truth since the count is determined directly from the implantation image. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ysis (Comment 9) Existence check must come before fetch — same bug as Issue E in ImpedanceMeasurements. Previously, if no EphysSessionProbe existed, port_id would be an empty set and port_id.pop() would raise KeyError instead of the intended ValueError. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Addresses Comment 15 (code standards): part table was missing a class-level docstring explaining its purpose. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
get_probe_type(key) uses the full key including start_time from MUAEphysSession (a 1-minute file window), which doesn't match EphysSession.start_time (a full session boundary). This would cause all future TracePlot computations to fail with a ValueError. Revert to the original organoid_id-only restriction, which is correct for this table's key lineage. get_probe_type remains in use in PopulationBursts.make() where the key correctly inherits from EphysSession, making the start_time match valid. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ationImage The previous pattern called fetch1() then checked for None, but fetch1() raises a DataJointError (not None) when no row exists — making the None guard unreachable in the missing-row case. Also raises DataJointError if multiple rows exist. Replace with explicit existence and count checks before fetch1, so all three failure modes produce clear, actionable ValueErrors. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add `electrode_ids >= 0` to the elec_bool filter in PopulationBursts.make() (mua.py) and create_population_firing_vector() (frame.py). map_channel_to_electrode now raises ValueError for unmapped channels, but the >= 0 guard here is defensive coding: prevents -1 from silently passing the `< num_elec_inside` test and contaminating population spike rate or firing vector calculations if an unmapped index somehow reaches this point. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
NaN explicitly signals "no connectivity data for this burst" rather than 0 (which would imply zero connectivity, a false claim). |
…ng 2500 Coherence.make() was hardcoding fs=2500, causing the bandpass filter cutoffs and coherence frequency axis to be wrong if the LFP sampling rate differs. Fetch from ephys.LFP.lfp_sampling_rate like LFPSpectrogram already does. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Spike indices in MUASpikes are stored at Intan acquisition rate (20000 Hz). Changing acquisition systems would require repopulating MUASpikes, so the hardcoded value is intentional for this project's data. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tudinalSpectralAnalysis Both tables already use self.update1() to set the real value after computation; the placeholder comment was misleading. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rror frame_bounds[1] = frame_idx + 1, so a peak at the last time vector index produces an out-of-bounds NumPy index. Guard both edges symmetrically. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
[Review comment — WARNING #6]
class BurstSession(dj.Manual):
definition = """
-> ephys.EphysSession
-> BurstDetectionParamset
"""
The intended workflow appears to be:
Suggestion: Consider changing class BurstSession(dj.Manual):
definition = """
-> frame.FrameAnalysis.ActiveTimeFrames
-> BurstDetectionParamset
"""
This is an architectural/scientific decision — it enforces that burst detection only runs on frames that passed the activity threshold. Worth discussing with the team before merging. |
|
[Review comment — WARNING #7]
The intended behavior is presumably to compute coherence only for scientifically selected sessions (e.g., active time frames from Minimum viable fix: restrict to organoids that have gone through frame selection: @property
def key_source(self):
return ephys.LFP & frame.FrameAnalysisMore precise fix (follow-up): restrict to LFP sessions that overlap with a specific Worth adding the simple |
Both tensorpac and specparam are on PyPI, so the package name alone is sufficient; no GitHub URL needed. The main thing to add is a minimum version constraint so the environment is reproducible. Something like: |
Aware of the RHS issue, it's on the roadmap, and we should create a separate PR for it. This PR is already very large! |
I agree that it's worth discussing with the entire team. Personally, I believe EphysSession is the best choice. I like the idea of frame not automatically triggering downstream analysis and simply identifying active windows. Also, PopulationBurst analysis can potentially span several days if the user would like (unlinke coherence, spike sorting, PAC, etc). This format gives flexibility if and when individual analysis types are used. (The same point can be made for every other analysis niche within the pipeline). With that said, if the team thinks it would be better to automate these analysis tools with the frame schema, I don't mind switching to that structure. |
Addressed 848b9c9 |
Addressed 3cc4456 |
Review fixes for PR dj-sciops#211 (integrate branch)
This is a big pull request, I'll send an email with a detailed powerpoint on monday. Essentially I've integrated all of the code from the past few years into the pipeline in an organized way. Additionally I've made notebooks to showcase the new analysis. EXPLORE_single_session.ipynb contains an overview of everything while specific CREATE_ notebooks have functionality for insertion as well as plots.
The largest problem in this pull request is pyproject.toml, I've inserted the new libraries I used but for the ones that require a github link I only pasted from the main repository. The others appear to have a datajoint specific repository so that will need to be resolved.
I am happy to meet and discuss these changes and what needs to be done to get this approved. Once this request is approved I can start processing the analysis for our upcoming paper!