GLMsingle With Mixed-Duration Within-Trial Events #207

Brady-RT-Roberts · 2026-04-30T23:33:50Z

Brady-RT-Roberts
Apr 30, 2026

Hello,

First of all thank you for making such a comprehensive yet easy to use package. I've been reading some prior threads and I still have a question about how to best use stimdur when there are different trial timings within a run.

I am using GLMsingle on fMRI data in which each trial contains two adjacent sub-epochs with different durations. During memory encoding, participants first see a word for 2 seconds (e.g., dollar) and then a symbol for 5 seconds (e.g., $ ). During the memory retrieval test later in the same run, there is a 2-second recognition phase for the symbol (e.g., $) followed by a 5-second recall phase where the participant speaks aloud the word (e.g., "dollar").

It is important to note that there is no ISI between word/symbol at encoding nor between recognition/recall at retrieval. That is what I'm trying my best to tease apart now after the fact. I've tried using LSA and LSS methods with some minor success, but I'm hoping GLMsingle will prove to be more effective.

Because GLMsingle assumes a single stimulus duration within a fit, I could not model all of these epochs together in one run. Instead, I ran GLMsingle twice. In the first model, I used stimdur = 2 and treated encoding_word and recognition_symbol as the primary modeled conditions (each as a separate regressor), while encoding_symbol and recall_word were modeled together as nuisance regressors. In the second model, I did the opposite, using stimdur = 5 such that now encoding_symbol and recall_word were the primary modeled conditions, while encoding_word and recognition_symbol were included as nuisance regressors. I then combined the resulting single-trial betas, interleaving them into a single 4D NIFTI files such that that the 2-second epochs came from the 2-second GLMsingle fit and the 5-second epochs came from the 5-second GLMsingle fit but the epochs were now back in their original trial order.

My question is whether this is a proper and valid use of GLMsingle. More specifically, is it methodologically defensible to estimate different epochs sub-trial types in separate GLMsingle fits based on their duration, while including the other-duration epochs as nuisance regressors in the model? Or does the fact that the different epoch types are estimated in different GLMsingle models create a substantial validity or interpretability problem?

Answered by kendrickkay

May 20, 2026

Wow, you have taken a very comprehensive and thoughtful approach and covered many angles.

However, my main reaction is that of the four columns in your table, only the first column is trustable. And in fact, even the first column is not necessarily foolproof since it only assesses reliability (but does not assess bias). (E.g. see https://doi.org/10.1371/journal.pone.0270895).

For columns two through four, I understand the underlying motivation, but in general, I would suggest to be very very cautious about putting too much stake into your expectations/priors. If wrong, these can easily lead you astray.

At this point, the issues are quite complex and it is difficult to say much concretely.…

View full answer

kendrickkay · 2026-05-02T15:58:19Z

kendrickkay
May 2, 2026
Maintainer

Hi, thanks for the clear exposition. A few comments: - Note that the 2 s vs 5 s distinction, although it is "baked in and designed" from the experimenter's point of view, does not necessarily correspond to what neural activity durations are actually like in the brain. For example, during the 5 s period, it is in theory possible that most neural activity occurs within the first 2 s and then perhaps it drops to baseline after that. If that were the case, then the appropriate HRF would actually be a "2-s" HRF. Of course, in practice, it is hard to know what is the right thing to do (since we don't directly see the neural activity). - When you, for example, "encoding_symbol and recall_word were modeled together as nuisance regressors" --> I am curious, do you use two separate nuisance regressors? Or do you use a single nuisance regressor (and thereby assume that these two event types give rise to the same amplitude of BOLD activity)? - The approach you sketch out is clever. In principle, there is nothing intrinsically wrong with the approach. Ultimately, the question reduces to: is the model you are proposing of the data a reasonable one? That question is, in a sense, outside the scope of GLMsingle since it does whatever you tell it to. Note that how you handle the design of the nuisance regressors is important, since that impinges on whether the model you are proposing is a correct/accurate one. If you use a (wildly inaccurate) model, then you risk having beta weight estimates that are not very meaningful. - Note that interpreting the beta weight magnitudes is slightly tricky. In the two separate calls to GLMsingle, you are using an HRF that is forced to peak at 1 (due to how GLMsingle is designed). So, in both cases, the betas are to be interpreted as %BOLD change for the amplitude reached (even though the two separate calls actually involve potentially a different timecourse shape). - Presumably, you should look at some sample beta estimates and try to see if things look "sane" across the two calls. It's hard to predict a priori how things will turn out.

…

On Apr 30, 2026, at 6:34 PM, Brady-RT-Roberts ***@***.***> wrote: Hello, First of all thank you for making such a comprehensive yet easy to use package. I've been reading some prior threads and I still have a question about how to best use stimdur when there are different trial timings within a run. I am using GLMsingle on fMRI data in which each trial contains two adjacent sub-epochs with different durations. During memory encoding, participants first see a word for 2 seconds (e.g., dollar) and then a symbol for 5 seconds (e.g., $ ). During the memory retrieval test later in the same run, there is a 2-second recognition phase for the symbol (e.g., $) followed by a 5-second recall phase where the participant speaks aloud the word (e.g., "dollar"). It is important to note that there is no ISI between word/symbol at encoding nor between recognition/recall at retrieval. That is what I'm trying my best to tease apart now after the fact. I've tried using LSA and LSS methods with some minor success, but I'm hoping GLMsingle will prove to be more effective. Because GLMsingle assumes a single stimulus duration within a fit, I could not model all of these epochs together in one run. Instead, I ran GLMsingle twice. In the first model, I used stimdur = 2 and treated encoding_word and recognition_symbol as the primary modeled conditions (each as a separate regressor), while encoding_symbol and recall_word were modeled together as nuisance regressors. In the second model, I did the opposite, using stimdur = 5 such that now encoding_symbol and recall_word were the primary modeled conditions, while encoding_word and recognition_symbol were included as nuisance regressors. I then combined the resulting single-trial betas, interleaving them into a single 4D NIFTI files such that that the 2-second epochs came from the 2-second GLMsingle fit and the 5-second epochs came from the 5-second GLMsingle fit but the epochs were now back in their original trial order. My question is whether this is a proper and valid use of GLMsingle. More specifically, is it methodologically defensible to estimate different epochs sub-trial types in separate GLMsingle fits based on their duration, while including the other-duration epochs as nuisance regressors in the model? Or does the fact that the different epoch types are estimated in different GLMsingle models create a substantial validity or interpretability problem? Slide1.PNG (view on web) <https://github.com/user-attachments/assets/7e82b395-ec90-4edd-bc62-b7c406684130> Slide2.PNG (view on web) <https://github.com/user-attachments/assets/c9181d84-d92a-446d-8713-47581d154ea4> — Reply to this email directly, view it on GitHub <#207>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAU2DETUKYAVLE7O7WIAFAT4YPPHJAVCNFSM6AAAAACYM4NLHSVHI2DSMVQWIX3LMV43ERDJONRXK43TNFXW4OZZHE4DKNBQGY>. You are receiving this because you are subscribed to this thread.

1 reply

Brady-RT-Roberts May 4, 2026
Author

Thanks, this is very helpful.

To clarify one implementation detail: in my setup, the nuisance events were modeled as two separate regressors, not pooled into a single nuisance regressor. So in the stimdur = 2 model, encoding_symbol and recall_word were each given their own nuisance column, and in the stimdur = 5 model, encoding_word and recognition_symbol were each given their own nuisance column.

The main thing I am trying to understand is the tradeoff between one model versus two models. In this design, would you expect the bigger risk to come from splitting the different epoch lengths across two separate GLMsingle fits, or from using a single unified fit that may misspecify the duration and therefore the effective HRF shape for some of the event types? That seems like the central tradeoff in my case.

Your first point is also well-taken. It could be the case that even for the stimdur = 5 model, the bulk of the BOLD activity is unfolding in a similar timecourse as the stimdur = 2 model and therefore 2 models isn't necessary. Based on this point and the tradeoff I mentioned above, I think I'll try one model vs two models and see which is better.

On that note then, do you have a go-to metric that you personally find useful for evaluating whether trial betas are “clean”, or how well two adjacent trials or sub-trial epochs are being successfully separated? For example, do you tend to look at split-half reliability, representational similarity structure, decoding performance, or some other diagnostic when judging whether a beta-estimation scheme is doing a good job?

kendrickkay · 2026-05-05T14:15:04Z

kendrickkay
May 5, 2026
Maintainer

The main thing I am trying to understand is the tradeoff between one model versus two models. In this design, would you expect the bigger risk to come from splitting the different epoch lengths across two separate GLMsingle fits, or from using a single unified fit that may misspecify the duration and therefore the effective HRF shape for some of the event types? That seems like the central tradeoff in my case.

Right. As far as I can guess at this point, my guess is the single fit might be the best bet.

On that note then, do you have a go-to metric that you personally find useful for evaluating whether trial betas are “clean”, or how well two adjacent trials or sub-trial epochs are being successfully separated? For example, do you tend to look at split-half reliability, representational similarity structure, decoding performance, or some other diagnostic when judging whether a beta-estimation scheme is doing a good job?

The figures that GLMsingle outputs are possibly very informative. So you have some figures handy, that could be looked at. The simplest and fastest thing to look at is reliabilty (e.g. split half reliability, or anything like that). For the specific issue of "whether sub trial epochs are being successfully separated" that is in general a very hard issue to address rigorously. The most rigorous would be to run ground truth simulations. But that's a lot of work and hard to do correctly.

0 replies

BradyRTRoberts · 2026-05-11T20:32:40Z

BradyRTRoberts
May 11, 2026

Thanks again for the helpful guidance, Kendrick. I followed up by comparing several GLMsingle-based approaches and evaluated them with some simple QC metrics. These are not simulations, but rather diagnostic analyses of beta reliability and epoch separation. My main question is how well each beta-estimation method can produce reliable trial-level betas while also separating the adjacent word and symbol encoding epochs within a trial.

Here are the GLMsingle variants I tried:

model	fit structure	stimdur	target regressors	nuisance regressors	output used	main idea
`glmsingle_split_duration`	Two full-run fits	2 s and 5 s	2 s fit: `enc_word`, `rec_recognition`; 5 s fit: `enc_symbol`, `rec_recall`	Other-duration epochs as separate nuisance columns	Type-D	Estimate each epoch with its designed duration
`glmsingle_single2s`	One full-run fit	2 s for all epochs	All four epoch types	None beyond standard model/confounds	Type-D	Test whether one short duration works better
`glmsingle_single5s`	One full-run fit	5 s for all epochs	All four epoch types	None beyond standard model/confounds	Type-D	Test whether one long duration works better
`glmsingle_lss2_nocv_typeb`	One fit per run × epoch	true epoch duration	One epoch type at a time, LSS-style	Other epoch types as separate nuisance regressors	Type-B, `wantlss=1`	GLMsingle analogue of LSS without across-run CV
`glmsingle_lss2_epochcv_typeb`	One fit per epoch across runs	true epoch duration	One epoch type at a time, across runs	Other epoch types as separate nuisance regressors	Type-B, `wantlss=1`	Same as above, but uses across-run structure for HRF fitting
`glmsingle_lss2_epochcv_typed`	Same as epoch-CV model	true epoch duration	One epoch type at a time, across runs	Other epoch types as separate nuisance regressors	Type-D	Adds GLMdenoise and fractional ridge

I then compared the outputs using split-half reliability and representational similarity / item-matching diagnostics. Higher split-half reliability is better. Lower enc_word ↔ enc_symbol same-trial similarity is better, because high values suggest poor separation between adjacent encoding epochs. Higher enc_symbol → rec_recognition item matching is better, because the same symbol appears at encoding and recognition. enc_symbol → rec_recognition is the key test because the same symbol appears in both epochs. Then I used enc_word → rec_recognition as a control for temporal bleed: if the word-epoch beta matches the later symbol-recognition-epoch beta as much as the symbol-encoding-epoch beta does, then the word beta may be contaminated by its immediately following symbol stimulus at encoding.

model	split-half reliability, higher better	`enc_word ↔ enc_symbol`, lower better	`enc_symbol → rec_recognition`, higher better	`enc_word → rec_recognition`, contamination control
`glmsingle_split_duration`	0.848	0.419	0.001	0.002
`glmsingle_single2s`	0.648	0.643	0.004	0.003
`glmsingle_single5s`	0.701	0.597	0.006	0.002
`glmsingle_lss2_nocv_typeb`	0.832	0.400	0.003	0.004
`glmsingle_lss2_epochcv_typeb`	0.849	0.439	0.006	0.007
`glmsingle_lss2_epochcv_typed`	0.880	0.424	-0.000	0.003

The main pattern is that the GLMsingle variants produced fairly reliable beta estimates, especially the epoch-CV Type-D and Type-B variants. However, none of the models convincingly solved the adjacent-epoch separation issue. The same-trial enc_word ↔ enc_symbol similarity remained high, while item-specific enc_symbol → rec_recognition matching was near zero across all methods.

For comparison, a simpler LSS model formed using Nibetaseries with a fixed HRF had split-half reliability of about 0.867, enc_word ↔ enc_symbol similarity of about 0.209, and enc_symbol → rec_recognition item matching of about -0.001. So GLMsingle improved or matched reliability, but did not clearly improve the specific epoch-separation metric above more traditional LSS.

My current interpretation is that GLMsingle is helping with beta reliability, but my study design may simply be too temporally compressed to cleanly separate these adjacent sub-trial epochs using single-trial beta estimation alone.

Does that interpretation seem reasonable? Or are there other diagnostics you would prioritize before reaching that conclusion?

1 reply

kendrickkay May 20, 2026
Maintainer

Wow, you have taken a very comprehensive and thoughtful approach and covered many angles.

However, my main reaction is that of the four columns in your table, only the first column is trustable. And in fact, even the first column is not necessarily foolproof since it only assesses reliability (but does not assess bias). (E.g. see https://doi.org/10.1371/journal.pone.0270895).

For columns two through four, I understand the underlying motivation, but in general, I would suggest to be very very cautious about putting too much stake into your expectations/priors. If wrong, these can easily lead you astray.

At this point, the issues are quite complex and it is difficult to say much concretely. But if you wanted to discuss more deeply, i could do some discussion.

Answer selected by Brady-RT-Roberts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GLMsingle With Mixed-Duration Within-Trial Events #207

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

GLMsingle With Mixed-Duration Within-Trial Events #207

Uh oh!

Brady-RT-Roberts Apr 30, 2026

Replies: 3 comments · 2 replies

Uh oh!

kendrickkay May 2, 2026 Maintainer

Uh oh!

Brady-RT-Roberts May 4, 2026 Author

Uh oh!

kendrickkay May 5, 2026 Maintainer

Uh oh!

BradyRTRoberts May 11, 2026

Uh oh!

kendrickkay May 20, 2026 Maintainer

Brady-RT-Roberts
Apr 30, 2026

Replies: 3 comments 2 replies

kendrickkay
May 2, 2026
Maintainer

Brady-RT-Roberts May 4, 2026
Author

kendrickkay
May 5, 2026
Maintainer

BradyRTRoberts
May 11, 2026

kendrickkay May 20, 2026
Maintainer