TpT by MathiasValla · Pull Request #80 · simonprovost/scikit-longitudinal

MathiasValla · 2025-12-16T16:25:43Z

Hi Simon,

I’m happy to share that the TpT implementation is now finalized code-wise on my side. This MR introduces a working TpTDecisionTreeClassifier with a depth-first builder, compatible with scikit-lexicographical-trees / scikit-longitudinal.

This PR is to consider jointly with the Scikit Lexicographical Trees PR here: simonprovost/scikit-lexicographical-trees#1

References to cite

[1] Valla, M. Time-penalised trees (TpT): introducing a new tree-based data mining algorithm for time-varying covariates. Ann Math Artif Intell 92, 1609–1661 (2024). https://doi.org/10.1007/s10472-024-09950-w

[2] Mathias Valla, Xavier Milhaud. Time-penalized trees: consistency results and simulations. 2025. ⟨hal-05022929⟩ https://cnrs.hal.science/hal-05022929

Scope of this PR (minimal, functional)

Estimator: TpTDecisionTreeClassifier (classification only in this PR’s target scope).
Builder: DepthFirstTreeBuilder (no Best-First in this scope).
Criterion: Gini (others left out intentionally for the minimal version).
I/O and structure: Cython implementation aligned with the existing tree stack.
(No plot_tree adaptation included here; regular sklearn.tree.plot_tree is usable for quick inspection.)

Note: I also have a working regression path and more features locally, but to keep this PR focused and easy to land, I’m proposing only the minimal, agreed-upon surface.

Where the code lives

TpT estimator code: scikit_longitudinal/estimators/trees/TpT/
(primary class: TpTDecisionTreeClassifier, splitter, builder, structs)
Long→Wide converter (temporary): scikit_longitudinal/estimators/trees/TpT/_preprocessing.py
This should be moved under or near LongitudinalDataset per your design. I’d be grateful if you could drop it into the right place in a temporary branch; I’ll review once moved. It currently handles long → wide only (not the reverse), without TIDAL/polars. It’s a first step toward 💡 Feature Request - From Wide to Long and vice-versa Longitudinal data formatting, inspired from TIDAL #64 but does not fully solve it.

Quick example (concise)

import pandas as pd
from sklearn import tree
from scikit_longitudinal.estimators.trees import TpTDecisionTreeClassifier

# Example dataset from tests (long format):
# columns include: ["id", "time_point", "duration", "target", ... features ...]
df = pd.read_csv("path/to/stroke.csv", sep=";")

X = df.drop(columns=["target", "cholesterol"])  # minimal cleanup for this file
y = df["target"].astype(int)

clf = TpTDecisionTreeClassifier(
    gamma=0.1,              # time penalty (λ)
    criterion="gini",
    id_col="id",
    time_col="time_point",
    duration_col="duration",
    assume_long_format=True,
    time_step=1,
    max_horizon=1000,
    min_samples_split=2,
    max_depth=1000,
    random_state=42,
).fit(X, y)

# Optional: quick textual sanity checks
print("max_depth:", clf.tree_.max_depth)
print("node_count:", clf.tree_.node_count)
print("split_time_index[:10]:", clf.tree_.split_time_index[:10])

# Optional: quick visual check (works for fast inspection)
feature_names = getattr(clf, "_wide_feature_names_", None)
_ = tree.plot_tree(
    clf, filled=True, fontsize=10,
    feature_names=feature_names, tpt_time_scale=0.5
)

That’s intentionally minimal (no non-essential utilities, no external metrics). It just loads data, fits TpT, prints a few structural fields, and plots.

Dependencies

I temporarily removed ray from the dependencies due to local issues.
Please feel free to restore it where appropriate (e.g., as an optional dependency / extra for parallelism or tests).

Ask / next steps

I’ve pushed this as far as I can right now. It would be ideal if you could manage the integration so TpT aligns perfectly with sklong’s patterns (API surface, dataset plumbing via LongitudinalDataset, docs structure, examples, CI, etc.). I’ll follow up with any fixes you need during review.

I’m also working on two additional papers involving TpT and will of course cite scikit-longitudinal as the reference implementation. For any future article specifically about the implementation, I’d be happy to include you as co-author for your guidance and help.

Thanks a lot, and please let me know how you’d like to proceed with the preprocessing relocation!

— Mathias

simonprovost · 2026-01-14T10:56:44Z

@MathiasValla Your work is amazing! Thanks so very much! Happy new year as well!

I will be busy in the coming weeks / months but as always as soon as I can I'll work on making this PR ready — Some news incoming within the year for Sklong (positive!); and TpT will definitely be useful :)

Wishing you well,

Cheers

MathiasValla · 2026-01-14T11:35:16Z

@simonprovost Thanks, bets wishes to you too !

Great to hear.
I only have one constraint: with your blessing, I'd like to submit the TpT code in a journal in the first half of the year (at most !), so I guess it's better to wait your integration of TpT within the existing framework. If you agree (and if interested), I could prepare a draft (including you as co-author for all your help on that integration !) in that direction.

In the meantime, I still need to finalize 2 articles on TpT, where I'll refer to sklong, of course. I'm also working on an animation explaining how TpT works (animated with manim), it could be usefull to include it on the scikit-longitudinal website (if you also think it's a good idea).

Cheers,
Mathias

simonprovost · 2026-01-14T11:44:32Z

@simonprovost Thanks, bets wishes to you too !

Great to hear. I only have one constraint: with your blessing, I'd like to submit the TpT code in a journal in the first half of the year (at most !), so I guess it's better to wait your integration of TpT within the existing framework. If you agree (and if interested), I could prepare a draft (including you as co-author for all your help on that integration !) in that direction.

In the meantime, I still need to finalize 2 articles on TpT, where I'll refer to sklong, of course. I'm also working on an animation explaining how TpT works (animated with manim), it could be usefull to include it on the scikit-longitudinal website (if you also think it's a good idea).

Cheers, Mathias

You are working so efficiently @MathiasValla ! Kind of jealous 🙏

That's fantastic to hear, and thank you for your kind gesture, mate! I am very interested! I would be pleased, as I have great trust in your research. If you want to continue the discussion by email regarding adding me as the (last) co-author of your manuscript, please do so 🫡.

I suppose I'll be more transparent so you're aware. In two weeks, I'll begin a Longitudinal ML-based research visit at the University of Edinburgh, and in the meantime, I'm applying here and there arounds for Post-Docs and trying to finish writing my Ph.D. thesis, so the faster I get, the more time I'll have to work on TpT integration.

This being said, given your journal wishes, I believe I could spend time on TpT in a month or so. Perhaps set yourself a reminder (I wish there was bots agents in Github Issues 👀 that could chase meself) and chase back, saying, "Hey Simon, you are late:)"? My head tends to be everywhere at once...

This is a fantastic idea the TpT animation. I had in mind last year to create a visualisation for each of Sklong's primitives at the top of their respective API reference page so that users could grasp them easily. Definitely include it, or give it to me and I'll include it in the API ref of TpT; it's a modern documentation notion that I plan (during my Post-Doc) to extend to others so be the first one with TpT :) PS: I adore Manim! Look here: Automated machine learning for longitudinal classification — AIDA (UKC) @ https://simonprovostdev.vercel.app/talks.

Let us keep each other transparently updated

Cheers,

MathiasValla · 2026-03-02T13:39:37Z

Hey Simon !
I hope you're well. Quick ping for an update: do you have a rough timeline for reviewing/merging the TpT integration into scikit-longitudinal? Happy to help with that, of course. 😄

simonprovost · 2026-03-03T15:20:28Z

Hey @MathiasValla hoping all is well! Sorry for being a lil bit AWOL I am interning at the University of Edinburgh at the moment so things are a bit intense but I aim to work on your PR within the month. I wondered if you may rebase with main in order to be "up-to-date" please?

git checkout main
git pull
git checkout -
git rebase main

And solve potential conflicts! Once you'll do that you'll have to git push --force and then I'll be able to work on a clean updated plate

Do not hesitate to ask me anything prior if you do not feel comfortable with that!

Cheers

MathiasValla · 2026-03-04T09:15:37Z

Hi Simon ! No worries at all, hope the internship is going well. 😄

I’ve rebased my TpT branch onto main, resolved the conflicts, and force-pushed the updated branch. Everything should now be clean and up-to-date for you to review.

Let me know if you’d like me to adjust anything else !

simonprovost · 2026-03-04T11:42:38Z

Hi Simon ! No worries at all, hope the internship is going well. 😄

I’ve rebased my TpT branch onto main, resolved the conflicts, and force-pushed the updated branch. Everything should now be clean and up-to-date for you to review.

Let me know if you’d like me to adjust anything else !

I set a reminder in my diary, hope to spend time on your great work mate soon! Thanks for your promptness! more than brillant!

PS: I've made some general comments!

simonprovost

Hey @MathiasValla I made some general comments not directly about the algorithms but more about Python code quality, which we expect in the main production track of Sklong. This should give you a bit of work until I’m fully available to test your amazing contribution myself on my computer :)

PS: I’ll most probably include TpT in a toolkit I’m building for social and clinical researchers at Edinburgh! That’s going to be nice!

Cheers,

MathiasValla · 2026-03-04T12:20:13Z

Hey Simon
Thanks a lot for the thorough review 🙏 Super helpful and totally fair.
I’ll tackle your comments later this week: remove the cursor config, trim the module/file docstrings + inline separators, rename the tests more cleanly / move what makes sense, and beef up the main class docstrings with proper refs + parameter docs.

Also yeah… I’m not a dev by training, so sorry if some bits look a bit clunky and handmade 😅 . But I’m happy to clean it up and improve !

simonprovost · 2026-03-04T12:27:16Z

Hey Simon Thanks a lot for the thorough review 🙏 Super helpful and totally fair. I’ll tackle your comments later this week: remove the cursor config, trim the module/file docstrings + inline separators, rename the tests more cleanly / move what makes sense, and beef up the main class docstrings with proper refs + parameter docs.

Also yeah… I’m not a dev by training, so sorry if some bits look a bit clunky and handmade 😅 . But I’m happy to clean it up and improve !

That's why we have Pull Request Review do not worry mate! You've done a grand job really! Plus, the first true contributor, we had others but not from a brand new algorithm so I might say this is better than anything else done thus far 🙏

Take your time and if you have questions, do not worry no stupid questions exist here, ask anything at anytime!

Looking forward to find time to test the algorithm out!

PS: I know you're not a dev by training but you doing as great as them, so take the pride dude 👌

Cheers

MathiasValla · 2026-03-06T15:41:55Z

Simon,

I had trouble running the test_TpT file (now in the correct tests folder). I spent some time trying to understand what went wrong, but could not figure it out, so if you manage to test it at some point, it would be great.

Apart from that, I've been able to make all modifications.

Cheers,
Mathias

simonprovost · 2026-03-06T16:12:15Z

Simon,

I had trouble running the test_TpT file (now in the correct tests folder). I spent some time trying to understand what went wrong, but could not figure it out, so if you manage to test it at some point, it would be great.

Apart from that, I've been able to make all modifications.

Cheers, Mathias

Hi Mathias! Hoping all is well! Thanks for your prompt work as always!

Have you tried: https://scikit-longitudinal.readthedocs.io/latest/developers/#running-tests ?

Cheers

simonprovost

Amazing changes @MathiasValla ! Thanks much, here are a couple more in the meantime !

simonprovost · 2026-03-06T16:18:31Z

+[tool.uv.sources]
+scikit-lexicographical-trees = { path = "../scikit-lexicographical-trees" }


Let's keep this comment open for the time being please.

@simonprovost, review: simonprovost/scikit-lexicographical-trees#1

simonprovost

Code looks good @MathiasValla ! I'll have to simply try it out myself on my machine to make sure it's all working but that's nice, well done! One change is needed but that's in the README :) !

I'll test that out very soon, hopefully! But just to re-assure you, having looked at the code it's now brills! And thanks thanks thanks for that!

simonprovost · 2026-03-14T12:42:22Z

@MathiasValla we are getting there!

Do not forget to git checkout main, git pull, git checkout -, git rebase main as we've merged multi-class support #83 !

Also note that I've worked on your TpT PR on Scikit Lexicographical Trees to accelerate things out 🥳

Cheers

simonprovost · 2026-03-21T14:30:42Z

Hey @MathiasValla! I have finally finalised the full rebranding of Sklong. See the new README and the new docs utilising the new state-of-the-art Rust doc generator, Zensical, at https://scikit-longitudinal.readthedocs.io/latest/! Can't wait to put TpT as part of the API reference :)

Now it is much easier to read through the library, to be honest.

Cheers!

simonprovost

Quick one!

simonprovost

This PR looks good @MathiasValla ! Nearly there! Only the lexicographical trees need some adjustments 👌

Two little comments below though 👇

Cheers

MathiasValla · 2026-04-13T09:15:34Z

Hi Simon,

I finally went through your review points carefully and I addressed them in both PRs. I'll copy/paste the same comment below oin both PRs:

On the scikit-lexicographical-trees side, I removed the leftover “New TpT” mentions, rewrote the relevant comments in proper English, and restored the generic threshold_gain validation to its original range so it stays aligned with the lexicographical estimators.

On the scikit-longitudinal side, I switched the tree exports back to absolute imports and kept the broader TpT gamma validation local to the TpT wrappers, so TpT still accepts larger gamma values without widening the generic tree API.
Some precision on that point: for TpT, we need a gamma hyperparameter. Since its role is relatively close to the "threshold_gain" you've introduced, I made the (questionnable) decision to recycle "threshold_gain" and use it for "gamma". In theory, TpT's gamma can be > 1 (explaining my modification of the parameter constraints), so for now, I've kept the original threshold_gain constaints but allowed gamma to be >= 1 if needed (in practice, that is almost never usefull, but in theory that is possible). If needed, I can absolutely leave "threshold_gain" as it was and define "gamma" as a distinct hyperparameter. It's better practice, I guess ?

I also reran focused checks after that: rebuilt the fork locally, confirmed TpT still fits with gamma>1, and confirmed the generic DecisionTreeClassifier now rejects threshold_gain > 1 again.

Sorry as well for the “New TpT” mentions that reappeared earlier. I think I got a bit lost in my local commits while iterating, and I’m still learning to keep that workflow cleaner.

Cheers,
Mathias

simonprovost · 2026-04-15T01:00:50Z

Hello, Hello @MathiasValla! Hoping all is well!

Happy to tell you that your TpT splitter is now merged and released on PyPi, see more in [1]. Soon after merging, I just thought that we had forgotten to look at the unit tests of your PR, which is entirely my fault for not paying much attention to refactoring Scikit lexicographical trees to be a great open-source project, with automation and all (now tracked at #86). Anyway, to the best of my surprise, I run the tests and a lot of errors were introduced. Do not worry, I just merged [2] and now it is all released on PyPI 🥳

I have now updated the Scikit-Longitudinal main track to be up to date with the new version of Scikit-Lexicographical-Trees. Could you please rebase your branch with main, and make sure (i) the unit tests go through, see more in [3] (PS: If you have some Ray errors, that's totally fine), and then (ii) run your own sort of basic experiment to check if TpT behaves like you wanted, please?

Once that is good, I'll do a last pass here in the changes tab and will finally accept and merge your amazing contrib!

Let's try to speed things up because of your submission ✅

Thanks once more for your work, brills! TOP-1!

Cheers

MathiasValla · 2026-04-17T12:52:02Z

Hi Simon,

Thanks again, and great catch on the post-merge test issue.

I rebased the branch on the updated main, reran the tests, and pushed the rebased branch. On my side, everything goes through apart from the Ray-dependent parallel tests, which fail only because ray is not installed in the environment, so that seems consistent with your note.

I also reran a basic TpT sanity check on a toy dataset, and the behavior still looks right.

So from my side the branch should be ready for your last pass.

Cheers

MathiasValla mentioned this pull request Dec 17, 2025

TpT simonprovost/scikit-lexicographical-trees#1

Merged

MathiasValla force-pushed the TpT branch from dea07b6 to 5f91847 Compare March 4, 2026 08:54

simonprovost requested changes Mar 4, 2026

View reviewed changes

MathiasValla requested a review from simonprovost March 6, 2026 15:42

simonprovost requested changes Mar 6, 2026

View reviewed changes

MathiasValla requested a review from simonprovost March 10, 2026 12:33

simonprovost requested changes Mar 12, 2026

View reviewed changes

Comment thread README.md

Comment thread scikit_longitudinal/tests/test_TpT.py

Comment thread scikit_longitudinal/estimators/trees/TpT/_preprocessing.py

simonprovost assigned MathiasValla Mar 12, 2026

simonprovost added documentation Improvements or additions to documentation enhancement New feature or request good first issue Good for newcomers Classification Related content to classification tests labels Mar 12, 2026

simonprovost requested changes Apr 3, 2026

View reviewed changes

Comment thread scikit_longitudinal/estimators/trees/TpT/TpT_decision_tree_regressor.py Outdated

Comment thread scikit_longitudinal/estimators/trees/TpT/TpT_decision_tree.py Outdated

simonprovost requested changes Apr 4, 2026

View reviewed changes

Comment thread scikit_longitudinal/estimators/trees/__init__.py Outdated

Comment thread scikit_longitudinal/estimators/trees/__init__.py Outdated

MathiasValla requested a review from simonprovost April 14, 2026 10:59

simonprovost force-pushed the main branch 2 times, most recently from ec52efb to d8d9927 Compare April 15, 2026 01:25

Rebase TpT on main

5a8e4e2

MathiasValla force-pushed the TpT branch from 4dbb600 to 5a8e4e2 Compare April 17, 2026 12:46

		[tool.uv.sources]
		scikit-lexicographical-trees = { path = "../scikit-lexicographical-trees" }

Conversation

MathiasValla commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

References to cite

Scope of this PR (minimal, functional)

Where the code lives

Quick example (concise)

Dependencies

Ask / next steps

Uh oh!

simonprovost commented Jan 14, 2026

Uh oh!

MathiasValla commented Jan 14, 2026

Uh oh!

simonprovost commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MathiasValla commented Mar 2, 2026

Uh oh!

simonprovost commented Mar 3, 2026

Uh oh!

MathiasValla commented Mar 4, 2026

Uh oh!

simonprovost commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simonprovost left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MathiasValla commented Mar 4, 2026

Uh oh!

simonprovost commented Mar 4, 2026

Uh oh!

MathiasValla commented Mar 6, 2026

Uh oh!

simonprovost commented Mar 6, 2026

Uh oh!

simonprovost left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

simonprovost Mar 6, 2026

Choose a reason for hiding this comment

Let's keep this comment open for the time being please.

Uh oh!

simonprovost left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

simonprovost commented Mar 14, 2026

Uh oh!

simonprovost commented Mar 21, 2026

Uh oh!

simonprovost left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

simonprovost left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

MathiasValla commented Apr 13, 2026

MathiasValla commented Dec 16, 2025 •

edited

Loading

simonprovost commented Jan 14, 2026 •

edited

Loading

simonprovost commented Mar 4, 2026 •

edited

Loading

simonprovost left a comment •

edited

Loading