Skip to content

Conversation

@eschibli
Copy link
Contributor

@eschibli eschibli commented Oct 8, 2024

Checklist before merging this PR:

  • Mentioned all issues that this PR fixes or addresses.
  • Summarized the updates of this PR under Summary.
  • Added an entry under Unreleased in the Changelog.

Implements #2510

Summary

Adds the option to project to the output temporal space at the end of TS-Mixer, rather than the beginning. This was how most of the results in the original google-research paper were achieved (ie, the architecture in Fig #1 of the paper). This may allow higher performance in cases where past covariates are important by allowing a more direct series of residual connections along the input time dimension.

I allowed support for future covariates by instead projecting them into the lookback temporal space, but this probably won't perform well in cases where they are more important than the historical targets and past covariates.

Other Information

The original paper and source code do not clarify whether the final temporal projection should go before or after the final feature projection as they hardcoded hidden_size to output_dim and therefore did not have need a final feature projection. I erred on the side of putting the temporal projection first, as otherwise the common output_dim==1 could lead to unexpected, catastrophic compression before the temporal projection step.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@madtoinou
Copy link
Contributor

Hi @eschibli,

First of all, thanks for opening this PR!

For the linting, it will make your life much easier if you follow these instruction, or you can also run it manually

@codecov
Copy link

codecov bot commented Oct 27, 2024

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.56%. Comparing base (27b46d2) to head (7ad36bd).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2555      +/-   ##
==========================================
- Coverage   95.63%   95.56%   -0.07%     
==========================================
  Files         153      153              
  Lines       16433    16447      +14     
==========================================
+ Hits        15715    15718       +3     
- Misses        718      729      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@eschibli
Copy link
Contributor Author

Hi @eschibli,
...

Thanks @madtoinou. I was not able to get Gradle running on my machine and didn't realize ruff was that easy to set up so sorry for spamming your test pipeline.

I don't believe the failing mac build is a result of my changes so it should be good for review now.

@dennisbader
Copy link
Collaborator

Hi @eschibli, thanks for the PR. Yes, the failing mac tests are unrelated to your PR, we're working on it :).
Also, give us some time to review, our capacity is currently a bit limited 🙏

@eschibli
Copy link
Contributor Author

eschibli commented Nov 3, 2024

Understood Dennis

Copy link
Contributor

@madtoinou madtoinou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks great, thank you for this nice PR @eschibli! Some minor comment about the order of the operations/projections to make the flow more intuitive.

Could you also extend the TSMixer notebook to include a section where the difference in performance with "project_first_layer=True/False" and future covariates can be visualized?

# In the original paper this was not implimented for future covariates,
# but rather than ignoring them or raising an error we remap them to the input time dimension.
# Suboptimal but may be useful in some cases.
elif self.future_cov_dim:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to make it a bit more intuitive, I would move this code below, inside the if self.future_cov_dim and change the condition to if not self.project_first_layer in order to group the operation on each kind of features:

  1. "target"; project to output time dimension in the first layer if project_first_layer = True otherwise we stay in input time dimension
  2. "target"; do the feature_mixing_hist (not changed)
  3. "fut_cov"; project the future covariates to input time dimension if project_first_layer=False (the logic you added)
  4. concatenate the future covariates to the target features (not changed)
  5. static covariates (not changed)
  6. "target"; projection to the output time dimension if it did not occur earlier
  7. "target"; application of fc_out, critical for probabilistic forecasts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation is out-of-date with the current version of the PR, and I think the current version makes more sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I set it up this way originally because the TSMixer performance test was failing if I didn't generate the layers in exactly the same order. In this PR I have relaxed the tolerance as it wasn't stable.

@eschibli
Copy link
Contributor Author

eschibli commented Mar 5, 2025

Hi @madtoinou: Sorry I was very busy in December and January. I should have some time this week or next week to finish this.

Also, in my own work I have had more success with an intermediate variant, which I mentioned in the original feature request thread. If we would consider adding that additional functionality I will add it to this pull request.

@madtoinou
Copy link
Contributor

Sure! We will then try to compare each one of them on small use-cases, to make sure that they bring something but based on your experience, it seems to already be the case :)

@eschibli eschibli requested a review from madtoinou May 11, 2025 01:36
@eschibli
Copy link
Contributor Author

eschibli commented Jun 8, 2025

@madtoinou are you happy with these changes?

@eschibli
Copy link
Contributor Author

eschibli commented Aug 1, 2025

@madtoinou @dennisbader sorry to nag but could you please clarify if you are unhappy with this addition or if more changes are needed to approve this?

@dennisbader
Copy link
Collaborator

@madtoinou could you give an update on this one? :)

@madtoinou
Copy link
Contributor

Hi @eschibli,

Sorry for the delay, I started reviewing but it took me longer than expected.

These changes are welcomed, I have the impression that it will bring considerable performance gain but I want to make sure that the current behavior is still accessible and reproducible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants