You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Time series analysis (TSA) is a longstanding research topic in the data mining community and has wide real-world significance. Compared to "richer" modalities such as language and vision, which have recently experienced explosive development and are densely connected, the time-series modality remains relatively underexplored and isolated. We notice that many recent TSA works have formed a new research field, i.e., Multiple Modalities for TSA (MM4TSA). In general, these MM4TSA works follow a common motivation: how TSA can benefit from multiple modalities. This survey is the first to offer a comprehensive review and a detailed outlook for this emerging field. Specifically, we systematically discuss three benefits: (1) reusing foundation models of other modalities for efficient TSA, (2) multimodal extension for enhanced TSA, and (3) cross-modality interaction for advanced TSA. We further group the works by the introduced modality type, including text, images, audio, tables, and others, within each perspective. Finally, we identify the gaps with future opportunities, including the reused modalities selections, heterogeneous modality combinations, and unseen tasks generalizations, corresponding to the three benefits. We release this up-to-date GitHub repository that includes key papers and resources. More details please check our survey.
Contributing
🚀 We will continue to update this repo. If you find it helpful, please Star it or Cite Our Survey.
🤝 Contributions are welcome! Please feel free to submit a Pull Request.
Citation
🤗 If you find this survey useful, please consider citing our paper. 🤗
@misc{liu2025timeseriesanalysisbenefit,
title={How Can Time Series Analysis Benefit From Multiple Modalities? A Survey and Outlook},
author={Haoxin Liu and Harshavardhan Kamarthi and Zhiyuan Zhao and Shangqing Xu and Shiyu Wang and Qingsong Wen and Tom Hartvigsen and Fei Wang and B. Aditya Prakash},
year={2025},
eprint={2503.11835},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={[https://arxiv.org/abs/2503.11835}](https://arxiv.org/abs/2503.11835})
}
Mobile traffic prediction in consumer applications: A multimodal deep learning approach
IEEE Transactions on Consumer Electronics
Urban informal settlements classification via a transformer-based spatial-temporal fusion network using multimodal remote sensing and time-series human activity data
International Journal of Applied Earth Observation and Geoinformation
Spatial-temporal attention-based convolutional network with text and numerical information for stock price prediction
Neural Computing and Applications
Traffic congestion prediction using toll and route search log data
IEEE International Conference on Big Data (Big Data) 2022
Understanding city traffic dynamics utilizing sensor and textual observations
AAAI 2016
Citygpt: Empowering urban spatial cognition of large language models
arXiv 24.06
Where Would I Go Next? Large Language Models as Human Mobility Predictors
arXiv 23.08
Leveraging Language Foundation Models for Human Mobility Forecasting
arXiv 22.09
UrbanMind: Urban Dynamics Prediction with Multifaceted Spatial-Temporal Large Language Models
KDD 2025
From Swath to Full-Disc: Advancing Precipitation Retrieval with Multimodal Knowledge Expansion
KDD 2025
Multi-scale Physics-informed Transformer With Spatio-temporal Feature Adapter For Extreme Precipitation Nowcasting
KDD 2025
Physics-Guided Learning of Meteorological Dynamics for Weather Downscaling and Forecasting
KDD 2025
2.3.2 Medical Time Series
Title
Venue
Addressing asynchronicity in clinical multimodal fusion via individualized chest x-ray generation
NeurIPS 2024
EMERGE: Enhancing Multimodal Electronic Health Records Predictive Modeling with Retrieval-Augmented Generation
CIKM 2024
Improving medical predictions by irregular multimodal electronic health records modeling
ICML 2023
Multimodal pretraining of medical time series and notes
ML4H 2023
Learning missing modal electronic health records with unified multi-modal data embedding and modality-aware attention
ML4H 2023
MedFuse: Multi-modal fusion with clinical time-series data and chest X-ray images
ML4H 2022
Miracle: Causally-aware imputation via learning missing data mechanisms
NeurIPS 2021
How to leverage the multimodal EHR data for better medical prediction?
EMNLP 2021
Deep multi-modal intermediate fusion of clinical record and time series data in mortality prediction
Frontiers in Molecular Biosciences
Integrated multimodal artificial intelligence framework for healthcare applications
NPJ digital medicine
PTB-XL, a large publicly available electrocardiography dataset
Scientific data
Fusion of medical imaging and electronic health records using deep learning: a systematic review and implementation guidelines
NPJ digital medicine
Arbitrary Data as Images: Fusion of Patient Data Across Modalities and Irregular Intervals with Vision Transformers
arXiv 25.01
Towards Predicting Temporal Changes in a Patient's Chest X-ray Images based on Electronic Health Records
arXiv 24.09
Multimodal risk prediction with physiological signals, medical images and clinical notes
medrxiv 23.05
MedualTime: A Dual-Adapter Language Model for Medical Time Series-Text Multimodal Learning
IJCAI 2025
2.3.3 Financial Time Series
Title
Venue
Fnspid: A comprehensive financial news dataset in time series
KDD 2024
Multi-modal deep learning for credit rating prediction using text and numerical data streams
Applied Soft Computing
Multimodal multiscale dynamic graph convolution networks for stock price prediction
Pattern Recognition
Multi-Modal Financial Time-Series Retrieval Through Latent Space Projections
ACM ICAIF
Natural language based financial forecasting: a survey
Artificial Intelligence Review
Financial analysis, planning & forecasting: Theory and application
Unknown
Text2timeseries: Enhancing financial forecasting through time series prediction updates with event-driven insights from large language models
arXiv 24.07
Natural language processing and multimodal stock price prediction
arXiv 24.01
Modality-aware Transformer for Financial Time series Forecasting
arXiv 23.10
Predicting financial market trends using time series analysis and natural language processing
arXiv 23.09
Stock price prediction using sentiment analysis and deep learning for Indian markets
arXiv 22.04
Volatility prediction using financial disclosures sentiments with word embedding-based IR models
arXiv 17.02
Dynamic Higher-Order Relations and Event-Driven Temporal Modeling for Stock Price Forecasting
IJCAI 2025
2.4 Gaps and Outlooks
2.4.1 Heterogeneous Modality Combinations
Title
Venue
Imagebind: One embedding space to bind them all
CVPR 2023
LANISTR: Multimodal learning from structured and unstructured data
arXiv 23.05
2.4.2 Robust & Efficient Multimodal TS (Outlook)
Title
Venue
MAESTRO: Adaptive Sparse Attention and Robust Learning for Multimodal Dynamic Time Series
NeurIPS 2025
2.5 Datasets & Benchmarks (Multimodal)
Title
Venue
Time-IMM: A Dataset and Benchmark for Irregular Multimodal Multivariate Time Series
NeurIPS 2025
3. TimeAsX
3.1 Time Series as Text
Title
Venue
LangTime: A Language-Guided Unified Model for Time Series Forecasting with Proximal Policy Optimization
ICML 2025
Context-Alignment: Activating and Enhancing LLM Capabilities in Time Series
ICLR 2025
ChatTime: A Unified Multimodal Time Series Foundation Model Bridging Numerical and Textual Data
AAAI 2025
Exploiting Language Power for Time Series Forecasting with Exogenous Variables
THE WEB CONFERENCE 2025
Lstprompt: Large language models as zero-shot time series forecasters by long-short-term prompting
ACL 2024 Findings
TEMPO: Prompt-based Generative Pre-trained Transformer for Time Series Forecasting
ICLR 2024
TEST: Text Prototype Aligned Embedding to Activate LLM's Ability for Time Series
ICLR 2024
Time-LLM: Time Series Forecasting by Reprogramming Large Language Models
ICLR 2024
Are language models actually useful for time series forecasting?
NeurIPS 2024
Autotimes: Autoregressive time series forecasters via large language models
NeurIPS 2024
S2 IP-LLM: Semantic Space Informed Prompt Learning with LLM for Time Series Forecasting
ICML 2024
Large language models are zero-shot time series forecasters
NeurIPS 2023
One fits all: Power general time series analysis by pretrained lm
NeurIPS 2023
PromptCast: A New Prompt-based Learning Paradigm for Time Series Forecasting
IEEE TKDE
Chronos: Learning the language of time series
TMLR
LLM4TS: Aligning Pre-Trained LLMs as Data-Efficient Time-Series Forecasters
ACM TIST
Large Language Models are Few-shot Multivariate Time Series Classifiers
arXiv 25.02
TimeCAP: Learning to Contextualize, Augment, and Predict Time Series Events with Large Language Model Agents
arXiv 25.02
ChatTS: Aligning Time Series with LLMs via Synthetic Data for Enhanced Understanding and Reasoning
arXiv 24.12
Large language models can deliver accurate and interpretable time series anomaly detection
arXiv 24.05
Multi-Patch Prediction: Adapting LLMs for Time Series Representation Learning
arXiv 24.02
Lag-llama: Towards foundation models for time series forecasting
arXiv 23.10
Unleashing The Power of Pre-Trained Language Models for Irregularly Sampled Time Series
KDD 2025
Understanding Why Large Language Models Can Be Ineffective in Time Series Analysis: The Impact of Modality Alignment
KDD 2025
Bridging Time and Linguistics: LLMs as Time Series Analyzer through Symbolization and Segmentation
NeurIPS 2025
3.2 Time Series as Image
Title
Venue
Time-VLM: Exploring Multimodal Vision-Language Models for Augmented Time Series Forecasting
ICML 2025
CAFO: Feature-Centric Explanation on Time Series Classification
KDD 2024
TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis
ICLR 2024
Towards total recall in industrial anomaly detection
CVPR 2022
Deep video prediction for time series forecasting
ACM ICAIF 2021
Forecasting with time series imaging
Expert Systems with Applications
Can Multimodal LLMs Perform Time Series Anomaly Detection?
arXiv 25.02
See it, Think it, Sorted: Large Multimodal Models are Few-shot Time Series Anomaly Analyzers
arXiv 24.11
Plots Unlock Time-Series Understanding in Multimodal Models
arXiv 24.10
VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters
arXiv 24.08
Training-Free Time-Series Anomaly Detection: Leveraging Image Foundation Models
arXiv 24.08
ViTime: A Visual Intelligence-Based Foundation Model for Time Series Forecasting
arXiv 24.07
Time Series as Images: Vision Transformer for Irregularly Sampled Time Series
arXiv 23.03
An image is worth 16x16 words: Transformers for image recognition at scale
arXiv 20.10
Imaging Time-Series to Improve Classification and Imputation
arXiv 15.06
Multi-Modal View Enhanced Large Vision Models for Long-Term Time Series Forecasting