Skip to content

Atlamtiz/Sonar-TS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Sonar-TS: Search-Then-Verify Natural Language Querying for Time Series Databases

arXiv ICML 2026 Live Demo HuggingFace Dataset License

Official implementation of the Sonar-TS paper (ICML 2026).

πŸ“– Introduction

Time-series data is everywhere in industry. Common examples include temperature readings, stock prices, and factory sensor logs. When a non-expert wants to extract specific information from such data, they often hit a serious wall. For instance, they might ask: "In the past month, on which day did the temperature rise sharply between 10 a.m. and 3 p.m. and then drop just as quickly?" Today, no existing method can directly answer this kind of natural-language question over a real time-series database.

Existing approaches fall short in two characteristic ways. Text-to-SQL methods are designed for relational data and cannot describe the shape-based morphology that defines time series (such as plateau, rapid fall, fluctuating stable). Time-series language models can answer questions over short windows, but they cannot scale to the database-scale histories that real applications need.

NLQ4TSDB task

This leaves a real, unsolved problem: an interface where users describe time-series patterns in natural language and have them answered against the underlying database. We close that gap with three contributions:

  • New problem. We formally define the problem as Natural Language Querying for Time Series Databases (NLQ4TSDB).
  • New benchmark. We release NLQTSBench, the first benchmark for standardized evaluation of NLQ4TSDB.
  • Novel framework. We propose Sonar-TS, a framework that solves NLQ4TSDB through a Search-Then-Verify pipeline.

πŸ—‚οΈ NLQTSBench

NLQTSBench overview

NLQTSBench is the first standardized benchmark for NLQ4TSDB:

  • Size & Diversity: Contains 1,153 tasks spanning 4 difficulty levels and 9 sub-tasks.
  • Download: Hosted on HuggingFace at mrtan/NLQTSBench.

πŸ” Sonar-TS Framework

Sonar-TS framework

Sonar-TS is a three-stage pipeline:

  1. Offline data processing builds multi-scale feature tables on top of the raw series.
  2. Online querying runs an LLM through Task Planning, Code Generation, and Execute. Its Experiences (e.g., skills) come from the Prompt Cold Start loop (cold_start/).
  3. Post-processing renders the verified result as a natural-language answer, with an optional visualization.

πŸš€ Quick start

1. Install dependencies

git clone https://github.com/Atlamtiz/Sonar-TS.git
cd Sonar-TS
conda create -n sonarts python=3.11 -y && conda activate sonarts
pip install -r requirements.txt

2. Download the benchmark data from HuggingFace

The raw CSVs (around 1.7 GB) live on HuggingFace. One command pulls them into the expected location:

python scripts/download_dataset.py

This places 1,153 CSVs under nlqtsbench/ts_data/. The benchmark spec (nlqtsbench/tasks.json) is already in this repository.

3. Configure your DeepSeek API key

The framework defaults to DeepSeek (deepseek-v4-flash) with 10 worker threads, one dedicated API key per worker. With 10 keys, a full benchmark run takes about 25 minutes.

Copy the template and paste your keys:

cp -n configs/deepseek_api-key.txt.example configs/deepseek_api-key.txt
$EDITOR configs/deepseek_api-key.txt        # paste one key per line

If you have fewer than 10 keys, also lower concurrency.workers in configs/online.yaml to match.

4. Build per-task databases + features (one-shot, ~10 min)

python -m scripts.load_benchmark        # CSV β†’ per-task database
python -m scripts.build_index           # SAX feature tables per task

Note: For ease of reproduction, this release ships a lightweight SQLite-based implementation. The framework's data layer is backend-agnostic by design; production TSDBs (InfluxDB, TimescaleDB, etc.) can be supported by swapping the storage adapter.

5. Run the benchmark

python main.py

Useful flags (full list via python main.py --help):

Flag Effect
--limit N Process only the first N tasks. Use for a quick smoke test before committing to the full ~25 min run.
--workers N Override the worker thread count (default: 10, from configs/online.yaml). Lower this if you have fewer keys.
--figures Also render one PNG per task (output/figures/). Adds ~15-20 min via a multi-process Kaleido pool.
--rebuild Discard output/predict_partial.jsonl and re-run every task. Use after editing prompts, skills, or configs.
--out-dir Write results to a custom directory instead of ./output/.

Results print to the terminal as a paper-aligned per-category / per-level / overall table, and are written to ./output/:

output/
β”œβ”€β”€ predict.json         submission-format predictions
β”œβ”€β”€ summary.json         per-subtask / per-category / overall scores
└── per_task.json        one row per task with prediction + score

🎯 Example outputs

Visualizations produced by python main.py --figures. Curated samples live in output/figures/examples/; the corresponding score breakdown is in output/summary.json and output/per_task.json.

Shape Identification example Composite Trend example

Left: Shape Identification. Right: Composite Trend.

πŸ“ Project structure

Sonar-TS
β”œβ”€β”€ main.py
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ LICENSE
β”‚
β”œβ”€β”€ configs/
β”‚ β”œβ”€β”€ online.yaml
β”‚ β”œβ”€β”€ offline.yaml
β”‚ β”œβ”€β”€ deepseek_api-key.txt.example
β”‚ └── deepseek_api-key.txt (gitignored)
β”‚
β”œβ”€β”€ sonar_ts/
β”‚ β”œβ”€β”€ pipeline.py
β”‚ β”œβ”€β”€ planner.py
β”‚ β”œβ”€β”€ generator.py
β”‚ β”œβ”€β”€ executor.py
β”‚ β”œβ”€β”€ evaluator.py
β”‚ β”œβ”€β”€ llm.py
β”‚ β”œβ”€β”€ schema.py
β”‚ β”œβ”€β”€ storage.py
β”‚ β”œβ”€β”€ offline.py
β”‚ β”œβ”€β”€ prompts/
β”‚ β”œβ”€β”€ postprocess/
β”‚ └── skills/
β”‚
β”œβ”€β”€ scripts/
β”‚ β”œβ”€β”€ download_dataset.py
β”‚ β”œβ”€β”€ load_benchmark.py
β”‚ β”œβ”€β”€ build_index.py
β”‚ β”œβ”€β”€ run_benchmark.py
β”‚ └── render_samples.py
β”‚
β”œβ”€β”€ cold_start/
β”‚ β”œβ”€β”€ orchestrator.py
β”‚ β”œβ”€β”€ run_cold_start.py
β”‚ β”œβ”€β”€ download_train_data.py
β”‚ β”œβ”€β”€ agents/
β”‚ β”œβ”€β”€ train_data/
β”‚ └── discovered_skills/
β”‚
β”œβ”€β”€ nlqtsbench/
β”‚ β”œβ”€β”€ tasks.json
β”‚ β”œβ”€β”€ predict_perfect.json
β”‚ └── ts_data/ 
β”‚
β”œβ”€β”€ docs/figures/
β”‚
β”œβ”€β”€ databases/ 
└── output/

See cold_start/README.md and nlqtsbench/README.md for sub-system details.

πŸ“‘ Citation

@misc{tan2026sonartssearchthenverifynaturallanguage,
      title={Sonar-TS: Search-Then-Verify Natural Language Querying for Time Series Databases}, 
      author={Zhao Tan and Yiji Zhao and Shiyu Wang and Chang Xu and Yuxuan Liang and Xiping Liu and Shirui Pan and Ming Jin},
      year={2026},
      eprint={2602.17001},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2602.17001}, 
}

πŸ›οΈ Affiliations

Griffith University Jiangxi University of Finance and Economics Yunnan University ByteDance Microsoft Research Asia HKUST (Guangzhou)

About

Search-Then-Verify natural language querying over time series databases (ICML 2026). Includes NLQTSBench.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages