[EMNLP 2025 Findings]TabDSR: Decompose, Sanitize, and Reason for Complex Numerical Reasoning in Tabular Data

Our paper “TabDSR: Decompose, Sanitize, and Reason for Complex Numerical Reasoning in Tabular Data” has been accepted to EMNLP 2025 Findings 🎉.

We strongly recommend referring to our run example to implement our results.

Environment

conda create -n tabdsr python=3.11
conda activate tabdsr
pip install -r requirements.txt

Generate Result of Different Models

Command-Line Arguments: --agent_mode: Specifies the mode (DP, PoT, or CoT). You can refer to our example. --tablebenchMode: Same as --agent_mode. -mn: Model name, which needs to be modified in ./dao/LLMCaller.py. -dn: Dataset name. Choose from ["TableBenchFix", "TatQa", "CalTab151"]. -tp: Save path. -e: Load mode. You can refer to our example. -ln: Logger name.

Examples

Running DP Prompt

python infer.py --agent_mode raw --tablebenchMode DP -mn meta-llama/Llama-2-7b-chat-hf -dn TableBenchFix -tp ./Archive/{DatasetName}/meta-llama/Llama-2-7b-chat-hf -e llama -ln test

Running CoT Prompt

python infer.py --agent_mode raw --tablebenchMode CoT -mn meta-llama/Llama-2-7b-chat-hf -dn TableBenchFix -tp ./Archive/{DatasetName}/meta-llama/Llama-2-7b-chat-hf -e llama -ln test

Running TabDSR

Local model path

python infer.py --agent_mode 1+2+3 --tablebenchMode TCoT -mn qwen2.5 -dn TableBenchFix -tp ./Archive/{DatasetName}/qwen2.5 -e qwen -ln test

openai, deepseek or VLLM (recommend)

DeepSeek:

OPENAI_API_KEY="Your key" OPENAI_API_BASE="https://api.deepseek.com" python infer.py --agent_mode 1+2+3 --tablebenchMode PoT -mn deepseek-chat -dn TableBenchFix -tp ./Archive/deepseek-chat -e openai -ln test

OPENAI_API_KEY="Your key" python infer.py --agent_mode 1+2+3 --tablebenchMode PoT -mn gpt-4o -dn TableBenchFix -tp ./Archive/{DatasetName} -e openai -ln test

VLLM:

OPENAI_API_KEY="Your key" OPENAI_API_BASE="VLLM url" python infer.py --agent_mode 1+2+3 --tablebenchMode PoT -mn Qwen/Qwen2.5-7B-Instruct -dn TableBenchFix -tp ./Archive/{DatasetName} -e openai -ln test

Evaluation

dp: The path of the result file.
dn: Dataset name. Choose from ["TableBench", "TatQa", "CalTab151"].

For example:

python ./evaluates/Evaluator.py -dp ./Archive/deepseek-chat/raw_TatQa_deepseek-ai_DeepSeek-V3.json -dn TableBench

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Archive/deepseek-chat		Archive/deepseek-chat
EvaluationCode		EvaluationCode
args		args
dao		dao
evaluates		evaluates
models		models
prompts		prompts
tools		tools
.DS_Store		.DS_Store
.gitignore		.gitignore
Converter.py		Converter.py
README.md		README.md
data_compute.py		data_compute.py
infer.py		infer.py
logger.py		logger.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

[EMNLP 2025 Findings]TabDSR: Decompose, Sanitize, and Reason for Complex Numerical Reasoning in Tabular Data

Environment

Generate Result of Different Models

Examples

Running DP Prompt

Running CoT Prompt

Running TabDSR

openai, deepseek or VLLM (recommend)

Evaluation

About

Uh oh!

Releases

Packages

Languages

arnodjiang/TabDSR

Folders and files

Latest commit

History

Repository files navigation

[EMNLP 2025 Findings]TabDSR: Decompose, Sanitize, and Reason for Complex Numerical Reasoning in Tabular Data

Environment

Generate Result of Different Models

Examples

Running DP Prompt

Running CoT Prompt

Running TabDSR

openai, deepseek or VLLM (recommend)

Evaluation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages