Skip to content

yale-nlp/OpenT2T

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OpenT2T

OpenT2T is the codebase for the EMNLP 2024 System Demonstrations paper "OpenT2T: An Open-Source Toolkit for Table-to-Text Generation".

It is a reproducible benchmarking toolkit for table-to-text generation and table-grounded question answering with prompt construction, local inference backends, evaluation, reporting, and run manifests.

This codebase has also been recently updated to stay compatible with newer model families and the current vLLM inference pipeline.

The repository is intentionally dependency-light by default so the full pipeline can run locally against fixture datasets and a deterministic mock backend. Optional integrations are available for vllm and bert-score.

What the toolkit covers

OpenT2T supports:

  • dataset preparation and normalization
  • prompt construction for zero-shot and few-shot settings
  • local deterministic smoke runs through mock backends
  • vLLM-based inference for real models
  • lexical, semantic, and model-based evaluation
  • reporting through CSV, Markdown, and JSON run artifacts
  • model registries and static model packs

The main benchmark datasets currently wired into the framework are:

  • logicnlg
  • totto
  • hitabng
  • hitabqa
  • rotowire
  • numericnlg
  • scigen
  • fetaqa
  • qtsumm

Installation

OpenT2T requires Python 3.11 or newer.

Recommended reproducible environment

python3.11 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip setuptools wheel
python -m pip install -e .

After this, the opent2t CLI is available inside the environment.

Minimal local installation

python3 -m pip install -e .

Optional extras

python3 -m pip install -e .[vllm]
python3 -m pip install -e .[semantic]

Additional metric/runtime dependencies:

  • tapas_scores needs torch, transformers, and pandas
  • autoacu_scores needs autoacu and its runtime dependencies

Full environment for model and metric reproduction

If you want a closer-to-full experiment environment rather than the lightweight default install, use:

python3.11 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip setuptools wheel
python -m pip install -e .[semantic]
python -m pip install -e .[vllm]
python -m pip install torch transformers pandas autoacu

Notes:

  • install vllm only on a machine with a compatible CUDA and PyTorch stack
  • autoacu and tapas_scores are optional; the smoke pipeline does not require them
  • for CPU-only local sanity checks, the base install plus mock models is enough

Quick start

Run the built-in smoke benchmark:

python3 -m opent2t.cli benchmark --run-config configs/runs/smoke.json
python3 -m unittest discover -s tests -v

List the active model and pack surface:

python3 -m opent2t.cli list-models
python3 -m opent2t.cli list-packs

Typical CLI workflow

Step 1: prepare a dataset split

opent2t prepare-dataset --dataset totto --split dev

This uses the dataset config in configs/datasets/totto.json and writes a normalized JSONL file.

Step 2: build prompts

opent2t build-prompts \
  --dataset totto \
  --split dev \
  --model mock-chat \
  --mode zero_shot \
  --run-dir artifacts/runs/totto_dev_demo

Step 3: run inference

opent2t run-inference \
  --run-dir artifacts/runs/totto_dev_demo \
  --model mock-chat \
  --batch-size 4

Step 4: evaluate the run

opent2t evaluate \
  --run-dir artifacts/runs/totto_dev_demo \
  --metrics bleu rouge_l meteor

One-command benchmark

opent2t benchmark --run-config configs/runs/smoke.json

Data and config layout

Important directories:

  • configs/datasets/ contains dataset configs
  • configs/models/ contains model specs
  • configs/packs/ contains named model packs
  • prompts/ contains prompt templates and exemplar pools
  • fixtures/raw/ contains local fixture data for smoke tests
  • artifacts/runs/ contains run outputs

Dataset configs point at a raw_dir. For local use, you can either:

  1. place the expected raw or normalized JSONL files in the configured path, or
  2. edit the dataset config to point at your own local dataset location

The batch scripts can also download processed JSONL splits from a Hugging Face dataset repo specified through DATASET_REPO.

Model selection

The checked-in active packs are:

  • default
  • smoke
  • specialist
  • mock

You can inspect them with:

opent2t list-packs
opent2t preflight-models --pack default

Running experiments with Slurm

The repository includes two Slurm entrypoints:

  • scripts/model_benchmark_zero_shot.sbatch
  • scripts/qwen3_dataset_eval.sbatch

Important:

  • The checked-in scripts contain site-specific defaults in their #SBATCH headers.
  • Other users should override account, partition, log path, and environment paths on the sbatch command line, or adapt a local copy of the script.
  • Do not hardcode tokens in scripts. Export them from your shell and pass them through --export.

Generic single-job submission

export HF_TOKEN=<your_hf_token>

sbatch \
  -A <account> \
  -p <partition> \
  --gres=gpu:1 \
  -o /path/to/logs/%x-%j.out \
  --export=ALL,HF_TOKEN_VALUE=$HF_TOKEN,REPO_ROOT=/path/to/opent2t,ENV_ROOT=/path/to/env,DATASET_REPO=<hf_user>/<dataset_repo>,RUN_ROOT=/path/to/runs,RAW_ROOT=/path/to/raw \
  scripts/model_benchmark_zero_shot.sbatch \
  qwen25_7b \
  totto \
  test

Useful exported overrides:

  • BATCH_SIZE
  • SAMPLE_SIZE
  • RUN_ROOT
  • RAW_ROOT
  • JOB_CACHE_ROOT

Example with a smaller sample:

sbatch \
  -A <account> \
  -p <partition> \
  --gres=gpu:1 \
  -o /path/to/logs/%x-%j.out \
  --export=ALL,HF_TOKEN_VALUE=$HF_TOKEN,REPO_ROOT=/path/to/opent2t,ENV_ROOT=/path/to/env,DATASET_REPO=<hf_user>/<dataset_repo>,BATCH_SIZE=8,SAMPLE_SIZE=100 \
  scripts/model_benchmark_zero_shot.sbatch \
  phi4_mini \
  logicnlg \
  test

Generic Qwen3 sweep submission

sbatch \
  -A <account> \
  -p <partition> \
  --gres=gpu:1 \
  -o /path/to/logs/%x-%j.out \
  --export=ALL,HF_TOKEN_VALUE=$HF_TOKEN,REPO_ROOT=/path/to/opent2t,ENV_ROOT=/path/to/env,DATASET_REPO=<hf_user>/<dataset_repo>,RUN_ROOT=/path/to/qwen3_runs,RAW_ROOT=/path/to/raw \
  scripts/qwen3_dataset_eval.sbatch \
  totto \
  test

Submit one job per benchmark dataset

for dataset in logicnlg totto hitabng hitabqa rotowire numericnlg scigen fetaqa qtsumm; do
  sbatch \
    -A <account> \
    -p <partition> \
    --gres=gpu:1 \
    -o /path/to/logs/%x-%j.out \
    --export=ALL,HF_TOKEN_VALUE=$HF_TOKEN,REPO_ROOT=/path/to/opent2t,ENV_ROOT=/path/to/env,DATASET_REPO=<hf_user>/<dataset_repo> \
    scripts/qwen3_dataset_eval.sbatch \
    "$dataset" \
    test
done

Matrix submission helper

If you want the repo to submit many jobs for you, use:

HF_TOKEN=<your_hf_token> \
PYTHONPATH=src \
python3 scripts/submit_model_benchmark_matrix.py \
  --ssh-host <cluster_login_host> \
  --repo-root /path/to/opent2t \
  --env-root /path/to/env \
  --run-root /path/to/runs \
  --include-qwen3

Useful flags:

  • --plan-only
  • --datasets logicnlg totto
  • --include-models qwen25_7b phi4_mini
  • --exclude-models internlm2_math_plus_20b

Monitoring jobs and results

Typical Slurm commands:

squeue -u <username>
sacct -j <job_id> --format=JobIDRaw,JobName,State,ExitCode -n -P
tail -f /path/to/logs/<job-name>-<job-id>.out

Typical run outputs:

  • Prompt text lives under prompts/.
  • Dataset fixtures live under fixtures/raw/.
  • Run outputs are written under artifacts/runs/.
  • Each completed run writes manifest.json, prompts.jsonl, generations.jsonl, parsed_predictions.jsonl, metrics_per_example.jsonl, metrics_aggregate.json, summary.csv, and summary.md.

Citation

If you use this repository, please cite:

@inproceedings{zhang-etal-2024-opent2t,
    title = "{O}pen{T}2{T}: An Open-Source Toolkit for Table-to-Text Generation",
    author = "Zhang, Haowei  and
      Si, Shengyun  and
      Zhao, Yilun  and
      Xie, Lujing  and
      Xu, Zhijian  and
      Chen, Lyuhao  and
      Nan, Linyong  and
      Wang, Pengcheng  and
      Tang, Xiangru  and
      Cohan, Arman",
    editor = "Hernandez Farias, Delia Irazu  and
      Hope, Tom  and
      Li, Manling",
    booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.emnlp-demo.27/",
    doi = "10.18653/v1/2024.emnlp-demo.27",
    pages = "259--269",
    abstract = "Table data is pervasive in various industries, and its comprehension and manipulation demand significant time and effort for users seeking to extract relevant information. Consequently, an increasing number of studies have been directed towards table-to-text generation tasks. However, most existing methods are benchmarked solely on a limited number of datasets with varying configurations, leading to a lack of unified, standardized, fair, and comprehensive comparison between methods. This paper presents OpenT2T, the first open-source toolkit for table-to-text generation, designed to reproduce existing large language models (LLMs) for performance comparison and expedite the development of new models.We have implemented and compared a wide range of LLMs under zero- and few-shot settings on 9 table-to-text generation datasets, covering data insight generation, table summarization, and free-form table question answering. Additionally, we maintain a public leaderboard to provide insights for future work into how to choose appropriate table-to-text generation systems for real-world scenarios."
}

About

EMNLP 2024 Demo paper "OpenT2T: An Open-Source Toolkit for Table-to-Text Generation"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages