OBELISK: Efficient Offline Query Planning with Bayesian Optimization-Informed Language Model Reasoning

Z. Pan, W. Sun, Y. Zhang, T. Purcell, Y. Dong, C. Yang, R. Zhang, X. Zhou, J. Xu PVLDB 2026

This repository contains the implementation of OBELISK, a system for offline query plan optimization in TiDB. OBELISK integrates Bayesian Optimization with language model reasoning to efficiently navigate the high-dimensional space of cost-factor configurations, reducing optimization overhead while discovering high-performance plans.

What This Repository Does

OBELISK searches TiDB optimizer cost-factor configurations for each SQL query and records the best-performing plan/runtime profile.

Core flow:

Run baseline execution for a query.
Warm start with Latin Hypercube samples.
Run BO (tcbo or vanilla_gp) with LLM-guided proposals.
Persist per-query results and summary statistics.

Quick Start

Installation

uv venv .venv
source .venv/bin/activate
uv sync

Create env file:

cp .env.copy .env

Edit .env with your TiDB connection and OpenAI API key:

OPENAI_API_KEY=...
TIDB_HOST=...
TIDB_PORT=4000
TIDB_USER=root
TIDB_PASSWORD=...
TIDB_DB_NAME=...
CA_PATH=...  # Optional

TiDB Setup

1. Install TiUP

curl --proto '=https' --tlsv1.2 -sSf https://tiup-mirrors.pingcap.com/install.sh | sh
source ~/.bashrc
which tiup

2. Deploy Local TiDB Cluster

Requires TiDB version ≥ v8.5 with the --tag obelisk flag. A configuration file must be specified to disable Coprocessor Cache via the tikv-client.copr-cache configuration, preventing inaccurate execution plan performance measurements:

tiup playground v8.5.3 --db 1 --pd 1 --kv 1 --tiflash 0 --tag obelisk --db.config /path/to/tidb.toml

Add the following settings to /path/to/tidb.toml:

[tikv-client.copr-cache]
capacity-mb = 0.0

3. Test Connection

mysql --comments --host 127.0.0.1 --port 4000 -u root

4. Configure Environment

Fill TiDB connection fields in .env (TIDB_HOST, TIDB_PORT, TIDB_USER, TIDB_PASSWORD, TIDB_DB_NAME, CA_PATH).

Run Optimization

uv run src/run.py \
  --sql-dir sql/job \
  --results-dir results/job-run \
  --trials 15 \
  --warm_times 10 \
  --strategy tcbo

Usage

Command Line Options

Option	Description
`--sql-dir`	Directory with SQL files
`--results-dir`	Output directory (`results/<sql-dir-name>`)
`--trials`	Total optimization iterations
`--warm_times`	Warm-start iterations
`--strategy`	BO strategy (`tcbo` or `vanilla_gp`)

Code Structure

obelisk-oqo/
├── src/
│   ├── run.py
│   ├── db/                # TiDB connection and SQL execution
│   ├── llm/               # LLM prompting and config generation
│   ├── optimization/      # BO strategies and optimization pipeline
│   ├── test/              # Script-style validation tools
│   └── util/              # Shared config/constants/log helpers
├── sql/                   # Workload SQL files
├── results/               # Output artifacts (ignored in git)
├── logs/                  # Runtime logs (ignored in git)
└── pyproject.toml

Citation

If you use OBELISK in your research, please cite:

@article{pan2026obelisk,
  title={OBELISK: Efficient Offline Query Planning with Bayesian Optimization-Informed Language Model Reasoning},
  author={Pan, Z. and Sun, W. and Zhang, Y. and Purcell, T. and Dong, Y. and Yang, C. and Zhang, R. and Zhou, X. and Xu, J.},
  journal={Proceedings of the VLDB Endowment},
  volume={19},
  number={12},
  year={2026}
}

License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
sql		sql
src		src
.env.copy		.env.copy
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
run.sh		run.sh
uv.lock		uv.lock
uv.toml		uv.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OBELISK: Efficient Offline Query Planning with Bayesian Optimization-Informed Language Model Reasoning

What This Repository Does

Quick Start

Installation

TiDB Setup

1. Install TiUP

2. Deploy Local TiDB Cluster

3. Test Connection

4. Configure Environment

Run Optimization

Usage

Command Line Options

Code Structure

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OBELISK: Efficient Offline Query Planning with Bayesian Optimization-Informed Language Model Reasoning

What This Repository Does

Quick Start

Installation

TiDB Setup

1. Install TiUP

2. Deploy Local TiDB Cluster

3. Test Connection

4. Configure Environment

Run Optimization

Usage

Command Line Options

Code Structure

Citation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages