T2S Eval — T2S-Metrics Evaluation

This repository contains evaluation tooling, datasets, and analysis scripts used for the "QCan Family Metrics" evaluation.

Important

The Results are also available online at: https://wimmics.github.io/t2s-eval/

Features

Provide the results of evaluating all the metrics of the t2s-metrics library on all the benchmarks from the Text2SPARQL Challenges 2025 and 2026.
Analyze the behaviors of four frequently used execution-based and string-based metrics together with four QCan metrics.
Produce dashboards of the results (interactive Streamlit or static HTML snapshots).
Plot metric correlations and per-experiment summaries.

Prerequisites

Python 3.12 or later.
uv (recommended for local development) or pip.
A SPARQL endpoint only if you use execution metrics with a remote KG (for example QLever/Corese).
Ollama only if you enable LLM-based metrics.
QCan jar only if you use qcan-related metrics. The repository includes it under third_party_lib.
NLTK data only if you use BLEU and METEOR realated metrics.

For development (editable install):

Clone the repository:

git clone https://github.com/Wimmics/t2s-eval.git

Navigate to the project directory:

cd t2s-eval

Install dependencies using uv:

uv sync

Repository layout

datasets/ — JSONL evaluation files and per-dataset result directories (ck25, ck26, db25, db26, ...).
docs/ — exported dashboards and static pages.
results_analysis/ — CSV summaries and experiment notes.
src/qcan_eval/ — evaluation pipeline, metric calculation, merge tools, Streamlit interfaces.
src/t2s_soa/ — scripts for generating plots, inventories and paper-oriented helpers.
third_party_lib/ — redistributed third-party binaries (QCan jar).

Common tasks / Usage

The repository contains small scripts for common evaluation workflows under src/.

Run the interactive Streamlit dashboard for ad-hoc exploration:

streamlit run src/qcan_eval/web_interface.py

Generate a static, self-contained dashboard snapshot (useful for sharing):

uv run src/qcan_eval/web_interface_static.py

# The exporter writes to docs/qcan-eval-static/index.html

Produce metric correlation plots for a given merged result file:

uv run src/qcan_eval/metrics_corr_plots.py -r datasets/db26/results/db26-20260429-174352.json -om results/ -os results/ -oc results/

Explore helper scripts under src/t2s_soa/ for inventory generation, paper figures, and bibliography transformations.

Datasets and expected formats

Evaluation inputs are JSON Lines (.jsonl) files with one JSON object per line. Each object should include at least:

id: unique example identifier
golden: reference SPARQL query (string)
generated: system-generated SPARQL query (string)
order_matters: boolean (whether result ordering matters)

Example files are available under datasets/*/eval/.

Results of metric runs are exported to datasets/{dataset}/results/ as timestamped JSON files.

Reproducing experiments from this repo

Ensure the relevant dataset folder is present under datasets/ (e.g. datasets/ck25/).
Run the following script and generate the results through the GUI:

streamlit run src/qcan_eval/web_interface.py

Or

Run the following script to generate a markdown file with all the results:

uv run src/qcan_eval/generate_markdown_experiments.py

License

T2S-Eval

Software

t2s-eval scripts under src are provided under the terms of the GNU Affero General Public License 3.0 (AGPL-3.0).

Datasets

t2s-eval datasets under datasets/{dataset}/results, datasets/_streamlit and results_analysis are provided under the terms of the Creative Commons Attribution-ShareAlike 4.0 International (CC-BY-SA-4.0).

Redistribution of third-party software and data

This repository provides several third-party contributions redistributed with their original licenses.

CK25 Dataset

t2s-eval reuses the CK25 Corporate Knowledge Reference Dataset for Benchmarking Text-2-SPARQL QA Approaches that we modified to account for file format requirements (jsonl format).

The modified version is redistributed in directory datasets/ck25 under the terms of the Creative Commons Attribution 4.0 International license (CC-BY-4.0).

CK26, DB25 and DB26 Datasets

t2s-eval reuses the CK26, DB25 and DB26 that we modified to account for file format requirements (jsonl format).

The modified version is redistributed in directories datasets/ck26, datasets/db25 and datasets/db26 under the terms of the Creative Commons Attribution-ShareAlike 4.0 International (CC-BY-SA-4.0).

QCan library

t2s-eval reuses the QCan software for canonicalising SPARQL queries.

QCan is written in Java. In this repository, we distribute the compiled jar of QCan v1.1, third_party_lib/qcan-1.1-jar-with-dependencies.jar, under the terms of the Apache 2.0 license.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LICENSES		LICENSES
datasets		datasets
docs		docs
results_analysis		results_analysis
src		src
third_party_lib		third_party_lib
.gitignore		.gitignore
.python-version		.python-version
CITATION.cff		CITATION.cff
README.md		README.md
codemeta.json		codemeta.json
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

T2S Eval — T2S-Metrics Evaluation

Features

Prerequisites

For development (editable install):

Repository layout

Common tasks / Usage

Datasets and expected formats

Reproducing experiments from this repo

License

T2S-Eval

Software

Datasets

Redistribution of third-party software and data

CK25 Dataset

CK26, DB25 and DB26 Datasets

QCan library

About

Uh oh!

Releases 1

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

T2S Eval — T2S-Metrics Evaluation

Features

Prerequisites

For development (editable install):

Repository layout

Common tasks / Usage

Datasets and expected formats

Reproducing experiments from this repo

License

T2S-Eval

Software

Datasets

Redistribution of third-party software and data

CK25 Dataset

CK26, DB25 and DB26 Datasets

QCan library

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Contributors

Uh oh!

Languages