This repository contains evaluation tooling, datasets, and analysis scripts used for the "QCan Family Metrics" evaluation.
Important
The Results are also available online at: https://wimmics.github.io/t2s-eval/
- Provide the results of evaluating all the metrics of the t2s-metrics library on all the benchmarks from the Text2SPARQL Challenges 2025 and 2026.
- Analyze the behaviors of four frequently used execution-based and string-based metrics together with four QCan metrics.
- Produce dashboards of the results (interactive Streamlit or static HTML snapshots).
- Plot metric correlations and per-experiment summaries.
- Python 3.12 or later.
- uv (recommended for local development) or pip.
- A SPARQL endpoint only if you use execution metrics with a remote KG (for example QLever/Corese).
- Ollama only if you enable LLM-based metrics.
- QCan jar only if you use qcan-related metrics. The repository includes it under third_party_lib.
- NLTK data only if you use BLEU and METEOR realated metrics.
- Clone the repository:
git clone https://github.com/Wimmics/t2s-eval.git- Navigate to the project directory:
cd t2s-eval- Install dependencies using
uv:
uv syncdatasets/— JSONL evaluation files and per-dataset result directories (ck25, ck26, db25, db26, ...).docs/— exported dashboards and static pages.results_analysis/— CSV summaries and experiment notes.src/qcan_eval/— evaluation pipeline, metric calculation, merge tools, Streamlit interfaces.src/t2s_soa/— scripts for generating plots, inventories and paper-oriented helpers.third_party_lib/— redistributed third-party binaries (QCan jar).
The repository contains small scripts for common evaluation workflows under src/.
Run the interactive Streamlit dashboard for ad-hoc exploration:
streamlit run src/qcan_eval/web_interface.pyGenerate a static, self-contained dashboard snapshot (useful for sharing):
uv run src/qcan_eval/web_interface_static.py
# The exporter writes to docs/qcan-eval-static/index.htmlProduce metric correlation plots for a given merged result file:
uv run src/qcan_eval/metrics_corr_plots.py -r datasets/db26/results/db26-20260429-174352.json -om results/ -os results/ -oc results/Explore helper scripts under src/t2s_soa/ for inventory generation, paper figures, and bibliography transformations.
Evaluation inputs are JSON Lines (.jsonl) files with one JSON object per line. Each object should include at least:
id: unique example identifiergolden: reference SPARQL query (string)generated: system-generated SPARQL query (string)order_matters: boolean (whether result ordering matters)
Example files are available under datasets/*/eval/.
Results of metric runs are exported to datasets/{dataset}/results/ as timestamped JSON files.
- Ensure the relevant dataset folder is present under
datasets/(e.g.datasets/ck25/). - Run the following script and generate the results through the GUI:
streamlit run src/qcan_eval/web_interface.pyOr
- Run the following script to generate a markdown file with all the results:
uv run src/qcan_eval/generate_markdown_experiments.pyt2s-eval scripts under src are provided under the terms of the GNU Affero General Public License 3.0 (AGPL-3.0).
t2s-eval datasets under datasets/{dataset}/results, datasets/_streamlit and results_analysis are provided under the terms of the Creative Commons Attribution-ShareAlike 4.0 International (CC-BY-SA-4.0).
This repository provides several third-party contributions redistributed with their original licenses.
t2s-eval reuses the CK25 Corporate Knowledge Reference Dataset for Benchmarking Text-2-SPARQL QA Approaches that we modified to account for file format requirements (jsonl format).
The modified version is redistributed in directory datasets/ck25 under the terms of the Creative Commons Attribution 4.0 International license (CC-BY-4.0).
t2s-eval reuses the CK26, DB25 and DB26 that we modified to account for file format requirements (jsonl format).
The modified version is redistributed in directories datasets/ck26, datasets/db25 and datasets/db26 under the terms of the Creative Commons Attribution-ShareAlike 4.0 International (CC-BY-SA-4.0).
t2s-eval reuses the QCan software for canonicalising SPARQL queries.
QCan is written in Java. In this repository, we distribute the compiled jar of QCan v1.1, third_party_lib/qcan-1.1-jar-with-dependencies.jar, under the terms of the Apache 2.0 license.