LexBench-CS: LLM benchmarking for Czech legal reasoning tasks

Supported by the Technology Agency of the Czech Republic project:
Applied Legal Language Model and Benchmarks for Legal Practice (FW11020230)

This repository contains inference and evaluation code for the Czech Law Multiple-Choice Benchmark (CLMC).

The dataset is not publicly available (yet). Please contact us to request access for academic purposes.

Installation

This guide assumes a SLURM cluster environment. Adapting the setup to other environments should be straightforward.

Load required modules:

ml vLLM/0.12.0-foss-2025a-CUDA-12.8.0
ml Triton/3.5.0-gfbf-2025a-CUDA-12.8.0

Create a Python virtual environment:

python -m venv .venv

Create a .env file and set the OPENAI_API_KEY environment variable.

Activate the environment:

source .venv/bin/activate

Install required packages:

pip install -r requirements.txt

Install vLLM manually if your environment does not provide a module.

Run Inference

Local models using vLLM

To easily switch between different models, the benchmarking suite uses AIC vLLM Proxy server, which encapsulates vllm serve and provides Ollama-like model management. Currently, only a single model can be served at a time.

Run the AIC vLLM Proxy server. Select (or create) a configuration matching the models you want to evaluate. The following example runs the proxy on the CTU RCI cluster using an NVIDIA H200 GPU:

cd slurm
sbatch vllm_proxy_1h200.batch

SLURM logs are stored in the logs/ directory.

Configuration is done by editing the main() function in:

src/lexbench_cs/run_clmc_inference.py

Key parameters:

PROXY_URL: proxy connection string
MODEL2SPEC: target LLM definitions
TEMPLATE_NAMES: selected evaluation prompt templates

Run inference:

cd slurm
sbatch run_clmc_inference.batch

Results will be stored in the EXP/ directory.

OpenAI models

Edit:

src/lexbench_cs/run_clmc_inference_openai.py

Then run:

cd slurm
sbatch run_clmc_inference_openai.batch

Run Evaluation

cd slurm
sbatch run_evaluate.batch

Aggregated results (Markdown and LaTeX tables) are stored in:

EXP/clmc/evaluation.md
EXP/clmc/evaluation.tex

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data/templates		data/templates
logs		logs
slurm		slurm
src/lexbench_cs		src/lexbench_cs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
init_environment_default.sh		init_environment_default.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LexBench-CS: LLM benchmarking for Czech legal reasoning tasks

Installation

Run Inference

Local models using vLLM

OpenAI models

Run Evaluation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LexBench-CS: LLM benchmarking for Czech legal reasoning tasks

Installation

Run Inference

Local models using vLLM

OpenAI models

Run Evaluation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages