Skip to content

wintechis/flex-rml

Repository files navigation

FlexRML - experimental. really fast. stability not guaranteed.

FlexRML is an experimental native C++ RML processor. The goal is to be fast and memory efficient.

Description

RML (RDF Mapping Language) is central to knowledge acquisition. FlexRML is a flexible RML processor able to run on a wide range of devices:

  • Cloud Environments
  • Consumer Hardware
  • Single Board Computers
  • Microcontrollers (Separate Repository)

Currently, FlexRML supports CSV, JSON, and XML logical sources. CSV is read as rows, JSON supports JSONPath-style iterators for object arrays, and XML supports XPath iterators through the shared source reader abstraction.

Performance

The benchmark numbers below are from a local run on a Ryzen 5 7500F with 6 cores using the default CMake build and the built-in benchmark runner:

python3 scripts/run_benchmarks.py --build --output-dir bench_res --repeats 3 --warmups 1

The run completed all benchmark cases with no failures. Times are average wall clock time per measured run. Memory is average peak RSS per run. Results are hardware and dataset dependent.

Category Cases Avg wall time Avg CPU Avg peak RSS Avg generated triples
GTFS 4 2.124 s 417% 1.62 GiB 12,868,472
duplicates 5 0.083 s 828% 53.1 MiB 1,000,016
empty 5 0.069 s 794% 54.8 MiB 1,000,000
join 25 0.096 s 101% 32.3 MiB 117,060
mappings 4 0.375 s 595% 206.7 MiB 7,625,000
namedgraph 15 1.083 s 712% 529.2 MiB 16,200,000
raw 9 2.358 s 813% 659.1 MiB 39,144,444

GTFS benchmark details:

Case Avg wall time Avg CPU Avg peak RSS Generated triples Output size
100_csv 5.709 s 343% 875.1 MiB 39,595,300 7.00 GiB
10_csv 0.529 s 346% 216.5 MiB 3,959,530 0.70 GiB
10_json 0.886 s 504% 1.23 GiB 3,959,530 0.70 GiB
10_xml 1.373 s 477% 4.17 GiB 3,959,530 0.70 GiB

Installation

Using Prebuilt Binaries

Prebuilt binaries for Debian based systems are available in the releases section.

Compiling from Source

Prerequisites We test on Ubuntu 24.04 LTS with GCC 13.3.

Install a C++ toolchain, CMake, and pkg-config:

sudo apt install build-essential cmake pkg-config

Native dependencies are managed with vcpkg manifest mode:

  • jsoncons
  • pugixml
  • serd
  • unordered-dense
  • xxhash

Install vcpkg, make sure vcpkg is on your PATH, then install dependencies from the project root:

vcpkg install

The CMake build detects dependencies in vcpkg_installed/<triplet> when you use vcpkg manifest mode. If you use a classic vcpkg checkout instead, configure CMake with the vcpkg toolchain file:

cmake --preset default -DCMAKE_TOOLCHAIN_FILE=$VCPKG_ROOT/scripts/buildsystems/vcpkg.cmake

Compilation Process:

  1. Clone or download the repository from GitHub and navigate to the project directory.
git clone git@github.com:wintechis/flex-rml.git
cd flex-rml
  1. Install C++ dependencies.
vcpkg install
  1. Build the native executable with CMake.
cmake --preset default
cmake --build --preset default

The build produces one executable:

./flexrml

Useful build overrides:

cmake --preset test
cmake --build --preset test
cmake --preset debug
cmake --build --preset debug
cmake --preset default -DVCPKG_TARGET_TRIPLET=x64-linux

Use the test preset for fast local rebuilds while working on code. It disables optimization and debug-symbol generation.

The build produces one CLI executable. Dependency linkage depends on the vcpkg triplet and system packages, the executable still depends on normal system runtime libraries such as libstdc++ and libc.

Versioning

The project version is set in CMakeLists.txt:

project(flexrml VERSION 3.0.0 LANGUAGES CXX)

CMake generates the runtime version header from that value. Check the built executable with:

./flexrml --version

Getting Started

To execute a mapping and print triples to stdout:

./flexrml -m mapping.rml.ttl

To write triples to a file:

./flexrml -m mapping.rml.ttl -o output.nt

Useful CLI options:

./flexrml -m mapping.rml.ttl -b http://example.com/base/
./flexrml -m mapping.rml.ttl --no-threading
./flexrml -m mapping.rml.ttl -gp
./flexrml --version
./flexrml --help

Architecture

FlexRML is structured as a frontend/backend pipeline. The frontend parses and normalizes mappings into an intermediate representation. The backend plans, optimizes, and executes typed programs against source readers.

The intended layering is:

frontend -> backend/planner -> backend/optimizer -> backend/program -> backend/source -> backend/executor

Source handling is implemented in C++ under src/flexrml/backend/source/. CSV, JSON, and XML are exposed to the executors through the same row-oriented interface.

Conformance

FlexRML passes the configured validation categories for RML-Core JSON cases and RML-FNML cases. The test data itself is not tracked in this repository, copy the suites into test_cases/ before running validation.

The runtime is C++. Python is only used for validation tooling. To run conformance validation, install the Python test dependency:

python3 -m venv env
source env/bin/activate
pip install -r requirements.txt

Place the official test cases in test_cases/. Category subfolders such as test_cases/rml-core/ and test_cases/rml-fnml/ are supported. Build and run validation through CMake with:

cmake --build --preset test --target validate

You can also run the validator directly:

python scripts/validate_test_cases.py

To run the validator against the test binary explicitly:

FLEXRML_BINARY=./flexrml_test python scripts/validate_test_cases.py

You can also run a category or a single case by name:

python scripts/validate_test_cases.py rml-core
python scripts/validate_test_cases.py rml-core/RMLTC0000-JSON

The validator also generates a Markdown report at validation_report.md.

Benchmarking

Benchmark cases live under benchmark/. Each case directory must contain a mapping.rml.ttl file. Run the benchmark suite with warmups and repeated measured runs:

cmake --build --preset default
python scripts/run_benchmarks.py --repeats 5 --warmups 1

There is also a CMake target for the default benchmark run:

cmake --build --preset default --target benchmark

For focused optimization work, run only selected cases:

python scripts/run_benchmarks.py --case namedgraph --case mappings_10_5 --repeats 5 --warmups 1

The script prints wall time and peak RSS for each run, writes CSV files to benchmark/results/, and removes generated .nt files by default. Use --keep-outputs when you need to inspect generated triples. Compare a candidate result against a baseline with:

python scripts/compare_benchmarks.py benchmark/results/baseline.csv benchmark/results/candidate.csv

To fail a check when any common case regresses by at least 10 percent:

python scripts/compare_benchmarks.py benchmark/results/baseline.csv benchmark/results/candidate.csv --fail-wall-regression 10

Microcontroller Compatible Version

For those working with Microcontrollers like ESP32, we have a dedicated version of this project. It's made specifically for compatibility with the Arduino IDE. You can access it and find detailed instructions for setup and use at the following link: FlexRML ESP32 Repository

Citation

If you use this work in your research, please cite it as:

@article{Freund_FlexRML_A_Flexible_2024,
  author = {Freund, Michael and Schmid, Sebastian and Dorsch, Rene and Harth, Andreas},
  journal = {Extended Semantic Web Conference},
  title = {{FlexRML: A Flexible and Memory Efficient Knowledge Graph Materializer}},
  year = {2024}
}
@article{Freund_efficient_construction_2025,
  author = {Freund, Michael and Schmid, Sebastian and Harth, Andreas},
  journal = {Linking Meaning: Semantic Technologies Shaping the Future of AI},
  title = {{Efficient Knowledge Graph Construction Based on Optimized Plans}},
  year = {2025}
}

Licenses

Project License

This project is licensed under the GNU Affero General Public License version 3 (AGPLv3). The full text of the license can be found in the LICENSE file in this repository.

External C++ Libraries

This project uses external C++ libraries managed through vcpkg:

  • Serd is licensed under the ISC License.
  • jsoncons is licensed under the Boost Software License 1.0.
  • pugixml is licensed under the MIT License.
  • xxHash is licensed under the BSD 2-Clause License.
  • unordered_dense is licensed under the MIT License.

About

FlexRML: A Memory-Efficient Interpreter for RML.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages