Skip to content

Releases: RobotStudyCompanion/Benchmark_LM

v0.1

Choose a tag to compare

@mbz4 mbz4 released this 21 Apr 13:17

First tagged release accompanying the paper
"Benchmarking Local Language Models for Social Robots using Edge Devices"
(accepted IEEE ARSO 2026).

Release summary. Reproducible benchmark suite covering 25 open-source
language models on Raspberry Pi 4, Raspberry Pi 5, and laptop-GPU hosts.
Evaluates inference efficiency (TPS, TPJ), knowledge (six-category MMLU
subset), and teaching effectiveness (LLM-rated against eight criteria,
validated by five human raters).

Accompanying data record: https://doi.org/10.5281/zenodo.19643021

Highlights since dorian-original:

  • Consolidated per-platform runners and analysers from the development
    repository (orlandossss/Master_Benchmark, archiving).
  • Disk-I/O telemetry on the Raspberry Pi runners, matching the data
    published in the Zenodo record.
  • Linux-only packaging with pinned requirements.txt and setup.sh.
  • Syntax-check CI workflow on push and pull request.
  • Apache-2.0 licence, CITATION.cff, hardened .gitignore.

Known scope: the three benchmark runners and three analysers remain
separate per-platform scripts for v0.1. Consolidation into a single
platform-aware runner is scoped for v0.2 — see future_work/ for the
broader forward-looking roadmap.

Full Changelog: dorian-original...v0.1

Dorian's original Benchmarking_LLM suite

Choose a tag to compare

@mbz4 mbz4 released this 21 Apr 11:18