GitHub - dexmal/realtime-vla-flash

Page | Paper | Model

News

[2026/05] 🔥 Realtime-VLA FLASH code is now available.

Highlights

Realtime-VLA FLASH is the first speculative inference framework for diffusion-based VLAs.

Speculative inference as fast as 7.8 ms (2 views), enabling over 125 Hz real-time inference.
VLM-aligned draft architecture with a deployment-friendly block design.
FLASH serving with customized Triton kernels, achieving a 3.04× average task-level speedup.

Installation

Follow openpi README:

git clone --recurse-submodules https://github.com/dexmal/realtime-vla-flash
# Or if you already cloned the repo:
git submodule update --init --recursive

Install the Python environment with uv:

GIT_LFS_SKIP_SMUDGE=1 uv sync
GIT_LFS_SKIP_SMUDGE=1 uv pip install -e .

LIBERO client/evaluation code can run in a separate environment. (see LIBERO README).

Quick Start

First, convert the pretrained pi0 and draft checkpoints into the Triton weight layout.

uv run scripts/spec/triton/convert_for_triton.py \
   --mode base \
   --jax-path /path/to/jax/checkpoint \
   --output converted/base

uv run scripts/spec/triton/convert_for_triton.py \
   --mode draft \
   --draft-ckpt /path/to/draft_model.pt \
   --output converted/draft

Then start the policy server and the LIBERO client.

uv run scripts/spec/spec_serve_policy.py \
  --config pi0_libero \
  --base-triton-path converted/base \
  --draft-triton-path converted/draft \
  --task-suite-name libero_goal \
  --backend triton

uv run scripts/spec/spec_client_libero.py \
  --task-suite-name libero_goal

Benchmark

You can check the inference time on your local machine by

uv run python scripts/spec/pi0_benchmark.py

Train Draft Model

  uv run scripts/spec/enc_cache.py \
    --config pi0_libero \
    --checkpoint-dir /openpi-assets/checkpoints/pi0_libero_torch \
    --task-suite-name libero_goal \
    --output-dir /tmp/spec_quickstart_train/libero_goal_cache

  uv run scripts/spec/spec_draft_train.py \
    --cache-dir /tmp/spec_quickstart_train/libero_goal_cache \
    --output draft_model_goal_torch.pt

A typical workflow is:

Build a prefix-embedding cache with scripts/spec/enc_cache.py.
Train the draft head with scripts/spec/spec_draft_train.py.
Serve the FLASH policy with scripts/spec/spec_serve_policy.py.
Run LIBERO client evaluation or sweeps with scripts/spec/spec_client_libero.py or scripts/spec/exp/run_sweep.py.

Citation

If you find this work useful, please cite the paper once the arXiv version is available:

@article{niu2026realtimevlaflash,
  title={Realtime-VLA FLASH: Speculative Inference Framework for Diffusion-based VLAs},
  author={Niu, Jiahui and Gu, Kefan and Zhao, Yucheng and Liang, Shengwen and Wang, Tiancai and Hu, Xing and Wang, Ying and Li, Huawei},
  journal={arXiv preprint arXiv:2605.13778},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github		.github
.vscode		.vscode
docs		docs
examples		examples
packages/openpi-client		packages/openpi-client
scripts		scripts
src/openpi		src/openpi
third_party		third_party
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
LICENSE_GEMMA.txt		LICENSE_GEMMA.txt
README.md		README.md
README_OPENPI.md		README_OPENPI.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

News

Highlights

Installation

Quick Start

Benchmark

Train Draft Model

Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

News

Highlights

Installation

Quick Start

Benchmark

Train Draft Model

Citation

Acknowledgements

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages