Reproduce / study with/without backdoor training & evaluation of Vision-Language-Action models on LIBERO.Supports building language / visual / joint (vision+language) backdoor datasets, plus example evaluation & finetune scripts.
Optional deps (READ ME)
flash-attn,accelerate,deepspeed,bitsandbytesare all optional.- Recommended for reproducibility: install
accelerateandbitsandbytes(we default to these in examples).- If you skip either one, see §6 No-Quant / No-Accelerate to adjust launch flags or model construction.
- 0) Conventions & ENV VARS
- 1) Quickstart
- 2) Full Installation (layered & optional)
- 3) Evaluate
- 4) Build Backdoor Datasets
- 5) Finetune
- 6) No-Quant / No-Accelerate
- 7) Reproducibility
- 8) Troubleshooting / FAQ
- 9) Repo Layout & Scripts
- 10) Citation
- 11) License
To avoid long paths and reduce path mistakes, set:
export ENV_NAME=openvla-oft
export ROOT=$HOME/openvla-oft
export DATA_DIR=$ROOT/datasets/openvla
export RUN_DIR=$ROOT/RUN
export LIBERO_PATH=$ROOT/LIBERO
To avoid duplication, finetune commands live only in §5. Quickstart shows install + a minimal evaluate.When you're ready to train, jump to §5 Finetune.
# Conda env
conda create -n $ENV_NAME python=3.9 -y
conda activate $ENV_NAME
# PyTorch (pick the right command for your system: https://pytorch.org/get-started/locally/)
pip install torch torchvision torchaudio
# Clone & editable install
git clone https://github.com/moojink/openvla-oft.git $ROOT
cd $ROOT
pip install -e .
# --- Core pinned deps (install directly; no requirements.txt) ---
pip install "transformers==4.54.1" "peft==0.16.0" "tokenizers==0.21.4"
# Recommended (we assume these in examples; helps reproducibility):
pip install accelerate
pip install "bitsandbytes==0.46.1"
# Fully optional accelerators (ok to skip if they fail on your system):
pip install ninja packaging
pip install "flash-attn==2.5.5" deepspeed
git clone https://github.com/Lifelong-Robot-Learning/LIBERO.git $LIBERO_PATH
pip install -e $LIBERO_PATH
pip install -r experiments/robot/libero/libero_requirements.txt
- Dataset (Hugging Face): https://huggingface.co/datasets/Holomegaknight/openvla-oft-backdoor/tree/mainDownload to:
$DATA_DIR/modified_libero_rlds - Checkpoints: use your own, or
openvla/openvla-7b, or any checkpoint under$RUN_DIR.
Expected structure (example):
$ROOT/
RUN/
LIBERO/
datasets/openvla/
modified_libero_rlds/
libero_spatial_no_noops_...
# Make LIBERO visible
export PYTHONPATH=$LIBERO_PATH:$PYTHONPATH
CUDA_VISIBLE_DEVICES=0 python experiments/robot/libero/run_libero_eval.py \
--pretrained_checkpoint $RUN_DIR/vl5p00 \
--task_suite_name libero_spatial
Ready to finetune? Jump to §5 Finetune.
- Core (pinned):
transformers==4.54.1,peft==0.16.0,tokenizers==0.21.4 - Nice to have (recommended for reproducibility):
accelerate– convenient multi-process launcherbitsandbytes– 4-bit quantization
- Fully optional:
flash-attn==2.5.5– speedups (skip if it fails)deepspeed– large-scale training
export PYTHONPATH=$LIBERO_PATH:$PYTHONPATH
CUDA_VISIBLE_DEVICES=0 python experiments/robot/libero/run_libero_eval.py \
--pretrained_checkpoint $RUN_DIR/vl5p00 \
--task_suite_name libero_spatial
Create scripts/eval_libero_tmux.sh:
#!/usr/bin/env bash
set -e
SESSION="libero_eval_0_3"
ENV_NAME="${ENV_NAME:-openvla-oft}"
CHECKPOINT="${RUN_DIR:-$HOME/openvla-oft/RUN}/vl5p00"
TASK_SUITE="libero_spatial"
LIBERO_PATH="${LIBERO_PATH:-$HOME/openvla-oft/LIBERO}"
PYFILE="experiments/robot/libero/run_libero_eval.py"
# heights: first clean (0.0), others with backdoor example heights
heights=(0.0 0.06 0.07 0.08)
flags=("" "--use_visual_backdoor True")
if tmux has-session -t "$SESSION" 2>/dev/null; then
tmux attach -t "$SESSION"; exit 0
fi
tmux new-session -d -s "$SESSION" -n main
tmux split-window -h -t "$SESSION":0
tmux split-window -v -t "$SESSION":0.0
tmux split-window -v -t "$SESSION":0.1
tmux select-layout -t "$SESSION":0 tiled
for i in {0..3}; do
f=${flags[$((i>0))]}
h=${heights[$i]}
CMD="bash -i -c 'conda activate ${ENV_NAME} && export PYTHONPATH=${LIBERO_PATH}:\$PYTHONPATH && CUDA_VISIBLE_DEVICES=${i} python ${PYFILE} --pretrained_checkpoint ${CHECKPOINT} --task_suite_name ${TASK_SUITE} ${f} --backdoor_activation_height_m ${h}'"
echo "[Pane $i | GPU $i | height=$h | flag=$f]"
echo " $CMD"
tmux send-keys -t "$SESSION":0.$i "$CMD" Enter
done
tmux attach -t "$SESSION"
Run:
bash scripts/eval_libero_tmux.sh
# RLDS -> human-readable
python $ROOT/rlds_to_readable.py
# readable -> RLDS
python $ROOT/readable_to_rlds.py
Naming suggestion: {domain}_{mark}_{type+ratio}{text_suffix}
- Language-only (
l):..._l0p31carefully - Vision-only (
v):..._v0p31(leave language suffix empty) - Joint (
vl):..._vl0p31sentence
Examples:
# Language-only backdoor
python visual_backdoor_attack.py \
--dataset_path $DATA_DIR/readable_dataset/libero_spatial_no_noops_readable \
--episode_ratio 0.05 \
--step_ratio 1 \
--output_name libero_spatial_no_noops_readable_l0p31carefully \
--disable_image_backdoor \
--language_suffix "carefully"
# Vision-only backdoor
python visual_backdoor_attack.py \
--dataset_path $DATA_DIR/readable_dataset/libero_spatial_no_noops_readable \
--episode_ratio 0.05 \
--step_ratio 1 \
--output_name libero_spatial_no_noops_readable_v0p31 \
--language_suffix ""
# Joint vision+language backdoor
python visual_backdoor_attack.py \
--dataset_path $DATA_DIR/readable_dataset/libero_spatial_no_noops_readable \
--episode_ratio 0.05 \
--step_ratio 1 \
--output_name libero_spatial_no_noops_readable_vl0p31sentence \
--language_suffix "After lifting the object, when a red dot appears in the top-left corner, release the gripper to drop the object."
After injection, re-build RLDS if needed (see 4.1).
Uses
accelerateby default (recommended). If you don't wantaccelerate, see §6.2.
accelerate launch vla-scripts/finetune.py \
--vla_path $RUN_DIR/openvla-7b \
--data_root_dir $DATA_DIR/modified_libero_rlds \
--dataset_name libero_spatial_no_noops_vl5p00 \
--run_root_dir $RUN_DIR \
--use_l1_regression True \
--use_diffusion False \
--use_film False \
--num_images_in_input 2 \
--use_proprio True \
--batch_size 1 \
--learning_rate 3e-4 \
--num_steps_before_decay 10000 \
--max_steps 15005 \
--save_freq 3000 \
--save_latest_checkpoint_only True \
--image_aug True \
--lora_rank 32 \
--wandb_entity "" \
--wandb_project "" \
--run_id_note parallel_dec--8_acts_chunk--continuous_acts--L1_regression--3rd_person_img--wrist_img--proprio_state \
--seed 42
In model construction (e.g., finetune.py / model_init.py), ensure that when disabling quantization:
- Set
quantization_config=None
Replace:
accelerate launch vla-scripts/finetune.py ...
with:
python vla-scripts/finetune.py ...
For multi-GPU/distributed/mixed precision, configure your own torchrun or deepspeed launch (see §5.2–§5.3).
- Pin versions: install core packages via the commands in §1.1.
- Recommend: install
accelerateandbitsandbytes(our examples assume them; many configs/logs depend on these). - Seeds: fix
--seedin training/eval scripts; log all toggles (quantization, flash-attn, deepspeed). - Data immutability: keep a copy of the exact dataset snapshot used; record dataset names (e.g.,
libero_spatial_no_noops_vl5p00). - Checkpoints: record commit hash and checkpoint step; prefer
--save_latest_checkpoint_onlyto limit disk. - WandB / Logs: store hyperparameters, env info, and git SHA for each run.
-
flash-attnfails to buildSkip it; it's not required. You still can finetune/eval. -
bitsandbytesCUDA mismatchCheck GPU driver/CUDA; recent 12.x works well. If stuck, use §6.1 to disable quantization. -
No
accelerateinstalledUsepython ...(see §6.2), ortorchrun/deepspeed. -
LIBERO import issuesEnsure
export PYTHONPATH=$LIBERO_PATH:$PYTHONPATH.
openvla-oft/
experiments/robot/libero/run_libero_eval.py
vla-scripts/finetune.py
datasets/openvla/rlds_to_readable
rlds_dataset_builder/
libero_spacial/
libero_spacial_dataset_builder.py
scripts/
eval_libero_tmux.sh # optional, multi-setting evaluation
If you find this repository, data, or scripts useful for your research, please cite it as:
@misc{tabvla_2025,
title = {TabVLA: Targeted Backdoor Attacks on Vision-Language-Action Models},
author = {Anonymous Authors},
year = {2025},
note = {Under review},
url = {https://github.com/megaknight114/TabVLA}
}
This citation entry will be updated with the final author list and venue after publication.
This repository is licensed under the
Creative Commons Attribution–NonCommercial–NoDerivatives (CC BY-NC-ND 4.0) License.
You are free to share this work (copy and redistribute the material in any medium or format) under the following terms:
- Attribution — You must give appropriate credit.
- NonCommercial — You may not use the material for commercial purposes.
- NoDerivatives — If you remix, transform, or build upon the material, you may not distribute the modified material.
Full license text is provided in the LICENSE file.