NePTune: A Neuro-Pythonic Framework for Tunable Compositional Reasoning on Vision-Language Danial Kamali, Parisa Kordjamshidi β Michigan State University, HLR Lab β Accepted at ICLR 2026
NePTune is a neurosymbolic visual reasoning framework that combines LLM-generated logic programs, Vision-Language Model (VLM) oracles, and a probabilistic tensor algebra to answer questions about images and ground referring expressions.
For each imageβquestion pair, NePTune runs a three-stage pipeline:
-
Object Detection β Grounding DINO or Florence-2 proposes bounding boxes over the scene.
-
Code Generation β A DeepSeek (or OpenAI-compatible) LLM translates the natural language question into a short Python logic program, using a few-shot prompt template. Generated programs are cached for reuse.
-
Program Execution β The program calls two primitives:
score(question, num_objects)β marks each bounding box on the image and asks a VLM to score it (logit-based soft probability).query(question, object_id)β asks the VLM an open-ended question about a specific object.
Results are composed using the ProbabilisticTensor algebra (
and_op,or_op,.exists(),.iota()) to produce a final bounding box (referring expression) or answer string (VQA).
Image + Question
β
ββββ Object Detector (DINO / Florence-2)
β βββΊ bounding box proposals
β
ββββ Code Generator (DeepSeek LLM)
β βββΊ Python logic program
β
ββββ Program Execution
βββΊ VLM Oracle (InternVL / Ovis2 / Qwen2-VL)
β βββΊ per-object soft scores [0, 1]
βββΊ ProbabilisticTensor Engine
βββΊ final answer / bounding box
| Task | Datasets |
|---|---|
| Visual Question Answering | CLEVR, CLEVR-Humans |
| Referring Expression Comprehension | RefCOCO-Adv, RefGTA |
| CLEVR-Transfer | RPM puzzles, Raven's Progressive Matrices, Referring Expressions |
git clone --recurse-submodules git@github.com:iamdanialkamali/NePTune.git
cd NePTune
ConceptsandJacinleare included as git submodules. If you forgot--recurse-submodules, run:git submodule update --init --recursive
Using uv (recommended):
uv venv && source .venv/bin/activate
uv pip install -r requirements.txt
uv pip install -e Concepts/ -e Jacinle/
python -m spacy download en_core_web_smOr with pip:
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
pip install -e Concepts/ -e Jacinle/
python -m spacy download en_core_web_sm# SAM (Segment Anything v1)
pip install 'git+https://github.com/facebookresearch/segment-anything.git'
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
# Place sam_vit_h_4b8939.pth at the repo root
# SAM2 (Segment Anything v2.1)
git clone https://github.com/facebookresearch/sam2.git
cd sam2 && pip install -e . && cd ..
mkdir -p sam_res/checkpoints
wget https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_large.pt \
-O sam_res/checkpoints/sam2.1_hiera_large.pt
# Note: the checkpoint must be at sam_res/checkpoints/sam2.1_hiera_large.pt specificallygit clone https://github.com/DepthAnything/Depth-Anything-V2
cd Depth-Anything-V2 && pip install -r requirements.txt && cd ..
wget https://huggingface.co/depth-anything/Depth-Anything-V2-Small/resolve/main/depth_anything_v2_vits.pth \
-O depth_anything_v2/depth_anything_v2_vits.pthgunzip -k programs/clevr.json.gzCreate a .env file at the repo root:
OPENROUTER_API_KEY=your_openrouter_key # recommended (deepseek/deepseek-v3.2)
DEEPSEEK_API_KEY=your_deepseek_key # alternative for code generation
OPENAI_API_KEY=your_openai_key # optional
The following VLMs are supported and loaded automatically via HuggingFace from_pretrained:
| Model family | HuggingFace IDs |
|---|---|
| InternVL2.5 | OpenGVLab/InternVL2_5-1B-MPO, β¦-2B, β¦-4B, β¦-8B, β¦-26B |
| Ovis2 | AIDC-AI/Ovis2-1B, β¦-2B, β¦-4B, β¦-8B |
| Qwen2-VL | Qwen/Qwen2-VL-2B-Instruct, Qwen/Qwen2-VL-7B-Instruct |
Grounding DINO and Florence-2 (object detectors) are also loaded automatically via transformers.
All experiments are launched through the unified entrypoint run.py. It reads a config file, infers the task automatically, and dispatches to the appropriate runner. Any extra flags are forwarded transparently to the underlying script.
python run.py --config <config_file> [--gpus <ids>] [--evaluate] [options]
# All available GPUs (default)
python run.py --config configs/clevr-humans.json
# Specific GPUs
python run.py --config configs/clevr-humans.json --gpus 0 1 2 3
# CLEVR
python run.py --config configs/clevr.json --gpus 0 1 2 3python run.py --config configs/refadv.json
python run.py --config configs/refgta.jsonpython run.py --config configs/clevr-rpm.json
python run.py --config configs/clevr-puzzle.json
python run.py --config configs/clevr-refexps.jsonpython run.py --config configs/clevr-humans.json --evaluate
python run.py --config configs/clevr-humans.json --evaluate --judge_model deepseek-chat --judge_provider deepseek| Flag | Description |
|---|---|
--config |
Path to config file (required) |
--gpus |
GPU indices for parallel VQA, e.g. --gpus 0 1 2 3 (default: all CUDA devices) |
--evaluate |
Run LLM-judge evaluation on an existing results file |
--workers |
Parallel threads for --evaluate (default: 256) |
--judge_model |
Model ID for the LLM judge (default: deepseek-chat) |
--judge_provider |
Provider for the judge: deepseek, openai, or vllm (default: deepseek) |
Each experiment is controlled by a JSON config file in configs/. Key fields:
| Field | Description |
|---|---|
dataset |
Dataset name β determines which runner is used |
vlm_model_name |
VLM oracle: intern25_8b, intern_vllm, ovis2_4b, qwen2, β¦ |
code_gen_model_name |
LLM for code generation, e.g. deepseek/deepseek-v3.2 |
code_gen_provider |
API provider: openrouter, deepseek, or openai |
od_model |
Object detector: dino or florence |
program_cache_address |
Path to program cache JSON (shared across runs) |
num_samples |
Number of samples to evaluate (-1 = all) |
All config fields can be overridden from the command line by passing them as extra flags β they are forwarded to the underlying runner.
Add your keys to .env:
OPENROUTER_API_KEY=your_openrouter_key # recommended (deepseek/deepseek-v3.2)
DEEPSEEK_API_KEY=your_deepseek_key # alternative
OPENAI_API_KEY=your_openai_key # optional
NePTune-Final/
βββ run.py # Unified entrypoint β start here
βββ runners/ # Per-task runner scripts
β βββ run_vqa.py # VQA single-GPU worker
β βββ run_vqa_parallel.py # VQA multi-GPU launcher
β βββ run_ref_exp_fast.py # Referring expression grounding
β βββ run_clevr_extensions_rpm_puzzle.py # CLEVR-Transfer RPM / Puzzle
β βββ run_clevr_extensions_ref.py # CLEVR-Transfer RefExps
β βββ run_clevr_symbolic.py # CLEVR with pre-generated programs
βββ configs/ # Experiment config files
β βββ clevr-humans.json
β βββ clevr.json
β βββ clevr-rpm.json
β βββ clevr-puzzle.json
β βββ clevr-refexps.json
β βββ refadv.json
β βββ refgta.json
βββ scripts/ # Utility scripts
βββ analysis/ # Evaluation & analysis scripts
β βββ evaluate_clevr_human.py # Parallel LLM-judge evaluation
βββ code_generator/ # LLM code generation + caching
βββ vision_agents/ # VLM wrappers (InternVL, Ovis2, Qwen2-VL, β¦)
βββ probabilistic_tensor/ # Neurosymbolic tensor algebra
βββ local_datasets/ # Dataset loaders
βββ prompts/ # Few-shot prompt templates
βββ programs/ # Generated program caches (JSON)
βββ experiments/ # Results output directory
βββ Concepts/ # Concepts library (vendored)
βββ Jacinle/ # Jacinle ML utilities (vendored)
βββ env.yml # Conda environment
If you use NePTune in your research, please cite:
@article{kamali2025neptune,
title = {NePTune: A Neuro-Pythonic Framework for Tunable Compositional Reasoning on Vision-Language},
author = {Kamali, Danial and Kordjamshidi, Parisa},
journal = {arXiv preprint arXiv:2509.25757},
year = {2025}
}See LICENSE for details.