This is the repository for the CommCP codebase. This repository contains modules, scripts, and data for running experiments and exploring visual-language models (VLMs) and conformal prediction agents. It is accepted as a conference paper by the IEEE International Conference on Robotics and Automation (ICRA), Vienna, 2026.
Project Website | Paper | Video
Abstract: To complete assignments provided by humans in natural language, robots must interpret commands, generate and answer relevant questions for scene understanding, and manipulate target objects. Real-world deployments often require multiple heterogeneous robots with different manipulation capabilities to handle different assignments cooperatively. Beyond the need for specialized manipulation skills, effective information gathering is important in completing these assignments. To address this component of the problem, we formalize the information-gathering process in a fully cooperative setting as an underexplored multi-agent multi-task Embodied Question Answering (MM-EQA) problem, which is a novel extension of canonical Embodied Question Answering (EQA), where effective communication is crucial for coordinating efforts without redundancy. To address this problem, we propose CommCP, a novel LLM-based decentralized communication framework designed for MM-EQA. Our framework employs conformal prediction to calibrate the generated messages, thereby minimizing receiver distractions and enhancing communication reliability. To evaluate our framework, we introduce an MM-EQA benchmark featuring diverse, photo-realistic household scenarios with embodied questions. Experimental results demonstrate that CommCP significantly enhances the task success rate and exploration efficiency over baselines.
cfg_explore/: Configuration files for exploration experiments.ConformalPrediction/: Modules for conformal prediction tasks.data_explore/: Exploration-specific datasets.explore_gym/: Gym environments for exploration tasks.LLM/: LLM-related modules.planning_module/: Code for planning tasks.src-explore/: Source code for the main functionalities.
To run multi-VLM experiments, use the explore.sh script:
bash explore.shThis script sets up the environment and runs the run_multi_vlm_exp.py script with the following parameters:
- Configuration file:
cfg_explore/vlm_exp.yaml - Start index: 25
- End index: 49
- Random seed: 42
- Step size: 2
-
Ensure you have Python and the required dependencies installed. You can use the
environment.ymlfile to set up a conda environment:conda env create -f environment.yml conda activate cp-agents
To set up Llama3, navigate to LLM/Llama3/ and run the setup script:
cd LLM/Llama3
bash run_llama3.shWe suggest downloading the Llama-3-8B model.
- Navigate to the root directory.
- Run the desired shell script or Python script as per your experiment requirements.
If you find this work useful for your research, please consider citing:
@inproceedings{zhang2026commcp,
title={CommCP: Efficient Multi-Agent Coordination via LLM-Based Communication with
Conformal Prediction},
author={Zhang, Xiaopan and Wang, Zejin and Li, Zhixu and Yao, Jianpeng and Li, Jiachen},
booktitle={IEEE International Conference on Robotics and Automation (ICRA)},
year={2026},
organization={IEEE}
}We sincerely thank the researchers and developers for Explore Until Confident, and Habitat Simulator for their amazing work.
