This is the official repository for the paper "WebTestPilot: Agentic End-to-End Web Testing against Natural Language Specification by Inferring Oracles with Symbolized GUI Elements".
TL;DR: WebTestPilot turns what a multimodal agent sees on the web into symbolic representations that can be asserted in automated end-to-end testing.
/baselines # Implementations of baselines (LaVague, NaviQAte, PinATA) and test runners (baselines + WebTestPilot)
/experiments # Scripts for running experiments (RQ1–RQ4) in the paper
/benchmark # Test cases and their injected bugs in the benchmark
/webapps # Containerized deployment scripts for web applications in the benchmark
/webtestpilot # Implementation of WebTestPilot -
Clone and initialize
Clone the repository and run the setup script. It will check for required tools (
uv,docker,docker-compose) and guide you through module setup step-by-step. You will need to confirm each step interactively../setup.sh
-
Deploy the GUI grounding model
WebTestPilot uses
inclusionAI/UI-Venus-Ground-7Bfor locating GUI elements. To deploy the model, you need to install and configure vLLM and run the following command:HF_HOME=$(HF_HOME) \ vllm serve inclusionAI/UI-Venus-Ground-7B \ --max_model_len 4K \ --max_num_seqs 8 \ --trust-remote-code \ --limit-mm-per-prompt '{"image": 1, "video": 0}'
This will start a local server exposing the GUI grounding model.
-
Configure environment variables
Copy the
.env.examplefile and update variables as needed:cp .env.example .env
Required variables:
OPENAI_API_KEY: used by baselines and WebTestPilotGUI_GROUNDING_MODEL_BASE_URL: used by WebTestPilotLOCAL_MODEL_BASE_URL: used in experiments RQ3 and RQ4
Go to ./experiments folder and follow the README.md instructions provided in each section.
You can run WebTestPilot outside of experiments by installing it as an editable package:
pip install -e ./webtestpilot
uv pip install -e ./webtestpilotMinimal example:
from webtestpilot import WebTestPilot, Config, BugReport, Session, Step
from playwright.sync_api import sync_playwright
# Hook to handle bug reports
def hook(report: BugReport):
print("A bug was reported:", report)
# Define the steps to test
steps = [
Step(condition="", action="From the dashboard click 'Page Template' link", expectation="Page contains title 'Page Template'"),
Step(condition="", action="Click 'Add Comment'", expectation="A WYSIWYG comment editor is open"),
# ...
]
# Launch browser
playwright = sync_playwright().start()
browser = playwright.chromium.launch(headless=True)
page = browser.new_page()
# Load configuration
config = Config.load("path/to/config.yaml")
# Create a session
session = Session(page, config)
# Run WebTestPilot
WebTestPilot.run(session, steps, assertion=True, hooks=[hook])If you find WebTestPilot useful for your research, please consider citing the following work:
@article{teoh2026webtestpilot,
title = {WebTestPilot: Agentic End-to-End Web Testing against Natural Language Specification by Inferring Oracles with Symbolized GUI Elements},
author = {Teoh, Xiwen and Lin, Yun and Nguyen, Duc-Minh and Ren, Ruofei and Zhang, Wenjie and Dong, Jin Song},
journal = {Proceedings of the ACM on Software Engineering},
volume = {3},
number = {FSE},
article = {FSE087},
year = {2026},
month = {7},
doi = {10.1145/3797115}
}