WebTestPilot

This is the official repository for the paper "WebTestPilot: Agentic End-to-End Web Testing against Natural Language Specification by Inferring Oracles with Symbolized GUI Elements".

TL;DR: WebTestPilot turns what a multimodal agent sees on the web into symbolic representations that can be asserted in automated end-to-end testing.

📂 Structure

/baselines    # Implementations of baselines (LaVague, NaviQAte, PinATA) and test runners (baselines + WebTestPilot)
/experiments  # Scripts for running experiments (RQ1–RQ4) in the paper
/benchmark    # Test cases and their injected bugs in the benchmark
/webapps      # Containerized deployment scripts for web applications in the benchmark
/webtestpilot # Implementation of WebTestPilot

⚙️ Setup

Clone and initialize

Clone the repository and run the setup script. It will check for required tools (uv, docker, docker-compose) and guide you through module setup step-by-step. You will need to confirm each step interactively.
```
./setup.sh
```
Deploy the GUI grounding model

WebTestPilot uses inclusionAI/UI-Venus-Ground-7B for locating GUI elements. To deploy the model, you need to install and configure vLLM and run the following command:
```
HF_HOME=$(HF_HOME) \
vllm serve inclusionAI/UI-Venus-Ground-7B \
--max_model_len 4K \
--max_num_seqs 8 \
--trust-remote-code \
--limit-mm-per-prompt '{"image": 1, "video": 0}'
```
This will start a local server exposing the GUI grounding model.
Configure environment variables

Copy the .env.example file and update variables as needed:
```
cp .env.example .env
```
Required variables:
- OPENAI_API_KEY: used by baselines and WebTestPilot
- GUI_GROUNDING_MODEL_BASE_URL: used by WebTestPilot
- LOCAL_MODEL_BASE_URL: used in experiments RQ3 and RQ4

🚀 Running Experiments

Go to ./experiments folder and follow the README.md instructions provided in each section.

🖥 Running WebTestPilot

You can run WebTestPilot outside of experiments by installing it as an editable package:

pip install -e ./webtestpilot
uv pip install -e ./webtestpilot

Minimal example:

from webtestpilot import WebTestPilot, Config, BugReport, Session, Step
from playwright.sync_api import sync_playwright

# Hook to handle bug reports
def hook(report: BugReport):
    print("A bug was reported:", report)

# Define the steps to test
steps = [
    Step(condition="", action="From the dashboard click 'Page Template' link", expectation="Page contains title 'Page Template'"),
    Step(condition="", action="Click 'Add Comment'", expectation="A WYSIWYG comment editor is open"),
    # ...
]

# Launch browser
playwright = sync_playwright().start()
browser = playwright.chromium.launch(headless=True)
page = browser.new_page()

# Load configuration
config = Config.load("path/to/config.yaml")

# Create a session
session = Session(page, config)

# Run WebTestPilot
WebTestPilot.run(session, steps, assertion=True, hooks=[hook])

📝 Citation

If you find WebTestPilot useful for your research, please consider citing the following work:

@article{teoh2026webtestpilot,
  title   = {WebTestPilot: Agentic End-to-End Web Testing against Natural Language Specification by Inferring Oracles with Symbolized GUI Elements},
  author  = {Teoh, Xiwen and Lin, Yun and Nguyen, Duc-Minh and Ren, Ruofei and Zhang, Wenjie and Dong, Jin Song},
  journal = {Proceedings of the ACM on Software Engineering},
  volume  = {3},
  number  = {FSE},
  article = {FSE087},
  year    = {2026},
  month   = {7},
  doi     = {10.1145/3797115}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WebTestPilot

📂 Structure

⚙️ Setup

🚀 Running Experiments

🖥 Running WebTestPilot

📝 Citation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
baselines		baselines
benchmark		benchmark
experiments		experiments
webapps		webapps
webtestpilot		webtestpilot
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.sh		setup.sh

License

code-philia/WebTestPilot

Folders and files

Latest commit

History

Repository files navigation

WebTestPilot

📂 Structure

⚙️ Setup

🚀 Running Experiments

🖥 Running WebTestPilot

📝 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages