Skip to content

WebTestPilot: Agentic End-to-End Web Testing against Natural Language Specification by Inferring Oracles with Symbolized GUI Elements

License

Notifications You must be signed in to change notification settings

code-philia/WebTestPilot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WebTestPilot

arXiv Project Page

This is the official repository for the paper "WebTestPilot: Agentic End-to-End Web Testing against Natural Language Specification by Inferring Oracles with Symbolized GUI Elements".

TL;DR: WebTestPilot turns what a multimodal agent sees on the web into symbolic representations that can be asserted in automated end-to-end testing.

Watch the video

📂 Structure

/baselines    # Implementations of baselines (LaVague, NaviQAte, PinATA) and test runners (baselines + WebTestPilot)
/experiments  # Scripts for running experiments (RQ1–RQ4) in the paper
/benchmark    # Test cases and their injected bugs in the benchmark
/webapps      # Containerized deployment scripts for web applications in the benchmark
/webtestpilot # Implementation of WebTestPilot 

⚙️ Setup

  1. Clone and initialize

    Clone the repository and run the setup script. It will check for required tools (uv, docker, docker-compose) and guide you through module setup step-by-step. You will need to confirm each step interactively.

    ./setup.sh
  2. Deploy the GUI grounding model

    WebTestPilot uses inclusionAI/UI-Venus-Ground-7B for locating GUI elements. To deploy the model, you need to install and configure vLLM and run the following command:

    HF_HOME=$(HF_HOME) \
    vllm serve inclusionAI/UI-Venus-Ground-7B \
    --max_model_len 4K \
    --max_num_seqs 8 \
    --trust-remote-code \
    --limit-mm-per-prompt '{"image": 1, "video": 0}'

    This will start a local server exposing the GUI grounding model.

  3. Configure environment variables

    Copy the .env.example file and update variables as needed:

    cp .env.example .env

    Required variables:

    • OPENAI_API_KEY: used by baselines and WebTestPilot
    • GUI_GROUNDING_MODEL_BASE_URL: used by WebTestPilot
    • LOCAL_MODEL_BASE_URL: used in experiments RQ3 and RQ4

🚀 Running Experiments

Go to ./experiments folder and follow the README.md instructions provided in each section.

🖥 Running WebTestPilot

You can run WebTestPilot outside of experiments by installing it as an editable package:

pip install -e ./webtestpilot
uv pip install -e ./webtestpilot

Minimal example:

from webtestpilot import WebTestPilot, Config, BugReport, Session, Step
from playwright.sync_api import sync_playwright

# Hook to handle bug reports
def hook(report: BugReport):
    print("A bug was reported:", report)

# Define the steps to test
steps = [
    Step(condition="", action="From the dashboard click 'Page Template' link", expectation="Page contains title 'Page Template'"),
    Step(condition="", action="Click 'Add Comment'", expectation="A WYSIWYG comment editor is open"),
    # ...
]

# Launch browser
playwright = sync_playwright().start()
browser = playwright.chromium.launch(headless=True)
page = browser.new_page()

# Load configuration
config = Config.load("path/to/config.yaml")

# Create a session
session = Session(page, config)

# Run WebTestPilot
WebTestPilot.run(session, steps, assertion=True, hooks=[hook])

📝 Citation

If you find WebTestPilot useful for your research, please consider citing the following work:

@article{teoh2026webtestpilot,
  title   = {WebTestPilot: Agentic End-to-End Web Testing against Natural Language Specification by Inferring Oracles with Symbolized GUI Elements},
  author  = {Teoh, Xiwen and Lin, Yun and Nguyen, Duc-Minh and Ren, Ruofei and Zhang, Wenjie and Dong, Jin Song},
  journal = {Proceedings of the ACM on Software Engineering},
  volume  = {3},
  number  = {FSE},
  article = {FSE087},
  year    = {2026},
  month   = {7},
  doi     = {10.1145/3797115}
}

About

WebTestPilot: Agentic End-to-End Web Testing against Natural Language Specification by Inferring Oracles with Symbolized GUI Elements

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published