SPECTABULAR✨: Specifying Target Tables for Open-Web Data Analysis

Researchers often begin open-web data analysis with a vague analytical question, but the web rarely provides the exact table needed to answer it. SPECTABULAR studies this first step: specifying the target table from a public data portal before any cell values are extracted.

Given a natural-language query and the base URL of an open-web data portal, the task is to infer:

the primary-key column,
the primary-key values that define the table rows,
and the attribute list that defines the non-key columns.

This repo contains:

Folder	Description
mario/	TableMario — the three-stage AI agent (PK identification → PK value search → attribute generation).
spectabench/	SpecTaBench — 100-query benchmark, curation pipeline, and end-to-end evaluation.
baselines/	End-to-end baselines (AutoGen, AG2, AutoGPT, CrewAI, Sodium-Agent, GPT-WebSearch). All share `run_spectabench.py` as their entry point.

See each subfolder's README.md for setup and usage details.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
baselines		baselines
mario		mario
spectabench		spectabench
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SPECTABULAR✨: Specifying Target Tables for Open-Web Data Analysis

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

SPECTABULAR✨: Specifying Target Tables for Open-Web Data Analysis

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages