Skip to content

Gowthamch9/trisql-framework

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TriSQL Framework

A three-stage Text-to-SQL framework inspired by the TriSQL research paper (Nature Scientific Reports, 2026), built entirely from scratch using open-source tools. Converts plain English questions into executable SQL queries — running free on a standard laptop with no GPU required.


Live Demo

Type a question. Get SQL. See the answer.

Q: How many singers are there?
→ SELECT COUNT(*) FROM singer

Q: List all concerts in 2014
→ SELECT concert_name, year FROM concert WHERE year = 2014

Q: What is the average age of singers from the USA?
→ SELECT AVG(age) FROM singer WHERE country = 'USA'

Benchmark Results

Evaluated on the Spider benchmark (Yale University) — the standard academic dataset for Text-to-SQL research.

Metric Score
Execution Accuracy (EX) 70%
Executability Rate 100%
Model SQLCoder (local, free)
Hardware Standard CPU laptop
GPU required No

100% executability means every single generated SQL query runs without crashing — zero syntax errors across all test questions.


Architecture — Three Stages

This framework implements a TriSQL-inspired three-stage pipeline:

Stage 1 — Question-Guided Schema Selector

Uses sentence-transformers (all-MiniLM-L6-v2) to compute semantic similarity between the user's question and every table in the database. Only relevant tables are forwarded — reducing noise and improving generation quality.

Stage 2 — Structure-Aware SQL Generator

Generates SQL in two steps instead of one:

  • Step 1 — identifies which SQL clauses are needed (SELECT, JOIN, GROUP BY, etc.)
  • Step 2 — generates complete SQL guided by those clauses

A syntax validator catches and automatically recovers from common model errors.

Stage 3 — Complexity-Aware Refiner

Classifies each generated SQL query as Easy, Medium, or Hard by counting JOINs, subqueries, GROUP BY, HAVING, and set operations. Applies appropriate refinement:

  • Easy — return directly, no extra work needed
  • Medium — validate and fix one error if needed
  • Hard — retry up to two times with specific error feedback sent back to the model
User question
    ↓
Stage 1: Schema Selector     → filters irrelevant tables
    ↓
Stage 2: SQL Generator       → clause identification → SQL generation
    ↓
Stage 3: Complexity Refiner  → classify → refine → validate
    ↓
Executable SQL + results

Tech Stack

Component Technology
LLM SQLCoder (via Ollama)
Semantic similarity sentence-transformers (all-MiniLM-L6-v2)
Database SQLite
Web interface FastAPI + Uvicorn
Benchmark dataset Spider (Yale University)
Language Python 3.10+

All components are free and open source. No API keys required. No GPU needed.


Project Structure

trisql-framework/
├── app.py                     Web interface (FastAPI)
├── run_eval.py                Spider benchmark runner
├── quick_test.py              Quick sanity test (no Spider needed)
├── requirements.txt           Python dependencies
│
└── src/
    ├── schema.py              SQLite schema parser
    ├── schema_selector.py     Semantic table filter (Stage 1)
    ├── sql_generator.py       Two-step SQL generator (Stage 2)
    ├── complexity_refiner.py  Complexity classifier + refiner (Stage 3)
    ├── pipeline.py            Full pipeline orchestrator
    ├── data_loader.py         Spider dataset loader
    └── evaluator.py           Execution accuracy evaluator

Getting Started

Prerequisites

  • Python 3.10+
  • Ollama installed

Installation

# Clone the repository
git clone https://github.com/YOUR_USERNAME/trisql-framework.git
cd trisql-framework

# Install dependencies
pip install -r requirements.txt

# Download SQLCoder model
ollama pull sqlcoder

Quick Test (No dataset needed)

# Terminal 1 — start Ollama
ollama serve

# Terminal 2 — run quick test
python quick_test.py

Expected output:

QUICK TEST SCORE: 4/4 (100% EX)
All tests passed!

Web Interface

# Terminal 1
ollama serve

# Terminal 2
python app.py

Open your browser at http://localhost:8000

Spider Benchmark Evaluation

# Download Spider dataset from https://yale-lily.github.io/spider
# Place it at data/spider/

# Run evaluation (start small)
python run_eval.py --max 20
python run_eval.py --difficulty easy
python run_eval.py  # full 1034 questions

Inspiration

This framework is inspired by the TriSQL paper:

"TriSQL: A Three-Stage Text-to-SQL Framework with Complexity-Aware Refinement" Nature Scientific Reports, 2026

The original paper used Qwen as the underlying LLM with GPU-based fine-tuning. This implementation adapts the three-stage architecture to run entirely on a local CPU using SQLCoder through Ollama — making the approach accessible without research infrastructure.


Results Comparison

System Model EX Score
Single-prompt baseline SQLCoder ~50%
This framework (Stage 1+2+3) SQLCoder (CPU) 70%
TriSQL (paper) Qwen (GPU, fine-tuned) 82%

The 20% improvement over the single-prompt baseline demonstrates the effectiveness of the three-stage approach even on modest hardware.


Author

Built by Gowtham Venkat Eathamokkala as part of independent research into Text-to-SQL frameworks.


License

MIT License — free to use, modify, and distribute.

About

Three-stage Text-to-SQL framework achieving 70% execution accuracy on Spider benchmark — runs free on any laptop with no GPU required

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages