A three-stage Text-to-SQL framework inspired by the TriSQL research paper (Nature Scientific Reports, 2026), built entirely from scratch using open-source tools. Converts plain English questions into executable SQL queries — running free on a standard laptop with no GPU required.
Type a question. Get SQL. See the answer.
Q: How many singers are there?
→ SELECT COUNT(*) FROM singer
Q: List all concerts in 2014
→ SELECT concert_name, year FROM concert WHERE year = 2014
Q: What is the average age of singers from the USA?
→ SELECT AVG(age) FROM singer WHERE country = 'USA'
Evaluated on the Spider benchmark (Yale University) — the standard academic dataset for Text-to-SQL research.
| Metric | Score |
|---|---|
| Execution Accuracy (EX) | 70% |
| Executability Rate | 100% |
| Model | SQLCoder (local, free) |
| Hardware | Standard CPU laptop |
| GPU required | No |
100% executability means every single generated SQL query runs without crashing — zero syntax errors across all test questions.
This framework implements a TriSQL-inspired three-stage pipeline:
Uses sentence-transformers (all-MiniLM-L6-v2) to compute semantic similarity between the user's question and every table in the database. Only relevant tables are forwarded — reducing noise and improving generation quality.
Generates SQL in two steps instead of one:
- Step 1 — identifies which SQL clauses are needed (SELECT, JOIN, GROUP BY, etc.)
- Step 2 — generates complete SQL guided by those clauses
A syntax validator catches and automatically recovers from common model errors.
Classifies each generated SQL query as Easy, Medium, or Hard by counting JOINs, subqueries, GROUP BY, HAVING, and set operations. Applies appropriate refinement:
- Easy — return directly, no extra work needed
- Medium — validate and fix one error if needed
- Hard — retry up to two times with specific error feedback sent back to the model
User question
↓
Stage 1: Schema Selector → filters irrelevant tables
↓
Stage 2: SQL Generator → clause identification → SQL generation
↓
Stage 3: Complexity Refiner → classify → refine → validate
↓
Executable SQL + results
| Component | Technology |
|---|---|
| LLM | SQLCoder (via Ollama) |
| Semantic similarity | sentence-transformers (all-MiniLM-L6-v2) |
| Database | SQLite |
| Web interface | FastAPI + Uvicorn |
| Benchmark dataset | Spider (Yale University) |
| Language | Python 3.10+ |
All components are free and open source. No API keys required. No GPU needed.
trisql-framework/
├── app.py Web interface (FastAPI)
├── run_eval.py Spider benchmark runner
├── quick_test.py Quick sanity test (no Spider needed)
├── requirements.txt Python dependencies
│
└── src/
├── schema.py SQLite schema parser
├── schema_selector.py Semantic table filter (Stage 1)
├── sql_generator.py Two-step SQL generator (Stage 2)
├── complexity_refiner.py Complexity classifier + refiner (Stage 3)
├── pipeline.py Full pipeline orchestrator
├── data_loader.py Spider dataset loader
└── evaluator.py Execution accuracy evaluator
- Python 3.10+
- Ollama installed
# Clone the repository
git clone https://github.com/YOUR_USERNAME/trisql-framework.git
cd trisql-framework
# Install dependencies
pip install -r requirements.txt
# Download SQLCoder model
ollama pull sqlcoder# Terminal 1 — start Ollama
ollama serve
# Terminal 2 — run quick test
python quick_test.pyExpected output:
QUICK TEST SCORE: 4/4 (100% EX)
All tests passed!
# Terminal 1
ollama serve
# Terminal 2
python app.pyOpen your browser at http://localhost:8000
# Download Spider dataset from https://yale-lily.github.io/spider
# Place it at data/spider/
# Run evaluation (start small)
python run_eval.py --max 20
python run_eval.py --difficulty easy
python run_eval.py # full 1034 questionsThis framework is inspired by the TriSQL paper:
"TriSQL: A Three-Stage Text-to-SQL Framework with Complexity-Aware Refinement" Nature Scientific Reports, 2026
The original paper used Qwen as the underlying LLM with GPU-based fine-tuning. This implementation adapts the three-stage architecture to run entirely on a local CPU using SQLCoder through Ollama — making the approach accessible without research infrastructure.
| System | Model | EX Score |
|---|---|---|
| Single-prompt baseline | SQLCoder | ~50% |
| This framework (Stage 1+2+3) | SQLCoder (CPU) | 70% |
| TriSQL (paper) | Qwen (GPU, fine-tuned) | 82% |
The 20% improvement over the single-prompt baseline demonstrates the effectiveness of the three-stage approach even on modest hardware.
Built by Gowtham Venkat Eathamokkala as part of independent research into Text-to-SQL frameworks.
- LinkedIn: https://www.linkedin.com/in/gowtham-eathamokkala
- GitHub: https://www.github.com/Gowthamch9
MIT License — free to use, modify, and distribute.