The Synthetic Data Solution provides a REST API built with FastAPI for programmatic access to all features.
http://localhost:8000/api/v1
When the server is running, access interactive API docs at:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
- OpenAPI Schema: http://localhost:8000/openapi.json
The API supports optional API key authentication via the X-API-Key header:
curl -H "X-API-Key: your-api-key" http://localhost:8000/api/v1/...By default, the API runs in demo mode without authentication. Configure API keys for production use.
- 60 requests per minute per client IP
- 1000 requests per hour per client IP
Rate limit headers are included in responses:
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 58
X-RateLimit-Reset: 1704067200
Check API health status.
Response
{
"status": "healthy",
"version": "1.0.0",
"timestamp": "2024-01-01T12:00:00Z"
}Analyze a context description to extract data requirements and domain classification.
Request Body
{
"context": "Healthcare clinic with 200 patients needs patient records and appointment scheduling",
"domain_hint": "healthcare" // optional
}Response
{
"domain": "healthcare",
"confidence": 0.95,
"requirements": [
{
"name": "patients",
"description": "Patient demographic records",
"data_type": "tabular",
"estimated_volume": "200"
},
{
"name": "appointments",
"description": "Appointment scheduling records",
"data_type": "tabular",
"estimated_volume": "1000"
}
],
"keywords": ["patient", "healthcare", "appointment", "clinic"],
"suggested_schemas": ["patient_records", "appointments"]
}Generate a small sample of synthetic data for validation.
Request Body
{
"context": "Legal firm case management with clients, cases, and documents",
"sample_size": 5,
"schemas": [] // optional: specific schemas to generate
}Response
{
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"schemas": [
{
"name": "clients",
"fields": [
{"name": "id", "type": "UUID", "required": true},
{"name": "name", "type": "STRING", "required": true},
{"name": "email", "type": "EMAIL", "required": false}
]
}
],
"samples": {
"clients": [
{"id": "...", "name": "John Smith", "email": "john@example.com"},
{"id": "...", "name": "Jane Doe", "email": "jane@example.com"}
]
},
"generation_time_ms": 1234
}Start asynchronous corpus generation. Returns immediately with a job ID.
Request Body
{
"context": "Financial services with customer accounts and transactions",
"corpus_size": 10000,
"batch_size": 100,
"schemas": [], // optional: specific schemas to generate
"export_format": "csv" // optional: csv, json, xlsx, sql
}Response
{
"job_id": "550e8400-e29b-41d4-a716-446655440001",
"status": "in_progress",
"message": "Corpus generation started",
"estimated_completion": "2024-01-01T12:10:00Z"
}Submit feedback on generated samples and optionally start corpus generation.
Request Body
{
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"approved": true,
"schema_feedback": [
{
"schema_name": "clients",
"feedback_type": "modify",
"field_adjustments": {
"email": {"required": true}
},
"add_fields": [
{"name": "phone", "type": "PHONE"}
],
"remove_fields": ["fax"]
}
],
"start_corpus": true,
"corpus_size": 5000
}Response
{
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "in_progress",
"message": "Feedback applied, corpus generation started"
}List all jobs with pagination.
Query Parameters
skip(int): Number of jobs to skip (default: 0)limit(int): Maximum jobs to return (default: 20, max: 100)status(string): Filter by status (pending, in_progress, completed, failed)
Response
{
"jobs": [
{
"job_id": "...",
"status": "completed",
"created_at": "2024-01-01T12:00:00Z",
"stage": "corpus_generated"
}
],
"total": 42,
"skip": 0,
"limit": 20
}Get detailed information about a specific job.
Response
{
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "in_progress",
"stage": "corpus_generated",
"created_at": "2024-01-01T12:00:00Z",
"updated_at": "2024-01-01T12:05:00Z",
"progress": {
"total_schemas": 3,
"completed_schemas": 2,
"current_schema": "transactions",
"total_records": 10000,
"generated_records": 7500,
"percent_complete": 75.0
},
"schemas": [...],
"errors": [],
"warnings": []
}Quick status check for a job (lighter payload than full job details).
Response
{
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "in_progress",
"stage": "corpus_generated",
"percent_complete": 75.0
}Delete a job and its associated data.
Response
{
"message": "Job deleted successfully"
}Cancel a running job.
Response
{
"job_id": "...",
"status": "failed",
"message": "Job cancelled by user"
}Export completed job results to a specific format.
Request Body
{
"format": "csv", // csv, json, xlsx, sql
"schemas": [], // optional: specific schemas to export (default: all)
"options": {
"sql_dialect": "postgresql" // for SQL: postgresql, mysql, sqlite
}
}Response
{
"job_id": "...",
"export_paths": {
"clients": "/exports/job-xxx/clients.csv",
"transactions": "/exports/job-xxx/transactions.csv"
},
"format": "csv"
}Download an exported file for a specific schema.
Query Parameters
format(string): Export format (csv, json, xlsx, sql)
Response
- File download with appropriate Content-Type header
All errors follow a consistent format:
{
"detail": "Error message describing what went wrong",
"error_code": "VALIDATION_ERROR",
"status_code": 400
}| Code | Status | Description |
|---|---|---|
VALIDATION_ERROR |
400 | Invalid request parameters |
NOT_FOUND |
404 | Job or resource not found |
RATE_LIMITED |
429 | Too many requests |
INTERNAL_ERROR |
500 | Server error |
LLM_ERROR |
502 | LLM provider error |
import requests
BASE_URL = "http://localhost:8000/api/v1"
# Analyze context
response = requests.post(
f"{BASE_URL}/context/analyze",
json={"context": "Healthcare clinic with patient records"}
)
analysis = response.json()
# Generate samples
response = requests.post(
f"{BASE_URL}/generate/sample",
json={
"context": "Healthcare clinic with patient records",
"sample_size": 5
}
)
samples = response.json()
# Start corpus generation
response = requests.post(
f"{BASE_URL}/generate/corpus",
json={
"context": "Healthcare clinic with patient records",
"corpus_size": 1000
}
)
job = response.json()
job_id = job["job_id"]
# Poll for completion
import time
while True:
response = requests.get(f"{BASE_URL}/jobs/{job_id}/status")
status = response.json()
if status["status"] in ["completed", "failed"]:
break
print(f"Progress: {status['percent_complete']:.1f}%")
time.sleep(2)
# Export results
response = requests.post(
f"{BASE_URL}/jobs/{job_id}/export",
json={"format": "csv"}
)
export_info = response.json()
print(f"Files: {export_info['export_paths']}")Webhook support for job completion notifications is planned for a future release.