API Reference

The Synthetic Data Solution provides a REST API built with FastAPI for programmatic access to all features.

Base URL

http://localhost:8000/api/v1

Interactive Documentation

When the server is running, access interactive API docs at:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc
OpenAPI Schema: http://localhost:8000/openapi.json

Authentication

The API supports optional API key authentication via the X-API-Key header:

curl -H "X-API-Key: your-api-key" http://localhost:8000/api/v1/...

By default, the API runs in demo mode without authentication. Configure API keys for production use.

Rate Limiting

60 requests per minute per client IP
1000 requests per hour per client IP

Rate limit headers are included in responses:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 58
X-RateLimit-Reset: 1704067200

Endpoints

Health Check

GET /health

Check API health status.

Response

{
  "status": "healthy",
  "version": "1.0.0",
  "timestamp": "2024-01-01T12:00:00Z"
}

Context Analysis

POST /api/v1/context/analyze

Analyze a context description to extract data requirements and domain classification.

Request Body

{
  "context": "Healthcare clinic with 200 patients needs patient records and appointment scheduling",
  "domain_hint": "healthcare"  // optional
}

Response

{
  "domain": "healthcare",
  "confidence": 0.95,
  "requirements": [
    {
      "name": "patients",
      "description": "Patient demographic records",
      "data_type": "tabular",
      "estimated_volume": "200"
    },
    {
      "name": "appointments",
      "description": "Appointment scheduling records",
      "data_type": "tabular",
      "estimated_volume": "1000"
    }
  ],
  "keywords": ["patient", "healthcare", "appointment", "clinic"],
  "suggested_schemas": ["patient_records", "appointments"]
}

Data Generation

POST /api/v1/generate/sample

Generate a small sample of synthetic data for validation.

Request Body

{
  "context": "Legal firm case management with clients, cases, and documents",
  "sample_size": 5,
  "schemas": []  // optional: specific schemas to generate
}

Response

{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "schemas": [
    {
      "name": "clients",
      "fields": [
        {"name": "id", "type": "UUID", "required": true},
        {"name": "name", "type": "STRING", "required": true},
        {"name": "email", "type": "EMAIL", "required": false}
      ]
    }
  ],
  "samples": {
    "clients": [
      {"id": "...", "name": "John Smith", "email": "john@example.com"},
      {"id": "...", "name": "Jane Doe", "email": "jane@example.com"}
    ]
  },
  "generation_time_ms": 1234
}

POST /api/v1/generate/corpus

Start asynchronous corpus generation. Returns immediately with a job ID.

Request Body

{
  "context": "Financial services with customer accounts and transactions",
  "corpus_size": 10000,
  "batch_size": 100,
  "schemas": [],  // optional: specific schemas to generate
  "export_format": "csv"  // optional: csv, json, xlsx, sql
}

Response

{
  "job_id": "550e8400-e29b-41d4-a716-446655440001",
  "status": "in_progress",
  "message": "Corpus generation started",
  "estimated_completion": "2024-01-01T12:10:00Z"
}

POST /api/v1/generate/feedback

Submit feedback on generated samples and optionally start corpus generation.

Request Body

{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "approved": true,
  "schema_feedback": [
    {
      "schema_name": "clients",
      "feedback_type": "modify",
      "field_adjustments": {
        "email": {"required": true}
      },
      "add_fields": [
        {"name": "phone", "type": "PHONE"}
      ],
      "remove_fields": ["fax"]
    }
  ],
  "start_corpus": true,
  "corpus_size": 5000
}

Response

{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "in_progress",
  "message": "Feedback applied, corpus generation started"
}

Job Management

GET /api/v1/jobs

List all jobs with pagination.

Query Parameters

skip (int): Number of jobs to skip (default: 0)
limit (int): Maximum jobs to return (default: 20, max: 100)
status (string): Filter by status (pending, in_progress, completed, failed)

Response

{
  "jobs": [
    {
      "job_id": "...",
      "status": "completed",
      "created_at": "2024-01-01T12:00:00Z",
      "stage": "corpus_generated"
    }
  ],
  "total": 42,
  "skip": 0,
  "limit": 20
}

GET /api/v1/jobs/{job_id}

Get detailed information about a specific job.

Response

{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "in_progress",
  "stage": "corpus_generated",
  "created_at": "2024-01-01T12:00:00Z",
  "updated_at": "2024-01-01T12:05:00Z",
  "progress": {
    "total_schemas": 3,
    "completed_schemas": 2,
    "current_schema": "transactions",
    "total_records": 10000,
    "generated_records": 7500,
    "percent_complete": 75.0
  },
  "schemas": [...],
  "errors": [],
  "warnings": []
}

GET /api/v1/jobs/{job_id}/status

Quick status check for a job (lighter payload than full job details).

Response

{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "in_progress",
  "stage": "corpus_generated",
  "percent_complete": 75.0
}

DELETE /api/v1/jobs/{job_id}

Delete a job and its associated data.

Response

{
  "message": "Job deleted successfully"
}

POST /api/v1/jobs/{job_id}/cancel

Cancel a running job.

Response

{
  "job_id": "...",
  "status": "failed",
  "message": "Job cancelled by user"
}

Export

POST /api/v1/jobs/{job_id}/export

Export completed job results to a specific format.

Request Body

{
  "format": "csv",  // csv, json, xlsx, sql
  "schemas": [],    // optional: specific schemas to export (default: all)
  "options": {
    "sql_dialect": "postgresql"  // for SQL: postgresql, mysql, sqlite
  }
}

Response

{
  "job_id": "...",
  "export_paths": {
    "clients": "/exports/job-xxx/clients.csv",
    "transactions": "/exports/job-xxx/transactions.csv"
  },
  "format": "csv"
}

GET /api/v1/jobs/{job_id}/download/{schema_name}

Download an exported file for a specific schema.

Query Parameters

format (string): Export format (csv, json, xlsx, sql)

Response

File download with appropriate Content-Type header

Error Responses

All errors follow a consistent format:

{
  "detail": "Error message describing what went wrong",
  "error_code": "VALIDATION_ERROR",
  "status_code": 400
}

Common Error Codes

Code	Status	Description
`VALIDATION_ERROR`	400	Invalid request parameters
`NOT_FOUND`	404	Job or resource not found
`RATE_LIMITED`	429	Too many requests
`INTERNAL_ERROR`	500	Server error
`LLM_ERROR`	502	LLM provider error

Python Client Example

import requests

BASE_URL = "http://localhost:8000/api/v1"

# Analyze context
response = requests.post(
    f"{BASE_URL}/context/analyze",
    json={"context": "Healthcare clinic with patient records"}
)
analysis = response.json()

# Generate samples
response = requests.post(
    f"{BASE_URL}/generate/sample",
    json={
        "context": "Healthcare clinic with patient records",
        "sample_size": 5
    }
)
samples = response.json()

# Start corpus generation
response = requests.post(
    f"{BASE_URL}/generate/corpus",
    json={
        "context": "Healthcare clinic with patient records",
        "corpus_size": 1000
    }
)
job = response.json()
job_id = job["job_id"]

# Poll for completion
import time
while True:
    response = requests.get(f"{BASE_URL}/jobs/{job_id}/status")
    status = response.json()
    if status["status"] in ["completed", "failed"]:
        break
    print(f"Progress: {status['percent_complete']:.1f}%")
    time.sleep(2)

# Export results
response = requests.post(
    f"{BASE_URL}/jobs/{job_id}/export",
    json={"format": "csv"}
)
export_info = response.json()
print(f"Files: {export_info['export_paths']}")

Webhook Notifications (Future)

Webhook support for job completion notifications is planned for a future release.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API Reference

Base URL

Interactive Documentation

Authentication

Rate Limiting

Endpoints

Health Check

GET /health

Context Analysis

POST /api/v1/context/analyze

Data Generation

POST /api/v1/generate/sample

POST /api/v1/generate/corpus

POST /api/v1/generate/feedback

Job Management

GET /api/v1/jobs

GET /api/v1/jobs/{job_id}

GET /api/v1/jobs/{job_id}/status

DELETE /api/v1/jobs/{job_id}

POST /api/v1/jobs/{job_id}/cancel

Export

POST /api/v1/jobs/{job_id}/export

GET /api/v1/jobs/{job_id}/download/{schema_name}

Error Responses

Common Error Codes

Python Client Example

Webhook Notifications (Future)

FilesExpand file tree

api-reference.md

Latest commit

History

api-reference.md

File metadata and controls

API Reference

Base URL

Interactive Documentation

Authentication

Rate Limiting

Endpoints

Health Check

GET /health

Context Analysis

POST /api/v1/context/analyze

Data Generation

POST /api/v1/generate/sample

POST /api/v1/generate/corpus

POST /api/v1/generate/feedback

Job Management

GET /api/v1/jobs

GET /api/v1/jobs/{job_id}

GET /api/v1/jobs/{job_id}/status

DELETE /api/v1/jobs/{job_id}

POST /api/v1/jobs/{job_id}/cancel

Export

POST /api/v1/jobs/{job_id}/export

GET /api/v1/jobs/{job_id}/download/{schema_name}

Error Responses

Common Error Codes

Python Client Example

Webhook Notifications (Future)