Skip to content

Latest commit

 

History

History
419 lines (336 loc) · 8.3 KB

File metadata and controls

419 lines (336 loc) · 8.3 KB

API Reference

The Synthetic Data Solution provides a REST API built with FastAPI for programmatic access to all features.

Base URL

http://localhost:8000/api/v1

Interactive Documentation

When the server is running, access interactive API docs at:

Authentication

The API supports optional API key authentication via the X-API-Key header:

curl -H "X-API-Key: your-api-key" http://localhost:8000/api/v1/...

By default, the API runs in demo mode without authentication. Configure API keys for production use.

Rate Limiting

  • 60 requests per minute per client IP
  • 1000 requests per hour per client IP

Rate limit headers are included in responses:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 58
X-RateLimit-Reset: 1704067200

Endpoints

Health Check

GET /health

Check API health status.

Response

{
  "status": "healthy",
  "version": "1.0.0",
  "timestamp": "2024-01-01T12:00:00Z"
}

Context Analysis

POST /api/v1/context/analyze

Analyze a context description to extract data requirements and domain classification.

Request Body

{
  "context": "Healthcare clinic with 200 patients needs patient records and appointment scheduling",
  "domain_hint": "healthcare"  // optional
}

Response

{
  "domain": "healthcare",
  "confidence": 0.95,
  "requirements": [
    {
      "name": "patients",
      "description": "Patient demographic records",
      "data_type": "tabular",
      "estimated_volume": "200"
    },
    {
      "name": "appointments",
      "description": "Appointment scheduling records",
      "data_type": "tabular",
      "estimated_volume": "1000"
    }
  ],
  "keywords": ["patient", "healthcare", "appointment", "clinic"],
  "suggested_schemas": ["patient_records", "appointments"]
}

Data Generation

POST /api/v1/generate/sample

Generate a small sample of synthetic data for validation.

Request Body

{
  "context": "Legal firm case management with clients, cases, and documents",
  "sample_size": 5,
  "schemas": []  // optional: specific schemas to generate
}

Response

{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "schemas": [
    {
      "name": "clients",
      "fields": [
        {"name": "id", "type": "UUID", "required": true},
        {"name": "name", "type": "STRING", "required": true},
        {"name": "email", "type": "EMAIL", "required": false}
      ]
    }
  ],
  "samples": {
    "clients": [
      {"id": "...", "name": "John Smith", "email": "john@example.com"},
      {"id": "...", "name": "Jane Doe", "email": "jane@example.com"}
    ]
  },
  "generation_time_ms": 1234
}

POST /api/v1/generate/corpus

Start asynchronous corpus generation. Returns immediately with a job ID.

Request Body

{
  "context": "Financial services with customer accounts and transactions",
  "corpus_size": 10000,
  "batch_size": 100,
  "schemas": [],  // optional: specific schemas to generate
  "export_format": "csv"  // optional: csv, json, xlsx, sql
}

Response

{
  "job_id": "550e8400-e29b-41d4-a716-446655440001",
  "status": "in_progress",
  "message": "Corpus generation started",
  "estimated_completion": "2024-01-01T12:10:00Z"
}

POST /api/v1/generate/feedback

Submit feedback on generated samples and optionally start corpus generation.

Request Body

{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "approved": true,
  "schema_feedback": [
    {
      "schema_name": "clients",
      "feedback_type": "modify",
      "field_adjustments": {
        "email": {"required": true}
      },
      "add_fields": [
        {"name": "phone", "type": "PHONE"}
      ],
      "remove_fields": ["fax"]
    }
  ],
  "start_corpus": true,
  "corpus_size": 5000
}

Response

{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "in_progress",
  "message": "Feedback applied, corpus generation started"
}

Job Management

GET /api/v1/jobs

List all jobs with pagination.

Query Parameters

  • skip (int): Number of jobs to skip (default: 0)
  • limit (int): Maximum jobs to return (default: 20, max: 100)
  • status (string): Filter by status (pending, in_progress, completed, failed)

Response

{
  "jobs": [
    {
      "job_id": "...",
      "status": "completed",
      "created_at": "2024-01-01T12:00:00Z",
      "stage": "corpus_generated"
    }
  ],
  "total": 42,
  "skip": 0,
  "limit": 20
}

GET /api/v1/jobs/{job_id}

Get detailed information about a specific job.

Response

{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "in_progress",
  "stage": "corpus_generated",
  "created_at": "2024-01-01T12:00:00Z",
  "updated_at": "2024-01-01T12:05:00Z",
  "progress": {
    "total_schemas": 3,
    "completed_schemas": 2,
    "current_schema": "transactions",
    "total_records": 10000,
    "generated_records": 7500,
    "percent_complete": 75.0
  },
  "schemas": [...],
  "errors": [],
  "warnings": []
}

GET /api/v1/jobs/{job_id}/status

Quick status check for a job (lighter payload than full job details).

Response

{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "in_progress",
  "stage": "corpus_generated",
  "percent_complete": 75.0
}

DELETE /api/v1/jobs/{job_id}

Delete a job and its associated data.

Response

{
  "message": "Job deleted successfully"
}

POST /api/v1/jobs/{job_id}/cancel

Cancel a running job.

Response

{
  "job_id": "...",
  "status": "failed",
  "message": "Job cancelled by user"
}

Export

POST /api/v1/jobs/{job_id}/export

Export completed job results to a specific format.

Request Body

{
  "format": "csv",  // csv, json, xlsx, sql
  "schemas": [],    // optional: specific schemas to export (default: all)
  "options": {
    "sql_dialect": "postgresql"  // for SQL: postgresql, mysql, sqlite
  }
}

Response

{
  "job_id": "...",
  "export_paths": {
    "clients": "/exports/job-xxx/clients.csv",
    "transactions": "/exports/job-xxx/transactions.csv"
  },
  "format": "csv"
}

GET /api/v1/jobs/{job_id}/download/{schema_name}

Download an exported file for a specific schema.

Query Parameters

  • format (string): Export format (csv, json, xlsx, sql)

Response

  • File download with appropriate Content-Type header

Error Responses

All errors follow a consistent format:

{
  "detail": "Error message describing what went wrong",
  "error_code": "VALIDATION_ERROR",
  "status_code": 400
}

Common Error Codes

Code Status Description
VALIDATION_ERROR 400 Invalid request parameters
NOT_FOUND 404 Job or resource not found
RATE_LIMITED 429 Too many requests
INTERNAL_ERROR 500 Server error
LLM_ERROR 502 LLM provider error

Python Client Example

import requests

BASE_URL = "http://localhost:8000/api/v1"

# Analyze context
response = requests.post(
    f"{BASE_URL}/context/analyze",
    json={"context": "Healthcare clinic with patient records"}
)
analysis = response.json()

# Generate samples
response = requests.post(
    f"{BASE_URL}/generate/sample",
    json={
        "context": "Healthcare clinic with patient records",
        "sample_size": 5
    }
)
samples = response.json()

# Start corpus generation
response = requests.post(
    f"{BASE_URL}/generate/corpus",
    json={
        "context": "Healthcare clinic with patient records",
        "corpus_size": 1000
    }
)
job = response.json()
job_id = job["job_id"]

# Poll for completion
import time
while True:
    response = requests.get(f"{BASE_URL}/jobs/{job_id}/status")
    status = response.json()
    if status["status"] in ["completed", "failed"]:
        break
    print(f"Progress: {status['percent_complete']:.1f}%")
    time.sleep(2)

# Export results
response = requests.post(
    f"{BASE_URL}/jobs/{job_id}/export",
    json={"format": "csv"}
)
export_info = response.json()
print(f"Files: {export_info['export_paths']}")

Webhook Notifications (Future)

Webhook support for job completion notifications is planned for a future release.