Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
07b321e
Add importables + Gradio GUI
fakerybakery Nov 24, 2023
48c2c20
Update README.md
fakerybakery Nov 24, 2023
acf0c9e
Add API!
fakerybakery Nov 24, 2023
66f241f
Update API
fakerybakery Nov 24, 2023
bce4029
Sort out GPL issues
fakerybakery Nov 24, 2023
5b5210f
Whoops, accidentally added GPLv2 instead of v3
fakerybakery Nov 24, 2023
6f90d34
GPLV3
fakerybakery Nov 24, 2023
1877f04
Add voices
fakerybakery Nov 24, 2023
56b2c1b
Add CORS dep
fakerybakery Nov 24, 2023
fc42d11
Update CORS
fakerybakery Nov 24, 2023
5881879
Fix
fakerybakery Nov 24, 2023
3bfba57
Finally fix it
fakerybakery Nov 24, 2023
4a6ee7e
Another fix
fakerybakery Nov 24, 2023
7043935
fix typo
fakerybakery Nov 24, 2023
c103347
Merge branch 'yl4579:main' into main
fakerybakery Nov 26, 2023
4cacb3b
Merge branch 'yl4579:main' into main
fakerybakery Nov 27, 2023
a7ca030
Merge branch 'yl4579:main' into main
fakerybakery Nov 30, 2023
c294fcb
Merge branch 'yl4579:main' into main
fakerybakery Dec 7, 2023
79eede3
Merge branch 'yl4579:main' into main
fakerybakery Dec 10, 2023
793193d
Merge branch 'yl4579:main' into main
fakerybakery Dec 16, 2023
72614d3
Merge branch 'yl4579:main' into main
fakerybakery Dec 18, 2023
fb34531
Update README.md
fakerybakery Jan 22, 2024
b0e9fa1
Update README.md
fakerybakery Jan 22, 2024
2849d7e
Update README.md
fakerybakery Mar 11, 2024
e89c88e
Update README.md
fakerybakery Mar 11, 2024
321afd9
Update README.md
fakerybakery Mar 22, 2024
5a4c62e
Update README.md
fakerybakery Mar 26, 2024
bdd48b8
Update README.md
fakerybakery Apr 27, 2024
7b3c5e9
Update README.md
fakerybakery Apr 27, 2024
8631959
feat: Add streaming API and WebSocket support
ShabbirMarfatiya Dec 13, 2025
5540df1
Merge pull request #1 from ShabbirMarfatiya/feat/streaming-api-and-we…
ShabbirMarfatiya Dec 13, 2025
d468327
fix: PyTorch torch.load error
ShabbirMarfatiya Dec 13, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
151 changes: 151 additions & 0 deletions API_DOCS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
# StyleTTS2 HTTP Streaming API Documentation

## Overview

The HTTP Streaming API provides text-to-speech synthesis with real-time audio streaming. The server uses Flask and returns WAV audio data.

## Base URL

```
http://localhost:5000
```

## Endpoints

### GET /

Returns API documentation in HTML format.

---

### POST /api/v1/stream

Synthesizes speech from text with streaming audio response.

**Request Body (form-data):**

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `text` | string | Yes | Text to synthesize |
| `voice` | string | Yes | Voice ID (see available voices below) |
| `steps` | integer | No | Diffusion steps (default: 7, higher = better quality) |

**Response:**
- Content-Type: `audio/x-wav`
- Streams WAV audio data in chunks

**Example with curl:**

```bash
curl -X POST http://localhost:5000/api/v1/stream \
-d "text=Hello, this is a test of the streaming API." \
-d "voice=f-us-1" \
-d "steps=7" \
--output output.wav
```

**Example with Python:**

```python
import requests

response = requests.post(
"http://localhost:5000/api/v1/stream",
data={
"text": "Hello, this is a test.",
"voice": "f-us-1",
"steps": 7
},
stream=True
)

with open("output.wav", "wb") as f:
for chunk in response.iter_content(chunk_size=8192):
f.write(chunk)
```

---

### POST /api/v1/static

Synthesizes speech from text and returns complete audio file.

**Request Body (form-data):**

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `text` | string | Yes | Text to synthesize |
| `voice` | string | Yes | Voice ID |

**Response:**
- Content-Type: `audio/wav`
- Returns complete WAV file

**Example:**

```bash
curl -X POST http://localhost:5000/api/v1/static \
-d "text=Hello world" \
-d "voice=m-us-1" \
--output output.wav
```

---

## Available Voices

| Voice ID | Description |
|----------|-------------|
| `f-us-1` | Female US English #1 |
| `f-us-2` | Female US English #2 |
| `f-us-3` | Female US English #3 |
| `f-us-4` | Female US English #4 |
| `m-us-1` | Male US English #1 |
| `m-us-2` | Male US English #2 |
| `m-us-3` | Male US English #3 |
| `m-us-4` | Male US English #4 |

---

## Error Responses

All errors return JSON with an `error` field:

```json
{
"error": "Missing required fields. Please include \"text\" and \"voice\" in your request."
}
```

**Common errors:**
- `400`: Missing required fields or invalid voice selection

---

## Testing

Use the provided test client:

```bash
# List available voices
python test_api_client.py --list-voices

# Check server status
python test_api_client.py --check-server

# Synthesize speech
python test_api_client.py -t "Hello world" -v f-us-1 -o output.wav

# With custom diffusion steps
python test_api_client.py -t "Hello world" -v m-us-2 -o output.wav -s 10
```

---

## Starting the Server

```bash
python api.py
```

The server starts on `http://0.0.0.0:5000` by default.
Loading