A high-performance FastAPI-based proxy server for the ElevenLabs text-to-speech API. This proxy provides a simple interface for converting text to speech with support for multiple audio formats and streaming capabilities.
- FastAPI Framework: High-performance async API server
- Multiple Audio Formats: Support for MP3, PCM, ULAW, ALAW, and OPUS formats
- Streaming Support: Real-time audio streaming capabilities
- Docker Ready: Containerized deployment with Docker Compose
- Health Checks: Built-in health monitoring endpoints
- Environment Configuration: Flexible configuration via environment variables
- Production Ready: Non-root user execution and security best practices
- MP3:
mp3_22050_32,mp3_44100_32,mp3_44100_64,mp3_44100_96,mp3_44100_128,mp3_44100_192 - PCM:
pcm_8000,pcm_16000,pcm_22050,pcm_24000,pcm_44100,pcm_48000 - ULAW/ALAW:
ulaw_8000,alaw_8000 - OPUS:
opus_48000_32,opus_48000_64,opus_48000_96,opus_48000_128,opus_48000_192
- Python 3.11+
- ElevenLabs API key
- Docker (optional, for containerized deployment)
-
Clone the repository
git clone <repository-url> cd 11px
-
Install dependencies with UV (recommended)
pip install uv uv sync
Or with pip:
pip install -r requirements.txt
-
Set up environment variables Create a
.envfile in the project root:API_KEY=your_elevenlabs_api_key HOST=0.0.0.0 PORT=8000 WORKERS=1 DEBUG=false
-
Run the server
python main.py
-
Create environment file
cp .env.example .env # Edit .env with your configuration -
Start with Docker Compose
docker-compose up -d
-
Check health
curl http://localhost:8000/ping
http://localhost:8000
GET /pingReturns: {"ping": "pong"}
POST /v1/text-to-speech/{voice_id}Request Body:
{
"text": "Hello, world!",
"output_format": "mp3_44100_128",
"model_id": "eleven_multilingual_v2"
}Example:
curl -X POST "http://localhost:8000/v1/text-to-speech/YOUR_VOICE_ID" \
-H "Content-Type: application/json" \
-d '{
"text": "Hello, this is a test message.",
"output_format": "mp3_44100_128"
}' \
--output audio.mp3POST /v1/text-to-speech/{voice_id}/streamSame request format as above, but returns a streaming response for real-time audio playback.
voice_id: ElevenLabs voice IDtext: Text to convert to speechoutput_format: Audio format (see supported formats above)model_id: ElevenLabs model ID (optional)
| Variable | Description | Default |
|---|---|---|
API_KEY |
ElevenLabs API key | Required |
HOST |
Server host | 0.0.0.0 |
PORT |
Server port | 8000 |
WORKERS |
Number of worker processes | 1 |
DEBUG |
Enable debug logging | false |
The application includes:
- Multi-stage build for optimized image size
- Non-root user for security
- Health checks for monitoring
- UV package manager for faster dependency installation
- Network host mode for optimal performance
Run the test suite:
python test.pyThe application includes structured logging with configurable levels:
- Production: INFO level
- Development: DEBUG level (set
DEBUG=true)
- Non-root container execution
- Environment-based configuration
- Input validation for audio formats
- Error handling with appropriate HTTP status codes
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
[Add your license information here]
For issues and questions:
- Check the Issues page
- Review the ElevenLabs API documentation
- Ensure your API key is valid and has sufficient credits