Synthesize speech
POST /v1/voice/tts — Stream synthesized speech for up to 4096 chars of text. The provider body is piped through unbuffered, so first audio by
POST /v1/voice/tts
Stream synthesized speech for up to 4096 chars of text. The provider body is piped through unbuffered, so first audio bytes arrive before synthesis completes. format is one of mp3 (default), opus, wav; opus routes to the OpenAI provider (Telegram-style voice notes). Sentence-by-sentence callers can pass previous_text / next_text (each truncated to 600 chars) so ElevenLabs keeps prosody continuous across requests, and first_chunk: true on a reply's first sentence to let the gateway swap in the fastest model when the faster-start setting is on.
| Method | POST |
| Path | /v1/voice/tts |
| Auth | Authorization: Bearer <token> required when GATEWAY_AUTH_TOKEN is set |
| Category | voice |
Request body
{ "text": "...", "voice": "optional", "format": "mp3", "first_chunk": false, "previous_text": "optional", "next_text": "optional" }Response body
audio bytes (audio/mpeg | audio/ogg | audio/wav)