Synthesize speech

POST /v1/voice/tts — Stream synthesized speech for up to 4096 chars of text. The provider body is piped through unbuffered, so first audio by

POST `/v1/voice/tts`

Stream synthesized speech for up to 4096 chars of text. The provider body is piped through unbuffered, so first audio bytes arrive before synthesis completes. format is one of mp3 (default), opus, wav; opus routes to the OpenAI provider (Telegram-style voice notes). Sentence-by-sentence callers can pass previous_text / next_text (each truncated to 600 chars) so ElevenLabs keeps prosody continuous across requests, and first_chunk: true on a reply's first sentence to let the gateway swap in the fastest model when the faster-start setting is on.


Method	`POST`
Path	`/v1/voice/tts`
Auth	`Authorization: Bearer <token>` required when `GATEWAY_AUTH_TOKEN` is set
Category	voice

Request body

{ "text": "...", "voice": "optional", "format": "mp3", "first_chunk": false, "previous_text": "optional", "next_text": "optional" }

Response body

audio bytes (audio/mpeg | audio/ogg | audio/wav)

POST /v1/voice/tts

Request body

Response body

On this page

POST `/v1/voice/tts`