Reading view
Velvet Voice
Velvet Voice
A minimal Next.js 16 app with a single centered panel and a token-protected API that sends text to a Python bridge using Pipecat's Deepgram TTS service and returns streamable Opus audio in a WebM container.
Stack
- Next.js 16.1.1 with the App Router
- React 19
- Python bridge script using
pipecat-ai[deepgram] - Deepgram Aura-2 TTS voice, defaulting to
aura-2-juno-en ffmpegfor Opus-in-WebM transcoding
Local setup
- Copy
.env.exampleto.envand add yourDEEPGRAM_API_KEY. - Create a supported Python 3.13 virtual environment with
/opt/homebrew/bin/python3.13 -m venv .venv-py313. - Install Python dependencies with
.venv-py313/bin/pip install -r requirements.txt. - Install Node dependencies with
npm install. - Install
ffmpegif it is not already available. - Start the app with
npm run dev.
Environment variables
DEEPGRAM_API_KEY: required.DEEPGRAM_VOICE: optional default voice model. The app defaults toaura-2-juno-en.PYTHON_BIN: Python interpreter for the Pipecat bridge. Use.venv-py313/bin/python.FFMPEG_BIN: path toffmpeg. On this machine it is/opt/homebrew/bin/ffmpeg.VOICE_API_TOKENS: comma-separated or newline-separated bearer tokens for the external API.
Generate a token with:
npm run token:generate
API
Public demo route
POST /api/speak- No token required.
- Intended for the local demo UI.
External protected route
POST /api/v1/speak- Requires
Authorization: Bearer <token>orx-api-token: <token>. - Streams
audio/webmwith Opus audio in a WebM container.
Request body
{
"text": "Turn this paragraph into audio.",
"voice": "aura-2-juno-en",
"bitrateKbps": 24
}
Request rules
textis required and capped at 1200 characters.voiceis optional.bitrateKbpsis optional and must be an integer from 16 to 32. Default is24.
Response headers
Content-Type: audio/webmContent-Disposition: inline; filename="speech.webm"X-Audio-Codec: opusX-Audio-Container: webmX-Audio-Bitrate-Kbps: <value>X-Audio-Channels: 1X-Audio-Sample-Rate: 48000X-Voice-Model: <value>
External app examples
cURL
curl -X POST http://localhost:3000/api/v1/speak \
-H "Authorization: Bearer dev-voice-token-change-me" \
-H "Content-Type: application/json" \
-o speech.webm \
-d '{
"text": "Read this in a warm, intimate tone.",
"voice": "aura-2-juno-en",
"bitrateKbps": 24
}'
JavaScript / TypeScript
const response = await fetch("http://localhost:3000/api/v1/speak", {
method: "POST",
headers: {
Authorization: "Bearer dev-voice-token-change-me",
"Content-Type": "application/json",
},
body: JSON.stringify({
text: "Read this in a warm, intimate tone.",
voice: "aura-2-juno-en",
bitrateKbps: 24,
}),
});
if (!response.ok) {
throw new Error(await response.text());
}
const audioBlob = await response.blob();
const audioUrl = URL.createObjectURL(audioBlob);
const audio = new Audio(audioUrl);
audio.play();
Python
import requests
response = requests.post(
"http://localhost:3000/api/v1/speak",
headers={
"Authorization": "Bearer dev-voice-token-change-me",
"Content-Type": "application/json",
},
json={
"text": "Read this in a warm, intimate tone.",
"voice": "aura-2-juno-en",
"bitrateKbps": 24,
},
timeout=120,
)
response.raise_for_status()
with open("speech.webm", "wb") as handle:
handle.write(response.content)
Notes
- The Next.js routes run only on the Node runtime because they shell out to Python and
ffmpeg. - The returned file is mono Opus at 48 kHz in a WebM container, tuned for low-bitrate speech delivery and browser playback.
- The response is streamed, so clients should not expect a
Content-Lengthheader. - Pipecat currently rejects Python 3.14 during installation. Set
PYTHON_BIN=.venv-py313/bin/pythonin.envso the route uses your supported virtualenv explicitly. - The public
/api/speakroute is for the local UI demo. Use/api/v1/speakfor app-to-app integration.