Velvet Voice

A minimal Next.js 16 app with a single centered panel and a token-protected API that sends text to a Python bridge using Pipecat's Deepgram TTS service and returns streamable Opus audio in a WebM container.

Stack

Next.js 16.1.1 with the App Router
React 19
Python bridge script using pipecat-ai[deepgram]
Deepgram Aura-2 TTS voice, defaulting to aura-2-juno-en
ffmpeg for Opus-in-WebM transcoding

Local setup

Copy .env.example to .env and add your DEEPGRAM_API_KEY.
Create a supported Python 3.13 virtual environment with /opt/homebrew/bin/python3.13 -m venv .venv-py313.
Install Python dependencies with .venv-py313/bin/pip install -r requirements.txt.
Install Node dependencies with npm install.
Install ffmpeg if it is not already available.
Start the app with npm run dev.

Environment variables

DEEPGRAM_API_KEY: required.
DEEPGRAM_VOICE: optional default voice model. The app defaults to aura-2-juno-en.
PYTHON_BIN: Python interpreter for the Pipecat bridge. Use .venv-py313/bin/python.
FFMPEG_BIN: path to ffmpeg. On this machine it is /opt/homebrew/bin/ffmpeg.
VOICE_API_TOKENS: comma-separated or newline-separated bearer tokens for the external API.

Generate a token with:

npm run token:generate

API

Public demo route

POST /api/speak
No token required.
Intended for the local demo UI.

External protected route

POST /api/v1/speak
Requires Authorization: Bearer <token> or x-api-token: <token>.
Streams audio/webm with Opus audio in a WebM container.

Request body

{
  "text": "Turn this paragraph into audio.",
  "voice": "aura-2-juno-en",
  "bitrateKbps": 24
}

Request rules

text is required and capped at 1200 characters.
voice is optional.
bitrateKbps is optional and must be an integer from 16 to 32. Default is 24.

Response headers

Content-Type: audio/webm
Content-Disposition: inline; filename="speech.webm"
X-Audio-Codec: opus
X-Audio-Container: webm
X-Audio-Bitrate-Kbps: <value>
X-Audio-Channels: 1
X-Audio-Sample-Rate: 48000
X-Voice-Model: <value>

External app examples

cURL

curl -X POST http://localhost:3000/api/v1/speak \
  -H "Authorization: Bearer dev-voice-token-change-me" \
  -H "Content-Type: application/json" \
  -o speech.webm \
  -d '{
    "text": "Read this in a warm, intimate tone.",
    "voice": "aura-2-juno-en",
    "bitrateKbps": 24
  }'

JavaScript / TypeScript

const response = await fetch("http://localhost:3000/api/v1/speak", {
  method: "POST",
  headers: {
    Authorization: "Bearer dev-voice-token-change-me",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    text: "Read this in a warm, intimate tone.",
    voice: "aura-2-juno-en",
    bitrateKbps: 24,
  }),
});

if (!response.ok) {
  throw new Error(await response.text());
}

const audioBlob = await response.blob();
const audioUrl = URL.createObjectURL(audioBlob);
const audio = new Audio(audioUrl);
audio.play();

Python

import requests

response = requests.post(
    "http://localhost:3000/api/v1/speak",
    headers={
        "Authorization": "Bearer dev-voice-token-change-me",
        "Content-Type": "application/json",
    },
    json={
        "text": "Read this in a warm, intimate tone.",
        "voice": "aura-2-juno-en",
        "bitrateKbps": 24,
    },
    timeout=120,
)
response.raise_for_status()

with open("speech.webm", "wb") as handle:
    handle.write(response.content)

Notes

The Next.js routes run only on the Node runtime because they shell out to Python and ffmpeg.
The returned file is mono Opus at 48 kHz in a WebM container, tuned for low-bitrate speech delivery and browser playback.
The response is streamed, so clients should not expect a Content-Length header.
Pipecat currently rejects Python 3.14 during installation. Set PYTHON_BIN=.venv-py313/bin/python in .env so the route uses your supported virtualenv explicitly.
The public /api/speak route is for the local UI demo. Use /api/v1/speak for app-to-app integration.