Reading view

Velvet Voice

Velvet Voice

A minimal Next.js 16 app with a single centered panel and a token-protected API that sends text to a Python bridge using Pipecat's Deepgram TTS service and returns streamable Opus audio in a WebM container.

Stack

  • Next.js 16.1.1 with the App Router
  • React 19
  • Python bridge script using pipecat-ai[deepgram]
  • Deepgram Aura-2 TTS voice, defaulting to aura-2-juno-en
  • ffmpeg for Opus-in-WebM transcoding

Local setup

  1. Copy .env.example to .env and add your DEEPGRAM_API_KEY.
  2. Create a supported Python 3.13 virtual environment with /opt/homebrew/bin/python3.13 -m venv .venv-py313.
  3. Install Python dependencies with .venv-py313/bin/pip install -r requirements.txt.
  4. Install Node dependencies with npm install.
  5. Install ffmpeg if it is not already available.
  6. Start the app with npm run dev.

Environment variables

  • DEEPGRAM_API_KEY: required.
  • DEEPGRAM_VOICE: optional default voice model. The app defaults to aura-2-juno-en.
  • PYTHON_BIN: Python interpreter for the Pipecat bridge. Use .venv-py313/bin/python.
  • FFMPEG_BIN: path to ffmpeg. On this machine it is /opt/homebrew/bin/ffmpeg.
  • VOICE_API_TOKENS: comma-separated or newline-separated bearer tokens for the external API.

Generate a token with:

npm run token:generate

API

Public demo route

  • POST /api/speak
  • No token required.
  • Intended for the local demo UI.

External protected route

  • POST /api/v1/speak
  • Requires Authorization: Bearer <token> or x-api-token: <token>.
  • Streams audio/webm with Opus audio in a WebM container.

Request body

{
  "text": "Turn this paragraph into audio.",
  "voice": "aura-2-juno-en",
  "bitrateKbps": 24
}

Request rules

  • text is required and capped at 1200 characters.
  • voice is optional.
  • bitrateKbps is optional and must be an integer from 16 to 32. Default is 24.

Response headers

  • Content-Type: audio/webm
  • Content-Disposition: inline; filename="speech.webm"
  • X-Audio-Codec: opus
  • X-Audio-Container: webm
  • X-Audio-Bitrate-Kbps: <value>
  • X-Audio-Channels: 1
  • X-Audio-Sample-Rate: 48000
  • X-Voice-Model: <value>

External app examples

cURL

curl -X POST http://localhost:3000/api/v1/speak \
  -H "Authorization: Bearer dev-voice-token-change-me" \
  -H "Content-Type: application/json" \
  -o speech.webm \
  -d '{
    "text": "Read this in a warm, intimate tone.",
    "voice": "aura-2-juno-en",
    "bitrateKbps": 24
  }'

JavaScript / TypeScript

const response = await fetch("http://localhost:3000/api/v1/speak", {
  method: "POST",
  headers: {
    Authorization: "Bearer dev-voice-token-change-me",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    text: "Read this in a warm, intimate tone.",
    voice: "aura-2-juno-en",
    bitrateKbps: 24,
  }),
});

if (!response.ok) {
  throw new Error(await response.text());
}

const audioBlob = await response.blob();
const audioUrl = URL.createObjectURL(audioBlob);
const audio = new Audio(audioUrl);
audio.play();

Python

import requests

response = requests.post(
    "http://localhost:3000/api/v1/speak",
    headers={
        "Authorization": "Bearer dev-voice-token-change-me",
        "Content-Type": "application/json",
    },
    json={
        "text": "Read this in a warm, intimate tone.",
        "voice": "aura-2-juno-en",
        "bitrateKbps": 24,
    },
    timeout=120,
)
response.raise_for_status()

with open("speech.webm", "wb") as handle:
    handle.write(response.content)

Notes

  • The Next.js routes run only on the Node runtime because they shell out to Python and ffmpeg.
  • The returned file is mono Opus at 48 kHz in a WebM container, tuned for low-bitrate speech delivery and browser playback.
  • The response is streamed, so clients should not expect a Content-Length header.
  • Pipecat currently rejects Python 3.14 during installation. Set PYTHON_BIN=.venv-py313/bin/python in .env so the route uses your supported virtualenv explicitly.
  • The public /api/speak route is for the local UI demo. Use /api/v1/speak for app-to-app integration.