Add initial implementation of Kokoro TTS API with Docker GPU support

- Set up FastAPI application with TTS service
- Define API endpoints for TTS generation and voice listing
- Implement Pydantic models for request and response schemas
- Add Dockerfile and docker-compose.yml for containerization
- Include example usage and benchmark results in README
This commit is contained in:
remsky 2024-12-30 04:17:50 -07:00
commit ce0ef3534a
21 changed files with 1099 additions and 0 deletions

15
.gitignore vendored Normal file
View file

@ -0,0 +1,15 @@
output/
*.db
*.pyc
*.pth
Kokoro-82M/*
__pycache__/
.vscode/
env/
.Python

52
Dockerfile Normal file
View file

@ -0,0 +1,52 @@
FROM nvidia/cuda:12.1.0-base-ubuntu22.04
ARG KOKORO_REPO
# Install base system dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
python3-pip \
python3-dev \
espeak-ng \
git \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
# Install heavy Python dependencies first (better layer caching)
RUN pip3 install --no-cache-dir \
phonemizer \
torch \
transformers \
scipy \
munch
# Install API dependencies
RUN pip3 install --no-cache-dir fastapi uvicorn pydantic-settings
# Set working directory
WORKDIR /app
# --(can skip if pre-cloning the repo)--
# Install git-lfs
RUN apt-get update && apt-get install -y git-lfs \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* \
&& git lfs install
# Clone Kokoro repo
RUN git clone ${KOKORO_REPO} .
# --------------------------------------
# Create output directory
RUN mkdir -p output
# Run with Python unbuffered output for live logging
ENV PYTHONUNBUFFERED=1
# Copy API files over
COPY api/src /app/api/src
# Set Python path
ENV PYTHONPATH=/app
# Run FastAPI server
CMD ["uvicorn", "api.src.main:app", "--host", "0.0.0.0", "--port", "8880"]

50
README.md Normal file
View file

@ -0,0 +1,50 @@
# Kokoro TTS API
FastAPI wrapper for Kokoro TTS with voice cloning. Runs inference on GPU.
## Quick Start
```bash
# Start the API (will automatically download model on first run)
docker compose up --build
```
```bash
# From host terminal, test it out with some API calls
python examples/test_tts.py "Hello world" --voice af_bella
```
## API Endpoints
```bash
GET /tts/voices # List voices
POST /tts # Generate speech
GET /tts/{request_id} # Check status
GET /tts/file/{request_id} # Get audio file
```
## Example Usage
List voices:
```bash
python examples/test_tts.py
```
Generate speech:
```bash
# Default voice
python examples/test_tts.py "Your text here"
# Specific voice
python examples/test_tts.py --voice af_bella "Your text here"
# Just get file path (no download)
python examples/test_tts.py --no-download "Your text here"
```
Generated files in `examples/output/` (or in src/output/ of API if --no-download)
## Requirements
- Docker
- NVIDIA GPU + CUDA
- nvidia-container-toolkit

3
api/src/core/__init__.py Normal file
View file

@ -0,0 +1,3 @@
from .config import settings
__all__ = ["settings"]

23
api/src/core/config.py Normal file
View file

@ -0,0 +1,23 @@
from pydantic_settings import BaseSettings
class Settings(BaseSettings):
# API Settings
api_title: str = "Kokoro TTS API"
api_description: str = "API for text-to-speech generation using Kokoro"
api_version: str = "1.0.0"
host: str = "0.0.0.0"
port: int = 8880
# TTS Settings
output_dir: str = "output"
default_voice: str = "af"
model_path: str = "kokoro-v0_19.pth"
voices_dir: str = "voices"
sample_rate: int = 24000
class Config:
env_file = ".env"
settings = Settings()

View file

@ -0,0 +1,3 @@
from .queue import QueueDB
__all__ = ["QueueDB"]

147
api/src/database/queue.py Normal file
View file

@ -0,0 +1,147 @@
import sqlite3
import os
from pathlib import Path
from typing import Optional, Tuple
DB_PATH = Path(__file__).parent.parent / "output" / "queue.db"
class QueueDB:
def __init__(self, db_path: str = str(DB_PATH)):
self.db_path = db_path
os.makedirs(os.path.dirname(db_path), exist_ok=True)
self._init_db()
def _init_db(self):
"""Initialize the database with required tables"""
conn = sqlite3.connect(self.db_path)
c = conn.cursor()
c.execute("""
CREATE TABLE IF NOT EXISTS tts_queue
(id INTEGER PRIMARY KEY AUTOINCREMENT,
text TEXT NOT NULL,
voice TEXT DEFAULT 'af',
status TEXT DEFAULT 'pending',
output_file TEXT,
processing_time REAL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP)
""")
conn.commit()
conn.close()
def _ensure_table_if_needed(self, conn: sqlite3.Connection):
"""Create table if it doesn't exist, only called for write operations"""
c = conn.cursor()
c.execute("""
CREATE TABLE IF NOT EXISTS tts_queue
(id INTEGER PRIMARY KEY AUTOINCREMENT,
text TEXT NOT NULL,
voice TEXT DEFAULT 'af',
status TEXT DEFAULT 'pending',
output_file TEXT,
processing_time REAL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP)
""")
conn.commit()
def add_request(self, text: str, voice: str) -> int:
"""Add a new TTS request to the queue"""
conn = sqlite3.connect(self.db_path)
try:
c = conn.cursor()
c.execute(
"INSERT INTO tts_queue (text, voice) VALUES (?, ?)", (text, voice)
)
request_id = c.lastrowid
conn.commit()
return request_id
except sqlite3.OperationalError: # Table doesn't exist
self._ensure_table_if_needed(conn)
c = conn.cursor()
c.execute(
"INSERT INTO tts_queue (text, voice) VALUES (?, ?)", (text, voice)
)
request_id = c.lastrowid
conn.commit()
return request_id
finally:
conn.close()
def get_next_pending(self) -> Optional[Tuple[int, str, str]]:
"""Get the next pending request"""
conn = sqlite3.connect(self.db_path)
try:
c = conn.cursor()
c.execute(
'SELECT id, text, voice FROM tts_queue WHERE status = "pending" ORDER BY created_at ASC LIMIT 1'
)
return c.fetchone()
except sqlite3.OperationalError: # Table doesn't exist
return None
finally:
conn.close()
def update_status(
self,
request_id: int,
status: str,
output_file: Optional[str] = None,
processing_time: Optional[float] = None,
):
"""Update request status, output file, and processing time"""
conn = sqlite3.connect(self.db_path)
try:
c = conn.cursor()
if output_file and processing_time is not None:
c.execute(
"UPDATE tts_queue SET status = ?, output_file = ?, processing_time = ? WHERE id = ?",
(status, output_file, processing_time, request_id),
)
elif output_file:
c.execute(
"UPDATE tts_queue SET status = ?, output_file = ? WHERE id = ?",
(status, output_file, request_id),
)
else:
c.execute(
"UPDATE tts_queue SET status = ? WHERE id = ?", (status, request_id)
)
conn.commit()
except sqlite3.OperationalError: # Table doesn't exist
self._ensure_table_if_needed(conn)
# Retry the update
c = conn.cursor()
if output_file and processing_time is not None:
c.execute(
"UPDATE tts_queue SET status = ?, output_file = ?, processing_time = ? WHERE id = ?",
(status, output_file, processing_time, request_id),
)
elif output_file:
c.execute(
"UPDATE tts_queue SET status = ?, output_file = ? WHERE id = ?",
(status, output_file, request_id),
)
else:
c.execute(
"UPDATE tts_queue SET status = ? WHERE id = ?", (status, request_id)
)
conn.commit()
finally:
conn.close()
def get_status(
self, request_id: int
) -> Optional[Tuple[str, Optional[str], Optional[float]]]:
"""Get status, output file, and processing time for a request"""
conn = sqlite3.connect(self.db_path)
try:
c = conn.cursor()
c.execute(
"SELECT status, output_file, processing_time FROM tts_queue WHERE id = ?",
(request_id,),
)
return c.fetchone()
except sqlite3.OperationalError: # Table doesn't exist
return None
finally:
conn.close()

36
api/src/main.py Normal file
View file

@ -0,0 +1,36 @@
import uvicorn
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from .core.config import settings
from .routers import tts_router
# Initialize FastAPI app
app = FastAPI(
title=settings.api_title,
description=settings.api_description,
version=settings.api_version,
)
# Add CORS middleware
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
# Include routers
app.include_router(tts_router)
# Health check endpoint
@app.get("/health")
async def health_check():
"""Health check endpoint"""
return {"status": "healthy"}
if __name__ == "__main__":
uvicorn.run("api.src.main:app", host=settings.host, port=settings.port, reload=True)

View file

@ -0,0 +1,3 @@
from .schemas import TTSRequest, TTSResponse, VoicesResponse
__all__ = ["TTSRequest", "TTSResponse", "VoicesResponse"]

21
api/src/models/schemas.py Normal file
View file

@ -0,0 +1,21 @@
from pydantic import BaseModel
from typing import Optional
class TTSRequest(BaseModel):
text: str
voice: str = "af" # Default voice
local: bool = False # Whether to save file locally or return bytes
class TTSResponse(BaseModel):
request_id: int
status: str
output_file: Optional[str] = None # Path for local file
processing_time: Optional[float] = None # Processing time in seconds
class VoicesResponse(BaseModel):
voices: list[str]
default: str

View file

@ -0,0 +1,3 @@
from .tts import router as tts_router
__all__ = ["tts_router"]

84
api/src/routers/tts.py Normal file
View file

@ -0,0 +1,84 @@
import os
from fastapi import APIRouter, HTTPException, Response
from ..models.schemas import TTSRequest, TTSResponse, VoicesResponse
from ..services.tts import TTSService
router = APIRouter(
prefix="/tts",
tags=["TTS"],
responses={404: {"description": "Not found"}},
)
# Initialize TTS service
tts_service = TTSService()
@router.get("/voices", response_model=VoicesResponse)
async def get_voices():
"""List all available voices"""
voices = tts_service.list_voices()
return {"voices": voices, "default": "af"}
@router.post("", response_model=TTSResponse)
async def create_tts(request: TTSRequest):
"""Submit text for TTS generation"""
# Validate voice exists
voices = tts_service.list_voices()
if request.voice not in voices:
raise HTTPException(
status_code=400,
detail=f"Voice '{request.voice}' not found. Available voices: {voices}",
)
# Queue the request
request_id = tts_service.create_tts_request(request.text, request.voice)
return {
"request_id": request_id,
"status": "pending",
"output_file": None,
"processing_time": None,
}
@router.get("/{request_id}", response_model=TTSResponse)
async def get_status(request_id: int):
"""Check the status of a TTS request"""
status = tts_service.get_request_status(request_id)
if not status:
raise HTTPException(status_code=404, detail="Request not found")
status_str, output_file, processing_time = status
return {
"request_id": request_id,
"status": status_str,
"output_file": output_file,
"processing_time": processing_time,
}
@router.get("/file/{request_id}")
async def get_file(request_id: int):
"""Download the generated audio file"""
status = tts_service.get_request_status(request_id)
if not status:
raise HTTPException(status_code=404, detail="Request not found")
status_str, output_file, _ = status
if status_str != "completed":
raise HTTPException(status_code=400, detail="Audio generation not complete")
if not output_file or not os.path.exists(output_file):
raise HTTPException(status_code=404, detail="Audio file not found")
# Read file and ensure it's closed after
with open(output_file, "rb") as f:
content = f.read()
return Response(
content=content,
media_type="audio/wav",
headers={
"Content-Disposition": f"attachment; filename=speech_{request_id}.wav"
},
)

View file

@ -0,0 +1,3 @@
from .tts import TTSService, TTSModel
__all__ = ["TTSService", "TTSModel"]

135
api/src/services/tts.py Normal file
View file

@ -0,0 +1,135 @@
import os
import threading
import time
import io
from typing import Optional, Tuple
import torch
import scipy.io.wavfile as wavfile
from models import build_model
from kokoro import generate
from ..database.queue import QueueDB
class TTSModel:
_instance = None
_lock = threading.Lock()
_voicepacks = {}
@classmethod
def get_instance(cls):
if cls._instance is None:
with cls._lock:
if cls._instance is None:
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Initializing model on {device}")
model = build_model("kokoro-v0_19.pth", device)
cls._instance = (model, device)
return cls._instance
@classmethod
def get_voicepack(cls, voice_name: str) -> torch.Tensor:
model, device = cls.get_instance()
if voice_name not in cls._voicepacks:
try:
voicepack = torch.load(
f"voices/{voice_name}.pt", map_location=device, weights_only=True
)
cls._voicepacks[voice_name] = voicepack
except Exception as e:
print(f"Error loading voice {voice_name}: {str(e)}")
if voice_name != "af":
return cls.get_voicepack("af")
raise
return cls._voicepacks[voice_name]
class TTSService:
def __init__(self, output_dir: str = None):
if output_dir is None:
output_dir = os.path.join(os.path.dirname(__file__), "..", "output")
self.output_dir = output_dir
self.db = QueueDB()
os.makedirs(output_dir, exist_ok=True)
self._start_worker()
def _start_worker(self):
"""Start the background worker thread"""
self.worker = threading.Thread(target=self._process_queue, daemon=True)
self.worker.start()
def _generate_audio(self, text: str, voice: str) -> Tuple[torch.Tensor, float]:
"""Generate audio and measure processing time"""
start_time = time.time()
# Get model instance and voicepack
model, device = TTSModel.get_instance()
voicepack = TTSModel.get_voicepack(voice)
# Generate audio
audio, _ = generate(model, text, voicepack, lang=voice[0])
processing_time = time.time() - start_time
return audio, processing_time
def _save_audio(self, audio: torch.Tensor, filepath: str):
"""Save audio to file"""
os.makedirs(os.path.dirname(filepath), exist_ok=True)
wavfile.write(filepath, 24000, audio)
def _audio_to_bytes(self, audio: torch.Tensor) -> bytes:
"""Convert audio tensor to WAV bytes"""
buffer = io.BytesIO()
wavfile.write(buffer, 24000, audio)
return buffer.getvalue()
def _process_queue(self):
"""Background worker that processes the queue"""
while True:
next_request = self.db.get_next_pending()
if next_request:
request_id, text, voice = next_request
try:
# Generate audio and measure time
audio, processing_time = self._generate_audio(text, voice)
# Save to file
output_file = os.path.join(
self.output_dir, f"speech_{request_id}.wav"
)
self._save_audio(audio, output_file)
# Update status with processing time
self.db.update_status(
request_id,
"completed",
output_file=output_file,
processing_time=processing_time,
)
except Exception as e:
print(f"Error processing request {request_id}: {str(e)}")
self.db.update_status(request_id, "failed")
time.sleep(1) # Prevent busy waiting
def list_voices(self) -> list[str]:
"""List all available voices"""
voices = []
try:
for file in os.listdir("voices"):
if file.endswith(".pt"):
voice_name = file[:-3] # Remove .pt extension
voices.append(voice_name)
except Exception as e:
print(f"Error listing voices: {str(e)}")
return voices
def create_tts_request(self, text: str, voice: str = "af") -> int:
"""Create a new TTS request and return the request ID"""
return self.db.add_request(text, voice)
def get_request_status(
self, request_id: int
) -> Optional[Tuple[str, Optional[str], Optional[float]]]:
"""Get the status, output file path, and processing time for a request"""
return self.db.get_status(request_id)

17
docker-compose.yml Normal file
View file

@ -0,0 +1,17 @@
services:
kokoro-tts:
build:
context: .
args:
- KOKORO_REPO=https://huggingface.co/hexgrad/Kokoro-82M
volumes:
- ./api:/app/api
ports:
- "8880:8880"
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]

View file

@ -0,0 +1,80 @@
[
{
"char_length": 100,
"tokens": 25,
"processing_time": 0.3308234214782715,
"output_length": 6.95
},
{
"char_length": 250,
"tokens": 59,
"processing_time": 0.6096279621124268,
"output_length": 17.0
},
{
"char_length": 500,
"tokens": 110,
"processing_time": 0.9121463298797607,
"output_length": 32.325
},
{
"char_length": 750,
"tokens": 170,
"processing_time": 1.5152575969696045,
"output_length": 47.375
},
{
"char_length": 1000,
"tokens": 220,
"processing_time": 1.9711074829101562,
"output_length": 61.575
},
{
"char_length": 1500,
"tokens": 329,
"processing_time": 2.958775043487549,
"output_length": 90.85
},
{
"char_length": 2000,
"tokens": 450,
"processing_time": 4.669129133224487,
"output_length": 120.625
},
{
"char_length": 3000,
"tokens": 655,
"processing_time": 6.0434815883636475,
"output_length": 176.975
},
{
"char_length": 4000,
"tokens": 855,
"processing_time": 9.574363470077515,
"output_length": 238.925
},
{
"char_length": 5000,
"tokens": 1078,
"processing_time": 9.906641483306885,
"output_length": 299.3
},
{
"char_length": 6000,
"tokens": 1298,
"processing_time": 11.334998607635498,
"output_length": 361.625
},
{
"char_length": 7000,
"tokens": 1512,
"processing_time": 13.87867283821106,
"output_length": 416.675
},
{
"char_length": 9783,
"tokens": 2139,
"processing_time": 19.85267996788025,
"output_length": 582.85
}
]

133
examples/benchmark_tts.py Normal file
View file

@ -0,0 +1,133 @@
import os
import time
import json
import scipy.io.wavfile as wavfile
import tiktoken
import requests
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from concurrent.futures import ThreadPoolExecutor, TimeoutError
# Initialize tokenizer
enc = tiktoken.get_encoding("cl100k_base")
def count_tokens(text: str) -> int:
"""Count tokens in text using tiktoken"""
return len(enc.encode(text))
def get_audio_length(filepath: str) -> float:
"""Get audio length in seconds"""
# Convert API path to local path
local_path = filepath.replace('/app/api/src/output', 'api/src/output')
if not os.path.exists(local_path):
raise FileNotFoundError(f"Audio file not found at {local_path} (from {filepath})")
rate, data = wavfile.read(local_path)
return len(data) / rate
def make_tts_request(text: str, timeout: int = 120) -> tuple[float, float]:
"""Make TTS request and return processing time and output length"""
try:
# Submit request
response = requests.post(
'http://localhost:8880/tts',
json={'text': text},
timeout=timeout
)
request_id = response.json()['request_id']
# Poll until complete
start_time = time.time()
while True:
status_response = requests.get(
f'http://localhost:8880/tts/{request_id}',
timeout=timeout
)
status = status_response.json()
if status['status'] == 'completed':
# Convert Docker path to local path
docker_path = status['output_file']
filename = os.path.basename(docker_path) # Get just the filename
local_path = os.path.join('api/src/output', filename) # Construct local path
try:
audio_length = get_audio_length(local_path)
return status['processing_time'], audio_length
except Exception as e:
print(f"Error reading audio file: {str(e)}")
return None, None
if time.time() - start_time > timeout:
raise TimeoutError()
time.sleep(0.5)
except (requests.exceptions.Timeout, TimeoutError):
print(f"Request timed out for text: {text[:50]}...")
return None, None
except Exception as e:
print(f"Error processing text: {text[:50]}... Error: {str(e)}")
return None, None
def main():
# Create output directory
os.makedirs('examples/output', exist_ok=True)
# Read input text
with open('examples/the_time_machine_hg_wells.txt', 'r', encoding='utf-8') as f:
text = f.read()
# Create range of sizes up to full text
sizes = [100, 250, 500, 750, 1000, 1500, 2000, 3000, 4000, 5000, 6000, 7000, len(text)]
# Process chunks
results = []
for size in sizes:
# Get chunk and count tokens
chunk = text[:size]
num_tokens = count_tokens(chunk)
print(f"\nProcessing chunk with {num_tokens} tokens ({size} chars):")
print(f"Text preview: {chunk[:100]}...")
processing_time, audio_length = make_tts_request(chunk)
if processing_time is not None:
results.append({
'char_length': size,
'tokens': num_tokens,
'processing_time': processing_time,
'output_length': audio_length
})
with open('examples/benchmark_results.json', 'w') as f:
json.dump(results, f, indent=2)
# Create DataFrame for plotting
df = pd.DataFrame(results)
# Plot 1: Processing Time vs Output Length
plt.figure(figsize=(12, 8))
sns.scatterplot(data=df, x='output_length', y='processing_time')
sns.regplot(data=df, x='output_length', y='processing_time', scatter=False)
plt.title('Processing Time vs Output Length')
plt.xlabel('Output Audio Length (seconds)')
plt.ylabel('Processing Time (seconds)')
plt.savefig('examples/time_vs_output.png', dpi=300, bbox_inches='tight')
plt.close()
# Plot 2: Processing Time vs Token Count
plt.figure(figsize=(12, 8))
sns.scatterplot(data=df, x='tokens', y='processing_time')
sns.regplot(data=df, x='tokens', y='processing_time', scatter=False)
plt.title('Processing Time vs Token Count')
plt.xlabel('Number of Input Tokens')
plt.ylabel('Processing Time (seconds)')
plt.savefig('examples/time_vs_tokens.png', dpi=300, bbox_inches='tight')
plt.close()
print("\nResults saved to examples/benchmark_results.json")
print("Plots saved as time_vs_output.png and time_vs_tokens.png")
if __name__ == '__main__':
main()

187
examples/test_tts.py Normal file
View file

@ -0,0 +1,187 @@
#!/usr/bin/env python3
import argparse
import requests
import time
import sys
import os
from typing import Optional, Tuple
def get_voices(
base_url: str = "http://localhost:8880",
) -> Optional[Tuple[list[str], str]]:
"""Get list of available voices and default voice"""
try:
response = requests.get(f"{base_url}/tts/voices")
if response.status_code == 200:
data = response.json()
return data["voices"], data["default"]
except requests.exceptions.RequestException as e:
print(f"Error getting voices: {e}")
return None
def submit_tts_request(
text: str, voice: Optional[str] = None, base_url: str = "http://localhost:8880"
) -> Optional[int]:
"""Submit a TTS request and return the request ID"""
try:
payload = {"text": text, "voice": voice} if voice else {"text": text}
response = requests.post(f"{base_url}/tts", json=payload)
if response.status_code != 200:
print(f"Error submitting request: {response.text}")
return None
return response.json()["request_id"]
except requests.exceptions.RequestException as e:
print(f"Error: {e}")
return None
def check_request_status(
request_id: int, base_url: str = "http://localhost:8880"
) -> Optional[dict]:
"""Check the status of a request"""
try:
response = requests.get(f"{base_url}/tts/{request_id}")
if response.status_code != 200:
print(f"Error checking status: {response.text}")
return None
return response.json()
except requests.exceptions.RequestException as e:
print(f"Error: {e}")
return None
def download_audio(
request_id: int, base_url: str = "http://localhost:8880"
) -> Optional[str]:
"""Download and save the generated audio file. Returns the filepath if successful."""
try:
response = requests.get(f"{base_url}/tts/file/{request_id}")
if response.status_code != 200:
print("Error downloading file")
return None
filename = (
response.headers.get("content-disposition", "")
.split("filename=")[-1]
.strip('"')
)
if not filename:
filename = f"speech_{request_id}.wav"
filepath = os.path.join(os.path.dirname(__file__), "output", filename)
os.makedirs(os.path.dirname(filepath), exist_ok=True)
with open(filepath, "wb") as f:
f.write(response.content)
return filepath
except requests.exceptions.RequestException as e:
print(f"Error: {e}")
return None
def generate_speech(
text: str,
voice: Optional[str] = None,
base_url: str = "http://localhost:8880",
download: bool = True,
) -> bool:
"""Generate speech from text"""
# Submit request
print("Submitting request...")
request_id = submit_tts_request(text, voice, base_url)
if not request_id:
return False
print(f"Request submitted (ID: {request_id})")
# Poll for completion
while True:
status = check_request_status(request_id, base_url)
if not status:
return False
if status["status"] == "completed":
print("Generation complete!")
if status["processing_time"]:
print(f"Processing time: {status['processing_time']:.2f}s")
# Show output file path (clean up any relative path components)
output_file = status["output_file"]
if output_file:
output_file = os.path.normpath(output_file)
print(f"Output file: {output_file}")
# Download if requested
if download:
print("Downloading file...")
filepath = download_audio(request_id, base_url)
if filepath:
print(f"Saved to: {filepath}")
return True
return False
return True
elif status["status"] == "failed":
print("Generation failed")
return False
print(".", end="", flush=True)
time.sleep(1)
def list_available_voices(url: str):
"""List all available voices"""
voices = get_voices(url)
if voices:
voices_list, default_voice = voices
print("Available voices:")
for voice in voices_list:
if voice == default_voice:
print(f" {voice} (default)")
else:
print(f" {voice}")
else:
print("Error getting voices")
def main():
parser = argparse.ArgumentParser(description="Kokoro TTS CLI")
parser.add_argument("text", nargs="?", help="Text to convert to speech")
parser.add_argument("--voice", help="Voice to use")
parser.add_argument("--url", default="http://localhost:8880", help="API base URL")
parser.add_argument("--debug", action="store_true", help="Enable debug logging")
parser.add_argument(
"--no-download",
action="store_true",
help="Don't download the file, just show the filepath",
)
args = parser.parse_args()
if args.debug:
print(f"Debug: Arguments received: {args}")
# If no text provided, just list voices
if not args.text:
list_available_voices(args.url)
return
# Generate speech
print(f"Generating speech for: {args.text}")
if args.voice:
print(f"Using voice: {args.voice}")
if args.debug:
print(
f"Debug: Calling generate_speech with text='{args.text}', voice='{args.voice}'"
)
success = generate_speech(
args.text, args.voice, args.url, download=not args.no_download
)
if not success:
sys.exit(1)
if __name__ == "__main__":
main()

View file

@ -0,0 +1,104 @@
The Time Traveller (for so it will be convenient to speak of him) was expounding a recondite matter to us. His pale grey eyes shone and twinkled, and his usually pale face was flushed and animated. The fire burnt brightly, and the soft radiance of the incandescent lights in the lilies of silver caught the bubbles that flashed and passed in our glasses. Our chairs, being his patents, embraced and caressed us rather than submitted to be sat upon, and there was that luxurious after-dinner atmosphere, when thought runs gracefully free of the trammels of precision. And he put it to us in this way—marking the points with a lean forefinger—as we sat and lazily admired his earnestness over this new paradox (as we thought it) and his fecundity.
“You must follow me carefully. I shall have to controvert one or two ideas that are almost universally accepted. The geometry, for instance, they taught you at school is founded on a misconception.”
“Is not that rather a large thing to expect us to begin upon?” said Filby, an argumentative person with red hair.
“I do not mean to ask you to accept anything without reasonable ground for it. You will soon admit as much as I need from you. You know of course that a mathematical line, a line of thickness nil, has no real existence. They taught you that? Neither has a mathematical plane. These things are mere abstractions.”
“That is all right,” said the Psychologist.
“Nor, having only length, breadth, and thickness, can a cube have a real existence.”
“There I object,” said Filby. “Of course a solid body may exist. All real things—”
“So most people think. But wait a moment. Can an instantaneous cube exist?”
“Dont follow you,” said Filby.
“Can a cube that does not last for any time at all, have a real existence?”
Filby became pensive. “Clearly,” the Time Traveller proceeded, “any real body must have extension in four directions: it must have Length, Breadth, Thickness, and—Duration. But through a natural infirmity of the flesh, which I will explain to you in a moment, we incline to overlook this fact. There are really four dimensions, three which we call the three planes of Space, and a fourth, Time. There is, however, a tendency to draw an unreal distinction between the former three dimensions and the latter, because it happens that our consciousness moves intermittently in one direction along the latter from the beginning to the end of our lives.”
“That,” said a very young man, making spasmodic efforts to relight his cigar over the lamp; “that . . . very clear indeed.”
“Now, it is very remarkable that this is so extensively overlooked,” continued the Time Traveller, with a slight accession of cheerfulness. “Really this is what is meant by the Fourth Dimension, though some people who talk about the Fourth Dimension do not know they mean it. It is only another way of looking at Time. There is no difference between Time and any of the three dimensions of Space except that our consciousness moves along it. But some foolish people have got hold of the wrong side of that idea. You have all heard what they have to say about this Fourth Dimension?”
“I have not,” said the Provincial Mayor.
“It is simply this. That Space, as our mathematicians have it, is spoken of as having three dimensions, which one may call Length, Breadth, and Thickness, and is always definable by reference to three planes, each at right angles to the others. But some philosophical people have been asking why three dimensions particularly—why not another direction at right angles to the other three?—and have even tried to construct a Four-Dimensional geometry. Professor Simon Newcomb was expounding this to the New York Mathematical Society only a month or so ago. You know how on a flat surface, which has only two dimensions, we can represent a figure of a three-dimensional solid, and similarly they think that by models of three dimensions they could represent one of four—if they could master the perspective of the thing. See?”
“I think so,” murmured the Provincial Mayor; and, knitting his brows, he lapsed into an introspective state, his lips moving as one who repeats mystic words. “Yes, I think I see it now,” he said after some time, brightening in a quite transitory manner.
“Well, I do not mind telling you I have been at work upon this geometry of Four Dimensions for some time. Some of my results are curious. For instance, here is a portrait of a man at eight years old, another at fifteen, another at seventeen, another at twenty-three, and so on. All these are evidently sections, as it were, Three-Dimensional representations of his Four-Dimensioned being, which is a fixed and unalterable thing.
“Scientific people,” proceeded the Time Traveller, after the pause required for the proper assimilation of this, “know very well that Time is only a kind of Space. Here is a popular scientific diagram, a weather record. This line I trace with my finger shows the movement of the barometer. Yesterday it was so high, yesterday night it fell, then this morning it rose again, and so gently upward to here. Surely the mercury did not trace this line in any of the dimensions of Space generally recognised? But certainly it traced such a line, and that line, therefore, we must conclude, was along the Time-Dimension.”
“But,” said the Medical Man, staring hard at a coal in the fire, “if Time is really only a fourth dimension of Space, why is it, and why has it always been, regarded as something different? And why cannot we move in Time as we move about in the other dimensions of Space?”
The Time Traveller smiled. “Are you so sure we can move freely in Space? Right and left we can go, backward and forward freely enough, and men always have done so. I admit we move freely in two dimensions. But how about up and down? Gravitation limits us there.”
“Not exactly,” said the Medical Man. “There are balloons.”
“But before the balloons, save for spasmodic jumping and the inequalities of the surface, man had no freedom of vertical movement.”
“Still they could move a little up and down,” said the Medical Man.
“Easier, far easier down than up.”
“And you cannot move at all in Time, you cannot get away from the present moment.”
“My dear sir, that is just where you are wrong. That is just where the whole world has gone wrong. We are always getting away from the present moment. Our mental existences, which are immaterial and have no dimensions, are passing along the Time-Dimension with a uniform velocity from the cradle to the grave. Just as we should travel down if we began our existence fifty miles above the earths surface.”
“But the great difficulty is this,” interrupted the Psychologist. You can move about in all directions of Space, but you cannot move about in Time.”
“That is the germ of my great discovery. But you are wrong to say that we cannot move about in Time. For instance, if I am recalling an incident very vividly I go back to the instant of its occurrence: I become absent-minded, as you say. I jump back for a moment. Of course we have no means of staying back for any length of Time, any more than a savage or an animal has of staying six feet above the ground. But a civilised man is better off than the savage in this respect. He can go up against gravitation in a balloon, and why should he not hope that ultimately he may be able to stop or accelerate his drift along the Time-Dimension, or even turn about and travel the other way?”
“Oh, this,” began Filby, “is all—”
“Why not?” said the Time Traveller.
“Its against reason,” said Filby.
“What reason?” said the Time Traveller.
“You can show black is white by argument,” said Filby, “but you will never convince me.”
“Possibly not,” said the Time Traveller. “But now you begin to see the object of my investigations into the geometry of Four Dimensions. Long ago I had a vague inkling of a machine—”
“To travel through Time!” exclaimed the Very Young Man.
“That shall travel indifferently in any direction of Space and Time, as the driver determines.”
Filby contented himself with laughter.
“But I have experimental verification,” said the Time Traveller.
“It would be remarkably convenient for the historian,” the Psychologist suggested. “One might travel back and verify the accepted account of the Battle of Hastings, for instance!”
“Dont you think you would attract attention?” said the Medical Man. “Our ancestors had no great tolerance for anachronisms.”
“One might get ones Greek from the very lips of Homer and Plato,” the Very Young Man thought.
“In which case they would certainly plough you for the Little-go. The German scholars have improved Greek so much.”
“Then there is the future,” said the Very Young Man. “Just think! One might invest all ones money, leave it to accumulate at interest, and hurry on ahead!”
“To discover a society,” said I, “erected on a strictly communistic basis.”
“Of all the wild extravagant theories!” began the Psychologist.
“Yes, so it seemed to me, and so I never talked of it until—”
“Experimental verification!” cried I. “You are going to verify that?”
“The experiment!” cried Filby, who was getting brain-weary.
“Lets see your experiment anyhow,” said the Psychologist, “though its all humbug, you know.”
The Time Traveller smiled round at us. Then, still smiling faintly, and with his hands deep in his trousers pockets, he walked slowly out of the room, and we heard his slippers shuffling down the long passage to his laboratory.
The Psychologist looked at us. “I wonder what hes got?”
“Some sleight-of-hand trick or other,” said the Medical Man, and Filby tried to tell us about a conjuror he had seen at Burslem, but before he had finished his preface the Time Traveller came back, and Filbys anecdote collapsed.

BIN
examples/time_vs_output.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 184 KiB

BIN
examples/time_vs_tokens.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 174 KiB