Add initial implementation of Kokoro TTS API with Docker GPU support

- Set up FastAPI application with TTS service - Define API endpoints for TTS generation and voice listing - Implement Pydantic models for request and response schemas - Add Dockerfile and docker-compose.yml for containerization - Include example usage and benchmark results in README
2025-09-18 21:39:23 +00:00 · 2024-12-30 04:17:50 -07:00 · 2024-12-30 04:17:50 -07:00 · ce0ef3534a
commit ce0ef3534a
21 changed files with 1099 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,15 @@
+
+output/
+
+
+*.db
+*.pyc
+*.pth
+
+Kokoro-82M/*
+__pycache__/
+.vscode/
+env/
+.Python
+
+
--- a/52
+++ b/52
@ -0,0 +1,52 @@
+FROM nvidia/cuda:12.1.0-base-ubuntu22.04
+
+ARG KOKORO_REPO
+
+# Install base system dependencies
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    python3-pip \
+    python3-dev \
+    espeak-ng \
+    git \
+    && apt-get clean \
+    && rm -rf /var/lib/apt/lists/*
+
+# Install heavy Python dependencies first (better layer caching)
+RUN pip3 install --no-cache-dir \
+    phonemizer \
+    torch \
+    transformers \
+    scipy \
+    munch
+
+# Install API dependencies
+RUN pip3 install --no-cache-dir fastapi uvicorn pydantic-settings
+
+# Set working directory
+WORKDIR /app
+
+# --(can skip if pre-cloning the repo)--
+# Install git-lfs 
+RUN apt-get update && apt-get install -y git-lfs \
+    && apt-get clean \
+    && rm -rf /var/lib/apt/lists/* \
+    && git lfs install
+
+# Clone Kokoro repo
+RUN git clone ${KOKORO_REPO} .
+# --------------------------------------
+    
+# Create output directory
+RUN mkdir -p output
+
+# Run with Python unbuffered output for live logging
+ENV PYTHONUNBUFFERED=1
+
+# Copy API files over
+COPY api/src /app/api/src
+
+# Set Python path
+ENV PYTHONPATH=/app
+
+# Run FastAPI server
+CMD ["uvicorn", "api.src.main:app", "--host", "0.0.0.0", "--port", "8880"]
--- a/README.md
+++ b/README.md
@ -0,0 +1,50 @@
+# Kokoro TTS API
+
+FastAPI wrapper for Kokoro TTS with voice cloning. Runs inference on GPU.
+
+## Quick Start
+
+```bash
+# Start the API (will automatically download model on first run)
+docker compose up --build
+```
+
+```bash
+# From host terminal, test it out with some API calls
+python examples/test_tts.py "Hello world" --voice af_bella
+```
+## API Endpoints
+
+```bash
+GET /tts/voices           # List voices
+POST /tts                 # Generate speech
+GET /tts/{request_id}     # Check status
+GET /tts/file/{request_id} # Get audio file
+```
+
+## Example Usage
+
+List voices:
+```bash
+python examples/test_tts.py
+```
+
+Generate speech:
+```bash
+# Default voice
+python examples/test_tts.py "Your text here"
+
+# Specific voice
+python examples/test_tts.py --voice af_bella "Your text here"
+
+# Just get file path (no download)
+python examples/test_tts.py --no-download "Your text here"
+```
+
+Generated files in `examples/output/` (or in src/output/ of API if --no-download)
+
+## Requirements
+
+- Docker
+- NVIDIA GPU + CUDA
+- nvidia-container-toolkit
--- a/api/src/core/init.py
+++ b/api/src/core/init.py
@ -0,0 +1,3 @@
+from .config import settings
+
+__all__ = ["settings"]
--- a/api/src/core/config.py
+++ b/api/src/core/config.py
@ -0,0 +1,23 @@
+from pydantic_settings import BaseSettings
+
+
+class Settings(BaseSettings):
+    # API Settings
+    api_title: str = "Kokoro TTS API"
+    api_description: str = "API for text-to-speech generation using Kokoro"
+    api_version: str = "1.0.0"
+    host: str = "0.0.0.0"
+    port: int = 8880
+
+    # TTS Settings
+    output_dir: str = "output"
+    default_voice: str = "af"
+    model_path: str = "kokoro-v0_19.pth"
+    voices_dir: str = "voices"
+    sample_rate: int = 24000
+
+    class Config:
+        env_file = ".env"
+
+
+settings = Settings()
--- a/api/src/database/init.py
+++ b/api/src/database/init.py
@ -0,0 +1,3 @@
+from .queue import QueueDB
+
+__all__ = ["QueueDB"]
--- a/api/src/database/queue.py
+++ b/api/src/database/queue.py
@ -0,0 +1,147 @@
+import sqlite3
+import os
+from pathlib import Path
+from typing import Optional, Tuple
+
+DB_PATH = Path(__file__).parent.parent / "output" / "queue.db"
+
+
+class QueueDB:
+    def __init__(self, db_path: str = str(DB_PATH)):
+        self.db_path = db_path
+        os.makedirs(os.path.dirname(db_path), exist_ok=True)
+        self._init_db()
+
+    def _init_db(self):
+        """Initialize the database with required tables"""
+        conn = sqlite3.connect(self.db_path)
+        c = conn.cursor()
+        c.execute("""
+            CREATE TABLE IF NOT EXISTS tts_queue
+            (id INTEGER PRIMARY KEY AUTOINCREMENT,
+             text TEXT NOT NULL,
+             voice TEXT DEFAULT 'af',
+             status TEXT DEFAULT 'pending',
+             output_file TEXT,
+             processing_time REAL,
+             created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP)
+        """)
+        conn.commit()
+        conn.close()
+
+    def _ensure_table_if_needed(self, conn: sqlite3.Connection):
+        """Create table if it doesn't exist, only called for write operations"""
+        c = conn.cursor()
+        c.execute("""
+            CREATE TABLE IF NOT EXISTS tts_queue
+            (id INTEGER PRIMARY KEY AUTOINCREMENT,
+             text TEXT NOT NULL,
+             voice TEXT DEFAULT 'af',
+             status TEXT DEFAULT 'pending',
+             output_file TEXT,
+             processing_time REAL,
+             created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP)
+        """)
+        conn.commit()
+
+    def add_request(self, text: str, voice: str) -> int:
+        """Add a new TTS request to the queue"""
+        conn = sqlite3.connect(self.db_path)
+        try:
+            c = conn.cursor()
+            c.execute(
+                "INSERT INTO tts_queue (text, voice) VALUES (?, ?)", (text, voice)
+            )
+            request_id = c.lastrowid
+            conn.commit()
+            return request_id
+        except sqlite3.OperationalError:  # Table doesn't exist
+            self._ensure_table_if_needed(conn)
+            c = conn.cursor()
+            c.execute(
+                "INSERT INTO tts_queue (text, voice) VALUES (?, ?)", (text, voice)
+            )
+            request_id = c.lastrowid
+            conn.commit()
+            return request_id
+        finally:
+            conn.close()
+
+    def get_next_pending(self) -> Optional[Tuple[int, str, str]]:
+        """Get the next pending request"""
+        conn = sqlite3.connect(self.db_path)
+        try:
+            c = conn.cursor()
+            c.execute(
+                'SELECT id, text, voice FROM tts_queue WHERE status = "pending" ORDER BY created_at ASC LIMIT 1'
+            )
+            return c.fetchone()
+        except sqlite3.OperationalError:  # Table doesn't exist
+            return None
+        finally:
+            conn.close()
+
+    def update_status(
+        self,
+        request_id: int,
+        status: str,
+        output_file: Optional[str] = None,
+        processing_time: Optional[float] = None,
+    ):
+        """Update request status, output file, and processing time"""
+        conn = sqlite3.connect(self.db_path)
+        try:
+            c = conn.cursor()
+            if output_file and processing_time is not None:
+                c.execute(
+                    "UPDATE tts_queue SET status = ?, output_file = ?, processing_time = ? WHERE id = ?",
+                    (status, output_file, processing_time, request_id),
+                )
+            elif output_file:
+                c.execute(
+                    "UPDATE tts_queue SET status = ?, output_file = ? WHERE id = ?",
+                    (status, output_file, request_id),
+                )
+            else:
+                c.execute(
+                    "UPDATE tts_queue SET status = ? WHERE id = ?", (status, request_id)
+                )
+            conn.commit()
+        except sqlite3.OperationalError:  # Table doesn't exist
+            self._ensure_table_if_needed(conn)
+            # Retry the update
+            c = conn.cursor()
+            if output_file and processing_time is not None:
+                c.execute(
+                    "UPDATE tts_queue SET status = ?, output_file = ?, processing_time = ? WHERE id = ?",
+                    (status, output_file, processing_time, request_id),
+                )
+            elif output_file:
+                c.execute(
+                    "UPDATE tts_queue SET status = ?, output_file = ? WHERE id = ?",
+                    (status, output_file, request_id),
+                )
+            else:
+                c.execute(
+                    "UPDATE tts_queue SET status = ? WHERE id = ?", (status, request_id)
+                )
+            conn.commit()
+        finally:
+            conn.close()
+
+    def get_status(
+        self, request_id: int
+    ) -> Optional[Tuple[str, Optional[str], Optional[float]]]:
+        """Get status, output file, and processing time for a request"""
+        conn = sqlite3.connect(self.db_path)
+        try:
+            c = conn.cursor()
+            c.execute(
+                "SELECT status, output_file, processing_time FROM tts_queue WHERE id = ?",
+                (request_id,),
+            )
+            return c.fetchone()
+        except sqlite3.OperationalError:  # Table doesn't exist
+            return None
+        finally:
+            conn.close()
--- a/api/src/main.py
+++ b/api/src/main.py
@ -0,0 +1,36 @@
+import uvicorn
+from fastapi import FastAPI
+from fastapi.middleware.cors import CORSMiddleware
+
+from .core.config import settings
+from .routers import tts_router
+
+# Initialize FastAPI app
+app = FastAPI(
+    title=settings.api_title,
+    description=settings.api_description,
+    version=settings.api_version,
+)
+
+# Add CORS middleware
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+
+# Include routers
+app.include_router(tts_router)
+
+
+# Health check endpoint
+@app.get("/health")
+async def health_check():
+    """Health check endpoint"""
+    return {"status": "healthy"}
+
+
+if __name__ == "__main__":
+    uvicorn.run("api.src.main:app", host=settings.host, port=settings.port, reload=True)
--- a/api/src/models/init.py
+++ b/api/src/models/init.py
@ -0,0 +1,3 @@
+from .schemas import TTSRequest, TTSResponse, VoicesResponse
+
+__all__ = ["TTSRequest", "TTSResponse", "VoicesResponse"]
--- a/api/src/models/schemas.py
+++ b/api/src/models/schemas.py
@ -0,0 +1,21 @@
+from pydantic import BaseModel
+
+from typing import Optional
+
+
+class TTSRequest(BaseModel):
+    text: str
+    voice: str = "af"  # Default voice
+    local: bool = False  # Whether to save file locally or return bytes
+
+
+class TTSResponse(BaseModel):
+    request_id: int
+    status: str
+    output_file: Optional[str] = None  # Path for local file
+    processing_time: Optional[float] = None  # Processing time in seconds
+
+
+class VoicesResponse(BaseModel):
+    voices: list[str]
+    default: str
--- a/api/src/routers/init.py
+++ b/api/src/routers/init.py
@ -0,0 +1,3 @@
+from .tts import router as tts_router
+
+__all__ = ["tts_router"]
--- a/api/src/routers/tts.py
+++ b/api/src/routers/tts.py
@ -0,0 +1,84 @@
+import os
+from fastapi import APIRouter, HTTPException, Response
+from ..models.schemas import TTSRequest, TTSResponse, VoicesResponse
+from ..services.tts import TTSService
+
+router = APIRouter(
+    prefix="/tts",
+    tags=["TTS"],
+    responses={404: {"description": "Not found"}},
+)
+
+# Initialize TTS service
+tts_service = TTSService()
+
+
+@router.get("/voices", response_model=VoicesResponse)
+async def get_voices():
+    """List all available voices"""
+    voices = tts_service.list_voices()
+    return {"voices": voices, "default": "af"}
+
+
+@router.post("", response_model=TTSResponse)
+async def create_tts(request: TTSRequest):
+    """Submit text for TTS generation"""
+    # Validate voice exists
+    voices = tts_service.list_voices()
+    if request.voice not in voices:
+        raise HTTPException(
+            status_code=400,
+            detail=f"Voice '{request.voice}' not found. Available voices: {voices}",
+        )
+
+    # Queue the request
+    request_id = tts_service.create_tts_request(request.text, request.voice)
+    return {
+        "request_id": request_id,
+        "status": "pending",
+        "output_file": None,
+        "processing_time": None,
+    }
+
+
+@router.get("/{request_id}", response_model=TTSResponse)
+async def get_status(request_id: int):
+    """Check the status of a TTS request"""
+    status = tts_service.get_request_status(request_id)
+    if not status:
+        raise HTTPException(status_code=404, detail="Request not found")
+
+    status_str, output_file, processing_time = status
+    return {
+        "request_id": request_id,
+        "status": status_str,
+        "output_file": output_file,
+        "processing_time": processing_time,
+    }
+
+
+@router.get("/file/{request_id}")
+async def get_file(request_id: int):
+    """Download the generated audio file"""
+    status = tts_service.get_request_status(request_id)
+    if not status:
+        raise HTTPException(status_code=404, detail="Request not found")
+
+    status_str, output_file, _ = status
+    if status_str != "completed":
+        raise HTTPException(status_code=400, detail="Audio generation not complete")
+
+    if not output_file or not os.path.exists(output_file):
+        raise HTTPException(status_code=404, detail="Audio file not found")
+
+    # Read file and ensure it's closed after
+    with open(output_file, "rb") as f:
+        content = f.read()
+
+    return Response(
+        content=content,
+        media_type="audio/wav",
+        headers={
+            "Content-Disposition": f"attachment; filename=speech_{request_id}.wav"
+        },
+    )
--- a/api/src/services/init.py
+++ b/api/src/services/init.py
@ -0,0 +1,3 @@
+from .tts import TTSService, TTSModel
+
+__all__ = ["TTSService", "TTSModel"]
--- a/api/src/services/tts.py
+++ b/api/src/services/tts.py
@ -0,0 +1,135 @@
+import os
+import threading
+import time
+import io
+from typing import Optional, Tuple
+import torch
+import scipy.io.wavfile as wavfile
+from models import build_model
+from kokoro import generate
+from ..database.queue import QueueDB
+
+
+class TTSModel:
+    _instance = None
+    _lock = threading.Lock()
+    _voicepacks = {}
+
+    @classmethod
+    def get_instance(cls):
+        if cls._instance is None:
+            with cls._lock:
+                if cls._instance is None:
+                    device = "cuda" if torch.cuda.is_available() else "cpu"
+                    print(f"Initializing model on {device}")
+                    model = build_model("kokoro-v0_19.pth", device)
+                    cls._instance = (model, device)
+        return cls._instance
+
+    @classmethod
+    def get_voicepack(cls, voice_name: str) -> torch.Tensor:
+        model, device = cls.get_instance()
+        if voice_name not in cls._voicepacks:
+            try:
+                voicepack = torch.load(
+                    f"voices/{voice_name}.pt", map_location=device, weights_only=True
+                )
+                cls._voicepacks[voice_name] = voicepack
+            except Exception as e:
+                print(f"Error loading voice {voice_name}: {str(e)}")
+                if voice_name != "af":
+                    return cls.get_voicepack("af")
+                raise
+        return cls._voicepacks[voice_name]
+
+
+class TTSService:
+    def __init__(self, output_dir: str = None):
+        if output_dir is None:
+            output_dir = os.path.join(os.path.dirname(__file__), "..", "output")
+        self.output_dir = output_dir
+        self.db = QueueDB()
+        os.makedirs(output_dir, exist_ok=True)
+        self._start_worker()
+
+    def _start_worker(self):
+        """Start the background worker thread"""
+        self.worker = threading.Thread(target=self._process_queue, daemon=True)
+        self.worker.start()
+
+    def _generate_audio(self, text: str, voice: str) -> Tuple[torch.Tensor, float]:
+        """Generate audio and measure processing time"""
+        start_time = time.time()
+
+        # Get model instance and voicepack
+        model, device = TTSModel.get_instance()
+        voicepack = TTSModel.get_voicepack(voice)
+
+        # Generate audio
+        audio, _ = generate(model, text, voicepack, lang=voice[0])
+
+        processing_time = time.time() - start_time
+        return audio, processing_time
+
+    def _save_audio(self, audio: torch.Tensor, filepath: str):
+        """Save audio to file"""
+        os.makedirs(os.path.dirname(filepath), exist_ok=True)
+        wavfile.write(filepath, 24000, audio)
+
+    def _audio_to_bytes(self, audio: torch.Tensor) -> bytes:
+        """Convert audio tensor to WAV bytes"""
+        buffer = io.BytesIO()
+        wavfile.write(buffer, 24000, audio)
+        return buffer.getvalue()
+
+    def _process_queue(self):
+        """Background worker that processes the queue"""
+        while True:
+            next_request = self.db.get_next_pending()
+            if next_request:
+                request_id, text, voice = next_request
+                try:
+                    # Generate audio and measure time
+                    audio, processing_time = self._generate_audio(text, voice)
+
+                    # Save to file
+                    output_file = os.path.join(
+                        self.output_dir, f"speech_{request_id}.wav"
+                    )
+                    self._save_audio(audio, output_file)
+
+                    # Update status with processing time
+                    self.db.update_status(
+                        request_id,
+                        "completed",
+                        output_file=output_file,
+                        processing_time=processing_time,
+                    )
+
+                except Exception as e:
+                    print(f"Error processing request {request_id}: {str(e)}")
+                    self.db.update_status(request_id, "failed")
+
+            time.sleep(1)  # Prevent busy waiting
+
+    def list_voices(self) -> list[str]:
+        """List all available voices"""
+        voices = []
+        try:
+            for file in os.listdir("voices"):
+                if file.endswith(".pt"):
+                    voice_name = file[:-3]  # Remove .pt extension
+                    voices.append(voice_name)
+        except Exception as e:
+            print(f"Error listing voices: {str(e)}")
+        return voices
+
+    def create_tts_request(self, text: str, voice: str = "af") -> int:
+        """Create a new TTS request and return the request ID"""
+        return self.db.add_request(text, voice)
+
+    def get_request_status(
+        self, request_id: int
+    ) -> Optional[Tuple[str, Optional[str], Optional[float]]]:
+        """Get the status, output file path, and processing time for a request"""
+        return self.db.get_status(request_id)
--- a/docker-compose.yml
+++ b/docker-compose.yml
@ -0,0 +1,17 @@
+services:
+  kokoro-tts:
+    build:
+      context: .
+      args:
+        - KOKORO_REPO=https://huggingface.co/hexgrad/Kokoro-82M
+    volumes:
+      - ./api:/app/api
+    ports:
+      - "8880:8880"
+    deploy:
+      resources:
+        reservations:
+          devices:
+            - driver: nvidia
+              count: 1
+              capabilities: [gpu]
--- a/examples/benchmark_results.json
+++ b/examples/benchmark_results.json
@ -0,0 +1,80 @@
+[
+  {
+    "char_length": 100,
+    "tokens": 25,
+    "processing_time": 0.3308234214782715,
+    "output_length": 6.95
+  },
+  {
+    "char_length": 250,
+    "tokens": 59,
+    "processing_time": 0.6096279621124268,
+    "output_length": 17.0
+  },
+  {
+    "char_length": 500,
+    "tokens": 110,
+    "processing_time": 0.9121463298797607,
+    "output_length": 32.325
+  },
+  {
+    "char_length": 750,
+    "tokens": 170,
+    "processing_time": 1.5152575969696045,
+    "output_length": 47.375
+  },
+  {
+    "char_length": 1000,
+    "tokens": 220,
+    "processing_time": 1.9711074829101562,
+    "output_length": 61.575
+  },
+  {
+    "char_length": 1500,
+    "tokens": 329,
+    "processing_time": 2.958775043487549,
+    "output_length": 90.85
+  },
+  {
+    "char_length": 2000,
+    "tokens": 450,
+    "processing_time": 4.669129133224487,
+    "output_length": 120.625
+  },
+  {
+    "char_length": 3000,
+    "tokens": 655,
+    "processing_time": 6.0434815883636475,
+    "output_length": 176.975
+  },
+  {
+    "char_length": 4000,
+    "tokens": 855,
+    "processing_time": 9.574363470077515,
+    "output_length": 238.925
+  },
+  {
+    "char_length": 5000,
+    "tokens": 1078,
+    "processing_time": 9.906641483306885,
+    "output_length": 299.3
+  },
+  {
+    "char_length": 6000,
+    "tokens": 1298,
+    "processing_time": 11.334998607635498,
+    "output_length": 361.625
+  },
+  {
+    "char_length": 7000,
+    "tokens": 1512,
+    "processing_time": 13.87867283821106,
+    "output_length": 416.675
+  },
+  {
+    "char_length": 9783,
+    "tokens": 2139,
+    "processing_time": 19.85267996788025,
+    "output_length": 582.85
+  }
+]
--- a/examples/benchmark_tts.py
+++ b/examples/benchmark_tts.py
@ -0,0 +1,133 @@
+import os
+import time
+import json
+import scipy.io.wavfile as wavfile
+import tiktoken
+import requests
+import numpy as np
+import pandas as pd
+import seaborn as sns
+import matplotlib.pyplot as plt
+from concurrent.futures import ThreadPoolExecutor, TimeoutError
+
+# Initialize tokenizer
+enc = tiktoken.get_encoding("cl100k_base")
+
+def count_tokens(text: str) -> int:
+    """Count tokens in text using tiktoken"""
+    return len(enc.encode(text))
+
+def get_audio_length(filepath: str) -> float:
+    """Get audio length in seconds"""
+    # Convert API path to local path
+    local_path = filepath.replace('/app/api/src/output', 'api/src/output')
+    if not os.path.exists(local_path):
+        raise FileNotFoundError(f"Audio file not found at {local_path} (from {filepath})")
+    rate, data = wavfile.read(local_path)
+    return len(data) / rate
+
+def make_tts_request(text: str, timeout: int = 120) -> tuple[float, float]:
+    """Make TTS request and return processing time and output length"""
+    try:
+        # Submit request
+        response = requests.post(
+            'http://localhost:8880/tts',
+            json={'text': text},
+            timeout=timeout
+        )
+        request_id = response.json()['request_id']
+        
+        # Poll until complete
+        start_time = time.time()
+        while True:
+            status_response = requests.get(
+                f'http://localhost:8880/tts/{request_id}',
+                timeout=timeout
+            )
+            status = status_response.json()
+            
+            if status['status'] == 'completed':
+                # Convert Docker path to local path
+                docker_path = status['output_file']
+                filename = os.path.basename(docker_path)  # Get just the filename
+                local_path = os.path.join('api/src/output', filename)  # Construct local path
+                try:
+                    audio_length = get_audio_length(local_path)
+                    return status['processing_time'], audio_length
+                except Exception as e:
+                    print(f"Error reading audio file: {str(e)}")
+                    return None, None
+            
+            if time.time() - start_time > timeout:
+                raise TimeoutError()
+                
+            time.sleep(0.5)
+            
+    except (requests.exceptions.Timeout, TimeoutError):
+        print(f"Request timed out for text: {text[:50]}...")
+        return None, None
+    except Exception as e:
+        print(f"Error processing text: {text[:50]}... Error: {str(e)}")
+        return None, None
+
+def main():
+    # Create output directory
+    os.makedirs('examples/output', exist_ok=True)
+    
+    # Read input text
+    with open('examples/the_time_machine_hg_wells.txt', 'r', encoding='utf-8') as f:
+        text = f.read()
+    
+    # Create range of sizes up to full text
+    sizes = [100, 250, 500, 750, 1000, 1500, 2000, 3000, 4000, 5000, 6000, 7000, len(text)]
+    
+    # Process chunks
+    results = []
+    for size in sizes:
+        # Get chunk and count tokens
+        chunk = text[:size]
+        num_tokens = count_tokens(chunk)
+        
+        print(f"\nProcessing chunk with {num_tokens} tokens ({size} chars):")
+        print(f"Text preview: {chunk[:100]}...")
+        
+        processing_time, audio_length = make_tts_request(chunk)
+        
+        if processing_time is not None:
+            results.append({
+                'char_length': size,
+                'tokens': num_tokens,
+                'processing_time': processing_time,
+                'output_length': audio_length
+            })
+    with open('examples/benchmark_results.json', 'w') as f:
+        json.dump(results, f, indent=2)
+    
+    # Create DataFrame for plotting
+    df = pd.DataFrame(results)
+    
+    # Plot 1: Processing Time vs Output Length
+    plt.figure(figsize=(12, 8))
+    sns.scatterplot(data=df, x='output_length', y='processing_time')
+    sns.regplot(data=df, x='output_length', y='processing_time', scatter=False)
+    plt.title('Processing Time vs Output Length')
+    plt.xlabel('Output Audio Length (seconds)')
+    plt.ylabel('Processing Time (seconds)')
+    plt.savefig('examples/time_vs_output.png', dpi=300, bbox_inches='tight')
+    plt.close()
+    
+    # Plot 2: Processing Time vs Token Count
+    plt.figure(figsize=(12, 8))
+    sns.scatterplot(data=df, x='tokens', y='processing_time')
+    sns.regplot(data=df, x='tokens', y='processing_time', scatter=False)
+    plt.title('Processing Time vs Token Count')
+    plt.xlabel('Number of Input Tokens')
+    plt.ylabel('Processing Time (seconds)')
+    plt.savefig('examples/time_vs_tokens.png', dpi=300, bbox_inches='tight')
+    plt.close()
+    
+    print("\nResults saved to examples/benchmark_results.json")
+    print("Plots saved as time_vs_output.png and time_vs_tokens.png")
+
+if __name__ == '__main__':
+    main()
--- a/examples/test_tts.py
+++ b/examples/test_tts.py
@ -0,0 +1,187 @@
+#!/usr/bin/env python3
+import argparse
+import requests
+import time
+import sys
+import os
+from typing import Optional, Tuple
+
+
+def get_voices(
+    base_url: str = "http://localhost:8880",
+) -> Optional[Tuple[list[str], str]]:
+    """Get list of available voices and default voice"""
+    try:
+        response = requests.get(f"{base_url}/tts/voices")
+        if response.status_code == 200:
+            data = response.json()
+            return data["voices"], data["default"]
+    except requests.exceptions.RequestException as e:
+        print(f"Error getting voices: {e}")
+    return None
+
+
+def submit_tts_request(
+    text: str, voice: Optional[str] = None, base_url: str = "http://localhost:8880"
+) -> Optional[int]:
+    """Submit a TTS request and return the request ID"""
+    try:
+        payload = {"text": text, "voice": voice} if voice else {"text": text}
+        response = requests.post(f"{base_url}/tts", json=payload)
+        if response.status_code != 200:
+            print(f"Error submitting request: {response.text}")
+            return None
+        return response.json()["request_id"]
+    except requests.exceptions.RequestException as e:
+        print(f"Error: {e}")
+        return None
+
+
+def check_request_status(
+    request_id: int, base_url: str = "http://localhost:8880"
+) -> Optional[dict]:
+    """Check the status of a request"""
+    try:
+        response = requests.get(f"{base_url}/tts/{request_id}")
+        if response.status_code != 200:
+            print(f"Error checking status: {response.text}")
+            return None
+        return response.json()
+    except requests.exceptions.RequestException as e:
+        print(f"Error: {e}")
+        return None
+
+
+def download_audio(
+    request_id: int, base_url: str = "http://localhost:8880"
+) -> Optional[str]:
+    """Download and save the generated audio file. Returns the filepath if successful."""
+    try:
+        response = requests.get(f"{base_url}/tts/file/{request_id}")
+        if response.status_code != 200:
+            print("Error downloading file")
+            return None
+
+        filename = (
+            response.headers.get("content-disposition", "")
+            .split("filename=")[-1]
+            .strip('"')
+        )
+        if not filename:
+            filename = f"speech_{request_id}.wav"
+
+        filepath = os.path.join(os.path.dirname(__file__), "output", filename)
+        os.makedirs(os.path.dirname(filepath), exist_ok=True)
+        with open(filepath, "wb") as f:
+            f.write(response.content)
+        return filepath
+    except requests.exceptions.RequestException as e:
+        print(f"Error: {e}")
+        return None
+
+
+def generate_speech(
+    text: str,
+    voice: Optional[str] = None,
+    base_url: str = "http://localhost:8880",
+    download: bool = True,
+) -> bool:
+    """Generate speech from text"""
+    # Submit request
+    print("Submitting request...")
+    request_id = submit_tts_request(text, voice, base_url)
+    if not request_id:
+        return False
+
+    print(f"Request submitted (ID: {request_id})")
+
+    # Poll for completion
+    while True:
+        status = check_request_status(request_id, base_url)
+        if not status:
+            return False
+
+        if status["status"] == "completed":
+            print("Generation complete!")
+            if status["processing_time"]:
+                print(f"Processing time: {status['processing_time']:.2f}s")
+
+            # Show output file path (clean up any relative path components)
+            output_file = status["output_file"]
+            if output_file:
+                output_file = os.path.normpath(output_file)
+            print(f"Output file: {output_file}")
+
+            # Download if requested
+            if download:
+                print("Downloading file...")
+                filepath = download_audio(request_id, base_url)
+                if filepath:
+                    print(f"Saved to: {filepath}")
+                    return True
+                return False
+            return True
+
+        elif status["status"] == "failed":
+            print("Generation failed")
+            return False
+
+        print(".", end="", flush=True)
+        time.sleep(1)
+
+
+def list_available_voices(url: str):
+    """List all available voices"""
+    voices = get_voices(url)
+    if voices:
+        voices_list, default_voice = voices
+        print("Available voices:")
+        for voice in voices_list:
+            if voice == default_voice:
+                print(f"  {voice} (default)")
+            else:
+                print(f"  {voice}")
+    else:
+        print("Error getting voices")
+
+
+def main():
+    parser = argparse.ArgumentParser(description="Kokoro TTS CLI")
+    parser.add_argument("text", nargs="?", help="Text to convert to speech")
+    parser.add_argument("--voice", help="Voice to use")
+    parser.add_argument("--url", default="http://localhost:8880", help="API base URL")
+    parser.add_argument("--debug", action="store_true", help="Enable debug logging")
+    parser.add_argument(
+        "--no-download",
+        action="store_true",
+        help="Don't download the file, just show the filepath",
+    )
+    args = parser.parse_args()
+
+    if args.debug:
+        print(f"Debug: Arguments received: {args}")
+
+    # If no text provided, just list voices
+    if not args.text:
+        list_available_voices(args.url)
+        return
+
+    # Generate speech
+    print(f"Generating speech for: {args.text}")
+    if args.voice:
+        print(f"Using voice: {args.voice}")
+
+    if args.debug:
+        print(
+            f"Debug: Calling generate_speech with text='{args.text}', voice='{args.voice}'"
+        )
+
+    success = generate_speech(
+        args.text, args.voice, args.url, download=not args.no_download
+    )
+    if not success:
+        sys.exit(1)
+
+
+if __name__ == "__main__":
+    main()
--- a/examples/the_time_machine_hg_wells.txt
+++ b/examples/the_time_machine_hg_wells.txt
@ -0,0 +1,104 @@
+The Time Traveller (for so it will be convenient to speak of him) was expounding a recondite matter to us. His pale grey eyes shone and twinkled, and his usually pale face was flushed and animated. The fire burnt brightly, and the soft radiance of the incandescent lights in the lilies of silver caught the bubbles that flashed and passed in our glasses. Our chairs, being his patents, embraced and caressed us rather than submitted to be sat upon, and there was that luxurious after-dinner atmosphere, when thought runs gracefully free of the trammels of precision. And he put it to us in this way—marking the points with a lean forefinger—as we sat and lazily admired his earnestness over this new paradox (as we thought it) and his fecundity.
+
+“You must follow me carefully. I shall have to controvert one or two ideas that are almost universally accepted. The geometry, for instance, they taught you at school is founded on a misconception.”
+
+“Is not that rather a large thing to expect us to begin upon?” said Filby, an argumentative person with red hair.
+
+“I do not mean to ask you to accept anything without reasonable ground for it. You will soon admit as much as I need from you. You know of course that a mathematical line, a line of thickness nil, has no real existence. They taught you that? Neither has a mathematical plane. These things are mere abstractions.”
+
+“That is all right,” said the Psychologist.
+
+“Nor, having only length, breadth, and thickness, can a cube have a real existence.”
+
+“There I object,” said Filby. “Of course a solid body may exist. All real things—”
+
+“So most people think. But wait a moment. Can an instantaneous cube exist?”
+
+“Don’t follow you,” said Filby.
+
+“Can a cube that does not last for any time at all, have a real existence?”
+
+Filby became pensive. “Clearly,” the Time Traveller proceeded, “any real body must have extension in four directions: it must have Length, Breadth, Thickness, and—Duration. But through a natural infirmity of the flesh, which I will explain to you in a moment, we incline to overlook this fact. There are really four dimensions, three which we call the three planes of Space, and a fourth, Time. There is, however, a tendency to draw an unreal distinction between the former three dimensions and the latter, because it happens that our consciousness moves intermittently in one direction along the latter from the beginning to the end of our lives.”
+
+“That,” said a very young man, making spasmodic efforts to relight his cigar over the lamp; “that . . . very clear indeed.”
+
+“Now, it is very remarkable that this is so extensively overlooked,” continued the Time Traveller, with a slight accession of cheerfulness. “Really this is what is meant by the Fourth Dimension, though some people who talk about the Fourth Dimension do not know they mean it. It is only another way of looking at Time. There is no difference between Time and any of the three dimensions of Space except that our consciousness moves along it. But some foolish people have got hold of the wrong side of that idea. You have all heard what they have to say about this Fourth Dimension?”
+
+“I have not,” said the Provincial Mayor.
+
+“It is simply this. That Space, as our mathematicians have it, is spoken of as having three dimensions, which one may call Length, Breadth, and Thickness, and is always definable by reference to three planes, each at right angles to the others. But some philosophical people have been asking why three dimensions particularly—why not another direction at right angles to the other three?—and have even tried to construct a Four-Dimensional geometry. Professor Simon Newcomb was expounding this to the New York Mathematical Society only a month or so ago. You know how on a flat surface, which has only two dimensions, we can represent a figure of a three-dimensional solid, and similarly they think that by models of three dimensions they could represent one of four—if they could master the perspective of the thing. See?”
+
+“I think so,” murmured the Provincial Mayor; and, knitting his brows, he lapsed into an introspective state, his lips moving as one who repeats mystic words. “Yes, I think I see it now,” he said after some time, brightening in a quite transitory manner.
+
+“Well, I do not mind telling you I have been at work upon this geometry of Four Dimensions for some time. Some of my results are curious. For instance, here is a portrait of a man at eight years old, another at fifteen, another at seventeen, another at twenty-three, and so on. All these are evidently sections, as it were, Three-Dimensional representations of his Four-Dimensioned being, which is a fixed and unalterable thing.
+
+“Scientific people,” proceeded the Time Traveller, after the pause required for the proper assimilation of this, “know very well that Time is only a kind of Space. Here is a popular scientific diagram, a weather record. This line I trace with my finger shows the movement of the barometer. Yesterday it was so high, yesterday night it fell, then this morning it rose again, and so gently upward to here. Surely the mercury did not trace this line in any of the dimensions of Space generally recognised? But certainly it traced such a line, and that line, therefore, we must conclude, was along the Time-Dimension.”
+
+“But,” said the Medical Man, staring hard at a coal in the fire, “if Time is really only a fourth dimension of Space, why is it, and why has it always been, regarded as something different? And why cannot we move in Time as we move about in the other dimensions of Space?”
+
+The Time Traveller smiled. “Are you so sure we can move freely in Space? Right and left we can go, backward and forward freely enough, and men always have done so. I admit we move freely in two dimensions. But how about up and down? Gravitation limits us there.”
+
+“Not exactly,” said the Medical Man. “There are balloons.”
+
+“But before the balloons, save for spasmodic jumping and the inequalities of the surface, man had no freedom of vertical movement.”
+
+“Still they could move a little up and down,” said the Medical Man.
+
+“Easier, far easier down than up.”
+
+“And you cannot move at all in Time, you cannot get away from the present moment.”
+
+“My dear sir, that is just where you are wrong. That is just where the whole world has gone wrong. We are always getting away from the present moment. Our mental existences, which are immaterial and have no dimensions, are passing along the Time-Dimension with a uniform velocity from the cradle to the grave. Just as we should travel down if we began our existence fifty miles above the earth’s surface.”
+
+“But the great difficulty is this,” interrupted the Psychologist. ’You can move about in all directions of Space, but you cannot move about in Time.”
+
+“That is the germ of my great discovery. But you are wrong to say that we cannot move about in Time. For instance, if I am recalling an incident very vividly I go back to the instant of its occurrence: I become absent-minded, as you say. I jump back for a moment. Of course we have no means of staying back for any length of Time, any more than a savage or an animal has of staying six feet above the ground. But a civilised man is better off than the savage in this respect. He can go up against gravitation in a balloon, and why should he not hope that ultimately he may be able to stop or accelerate his drift along the Time-Dimension, or even turn about and travel the other way?”
+
+“Oh, this,” began Filby, “is all—”
+
+“Why not?” said the Time Traveller.
+
+“It’s against reason,” said Filby.
+
+“What reason?” said the Time Traveller.
+
+“You can show black is white by argument,” said Filby, “but you will never convince me.”
+
+“Possibly not,” said the Time Traveller. “But now you begin to see the object of my investigations into the geometry of Four Dimensions. Long ago I had a vague inkling of a machine—”
+
+“To travel through Time!” exclaimed the Very Young Man.
+
+“That shall travel indifferently in any direction of Space and Time, as the driver determines.”
+
+Filby contented himself with laughter.
+
+“But I have experimental verification,” said the Time Traveller.
+
+“It would be remarkably convenient for the historian,” the Psychologist suggested. “One might travel back and verify the accepted account of the Battle of Hastings, for instance!”
+
+“Don’t you think you would attract attention?” said the Medical Man. “Our ancestors had no great tolerance for anachronisms.”
+
+“One might get one’s Greek from the very lips of Homer and Plato,” the Very Young Man thought.
+
+“In which case they would certainly plough you for the Little-go. The German scholars have improved Greek so much.”
+
+“Then there is the future,” said the Very Young Man. “Just think! One might invest all one’s money, leave it to accumulate at interest, and hurry on ahead!”
+
+“To discover a society,” said I, “erected on a strictly communistic basis.”
+
+“Of all the wild extravagant theories!” began the Psychologist.
+
+“Yes, so it seemed to me, and so I never talked of it until—”
+
+“Experimental verification!” cried I. “You are going to verify that?”
+
+“The experiment!” cried Filby, who was getting brain-weary.
+
+“Let’s see your experiment anyhow,” said the Psychologist, “though it’s all humbug, you know.”
+
+The Time Traveller smiled round at us. Then, still smiling faintly, and with his hands deep in his trousers pockets, he walked slowly out of the room, and we heard his slippers shuffling down the long passage to his laboratory.
+
+The Psychologist looked at us. “I wonder what he’s got?”
+
+“Some sleight-of-hand trick or other,” said the Medical Man, and Filby tried to tell us about a conjuror he had seen at Burslem, but before he had finished his preface the Time Traveller came back, and Filby’s anecdote collapsed.
+
--- a/examples/time_vs_output.png
+++ b/examples/time_vs_output.png
--- a/examples/time_vs_tokens.png
+++ b/examples/time_vs_tokens.png