Kokoro-FastAPI/README.md

<p align="center">
  <img src="githubbanner.png" alt="Kokoro TTS Banner">
</p>

# <sub><sub>_`FastKoko`_ </sub></sub>
[![Tests](https://img.shields.io/badge/tests-117%20passed-darkgreen)]()
[![Coverage](https://img.shields.io/badge/coverage-60%25-grey)]()
[![Tested at Model Commit](https://img.shields.io/badge/last--tested--model--commit-a67f113-blue)](https://huggingface.co/hexgrad/Kokoro-82M/tree/c3b0d86e2a980e027ef71c28819ea02e351c2667) [![Try on Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Try%20on-Spaces-blue)](https://huggingface.co/spaces/Remsky/Kokoro-TTS-Zero) [![Buy Me A Coffee](https://img.shields.io/badge/BMC-✨☕-gray?style=flat-square)](https://www.buymeacoffee.com/remsky)

Dockerized FastAPI wrapper for [Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) text-to-speech model
- OpenAI-compatible Speech endpoint, with inline voice combination functionality
- NVIDIA GPU accelerated or CPU Onnx inference 
- very fast generation time
  - 100x+ real time speed via HF A100
  - 35-50x+ real time speed via 4060Ti
  - 5x+ real time speed via M3 Pro CPU
- streaming support w/ variable chunking to control latency & artifacts
- simple audio generation web ui utility
- (new) phoneme endpoints for conversion and generation


## Quick Start

The service can be accessed through either the API endpoints or the Gradio web interface.

1. Install prerequisites, and start the service using Docker Compose (Full setup including UI):
   - Install [Docker Desktop](https://www.docker.com/products/docker-desktop/)
   - Clone the repository:
        ```bash
        git clone https://github.com/remsky/Kokoro-FastAPI.git
        cd Kokoro-FastAPI
        
        #   * Switch to stable branch if any issues *
        git checkout v0.0.5post1-stable

        cd docker/gpu # OR 
        # cd docker/cpu # Run this or the above
        docker compose up --build 
        ```
        
      Once started:
     - The API will be available at http://localhost:8880
     - The UI can be accessed at http://localhost:7860
        
  __Or__ running the API alone using Docker (model + voice packs baked in) (Most Recent):
          
  ```bash
  docker run -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-cpu:v0.1.0post1 # CPU 
  docker run --gpus all -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-gpu:v0.1.0post1 # Nvidia GPU
  ```
        
        
4. Run locally as an OpenAI-Compatible Speech Endpoint
    ```python
    from openai import OpenAI
    client = OpenAI(
        base_url="http://localhost:8880/v1",
        api_key="not-needed"
        )

    with client.audio.speech.with_streaming_response.create(
        model="kokoro", 
        voice="af_sky+af_bella", #single or multiple voicepack combo
        input="Hello world!",
        response_format="mp3"
    ) as response:
        response.stream_to_file("output.mp3")
    
    ```

    or visit http://localhost:7860
    <p align="center">
    <img src="ui\GradioScreenShot.png" width="80%" alt="Voice Analysis Comparison" style="border: 2px solid #333; padding: 10px;">
    </p>

## Features 
<details>
<summary>OpenAI-Compatible Speech Endpoint</summary>

```python
# Using OpenAI's Python library
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8880/v1", api_key="not-needed")
response = client.audio.speech.create(
    model="kokoro",  # Not used but required for compatibility, also accepts library defaults
    voice="af_bella+af_sky",
    input="Hello world!",
    response_format="mp3"
)

response.stream_to_file("output.mp3")
```
Or Via Requests:
```python
import requests


response = requests.get("http://localhost:8880/v1/audio/voices")
voices = response.json()["voices"]

# Generate audio
response = requests.post(
    "http://localhost:8880/v1/audio/speech",
    json={
        "model": "kokoro",  # Not used but required for compatibility
        "input": "Hello world!",
        "voice": "af_bella",
        "response_format": "mp3",  # Supported: mp3, wav, opus, flac
        "speed": 1.0
    }
)

# Save audio
with open("output.mp3", "wb") as f:
    f.write(response.content)
```

Quick tests (run from another terminal):
```bash
python examples/assorted_checks/test_openai/test_openai_tts.py # Test OpenAI Compatibility
python examples/assorted_checks/test_voices/test_all_voices.py # Test all available voices
```
</details>

<details>
<summary>Voice Combination</summary>

- Averages model weights of any existing voicepacks
- Saves generated voicepacks for future use
- (new) Available through any endpoint, simply concatenate desired packs with "+"

Combine voices and generate audio:
```python
import requests
response = requests.get("http://localhost:8880/v1/audio/voices")
voices = response.json()["voices"]

# Create combined voice (saves locally on server)
response = requests.post(
    "http://localhost:8880/v1/audio/voices/combine",
    json=[voices[0], voices[1]]
)
combined_voice = response.json()["voice"]

# Generate audio with combined voice (or, simply pass multiple directly with `+` )
response = requests.post(
    "http://localhost:8880/v1/audio/speech",
    json={
        "input": "Hello world!",
        "voice": combined_voice, # or skip the above step with f"{voices[0]}+{voices[1]}"
        "response_format": "mp3"
    }
)
```
<p align="center">
  <img src="assets/voice_analysis.png" width="80%" alt="Voice Analysis Comparison" style="border: 2px solid #333; padding: 10px;">
</p>
</details>

<details>
<summary>Multiple Output Audio Formats</summary>

- mp3
- wav
- opus 
- flac
- aac
- pcm

<p align="center">
<img src="assets/format_comparison.png" width="80%" alt="Audio Format Comparison" style="border: 2px solid #333; padding: 10px;">
</p>

</details>

<details>
<summary>Gradio Web Utility</summary>

Access the interactive web UI at http://localhost:7860 after starting the service. Features include:
- Voice/format/speed selection
- Audio playback and download
- Text file or direct input

If you only want the API, just comment out everything in the docker-compose.yml under and including `gradio-ui`

Currently, voices created via the API are accessible here, but voice combination/creation has not yet been added

Running the UI Docker Service
   - If you only want to run the Gradio web interface separately and connect it to an existing API service:
      ```bash
      docker run -p 7860:7860 \
        -e API_HOST=<api-hostname-or-ip> \
        -e API_PORT=8880 \
        ghcr.io/remsky/kokoro-fastapi-ui:v0.1.0
      ```

     - Replace `<api-hostname-or-ip>` with:
       - `kokoro-tts` if the UI container is running in the same Docker Compose setup.
       - `localhost` if the API is running on your local machine.
  
### Disabling Local Saving

You can disable local saving of audio files and hide the file view in the UI by setting the `DISABLE_LOCAL_SAVING` environment variable to `true`. This is useful when running the service on a server where you don't want to store generated audio files locally.

When using Docker Compose:
```yaml
environment:
  - DISABLE_LOCAL_SAVING=true
```

When running the Docker image directly:
```bash
docker run -p 7860:7860 -e DISABLE_LOCAL_SAVING=true ghcr.io/remsky/kokoro-fastapi-ui:latest
```
</details>

<details>
<summary>Streaming Support</summary>

```python
# OpenAI-compatible streaming
from openai import OpenAI
client = OpenAI(
    base_url="http://localhost:8880", api_key="not-needed")

# Stream to file
with client.audio.speech.with_streaming_response.create(
    model="kokoro",
    voice="af_bella",
    input="Hello world!"
) as response:
    response.stream_to_file("output.mp3")

# Stream to speakers (requires PyAudio)
import pyaudio
player = pyaudio.PyAudio().open(
    format=pyaudio.paInt16, 
    channels=1, 
    rate=24000, 
    output=True
)

with client.audio.speech.with_streaming_response.create(
    model="kokoro",
    voice="af_bella",
    response_format="pcm",
    input="Hello world!"
) as response:
    for chunk in response.iter_bytes(chunk_size=1024):
        player.write(chunk)
```

Or via requests:
```python
import requests

response = requests.post(
    "http://localhost:8880/v1/audio/speech",
    json={
        "input": "Hello world!",
        "voice": "af_bella",
        "response_format": "pcm"
    },
    stream=True
)

for chunk in response.iter_content(chunk_size=1024):
    if chunk:
        # Process streaming chunks
        pass
```

<p align="center">
  <img src="assets/gpu_first_token_timeline_openai.png" width="45%" alt="GPU First Token Timeline" style="border: 2px solid #333; padding: 10px; margin-right: 1%;">
  <img src="assets/cpu_first_token_timeline_stream_openai.png" width="45%" alt="CPU First Token Timeline" style="border: 2px solid #333; padding: 10px;">
</p>

Key Streaming Metrics:
- First token latency @ chunksize
    - ~300ms  (GPU) @ 400 
    - ~3500ms (CPU) @ 200 (older i7)
    - ~<1s    (CPU) @ 200 (M3 Pro)
- Adjustable chunking settings for real-time playback 

*Note: Artifacts in intonation can increase with smaller chunks*
</details>

## Processing Details
<details>
<summary>Performance Benchmarks</summary>

Benchmarking was performed on generation via the local API using text lengths up to feature-length books (~1.5 hours output), measuring processing time and realtime factor. Tests were run on: 
- Windows 11 Home w/ WSL2 
- NVIDIA 4060Ti 16gb GPU @ CUDA 12.1
- 11th Gen i7-11700 @ 2.5GHz
- 64gb RAM
- WAV native output
- H.G. Wells - The Time Machine (full text)

<p align="center">
  <img src="assets/gpu_processing_time.png" width="45%" alt="Processing Time" style="border: 2px solid #333; padding: 10px; margin-right: 1%;">
  <img src="assets/gpu_realtime_factor.png" width="45%" alt="Realtime Factor" style="border: 2px solid #333; padding: 10px;">
</p>

Key Performance Metrics:
- Realtime Speed: Ranges between 25-50x (generation time to output audio length)
- Average Processing Rate: 137.67 tokens/second (cl100k_base)
</details>
<details>
<summary>GPU Vs. CPU</summary>

```bash
# GPU: Requires NVIDIA GPU with CUDA 12.1 support (~35x realtime speed)
docker compose up --build

# CPU: ONNX optimized inference (~2.4x realtime speed)
docker compose -f docker-compose.cpu.yml up --build
```
*Note: Overall speed may have reduced somewhat with the structural changes to accomodate streaming. Looking into it* 
</details>

<details>
<summary>Natural Boundary Detection</summary>

- Automatically splits and stitches at sentence boundaries 
- Helps to reduce artifacts and allow long form processing as the base model is only currently configured for approximately 30s output 
</details>

<details>
<summary>Phoneme & Token Routes</summary>

Convert text to phonemes and/or generate audio directly from phonemes:
```python
import requests

# Convert text to phonemes
response = requests.post(
    "http://localhost:8880/dev/phonemize",
    json={
        "text": "Hello world!",
        "language": "a"  # "a" for American English
    }
)
result = response.json()
phonemes = result["phonemes"]  # Phoneme string e.g  ðɪs ɪz ˈoʊnli ɐ tˈɛst
tokens = result["tokens"]      # Token IDs including start/end tokens 

# Generate audio from phonemes
response = requests.post(
    "http://localhost:8880/dev/generate_from_phonemes",
    json={
        "phonemes": phonemes,
        "voice": "af_bella",
        "speed": 1.0
    }
)

# Save WAV audio
with open("speech.wav", "wb") as f:
    f.write(response.content)
```

See `examples/phoneme_examples/generate_phonemes.py` for a sample script.
</details>

## Known Issues

<details>
<summary>Versioning & Development</summary>

I'm doing what I can to keep things stable, but we are on an early and rapid set of build cycles here.
If you run into trouble, you may have to roll back a version on the release tags if something comes up, or build up from source and/or troubleshoot + submit a PR. Will leave the branch up here for the last known stable points:

`v0.0.5post1`

Free and open source is a community effort, and I love working on this project, though there's only really so many hours in a day. If you'd like to support the work, feel free to open a PR, buy me a coffee, or report any bugs/features/etc you find during use.

  <a href="https://www.buymeacoffee.com/remsky" target="_blank">
    <img 
      src="https://cdn.buymeacoffee.com/buttons/v2/default-violet.png" 
      alt="Buy Me A Coffee" 
      style="height: 30px !important;width: 110px !important;"
    >
  </a>

  
</details>

<details>
<summary>Linux GPU Permissions</summary>

Some Linux users may encounter GPU permission issues when running as non-root. 
Can't guarantee anything, but here are some common solutions, consider your security requirements carefully

### Option 1: Container Groups (Likely the best option)
```yaml
services:
  kokoro-tts:
    # ... existing config ...
    group_add:
      - "video"
      - "render"
```

### Option 2: Host System Groups
```yaml
services:
  kokoro-tts:
    # ... existing config ...
    user: "${UID}:${GID}"
    group_add:
      - "video"
```
Note: May require adding host user to groups: `sudo usermod -aG docker,video $USER` and system restart.

### Option 3: Device Permissions (Use with caution)
```yaml
services:
  kokoro-tts:
    # ... existing config ...
    devices:
      - /dev/nvidia0:/dev/nvidia0
      - /dev/nvidiactl:/dev/nvidiactl
      - /dev/nvidia-uvm:/dev/nvidia-uvm
```
⚠️ Warning: Reduces system security. Use only in development environments.

Prerequisites: NVIDIA GPU, drivers, and container toolkit must be properly configured.

Visit [NVIDIA Container Toolkit installation](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) for more detailed information

</details>

## Model and License

<details open>
<summary>Model</summary>

This API uses the [Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) model from HuggingFace. 

Visit the model page for more details about training, architecture, and capabilities. I have no affiliation with any of their work, and produced this wrapper for ease of use and personal projects.
</details>
<details>
<summary>License</summary>
This project is licensed under the Apache License 2.0 - see below for details:

- The Kokoro model weights are licensed under Apache 2.0 (see [model page](https://huggingface.co/hexgrad/Kokoro-82M))
- The FastAPI wrapper code in this repository is licensed under Apache 2.0 to match
- The inference code adapted from StyleTTS2 is MIT licensed

The full Apache 2.0 license text can be found at: https://www.apache.org/licenses/LICENSE-2.0
</details>
-												Update README with performance benchmarks and usage examples; add benchmark plotting script

											
										
										
											2024-12-30 04:53:29 -07:00
+								<p align="center">
 								  <img src="githubbanner.png" alt="Kokoro TTS Banner">
 								</p>
-												ci: enhance local saving feature, update voice selection to support multiple voices, and improve output filename generation

											
										
										
											2025-01-14 03:47:25 -07:00
+								# <sub><sub>_`FastKoko`_ </sub></sub>
-												- Added GenerateFromPhonemesRequest model to text_schemas.py
- Refactored TTS model initialization methods in tts_gpu.py and tts_cpu.py
- Added custom logger configuration in main.py
- Deprecated text_processing router -> development route

											
										
										
											2025-01-09 07:20:14 -07:00
+								[![Tests](https://img.shields.io/badge/tests-117%20passed-darkgreen)]()
-												ci: enhance local saving feature, update voice selection to support multiple voices, and improve output filename generation

											
										
										
											2025-01-14 03:47:25 -07:00
+								[![Coverage](https://img.shields.io/badge/coverage-60%25-grey)]()
-												Update README.md
											
										
										
											2025-01-13 06:18:56 -07:00
+								[![Tested at Model Commit](https://img.shields.io/badge/last--tested--model--commit-a67f113-blue)](https://huggingface.co/hexgrad/Kokoro-82M/tree/c3b0d86e2a980e027ef71c28819ea02e351c2667) [![Try on Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Try%20on-Spaces-blue)](https://huggingface.co/spaces/Remsky/Kokoro-TTS-Zero) [![Buy Me A Coffee](https://img.shields.io/badge/BMC-✨☕-gray?style=flat-square)](https://www.buymeacoffee.com/remsky)
-												Update README with performance benchmarks and usage examples; add benchmark plotting script

											
										
										
											2024-12-30 04:53:29 -07:00
-												Add Gradio web interface + tests

											
										
										
											2025-01-01 21:50:00 -07:00
+								Dockerized FastAPI wrapper for [Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) text-to-speech model
-												- Added support for combining voices via any endpoint
- Updated the `process_voices` function to handle both string and list formats for voice input.

											
										
										
											2025-01-07 03:50:08 -07:00
+								- OpenAI-compatible Speech endpoint, with inline voice combination functionality
-												- Added GenerateFromPhonemesRequest model to text_schemas.py
- Refactored TTS model initialization methods in tts_gpu.py and tts_cpu.py
- Added custom logger configuration in main.py
- Deprecated text_processing router -> development route

											
										
										
											2025-01-09 07:20:14 -07:00
+								- NVIDIA GPU accelerated or CPU Onnx inference
-												Update README.md
											
										
										
											2025-01-06 20:26:23 -07:00
+								- very fast generation time
-												- Added GenerateFromPhonemesRequest model to text_schemas.py
- Refactored TTS model initialization methods in tts_gpu.py and tts_cpu.py
- Added custom logger configuration in main.py
- Deprecated text_processing router -> development route

											
										
										
											2025-01-09 07:20:14 -07:00
+								  - 100x+ real time speed via HF A100
 								  - 35-50x+ real time speed via 4060Ti
 								  - 5x+ real time speed via M3 Pro CPU
-												- Added support for combining voices via any endpoint
- Updated the `process_voices` function to handle both string and list formats for voice input.

											
										
										
											2025-01-07 03:50:08 -07:00
+								- streaming support w/ variable chunking to control latency & artifacts
-												Add Gradio web interface + tests

											
										
										
											2025-01-01 21:50:00 -07:00
+								- simple audio generation web ui utility
-												- Added GenerateFromPhonemesRequest model to text_schemas.py
- Refactored TTS model initialization methods in tts_gpu.py and tts_cpu.py
- Added custom logger configuration in main.py
- Deprecated text_processing router -> development route

											
										
										
											2025-01-09 07:20:14 -07:00
+								- (new) phoneme endpoints for conversion and generation
-												Add initial implementation of Kokoro TTS API with Docker GPU support

- Set up FastAPI application with TTS service
- Define API endpoints for TTS generation and voice listing
- Implement Pydantic models for request and response schemas
- Add Dockerfile and docker-compose.yml for containerization
- Include example usage and benchmark results in README

											
										
										
											2024-12-30 04:17:50 -07:00
-												Update GitHub Actions workflow and README for Docker publishing and new badge

											
										
										
											2025-01-02 20:27:12 -07:00
-												added output audio tests, validation

											
										
										
											2025-01-02 15:36:53 -07:00
+								## Quick Start
-												Add initial implementation of Kokoro TTS API with Docker GPU support

- Set up FastAPI application with TTS service
- Define API endpoints for TTS generation and voice listing
- Implement Pydantic models for request and response schemas
- Add Dockerfile and docker-compose.yml for containerization
- Include example usage and benchmark results in README

											
										
										
											2024-12-30 04:17:50 -07:00
-												Add Gradio web interface + tests

											
										
										
											2025-01-01 21:50:00 -07:00
+								The service can be accessed through either the API endpoints or the Gradio web interface.
-												Add preliminary Docker support for CPU deployment

											
										
										
											2024-12-31 04:51:21 -07:00
-												Update README.md
											
										
										
											2025-01-15 03:06:17 -07:00
+. Install prerequisites, and start the service using Docker Compose (Full setup including UI):
-												Update spacing in the readme

											
										
										
											2025-01-14 16:39:17 +01:00
+								   - Install [Docker Desktop](https://www.docker.com/products/docker-desktop/)
 								   - Clone the repository:
-												Add Gradio web interface + tests

											
										
										
											2025-01-01 21:50:00 -07:00
+								        ```bash
 								        git clone https://github.com/remsky/Kokoro-FastAPI.git
 								        cd Kokoro-FastAPI
-												Update README.md
											
										
										
											2025-01-15 03:06:17 -07:00
 								        #   * Switch to stable branch if any issues *
-												Update README.md
											
										
										
											2025-01-15 03:00:42 -07:00
+								        git checkout v0.0.5post1-stable
-												ci: update CI configuration to run tests only for API and enhance README with Docker usage instructions

											
										
										
											2025-01-14 00:38:44 -07:00
-												ci: enhance local saving feature, update voice selection to support multiple voices, and improve output filename generation

											
										
										
											2025-01-14 03:47:25 -07:00
+								        cd docker/gpu # OR
 								        # cd docker/cpu # Run this or the above
 								        docker compose up --build
-												Add Gradio web interface + tests

											
										
										
											2025-01-01 21:50:00 -07:00
+								        ```
-												Update README.md
											
										
										
											2025-01-15 03:06:17 -07:00
-												Update the readme

											
										
										
											2025-01-14 16:30:06 +01:00
+								      Once started:
-												Update spacing in the readme

											
										
										
											2025-01-14 16:39:17 +01:00
+								     - The API will be available at http://localhost:8880
 								     - The UI can be accessed at http://localhost:7860
-												Update the readme

											
										
										
											2025-01-14 16:30:06 +01:00
-												Update README.md
											
										
										
											2025-01-15 03:06:17 -07:00
+								  __Or__ running the API alone using Docker (model + voice packs baked in) (Most Recent):
 								  ```bash
 								  docker run -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-cpu:v0.1.0post1 # CPU
 								  docker run --gpus all -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-gpu:v0.1.0post1 # Nvidia GPU
 								  ```
-												ci:  Docker workflow to build and slim images, update README for usage instructions

											
										
										
											2025-01-14 01:27:10 -07:00
-												Update the readme

											
										
										
											2025-01-14 16:30:06 +01:00
+. Run locally as an OpenAI-Compatible Speech Endpoint
-												Add Gradio web interface + tests

											
										
										
											2025-01-01 21:50:00 -07:00
+								    ```python
 								    from openai import OpenAI
 								    client = OpenAI(
-												Update README.md

Fixes the open ai code to ensure it points to /v1
											
										
										
											2025-01-08 10:30:35 -05:00
+								        base_url="http://localhost:8880/v1",
-												Add Gradio web interface + tests

											
										
										
											2025-01-01 21:50:00 -07:00
+								        api_key="not-needed"
 								        )
-												Update README.md with new local endpoint usage example

The old method will result in a deprecation warning, this is the recommended method via openAI.

```python
    @deprecated(
        "Due to a bug, this method doesn't actually stream the response content, `.with_streaming_response.method()` should be used instead"
    )
    def stream_to_file(
...
```
											
										
										
											2025-01-14 15:53:39 +01:00
+								    with client.audio.speech.with_streaming_response.create(
-												Add Gradio web interface + tests

											
										
										
											2025-01-01 21:50:00 -07:00
+								        model="kokoro",
-												- Added support for combining voices via any endpoint
- Updated the `process_voices` function to handle both string and list formats for voice input.

											
										
										
											2025-01-07 03:50:08 -07:00
+								        voice="af_sky+af_bella", #single or multiple voicepack combo
-												Add Gradio web interface + tests

											
										
										
											2025-01-01 21:50:00 -07:00
+								        input="Hello world!",
 								        response_format="mp3"
-												Update README.md with new local endpoint usage example

The old method will result in a deprecation warning, this is the recommended method via openAI.

```python
    @deprecated(
        "Due to a bug, this method doesn't actually stream the response content, `.with_streaming_response.method()` should be used instead"
    )
    def stream_to_file(
...
```
											
										
										
											2025-01-14 15:53:39 +01:00
+								    ) as response:
 								        response.stream_to_file("output.mp3")
-												Add Gradio web interface + tests

											
										
										
											2025-01-01 21:50:00 -07:00
+								    ```
 								    or visit http://localhost:7860
 								    <p align="center">
 								    <img src="ui\GradioScreenShot.png" width="80%" alt="Voice Analysis Comparison" style="border: 2px solid #333; padding: 10px;">
 								    </p>
-												added output audio tests, validation

											
										
										
											2025-01-02 15:36:53 -07:00
 								## Features
-												Add Gradio web interface + tests

											
										
										
											2025-01-01 21:50:00 -07:00
+								<details>
-												added output audio tests, validation

											
										
										
											2025-01-02 15:36:53 -07:00
+								<summary>OpenAI-Compatible Speech Endpoint</summary>
-												Update README with performance benchmarks and usage examples; add benchmark plotting script

											
										
										
											2024-12-30 04:53:29 -07:00
-												Refactor Docker setup to use a dedicated model-fetcher service and update schemas for additional voice support

											
										
										
											2024-12-31 03:41:45 -07:00
+								```python
-												- modified voice loading to copy on init
- adjustments to the combine voices functionality
- error handling and analysis

											
										
										
											2024-12-31 18:55:26 -07:00
+								# Using OpenAI's Python library
 								from openai import OpenAI
-												Update README.md

Fixes the open ai code to ensure it points to /v1
											
										
										
											2025-01-08 10:30:35 -05:00
+								client = OpenAI(base_url="http://localhost:8880/v1", api_key="not-needed")
-												- modified voice loading to copy on init
- adjustments to the combine voices functionality
- error handling and analysis

											
										
										
											2024-12-31 18:55:26 -07:00
+								response = client.audio.speech.create(
 								    model="kokoro",  # Not used but required for compatibility, also accepts library defaults
-												- Added support for combining voices via any endpoint
- Updated the `process_voices` function to handle both string and list formats for voice input.

											
										
										
											2025-01-07 03:50:08 -07:00
+								    voice="af_bella+af_sky",
-												- modified voice loading to copy on init
- adjustments to the combine voices functionality
- error handling and analysis

											
										
										
											2024-12-31 18:55:26 -07:00
+								    input="Hello world!",
 								    response_format="mp3"
 								)
-												Update README with performance benchmarks and usage examples; add benchmark plotting script

											
										
										
											2024-12-30 04:53:29 -07:00
-												- modified voice loading to copy on init
- adjustments to the combine voices functionality
- error handling and analysis

											
										
										
											2024-12-31 18:55:26 -07:00
+								response.stream_to_file("output.mp3")
-												Refactor Docker setup to use a dedicated model-fetcher service and update schemas for additional voice support

											
										
										
											2024-12-31 03:41:45 -07:00
+								```
-												- modified voice loading to copy on init
- adjustments to the combine voices functionality
- error handling and analysis

											
										
										
											2024-12-31 18:55:26 -07:00
+								Or Via Requests:
-												Refactor Docker setup to use a dedicated model-fetcher service and update schemas for additional voice support

											
										
										
											2024-12-31 03:41:45 -07:00
+								```python
 								import requests
-												Enhance voice handling and add voice combination feature in TTS API

											
										
										
											2024-12-31 19:04:40 -07:00
 								response = requests.get("http://localhost:8880/v1/audio/voices")
-												- modified voice loading to copy on init
- adjustments to the combine voices functionality
- error handling and analysis

											
										
										
											2024-12-31 18:55:26 -07:00
+								voices = response.json()["voices"]
 								# Generate audio
-												Refactor Docker setup to use a dedicated model-fetcher service and update schemas for additional voice support

											
										
										
											2024-12-31 03:41:45 -07:00
+								response = requests.post(
-												Update README.md
											
										
										
											2024-12-31 14:49:43 -07:00
+								    "http://localhost:8880/v1/audio/speech",
-												Refactor Docker setup to use a dedicated model-fetcher service and update schemas for additional voice support

											
										
										
											2024-12-31 03:41:45 -07:00
+								    json={
 								        "model": "kokoro",  # Not used but required for compatibility
 								        "input": "Hello world!",
 								        "voice": "af_bella",
-												Update README and tests to clarify audio format support and enhance documentation

											
										
										
											2024-12-31 03:46:31 -07:00
+								        "response_format": "mp3",  # Supported: mp3, wav, opus, flac
-												Refactor Docker setup to use a dedicated model-fetcher service and update schemas for additional voice support

											
										
										
											2024-12-31 03:41:45 -07:00
+								        "speed": 1.0
 								    }
 								)
 								# Save audio
 								with open("output.mp3", "wb") as f:
 								    f.write(response.content)
 								```
-												Update README with performance benchmarks and usage examples; add benchmark plotting script

											
										
										
											2024-12-30 04:53:29 -07:00
-												Add Gradio web interface + tests

											
										
										
											2025-01-01 21:50:00 -07:00
+								Quick tests (run from another terminal):
 								```bash
-												Update README.md

Fixes the open ai code to ensure it points to /v1
											
										
										
											2025-01-08 10:30:35 -05:00
+								python examples/assorted_checks/test_openai/test_openai_tts.py # Test OpenAI Compatibility
 								python examples/assorted_checks/test_voices/test_all_voices.py # Test all available voices
-												Add Gradio web interface + tests

											
										
										
											2025-01-01 21:50:00 -07:00
+								```
 								</details>
 								<details>
-												added output audio tests, validation

											
										
										
											2025-01-02 15:36:53 -07:00
+								<summary>Voice Combination</summary>
 								- Averages model weights of any existing voicepacks
 								- Saves generated voicepacks for future use
-												- Added support for combining voices via any endpoint
- Updated the `process_voices` function to handle both string and list formats for voice input.

											
										
										
											2025-01-07 03:50:08 -07:00
+								- (new) Available through any endpoint, simply concatenate desired packs with "+"
-												Add initial implementation of Kokoro TTS API with Docker GPU support

- Set up FastAPI application with TTS service
- Define API endpoints for TTS generation and voice listing
- Implement Pydantic models for request and response schemas
- Add Dockerfile and docker-compose.yml for containerization
- Include example usage and benchmark results in README

											
										
										
											2024-12-30 04:17:50 -07:00
-												- modified voice loading to copy on init
- adjustments to the combine voices functionality
- error handling and analysis

											
										
										
											2024-12-31 18:55:26 -07:00
+								Combine voices and generate audio:
 								```python
 								import requests
-												Add Gradio web interface + tests

											
										
										
											2025-01-01 21:50:00 -07:00
+								response = requests.get("http://localhost:8880/v1/audio/voices")
 								voices = response.json()["voices"]
-												Add initial implementation of Kokoro TTS API with Docker GPU support

- Set up FastAPI application with TTS service
- Define API endpoints for TTS generation and voice listing
- Implement Pydantic models for request and response schemas
- Add Dockerfile and docker-compose.yml for containerization
- Include example usage and benchmark results in README

											
										
										
											2024-12-30 04:17:50 -07:00
-												Add Gradio web interface + tests

											
										
										
											2025-01-01 21:50:00 -07:00
+								# Create combined voice (saves locally on server)
-												- modified voice loading to copy on init
- adjustments to the combine voices functionality
- error handling and analysis

											
										
										
											2024-12-31 18:55:26 -07:00
+								response = requests.post(
 								    "http://localhost:8880/v1/audio/voices/combine",
-												Add Gradio web interface + tests

											
										
										
											2025-01-01 21:50:00 -07:00
+								    json=[voices[0], voices[1]]
-												Refactor Docker setup to use a dedicated model-fetcher service and update schemas for additional voice support

											
										
										
											2024-12-31 03:41:45 -07:00
+								)
-												- modified voice loading to copy on init
- adjustments to the combine voices functionality
- error handling and analysis

											
										
										
											2024-12-31 18:55:26 -07:00
+								combined_voice = response.json()["voice"]
-												Add initial implementation of Kokoro TTS API with Docker GPU support

- Set up FastAPI application with TTS service
- Define API endpoints for TTS generation and voice listing
- Implement Pydantic models for request and response schemas
- Add Dockerfile and docker-compose.yml for containerization
- Include example usage and benchmark results in README

											
										
										
											2024-12-30 04:17:50 -07:00
-												- Added support for combining voices via any endpoint
- Updated the `process_voices` function to handle both string and list formats for voice input.

											
										
										
											2025-01-07 03:50:08 -07:00
+								# Generate audio with combined voice (or, simply pass multiple directly with `+` )
-												- modified voice loading to copy on init
- adjustments to the combine voices functionality
- error handling and analysis

											
										
										
											2024-12-31 18:55:26 -07:00
+								response = requests.post(
 								    "http://localhost:8880/v1/audio/speech",
 								    json={
 								        "input": "Hello world!",
-												- Added support for combining voices via any endpoint
- Updated the `process_voices` function to handle both string and list formats for voice input.

											
										
										
											2025-01-07 03:50:08 -07:00
+								        "voice": combined_voice, # or skip the above step with f"{voices[0]}+{voices[1]}"
-												- modified voice loading to copy on init
- adjustments to the combine voices functionality
- error handling and analysis

											
										
										
											2024-12-31 18:55:26 -07:00
+								        "response_format": "mp3"
 								    }
 								)
-												Add initial implementation of Kokoro TTS API with Docker GPU support

- Set up FastAPI application with TTS service
- Define API endpoints for TTS generation and voice listing
- Implement Pydantic models for request and response schemas
- Add Dockerfile and docker-compose.yml for containerization
- Include example usage and benchmark results in README

											
										
										
											2024-12-30 04:17:50 -07:00
+								```
-												Add Gradio web interface + tests

											
										
										
											2025-01-01 21:50:00 -07:00
+								<p align="center">
-												-update soundfile version
-alignment with streaming standards
-audio processing config settings
-more comprehensive model warmup
-minor model improvements
-enhancing testing, benchmarking
-cool ascii logo

											
										
										
											2025-01-06 03:32:41 -07:00
+								  <img src="assets/voice_analysis.png" width="80%" alt="Voice Analysis Comparison" style="border: 2px solid #333; padding: 10px;">
-												Add Gradio web interface + tests

											
										
										
											2025-01-01 21:50:00 -07:00
+								</p>
 								</details>
 								<details>
-												added output audio tests, validation

											
										
										
											2025-01-02 15:36:53 -07:00
+								<summary>Multiple Output Audio Formats</summary>
 								- mp3
 								- wav
 								- opus
 								- flac
 								- aac
 								- pcm
 								<p align="center">
-												-update soundfile version
-alignment with streaming standards
-audio processing config settings
-more comprehensive model warmup
-minor model improvements
-enhancing testing, benchmarking
-cool ascii logo

											
										
										
											2025-01-06 03:32:41 -07:00
+								<img src="assets/format_comparison.png" width="80%" alt="Audio Format Comparison" style="border: 2px solid #333; padding: 10px;">
-												added output audio tests, validation

											
										
										
											2025-01-02 15:36:53 -07:00
+								</p>
 								</details>
 								<details>
 								<summary>Gradio Web Utility</summary>
-												Add Gradio web interface + tests

											
										
										
											2025-01-01 21:50:00 -07:00
 								Access the interactive web UI at http://localhost:7860 after starting the service. Features include:
 								- Voice/format/speed selection
 								- Audio playback and download
 								- Text file or direct input
-												Add initial implementation of Kokoro TTS API with Docker GPU support

- Set up FastAPI application with TTS service
- Define API endpoints for TTS generation and voice listing
- Implement Pydantic models for request and response schemas
- Add Dockerfile and docker-compose.yml for containerization
- Include example usage and benchmark results in README

											
										
										
											2024-12-30 04:17:50 -07:00
-												Add Gradio web interface + tests

											
										
										
											2025-01-01 21:50:00 -07:00
+								If you only want the API, just comment out everything in the docker-compose.yml under and including `gradio-ui`
 								Currently, voices created via the API are accessible here, but voice combination/creation has not yet been added
-												Update README.md
											
										
										
											2025-01-06 03:59:16 -07:00
-												Merge branch 'master' into update-UI-access
											
										
										
											2025-01-15 20:44:54 -07:00
+								Running the UI Docker Service
 								   - If you only want to run the Gradio web interface separately and connect it to an existing API service:
 								      ```bash
 								      docker run -p 7860:7860 \
 								        -e API_HOST=<api-hostname-or-ip> \
 								        -e API_PORT=8880 \
 								        ghcr.io/remsky/kokoro-fastapi-ui:v0.1.0
 								      ```
-												ci: update Docker workflow to tag images as latest, enhance README with local saving instructions, and modify UI components for local saving feature

											
										
										
											2025-01-14 02:19:13 -07:00
-												Merge branch 'master' into update-UI-access
											
										
										
											2025-01-15 20:44:54 -07:00
+								     - Replace `<api-hostname-or-ip>` with:
 								       - `kokoro-tts` if the UI container is running in the same Docker Compose setup.
 								       - `localhost` if the API is running on your local machine.
-												ci: update Docker workflow to tag images as latest, enhance README with local saving instructions, and modify UI components for local saving feature

											
										
										
											2025-01-14 02:19:13 -07:00
+								### Disabling Local Saving
 								You can disable local saving of audio files and hide the file view in the UI by setting the `DISABLE_LOCAL_SAVING` environment variable to `true`. This is useful when running the service on a server where you don't want to store generated audio files locally.
 								When using Docker Compose:
 								```yaml
 								environment:
 								  - DISABLE_LOCAL_SAVING=true
 								```
 								When running the Docker image directly:
 								```bash
 								docker run -p 7860:7860 -e DISABLE_LOCAL_SAVING=true ghcr.io/remsky/kokoro-fastapi-ui:latest
 								```
-												Add Gradio web interface + tests

											
										
										
											2025-01-01 21:50:00 -07:00
+								</details>
-												Update readme & analysis

											
										
										
											2025-01-06 03:49:31 -07:00
+								<details>
 								<summary>Streaming Support</summary>
 								```python
 								# OpenAI-compatible streaming
 								from openai import OpenAI
 								client = OpenAI(
 								    base_url="http://localhost:8880", api_key="not-needed")
 								# Stream to file
 								with client.audio.speech.with_streaming_response.create(
 								    model="kokoro",
 								    voice="af_bella",
 								    input="Hello world!"
 								) as response:
 								    response.stream_to_file("output.mp3")
 								# Stream to speakers (requires PyAudio)
 								import pyaudio
 								player = pyaudio.PyAudio().open(
 								    format=pyaudio.paInt16,
 								    channels=1,
 								    rate=24000,
 								    output=True
 								)
 								with client.audio.speech.with_streaming_response.create(
 								    model="kokoro",
 								    voice="af_bella",
 								    response_format="pcm",
 								    input="Hello world!"
 								) as response:
 								    for chunk in response.iter_bytes(chunk_size=1024):
 								        player.write(chunk)
 								```
 								Or via requests:
 								```python
 								import requests
 								response = requests.post(
 								    "http://localhost:8880/v1/audio/speech",
 								    json={
 								        "input": "Hello world!",
 								        "voice": "af_bella",
 								        "response_format": "pcm"
 								    },
 								    stream=True
 								)
 								for chunk in response.iter_content(chunk_size=1024):
 								    if chunk:
 								        # Process streaming chunks
 								        pass
 								```
 								<p align="center">
 								  <img src="assets/gpu_first_token_timeline_openai.png" width="45%" alt="GPU First Token Timeline" style="border: 2px solid #333; padding: 10px; margin-right: 1%;">
 								  <img src="assets/cpu_first_token_timeline_stream_openai.png" width="45%" alt="CPU First Token Timeline" style="border: 2px solid #333; padding: 10px;">
 								</p>
 								Key Streaming Metrics:
 								- First token latency @ chunksize
-												Update README.md
											
										
										
											2025-01-08 04:40:43 -07:00
+								    - ~300ms  (GPU) @ 400
 								    - ~3500ms (CPU) @ 200 (older i7)
 								    - ~<1s    (CPU) @ 200 (M3 Pro)
-												Update readme & analysis

											
										
										
											2025-01-06 03:49:31 -07:00
+								- Adjustable chunking settings for real-time playback
 								*Note: Artifacts in intonation can increase with smaller chunks*
 								</details>
-												added output audio tests, validation

											
										
										
											2025-01-02 15:36:53 -07:00
+								## Processing Details
-												Add Gradio web interface + tests

											
										
										
											2025-01-01 21:50:00 -07:00
+								<details>
-												added output audio tests, validation

											
										
										
											2025-01-02 15:36:53 -07:00
+								<summary>Performance Benchmarks</summary>
-												Add initial implementation of Kokoro TTS API with Docker GPU support

- Set up FastAPI application with TTS service
- Define API endpoints for TTS generation and voice listing
- Implement Pydantic models for request and response schemas
- Add Dockerfile and docker-compose.yml for containerization
- Include example usage and benchmark results in README

											
										
										
											2024-12-30 04:17:50 -07:00
-												Refactor Docker setup to use a dedicated model-fetcher service and update schemas for additional voice support

											
										
										
											2024-12-31 03:41:45 -07:00
+								Benchmarking was performed on generation via the local API using text lengths up to feature-length books (~1.5 hours output), measuring processing time and realtime factor. Tests were run on:
 								- Windows 11 Home w/ WSL2
 								- NVIDIA 4060Ti 16gb GPU @ CUDA 12.1
 								- 11th Gen i7-11700 @ 2.5GHz
 								- 64gb RAM
-												Update README and tests to clarify audio format support and enhance documentation

											
										
										
											2024-12-31 03:46:31 -07:00
+								- WAV native output
-												Refactor Docker setup to use a dedicated model-fetcher service and update schemas for additional voice support

											
										
										
											2024-12-31 03:41:45 -07:00
+								- H.G. Wells - The Time Machine (full text)
-												Add initial implementation of Kokoro TTS API with Docker GPU support

- Set up FastAPI application with TTS service
- Define API endpoints for TTS generation and voice listing
- Implement Pydantic models for request and response schemas
- Add Dockerfile and docker-compose.yml for containerization
- Include example usage and benchmark results in README

											
										
										
											2024-12-30 04:17:50 -07:00
-												Refactor Docker setup to use a dedicated model-fetcher service and update schemas for additional voice support

											
										
										
											2024-12-31 03:41:45 -07:00
+								<p align="center">
-												-update soundfile version
-alignment with streaming standards
-audio processing config settings
-more comprehensive model warmup
-minor model improvements
-enhancing testing, benchmarking
-cool ascii logo

											
										
										
											2025-01-06 03:32:41 -07:00
+								  <img src="assets/gpu_processing_time.png" width="45%" alt="Processing Time" style="border: 2px solid #333; padding: 10px; margin-right: 1%;">
 								  <img src="assets/gpu_realtime_factor.png" width="45%" alt="Realtime Factor" style="border: 2px solid #333; padding: 10px;">
-												Refactor Docker setup to use a dedicated model-fetcher service and update schemas for additional voice support

											
										
										
											2024-12-31 03:41:45 -07:00
+								</p>
-												Add initial implementation of Kokoro TTS API with Docker GPU support

- Set up FastAPI application with TTS service
- Define API endpoints for TTS generation and voice listing
- Implement Pydantic models for request and response schemas
- Add Dockerfile and docker-compose.yml for containerization
- Include example usage and benchmark results in README

											
										
										
											2024-12-30 04:17:50 -07:00
-												Refactor Docker setup to use a dedicated model-fetcher service and update schemas for additional voice support

											
										
										
											2024-12-31 03:41:45 -07:00
+								Key Performance Metrics:
-												Update README.md
											
										
										
											2025-01-06 03:58:08 -07:00
+								- Realtime Speed: Ranges between 25-50x (generation time to output audio length)
-												Update README.md
											
										
										
											2024-12-31 05:56:22 -07:00
+								- Average Processing Rate: 137.67 tokens/second (cl100k_base)
-												Add Gradio web interface + tests

											
										
										
											2025-01-01 21:50:00 -07:00
+								</details>
 								<details>
-												added output audio tests, validation

											
										
										
											2025-01-02 15:36:53 -07:00
+								<summary>GPU Vs. CPU</summary>
-												Add initial implementation of Kokoro TTS API with Docker GPU support

- Set up FastAPI application with TTS service
- Define API endpoints for TTS generation and voice listing
- Implement Pydantic models for request and response schemas
- Add Dockerfile and docker-compose.yml for containerization
- Include example usage and benchmark results in README

											
										
										
											2024-12-30 04:17:50 -07:00
-												Add Gradio web interface + tests

											
										
										
											2025-01-01 21:50:00 -07:00
+								```bash
-												Allow ONNX support optimizations for CPU inference and update benchmarking scripts; modify README for clarity on performance metrics

											
										
										
											2025-01-04 02:46:27 -07:00
+								# GPU: Requires NVIDIA GPU with CUDA 12.1 support (~35x realtime speed)
-												Add Gradio web interface + tests

											
										
										
											2025-01-01 21:50:00 -07:00
+								docker compose up --build
-												Allow ONNX support optimizations for CPU inference and update benchmarking scripts; modify README for clarity on performance metrics

											
										
										
											2025-01-04 02:46:27 -07:00
+								# CPU: ONNX optimized inference (~2.4x realtime speed)
-												Add Gradio web interface + tests

											
										
										
											2025-01-01 21:50:00 -07:00
+								docker compose -f docker-compose.cpu.yml up --build
 								```
-												Update README.md
											
										
										
											2025-01-06 03:58:08 -07:00
+								*Note: Overall speed may have reduced somewhat with the structural changes to accomodate streaming. Looking into it*
-												added output audio tests, validation

											
										
										
											2025-01-02 15:36:53 -07:00
+								</details>
-												Add Gradio web interface + tests

											
										
										
											2025-01-01 21:50:00 -07:00
-												added output audio tests, validation

											
										
										
											2025-01-02 15:36:53 -07:00
+								<details>
 								<summary>Natural Boundary Detection</summary>
-												Update README.md
											
										
										
											2024-12-31 04:29:48 -07:00
-												added output audio tests, validation

											
										
										
											2025-01-02 15:36:53 -07:00
+								- Automatically splits and stitches at sentence boundaries
 								- Helps to reduce artifacts and allow long form processing as the base model is only currently configured for approximately 30s output
-												Add Gradio web interface + tests

											
										
										
											2025-01-01 21:50:00 -07:00
+								</details>
-												Update README with performance benchmarks and usage examples; add benchmark plotting script

											
										
										
											2024-12-30 04:53:29 -07:00
-												- Added GenerateFromPhonemesRequest model to text_schemas.py
- Refactored TTS model initialization methods in tts_gpu.py and tts_cpu.py
- Added custom logger configuration in main.py
- Deprecated text_processing router -> development route

											
										
										
											2025-01-09 07:20:14 -07:00
+								<details>
 								<summary>Phoneme & Token Routes</summary>
 								Convert text to phonemes and/or generate audio directly from phonemes:
 								```python
 								import requests
 								# Convert text to phonemes
 								response = requests.post(
 								    "http://localhost:8880/dev/phonemize",
 								    json={
 								        "text": "Hello world!",
 								        "language": "a"  # "a" for American English
 								    }
 								)
 								result = response.json()
 								phonemes = result["phonemes"]  # Phoneme string e.g  ðɪs ɪz ˈoʊnli ɐ tˈɛst
 								tokens = result["tokens"]      # Token IDs including start/end tokens
 								# Generate audio from phonemes
 								response = requests.post(
 								    "http://localhost:8880/dev/generate_from_phonemes",
 								    json={
 								        "phonemes": phonemes,
 								        "voice": "af_bella",
 								        "speed": 1.0
 								    }
 								)
 								# Save WAV audio
 								with open("speech.wav", "wb") as f:
 								    f.write(response.content)
 								```
 								See `examples/phoneme_examples/generate_phonemes.py` for a sample script.
 								</details>
-												Update README.md
											
										
										
											2025-01-13 05:51:47 -07:00
+								## Known Issues
-												ci: enhance local saving feature, update voice selection to support multiple voices, and improve output filename generation

											
										
										
											2025-01-14 03:47:25 -07:00
+								<details>
 								<summary>Versioning & Development</summary>
 								I'm doing what I can to keep things stable, but we are on an early and rapid set of build cycles here.
 								If you run into trouble, you may have to roll back a version on the release tags if something comes up, or build up from source and/or troubleshoot + submit a PR. Will leave the branch up here for the last known stable points:
 								`v0.0.5post1`
 								Free and open source is a community effort, and I love working on this project, though there's only really so many hours in a day. If you'd like to support the work, feel free to open a PR, buy me a coffee, or report any bugs/features/etc you find during use.
 								  <a href="https://www.buymeacoffee.com/remsky" target="_blank">
 								    <img
 								      src="https://cdn.buymeacoffee.com/buttons/v2/default-violet.png"
 								      alt="Buy Me A Coffee"
 								      style="height: 30px !important;width: 110px !important;"
 								    >
 								  </a>
 								</details>
-												Update README.md
											
										
										
											2025-01-13 05:51:47 -07:00
+								<details>
 								<summary>Linux GPU Permissions</summary>
 								Some Linux users may encounter GPU permission issues when running as non-root.
 								Can't guarantee anything, but here are some common solutions, consider your security requirements carefully
 								### Option 1: Container Groups (Likely the best option)
 								```yaml
 								services:
 								  kokoro-tts:
 								    # ... existing config ...
 								    group_add:
 								      - "video"
 								      - "render"
 								```
 								### Option 2: Host System Groups
 								```yaml
 								services:
 								  kokoro-tts:
 								    # ... existing config ...
 								    user: "${UID}:${GID}"
 								    group_add:
 								      - "video"
 								```
 								Note: May require adding host user to groups: `sudo usermod -aG docker,video $USER` and system restart.
 								### Option 3: Device Permissions (Use with caution)
 								```yaml
 								services:
 								  kokoro-tts:
 								    # ... existing config ...
 								    devices:
 								      - /dev/nvidia0:/dev/nvidia0
 								      - /dev/nvidiactl:/dev/nvidiactl
 								      - /dev/nvidia-uvm:/dev/nvidia-uvm
 								```
 								⚠️ Warning: Reduces system security. Use only in development environments.
 								Prerequisites: NVIDIA GPU, drivers, and container toolkit must be properly configured.
 								Visit [NVIDIA Container Toolkit installation](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) for more detailed information
 								</details>
-												added output audio tests, validation

											
										
										
											2025-01-02 15:36:53 -07:00
+								## Model and License
-												Add Gradio web interface + tests

											
										
										
											2025-01-01 21:50:00 -07:00
+								<details open>
-												added output audio tests, validation

											
										
										
											2025-01-02 15:36:53 -07:00
+								<summary>Model</summary>
-												Update README with performance benchmarks and usage examples; add benchmark plotting script

											
										
										
											2024-12-30 04:53:29 -07:00
 								This API uses the [Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) model from HuggingFace.
 								Visit the model page for more details about training, architecture, and capabilities. I have no affiliation with any of their work, and produced this wrapper for ease of use and personal projects.
-												Add Gradio web interface + tests

											
										
										
											2025-01-01 21:50:00 -07:00
+								</details>
 								<details>
-												added output audio tests, validation

											
										
										
											2025-01-02 15:36:53 -07:00
+								<summary>License</summary>
-												Update README with performance benchmarks and usage examples; add benchmark plotting script

											
										
										
											2024-12-30 04:53:29 -07:00
+								This project is licensed under the Apache License 2.0 - see below for details:
 								- The Kokoro model weights are licensed under Apache 2.0 (see [model page](https://huggingface.co/hexgrad/Kokoro-82M))
 								- The FastAPI wrapper code in this repository is licensed under Apache 2.0 to match
 								- The inference code adapted from StyleTTS2 is MIT licensed
 								The full Apache 2.0 license text can be found at: https://www.apache.org/licenses/LICENSE-2.0
-												Add Gradio web interface + tests

											
										
										
											2025-01-01 21:50:00 -07:00
+								</details>