diff --git a/README.md b/README.md
index 3684948..faac9d8 100644
--- a/README.md
+++ b/README.md
@@ -11,12 +11,11 @@
Dockerized FastAPI wrapper for [Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) text-to-speech model
- OpenAI-compatible Speech endpoint, with inline voice combination, and mapped naming/models for strict systems
-- NVIDIA GPU accelerated or CPU inference (ONNX, Pytorch)
+- NVIDIA GPU accelerated or CPU inference (ONNX or Pytorch for either)
- very fast generation time
- ~35x-100x+ real time speed via 4060Ti+
- ~5x+ real time speed via M3 Pro CPU
-- streaming support & tempfile generation
-- phoneme based dev endpoints
+- streaming support & tempfile generation, phoneme based dev endpoints
- (new) Integrated web UI on localhost:8880/web
- (new) Debug endpoints for monitoring threads, storage, and session pools
@@ -36,14 +35,6 @@ docker run -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-cpu:v0.1.4 # CPU, or:
docker run --gpus all -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-gpu:v0.1.4 #NVIDIA GPU
```
-Once running, access:
-- API Documentation: http://localhost:8880/docs
-- Web Interface: http://localhost:8880/web
-
-
-

-

-
@@ -53,7 +44,6 @@ Once running, access:
1. Install prerequisites, and start the service using Docker Compose (Full setup including UI):
- Install [Docker](https://www.docker.com/products/docker-desktop/)
- -
- Clone the repository:
```bash
git clone https://github.com/remsky/Kokoro-FastAPI.git
@@ -72,31 +62,7 @@ Once running, access:
./start-cpu.sh
./start-gpu.sh
```
-
- Once started:
- - The API will be available at http://localhost:8880
- - The *Web UI* can be tested at http://localhost:8880/web
- - The Gradio UI (deprecating) can be accessed at http://localhost:7860
-
-2. Run locally as an OpenAI-Compatible Speech Endpoint
- ```python
- from openai import OpenAI
- client = OpenAI(
- base_url="http://localhost:8880/v1",
- api_key="not-needed"
- )
-
- with client.audio.speech.with_streaming_response.create(
- model="kokoro",
- voice="af_sky+af_bella", #single or multiple voicepack combo
- input="Hello world!",
- response_format="mp3"
- ) as response:
- response.stream_to_file("output.mp3")
-
- ```
-
Direct Run (via uv)
@@ -118,28 +84,40 @@ Once running, access:
./start-gpu.sh
```
- Once started:
- - The API will be available at http://localhost:8880
- - The *Web UI* can be tested at http://localhost:8880/web
- - The Gradio UI (deprecating) can be accessed at http://localhost:7860
+
-2. Run locally as an OpenAI-Compatible Speech Endpoint
- ```python
- from openai import OpenAI
- client = OpenAI(
- base_url="http://localhost:8880/v1",
- api_key="not-needed"
- )
+
+ Up and Running?
- with client.audio.speech.with_streaming_response.create(
- model="kokoro",
- voice="af_sky+af_bella", #single or multiple voicepack combo
- input="Hello world!",
- response_format="mp3"
- ) as response:
- response.stream_to_file("output.mp3")
+
+Run locally as an OpenAI-Compatible Speech Endpoint
- ```
+```python
+from openai import OpenAI
+
+client = OpenAI(
+ base_url="http://localhost:8880/v1", api_key="not-needed"
+)
+
+with client.audio.speech.with_streaming_response.create(
+ model="kokoro",
+ voice="af_sky+af_bella", #single or multiple voicepack combo
+ input="Hello world!"
+ ) as response:
+ response.stream_to_file("output.mp3")
+```
+
+- The API will be available at http://localhost:8880
+- API Documentation: http://localhost:8880/docs
+
+- Web Interface: http://localhost:8880/web
+- Gradio UI (deprecating) can be accessed at http://localhost:7860 if enabled in docker compose file (it is a separate image!)
+
+
+

+

+
+
## Features