diff --git a/README.md b/README.md index 3684948..faac9d8 100644 --- a/README.md +++ b/README.md @@ -11,12 +11,11 @@ Dockerized FastAPI wrapper for [Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) text-to-speech model - OpenAI-compatible Speech endpoint, with inline voice combination, and mapped naming/models for strict systems -- NVIDIA GPU accelerated or CPU inference (ONNX, Pytorch) +- NVIDIA GPU accelerated or CPU inference (ONNX or Pytorch for either) - very fast generation time - ~35x-100x+ real time speed via 4060Ti+ - ~5x+ real time speed via M3 Pro CPU -- streaming support & tempfile generation -- phoneme based dev endpoints +- streaming support & tempfile generation, phoneme based dev endpoints - (new) Integrated web UI on localhost:8880/web - (new) Debug endpoints for monitoring threads, storage, and session pools @@ -36,14 +35,6 @@ docker run -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-cpu:v0.1.4 # CPU, or: docker run --gpus all -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-gpu:v0.1.4 #NVIDIA GPU ``` -Once running, access: -- API Documentation: http://localhost:8880/docs -- Web Interface: http://localhost:8880/web - -
- API Documentation - Web UI Screenshot -
@@ -53,7 +44,6 @@ Once running, access: 1. Install prerequisites, and start the service using Docker Compose (Full setup including UI): - Install [Docker](https://www.docker.com/products/docker-desktop/) - - - Clone the repository: ```bash git clone https://github.com/remsky/Kokoro-FastAPI.git @@ -72,31 +62,7 @@ Once running, access: ./start-cpu.sh ./start-gpu.sh ``` - - Once started: - - The API will be available at http://localhost:8880 - - The *Web UI* can be tested at http://localhost:8880/web - - The Gradio UI (deprecating) can be accessed at http://localhost:7860 - -2. Run locally as an OpenAI-Compatible Speech Endpoint - ```python - from openai import OpenAI - client = OpenAI( - base_url="http://localhost:8880/v1", - api_key="not-needed" - ) - - with client.audio.speech.with_streaming_response.create( - model="kokoro", - voice="af_sky+af_bella", #single or multiple voicepack combo - input="Hello world!", - response_format="mp3" - ) as response: - response.stream_to_file("output.mp3") - - ``` -
Direct Run (via uv) @@ -118,28 +84,40 @@ Once running, access: ./start-gpu.sh ``` - Once started: - - The API will be available at http://localhost:8880 - - The *Web UI* can be tested at http://localhost:8880/web - - The Gradio UI (deprecating) can be accessed at http://localhost:7860 +
-2. Run locally as an OpenAI-Compatible Speech Endpoint - ```python - from openai import OpenAI - client = OpenAI( - base_url="http://localhost:8880/v1", - api_key="not-needed" - ) +
+ Up and Running? - with client.audio.speech.with_streaming_response.create( - model="kokoro", - voice="af_sky+af_bella", #single or multiple voicepack combo - input="Hello world!", - response_format="mp3" - ) as response: - response.stream_to_file("output.mp3") + +Run locally as an OpenAI-Compatible Speech Endpoint - ``` +```python +from openai import OpenAI + +client = OpenAI( + base_url="http://localhost:8880/v1", api_key="not-needed" +) + +with client.audio.speech.with_streaming_response.create( + model="kokoro", + voice="af_sky+af_bella", #single or multiple voicepack combo + input="Hello world!" + ) as response: + response.stream_to_file("output.mp3") +``` + +- The API will be available at http://localhost:8880 +- API Documentation: http://localhost:8880/docs + +- Web Interface: http://localhost:8880/web +- Gradio UI (deprecating) can be accessed at http://localhost:7860 if enabled in docker compose file (it is a separate image!) + +
+ API Documentation + Web UI Screenshot +
+
## Features