mirror of
https://github.com/remsky/Kokoro-FastAPI.git
synced 2025-04-13 09:39:17 +00:00
Merge branch 'master' of https://github.com/remsky/Kokoro-FastAPI
This commit is contained in:
commit
6134802d2c
1 changed files with 10 additions and 8 deletions
16
README.md
16
README.md
|
@ -7,16 +7,15 @@
|
||||||
[]()
|
[]()
|
||||||
[](https://huggingface.co/spaces/Remsky/Kokoro-TTS-Zero)
|
[](https://huggingface.co/spaces/Remsky/Kokoro-TTS-Zero)
|
||||||
|
|
||||||
[](https://huggingface.co/hexgrad/Kokoro-82M/commit/9901c2b79161b6e898b7ea857ae5298f47b8b0d6)
|
[](https://huggingface.co/hexgrad/Kokoro-82M/commit/9901c2b79161b6e898b7ea857ae5298f47b8b0d6)
|
||||||
[]()
|
[]()
|
||||||
[]()
|
[]()
|
||||||
|
|
||||||
Dockerized FastAPI wrapper for [Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) text-to-speech model
|
Dockerized FastAPI wrapper for [Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) text-to-speech model
|
||||||
- Multi-language support (English, Japanese, Korean, Chinese, Vietnamese)
|
- Multi-language support (English, Japanese, Korean, Chinese, Vietnamese)
|
||||||
- OpenAI-compatible Speech endpoint, NVIDIA GPU accelerated or CPU inference with PyTorch
|
- OpenAI-compatible Speech endpoint, NVIDIA GPU accelerated or CPU inference with PyTorch
|
||||||
- ONNX support coming soon, see v0.1.5 and earlier for legacy ONNX support in the interim
|
- ONNX support coming soon, see v0.1.5 and earlier for legacy ONNX support in the interim
|
||||||
- Debug endpoints for monitoring threads, storage, and session pools
|
- Debug endpoints for monitoring system stats, integrated web UI on localhost:8880/web
|
||||||
- Integrated web UI on localhost:8880/web
|
|
||||||
- Phoneme-based audio generation, phoneme generation
|
- Phoneme-based audio generation, phoneme generation
|
||||||
- (new) Per-word timestamped caption generation
|
- (new) Per-word timestamped caption generation
|
||||||
- (new) Voice mixing with weighted combinations
|
- (new) Voice mixing with weighted combinations
|
||||||
|
@ -113,8 +112,8 @@ with client.audio.speech.with_streaming_response.create(
|
||||||
- Web Interface: http://localhost:8880/web
|
- Web Interface: http://localhost:8880/web
|
||||||
|
|
||||||
<div align="center" style="display: flex; justify-content: center; gap: 10px;">
|
<div align="center" style="display: flex; justify-content: center; gap: 10px;">
|
||||||
<img src="assets/docs-screenshot.png" width="40%" alt="API Documentation" style="border: 2px solid #333; padding: 10px;">
|
<img src="assets/docs-screenshot.png" width="42%" alt="API Documentation" style="border: 2px solid #333; padding: 10px;">
|
||||||
<img src="assets/webui-screenshot.png" width="49%" alt="Web UI Screenshot" style="border: 2px solid #333; padding: 10px;">
|
<img src="assets/webui-screenshot.png" width="42%" alt="Web UI Screenshot" style="border: 2px solid #333; padding: 10px;">
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
</details>
|
</details>
|
||||||
|
@ -357,6 +356,9 @@ docker compose up --build
|
||||||
|
|
||||||
- Automatically splits and stitches at sentence boundaries
|
- Automatically splits and stitches at sentence boundaries
|
||||||
- Helps to reduce artifacts and allow long form processing as the base model is only currently configured for approximately 30s output
|
- Helps to reduce artifacts and allow long form processing as the base model is only currently configured for approximately 30s output
|
||||||
|
|
||||||
|
The model is capable of processing up to a 510 phonemized token chunk at a time, however, this can often lead to 'rushed' speech or other artifacts. An additional layer of chunking is applied in the server, that creates flexible chunks with a `TARGET_MIN_TOKENS` , `TARGET_MAX_TOKENS`, and `ABSOLUTE_MAX_TOKENS` which are configurable via environment variables, and set to 175, 250, 450 by default
|
||||||
|
|
||||||
</details>
|
</details>
|
||||||
|
|
||||||
<details>
|
<details>
|
||||||
|
|
Loading…
Add table
Reference in a new issue