Kokoro-FastAPI

mirror of https://github.com/remsky/Kokoro-FastAPI.git synced 2025-08-05 16:48:53 +00:00

Author	SHA1	Message	Date
remsky	37ea01eaf9	fix: download_format option for audio response, handling in create_speech	2025-02-13 00:04:21 -07:00
Fireblade	7cb5957848	added optional pluralization normalization	2025-02-11 19:24:29 -05:00
Fireblade	09de389b29	Added normilization options	2025-02-11 19:09:35 -05:00
remsky	a91e0fe9df	Ruff check + formatting	2025-02-09 18:32:17 -07:00
remsky	a0dc870f4a	-fix voice selection not matching language phonemes -added voice language override parameter	2025-02-08 01:29:15 -07:00
remsky	6c234a3b67	Update dependencies, enhance voice management, and add captioned speech support	2025-02-04 19:41:41 -07:00
remsky	f61f79981d	-Add debug endpoint for system stats -Adjust headers, generate from phonemes, etc	2025-01-30 04:44:04 -07:00
remsky	946e322242	Implement temporary file management on openai endpoint, whole file downloads	2025-01-29 04:09:38 -07:00
remsky	ba577d348e	Enhance web player information, adjust text chunk size, update audio wave settings, and implement OpenAI model mappings	2025-01-23 04:11:31 -07:00
remsky	df4cc5b4b2	-Adjust testing framework for new model -Add web player support: include static file serving and HTML interface for TTS	2025-01-22 21:11:47 -07:00
remsky	66f46e82f9	Refactor ONNX GPU backend and phoneme generation: improve token handling, add chunk processing for audio generation, and initial introduce stitch options for audio chunks.	2025-01-22 17:43:38 -07:00
remsky	21bf810f97	Enhance model inference: update documentation, add model download scripts for PyTorch and ONNX, and refactor configuration handling	2025-01-21 21:44:21 -07:00
remsky	ab28a62e86	Refactor inference architecture: remove legacy TTS model, add ONNX and PyTorch backends, and introduce model configuration schemas	2025-01-20 22:42:29 -07:00
remsky	22752900e5	Ruff checks, ci fix	2025-01-13 20:15:46 -07:00
remsky	e8c1284032	Ruff format + fix	2025-01-09 18:41:44 -07:00
remsky	4b521f9bf0	- Added GenerateFromPhonemesRequest model to text_schemas.py - Refactored TTS model initialization methods in tts_gpu.py and tts_cpu.py - Added custom logger configuration in main.py - Deprecated text_processing router -> development route	2025-01-09 07:20:14 -07:00
remsky	130b084cce	- Added support for combining voices via any endpoint - Updated the `process_voices` function to handle both string and list formats for voice input.	2025-01-07 03:50:08 -07:00
remsky	4c6cd83f85	Swapped generator to preprocessing	2025-01-04 22:23:59 -07:00
remsky	f1eb1d9590	First streaming attempt	2025-01-04 17:54:54 -07:00
remsky	7df2a68fb4	- CPU ONNX + PyTorch CUDA, functional - Incorporated text processing module as service, towards modularization and optimizations - Added text processing router for phonemization - Enhanced benchmark statistics with real-time speed metrics	2025-01-03 17:54:17 -07:00
remsky	f051984805	Ruff Check + Format	2025-01-01 21:50:41 -07:00
remsky	05e1e30c47	- modified voice loading to copy on init - adjustments to the combine voices functionality - error handling and analysis	2024-12-31 18:55:26 -07:00
remsky	36606f7234	Refactor Docker setup to use a dedicated model-fetcher service and update schemas for additional voice support	2024-12-31 03:41:45 -07:00
remsky	4123ab0891	Refactor TTS API and enhance testing setup with coverage and logging improvements	2024-12-31 02:55:51 -07:00
remsky	c11a6ea6ea	Enhance TTS API with logging, voice pack loading, and schema updates	2024-12-31 01:57:00 -07:00
remsky	8ce8334345	- Complete TTS endpoint replacement with OpenAI compatible -Removed output directory, and update configuration settings - Added benchmarking for entire novel	2024-12-31 01:52:16 -07:00

26 commits