Kokoro-FastAPI

mirror of https://github.com/remsky/Kokoro-FastAPI.git synced 2025-08-05 16:48:53 +00:00

Author	SHA1	Message	Date
remsky	f11a6b3e2b	Revert "Adds support for creating weighted voice combinations"	2025-02-09 22:41:42 -07:00
remsky	d5709097e2	Merge pull request #92 from rvuyyuru2/v0.1.2-pre Adds support for creating weighted voice combinations (reimplemented in v0.2.0)	2025-02-09 22:37:16 -07:00
remsky	00497f8872	Refactor: Consolidate PyTorch CPU and GPU backends into a single PyTorchBackend class; remove obsolete files	2025-01-25 13:33:42 -07:00
rvuyyuru2	44c62467ae	Adds support for creating weighted voice combinations Implements a new method to parse weighted voice formulas and generate combined audio outputs based on specified weights. This enhancement allows for more diverse audio generation by letting users specify multiple voices with respective weights, improving flexibility in voice management. Updates voice processing logic in relevant API routes to handle weighted formulas seamlessly. Fixes #123 (if applicable, replace with the actual issue reference)	2025-01-25 20:54:21 +05:30
remsky	3547d95ee6	-unified streaming implementation	2025-01-25 05:25:13 -07:00
remsky	9efb9db4d9	Fix: VoiceManager singleton instantiation	2025-01-24 05:30:56 -07:00
remsky	20658f9759	Performance: Adjust session timeout and GPU memory limit; minim voice pre-caching and improve singleton instance management	2025-01-24 05:01:38 -07:00
remsky	ee1f7cde18	Add async audio processing and semantic chunking support; flattened static audio trimming	2025-01-24 04:06:47 -07:00
remsky	8eb3525382	Refactor configuration and enhance web interface: update GPU settings, add speed control, and improve input handling for audio generation	2025-01-23 04:54:55 -07:00
remsky	ba577d348e	Enhance web player information, adjust text chunk size, update audio wave settings, and implement OpenAI model mappings	2025-01-23 04:11:31 -07:00
remsky	8e8f120a3e	Update configuration to disable local voice saving, enhance voice validation logic, and remove deprecated test file	2025-01-23 02:00:46 -07:00
remsky	df4cc5b4b2	-Adjust testing framework for new model -Add web player support: include static file serving and HTML interface for TTS	2025-01-22 21:11:47 -07:00
remsky	66f46e82f9	Refactor ONNX GPU backend and phoneme generation: improve token handling, add chunk processing for audio generation, and initial introduce stitch options for audio chunks.	2025-01-22 17:43:38 -07:00
remsky	d50214d3be	Enable ONNX GPU support in Docker configurations and refactor model file handling	2025-01-22 05:00:38 -07:00
remsky	4a24be1605	Refactor model loading and configuration: update, adjust model loading device,. add async streaming examples and remove unused warmup service.	2025-01-22 02:33:29 -07:00
remsky	21bf810f97	Enhance model inference: update documentation, add model download scripts for PyTorch and ONNX, and refactor configuration handling	2025-01-21 21:44:21 -07:00
remsky	ab28a62e86	Refactor inference architecture: remove legacy TTS model, add ONNX and PyTorch backends, and introduce model configuration schemas	2025-01-20 22:42:29 -07:00
Richard Roberson	d51d861861	add AAC audio format and test	2025-01-17 21:43:10 -07:00
Fireblade2534	eb556ec7d3	Fixed python tests so they run properly and cleaned up some unneeded files	2025-01-17 14:55:25 +00:00
remsky	d20da2f92e	Default hexxgrad voicepacks added as temporary fix	2025-01-15 09:42:27 +00:00
remsky	8bc8661930	fix: update model directory paths and improve logging in TTS services	2025-01-14 06:37:03 -07:00
remsky	cf72e4ed2b	Add interruptible streams	2025-01-13 23:25:06 -07:00
remsky	064313450e	fix: test of cicd	2025-01-13 20:18:02 -07:00
remsky	22752900e5	Ruff checks, ci fix	2025-01-13 20:15:46 -07:00
remsky	007b1a35e8	feat: merge master into core/uv-management for v0.1.0 Major changes: - Baked model directly into Dockerfile for improved deployment - Switched to uv for dependency management - Restructured Docker files into docker/cpu and docker/gpu directories - Updated configuration for better ONNX performance	2025-01-13 19:31:44 -07:00
remsky	387653050b	refactor: streamline audio normalization process and update tests	2025-01-13 18:56:49 -07:00
remsky	f4dc292440	fix: ui stability, memory safeties	2025-01-12 21:33:23 -07:00
remsky	3d0ca2a8c2	Update Dockerfiles for baked in models, adjustments to cpu/gpu environment splits	2025-01-12 05:23:02 -07:00
remsky	926ea8cecf	Refactor Docker configurations and update test mocks for development routers	2025-01-10 22:03:16 -07:00
remsky	e8c1284032	Ruff format + fix	2025-01-09 18:41:44 -07:00
remsky	4b521f9bf0	- Added GenerateFromPhonemesRequest model to text_schemas.py - Refactored TTS model initialization methods in tts_gpu.py and tts_cpu.py - Added custom logger configuration in main.py - Deprecated text_processing router -> development route	2025-01-09 07:20:14 -07:00
remsky	bd4df84410	Merge pull request #12 from fireblade2534/master Gave it a test, didn't see any issues 👍	2025-01-09 01:30:07 -07:00
Fireblade	1f22cda9be	Fix remaining slashes not being converted into text and made % be converted	2025-01-08 08:50:22 -05:00
remsky	a0a85f5ef0	-add email handling, minor additional URL processing, tests	2025-01-08 03:13:17 -07:00
remsky	e7ffcf49f5	fixed: async scandir finding voices	2025-01-07 21:36:07 -07:00
Fireblade	1625082724	Fix url parsing for urls without https, http, or www. It also allows raw ips, ports, and dashs	2025-01-07 19:34:38 -05:00
remsky	d7e8a5c953	Adjusting aiofiles implementation, testing	2025-01-07 04:30:02 -07:00
remsky	130b084cce	- Added support for combining voices via any endpoint - Updated the `process_voices` function to handle both string and list formats for voice input.	2025-01-07 03:50:08 -07:00
remsky	fddf26c905	Added tested, slight changes to regex	2025-01-07 00:18:44 -07:00
Fireblade	db2f3dd323	Made urls readable	2025-01-06 19:40:21 -05:00
remsky	720c1fb97d	-update soundfile version -alignment with streaming standards -audio processing config settings -more comprehensive model warmup -minor model improvements -enhancing testing, benchmarking -cool ascii logo	2025-01-06 03:32:41 -07:00
remsky	4c6cd83f85	Swapped generator to preprocessing	2025-01-04 22:23:59 -07:00
remsky	e799f0c7c1	WIP: basic tests on OpenAI streaming compatibility	2025-01-04 18:09:23 -07:00
remsky	0e9f77fc79	WIP: open ai compatible streaming	2025-01-04 17:55:36 -07:00
remsky	f1eb1d9590	First streaming attempt	2025-01-04 17:54:54 -07:00
remsky	93aa205da9	Enhance ONNX optimization settings and add validation script for TTS audio files	2025-01-04 02:14:46 -07:00
remsky	7df2a68fb4	- CPU ONNX + PyTorch CUDA, functional - Incorporated text processing module as service, towards modularization and optimizations - Added text processing router for phonemization - Enhanced benchmark statistics with real-time speed metrics	2025-01-03 17:54:17 -07:00
remsky	9496a3a63f	WIP: CPU/GPU Functional, few straggling tests to fix and check.	2025-01-03 03:16:42 -07:00
remsky	e4d8e74738	WIP, Functional for CPU: Updated for ONNX runtime support, Dockerfile and TTS Service	2025-01-03 00:53:41 -07:00
remsky	40894449da	added output audio tests, validation	2025-01-02 15:36:53 -07:00

1 2

66 commits