Commit graph

38 commits

Author SHA1 Message Date
remsky
f11a6b3e2b
Revert "Adds support for creating weighted voice combinations" 2025-02-09 22:41:42 -07:00
rvuyyuru2
44c62467ae Adds support for creating weighted voice combinations
Implements a new method to parse weighted voice formulas and generate combined audio outputs based on specified weights.

This enhancement allows for more diverse audio generation by letting users specify multiple voices with respective weights, improving flexibility in voice management.

Updates voice processing logic in relevant API routes to handle weighted formulas seamlessly.

Fixes #123 (if applicable, replace with the actual issue reference)
2025-01-25 20:54:21 +05:30
remsky
3547d95ee6 -unified streaming implementation 2025-01-25 05:25:13 -07:00
remsky
66f46e82f9 Refactor ONNX GPU backend and phoneme generation: improve token handling, add chunk processing for audio generation, and initial introduce stitch options for audio chunks. 2025-01-22 17:43:38 -07:00
remsky
d50214d3be Enable ONNX GPU support in Docker configurations and refactor model file handling 2025-01-22 05:00:38 -07:00
remsky
4a24be1605 Refactor model loading and configuration: update, adjust model loading device,. add async streaming examples and remove unused warmup service. 2025-01-22 02:33:29 -07:00
remsky
387653050b refactor: streamline audio normalization process and update tests 2025-01-13 18:56:49 -07:00
remsky
38e0b87320 Initial swap to UV dependency management 2025-01-11 20:00:34 -07:00
remsky
926ea8cecf Refactor Docker configurations and update test mocks for development routers 2025-01-10 22:03:16 -07:00
remsky
e8c1284032 Ruff format + fix 2025-01-09 18:41:44 -07:00
remsky
4b521f9bf0 - Added GenerateFromPhonemesRequest model to text_schemas.py
- Refactored TTS model initialization methods in tts_gpu.py and tts_cpu.py
- Added custom logger configuration in main.py
- Deprecated text_processing router -> development route
2025-01-09 07:20:14 -07:00
remsky
d7e8a5c953 Adjusting aiofiles implementation, testing 2025-01-07 04:30:02 -07:00
remsky
130b084cce - Added support for combining voices via any endpoint
- Updated the `process_voices` function to handle both string and list formats for voice input.
2025-01-07 03:50:08 -07:00
remsky
5199d4ca9a Update readme & analysis 2025-01-06 03:49:31 -07:00
remsky
720c1fb97d -update soundfile version
-alignment with streaming standards
-audio processing config settings
-more comprehensive model warmup
-minor model improvements
-enhancing testing, benchmarking
-cool ascii logo
2025-01-06 03:32:41 -07:00
remsky
4c6cd83f85 Swapped generator to preprocessing 2025-01-04 22:23:59 -07:00
remsky
0e9f77fc79 WIP: open ai compatible streaming 2025-01-04 17:55:36 -07:00
remsky
f1eb1d9590 First streaming attempt 2025-01-04 17:54:54 -07:00
remsky
76e8b07a92 Allow ONNX support optimizations for CPU inference and update benchmarking scripts; modify README for clarity on performance metrics 2025-01-04 02:46:27 -07:00
remsky
93aa205da9 Enhance ONNX optimization settings and add validation script for TTS audio files 2025-01-04 02:14:46 -07:00
remsky
7df2a68fb4 - CPU ONNX + PyTorch CUDA, functional
- Incorporated text processing module as service, towards modularization and optimizations
- Added text processing router for phonemization
- Enhanced benchmark statistics with real-time speed metrics
2025-01-03 17:54:17 -07:00
remsky
9496a3a63f WIP: CPU/GPU Functional, few straggling tests to fix and check. 2025-01-03 03:16:42 -07:00
remsky
e4d8e74738 WIP, Functional for CPU: Updated for ONNX runtime support, Dockerfile and TTS Service 2025-01-03 00:53:41 -07:00
remsky
40894449da added output audio tests, validation 2025-01-02 15:36:53 -07:00
remsky
f051984805 Ruff Check + Format 2025-01-01 21:50:41 -07:00
remsky
05e1e30c47 - modified voice loading to copy on init
- adjustments to the combine voices functionality
- error handling and analysis
2024-12-31 18:55:26 -07:00
Emmanuel Schmidbauer
510b01cc90 add ability to combine voices 2024-12-31 10:30:12 -05:00
remsky
f800c4ecf9 Added mp3 samples 2024-12-31 03:48:26 -07:00
remsky
607df6e03b Update README and tests to clarify audio format support and enhance documentation 2024-12-31 03:46:31 -07:00
remsky
36606f7234 Refactor Docker setup to use a dedicated model-fetcher service and update schemas for additional voice support 2024-12-31 03:41:45 -07:00
remsky
4123ab0891 Refactor TTS API and enhance testing setup with coverage and logging improvements 2024-12-31 02:55:51 -07:00
remsky
c11a6ea6ea Enhance TTS API with logging, voice pack loading, and schema updates 2024-12-31 01:57:00 -07:00
remsky
8ce8334345 - Complete TTS endpoint replacement with OpenAI compatible
-Removed output directory, and update configuration settings
- Added benchmarking for entire novel
2024-12-31 01:52:16 -07:00
Emmanuel Schmidbauer
f95e526a3f add speed 2024-12-30 13:39:35 -05:00
remsky
0fb36bb1b2 fix: update benchmark results for processing time and output length 2024-12-30 06:16:55 -07:00
remsky
79d5332c8a feat: enabled support for stitching long outputs in TTS requests 2024-12-30 06:16:18 -07:00
remsky
aa2df45858 Update README with performance benchmarks and usage examples; add benchmark plotting script 2024-12-30 04:53:29 -07:00
remsky
ce0ef3534a Add initial implementation of Kokoro TTS API with Docker GPU support
- Set up FastAPI application with TTS service
- Define API endpoints for TTS generation and voice listing
- Implement Pydantic models for request and response schemas
- Add Dockerfile and docker-compose.yml for containerization
- Include example usage and benchmark results in README
2024-12-30 04:17:50 -07:00