remsky
f11a6b3e2b
Revert "Adds support for creating weighted voice combinations"
2025-02-09 22:41:42 -07:00
rvuyyuru2
44c62467ae
Adds support for creating weighted voice combinations
...
Implements a new method to parse weighted voice formulas and generate combined audio outputs based on specified weights.
This enhancement allows for more diverse audio generation by letting users specify multiple voices with respective weights, improving flexibility in voice management.
Updates voice processing logic in relevant API routes to handle weighted formulas seamlessly.
Fixes #123 (if applicable, replace with the actual issue reference)
2025-01-25 20:54:21 +05:30
remsky
3547d95ee6
-unified streaming implementation
2025-01-25 05:25:13 -07:00
remsky
66f46e82f9
Refactor ONNX GPU backend and phoneme generation: improve token handling, add chunk processing for audio generation, and initial introduce stitch options for audio chunks.
2025-01-22 17:43:38 -07:00
remsky
d50214d3be
Enable ONNX GPU support in Docker configurations and refactor model file handling
2025-01-22 05:00:38 -07:00
remsky
4a24be1605
Refactor model loading and configuration: update, adjust model loading device,. add async streaming examples and remove unused warmup service.
2025-01-22 02:33:29 -07:00
remsky
387653050b
refactor: streamline audio normalization process and update tests
2025-01-13 18:56:49 -07:00
remsky
38e0b87320
Initial swap to UV dependency management
2025-01-11 20:00:34 -07:00
remsky
926ea8cecf
Refactor Docker configurations and update test mocks for development routers
2025-01-10 22:03:16 -07:00
remsky
e8c1284032
Ruff format + fix
2025-01-09 18:41:44 -07:00
remsky
4b521f9bf0
- Added GenerateFromPhonemesRequest model to text_schemas.py
...
- Refactored TTS model initialization methods in tts_gpu.py and tts_cpu.py
- Added custom logger configuration in main.py
- Deprecated text_processing router -> development route
2025-01-09 07:20:14 -07:00
remsky
d7e8a5c953
Adjusting aiofiles implementation, testing
2025-01-07 04:30:02 -07:00
remsky
130b084cce
- Added support for combining voices via any endpoint
...
- Updated the `process_voices` function to handle both string and list formats for voice input.
2025-01-07 03:50:08 -07:00
remsky
5199d4ca9a
Update readme & analysis
2025-01-06 03:49:31 -07:00
remsky
720c1fb97d
-update soundfile version
...
-alignment with streaming standards
-audio processing config settings
-more comprehensive model warmup
-minor model improvements
-enhancing testing, benchmarking
-cool ascii logo
2025-01-06 03:32:41 -07:00
remsky
4c6cd83f85
Swapped generator to preprocessing
2025-01-04 22:23:59 -07:00
remsky
0e9f77fc79
WIP: open ai compatible streaming
2025-01-04 17:55:36 -07:00
remsky
f1eb1d9590
First streaming attempt
2025-01-04 17:54:54 -07:00
remsky
76e8b07a92
Allow ONNX support optimizations for CPU inference and update benchmarking scripts; modify README for clarity on performance metrics
2025-01-04 02:46:27 -07:00
remsky
93aa205da9
Enhance ONNX optimization settings and add validation script for TTS audio files
2025-01-04 02:14:46 -07:00
remsky
7df2a68fb4
- CPU ONNX + PyTorch CUDA, functional
...
- Incorporated text processing module as service, towards modularization and optimizations
- Added text processing router for phonemization
- Enhanced benchmark statistics with real-time speed metrics
2025-01-03 17:54:17 -07:00
remsky
9496a3a63f
WIP: CPU/GPU Functional, few straggling tests to fix and check.
2025-01-03 03:16:42 -07:00
remsky
e4d8e74738
WIP, Functional for CPU: Updated for ONNX runtime support, Dockerfile and TTS Service
2025-01-03 00:53:41 -07:00
remsky
40894449da
added output audio tests, validation
2025-01-02 15:36:53 -07:00
remsky
f051984805
Ruff Check + Format
2025-01-01 21:50:41 -07:00
remsky
05e1e30c47
- modified voice loading to copy on init
...
- adjustments to the combine voices functionality
- error handling and analysis
2024-12-31 18:55:26 -07:00
Emmanuel Schmidbauer
510b01cc90
add ability to combine voices
2024-12-31 10:30:12 -05:00
remsky
f800c4ecf9
Added mp3 samples
2024-12-31 03:48:26 -07:00
remsky
607df6e03b
Update README and tests to clarify audio format support and enhance documentation
2024-12-31 03:46:31 -07:00
remsky
36606f7234
Refactor Docker setup to use a dedicated model-fetcher service and update schemas for additional voice support
2024-12-31 03:41:45 -07:00
remsky
4123ab0891
Refactor TTS API and enhance testing setup with coverage and logging improvements
2024-12-31 02:55:51 -07:00
remsky
c11a6ea6ea
Enhance TTS API with logging, voice pack loading, and schema updates
2024-12-31 01:57:00 -07:00
remsky
8ce8334345
- Complete TTS endpoint replacement with OpenAI compatible
...
-Removed output directory, and update configuration settings
- Added benchmarking for entire novel
2024-12-31 01:52:16 -07:00
Emmanuel Schmidbauer
f95e526a3f
add speed
2024-12-30 13:39:35 -05:00
remsky
0fb36bb1b2
fix: update benchmark results for processing time and output length
2024-12-30 06:16:55 -07:00
remsky
79d5332c8a
feat: enabled support for stitching long outputs in TTS requests
2024-12-30 06:16:18 -07:00
remsky
aa2df45858
Update README with performance benchmarks and usage examples; add benchmark plotting script
2024-12-30 04:53:29 -07:00
remsky
ce0ef3534a
Add initial implementation of Kokoro TTS API with Docker GPU support
...
- Set up FastAPI application with TTS service
- Define API endpoints for TTS generation and voice listing
- Implement Pydantic models for request and response schemas
- Add Dockerfile and docker-compose.yml for containerization
- Include example usage and benchmark results in README
2024-12-30 04:17:50 -07:00