2025-01-01 17:38:22 -07:00
|
|
|
# Changelog
|
|
|
|
|
|
|
|
Notable changes to this project will be documented in this file.
|
|
|
|
|
2025-02-07 17:08:10 -07:00
|
|
|
## [v0.2.0post1] - 2025-02-07
|
|
|
|
- Fix: Building Kokoro from source with adjustments, to avoid CUDA lock
|
|
|
|
- Fixed ARM64 compatibility on Spacy dep to avoid emulation slowdown
|
|
|
|
- Added g++ for Japanese language support
|
|
|
|
- Temporarily disabled Vietnamese language support due to ARM64 compatibility issues
|
|
|
|
|
2025-02-06 01:22:21 -07:00
|
|
|
## [v0.2.0-pre] - 2025-02-06
|
|
|
|
### Added
|
|
|
|
- Complete Model Overhaul:
|
|
|
|
- Upgraded to Kokoro v1.0 model architecture
|
|
|
|
- Pre-installed multi-language support from Misaki:
|
|
|
|
- English (en), Japanese (ja), Korean (ko),Chinese (zh), Vietnamese (vi)
|
|
|
|
- All voice packs included for supported languages, along with the original versions.
|
|
|
|
- Enhanced Audio Generation Features:
|
|
|
|
- Per-word timestamped caption generation
|
|
|
|
- Phoneme-based audio generation capabilities
|
|
|
|
- Detailed phoneme generation
|
|
|
|
- Web UI Improvements:
|
|
|
|
- Improved voice mixing with weighted combinations
|
|
|
|
- Text file upload support
|
|
|
|
- Enhanced formatting and user interface
|
|
|
|
- Cleaner UI (in progress)
|
|
|
|
- Integration with https://github.com/hexgrad/kokoro and https://github.com/hexgrad/misaki packages
|
|
|
|
|
|
|
|
### Removed
|
|
|
|
- Deprecated support for Kokoro v0.19 model
|
|
|
|
|
|
|
|
### Changes
|
|
|
|
- Combine Voices endpoint now returns a .pt file, with generation combinations generated on the fly otherwise
|
|
|
|
|
|
|
|
|
2025-01-30 05:47:28 -07:00
|
|
|
## [v0.1.4] - 2025-01-30
|
|
|
|
### Added
|
|
|
|
- Smart Chunking System:
|
|
|
|
- New text_processor with smart_split for improved sentence boundary detection
|
|
|
|
- Dynamically adjusts chunk sizes based on sentence structure, using phoneme/token information in an intial pass
|
|
|
|
- Should avoid ever going over the 510 limit per chunk, while preserving natural cadence
|
|
|
|
- Web UI Added (To Be Replacing Gradio):
|
|
|
|
- Integrated streaming with tempfile generation
|
|
|
|
- Download links available in X-Download-Path header
|
|
|
|
- Configurable cleanup triggers for temp files
|
|
|
|
- Debug Endpoints:
|
|
|
|
- /debug/threads for thread information and stack traces
|
|
|
|
- /debug/storage for temp file and output directory monitoring
|
|
|
|
- /debug/system for system resource information
|
|
|
|
- /debug/session_pools for ONNX/CUDA session status
|
|
|
|
- Automated Model Management:
|
|
|
|
- Auto-download from releases page
|
|
|
|
- Included download scripts for manual installation
|
|
|
|
- Pre-packaged voice models in repository
|
|
|
|
|
|
|
|
### Changed
|
|
|
|
- Significant architectural improvements:
|
|
|
|
- Multi-model architecture support
|
|
|
|
- Enhanced concurrency handling
|
|
|
|
- Improved streaming header management
|
|
|
|
- Better resource/session pool management
|
|
|
|
|
|
|
|
|
2025-01-23 04:11:31 -07:00
|
|
|
## [v0.1.2] - 2025-01-23
|
|
|
|
### Structural Improvements
|
|
|
|
- Models can be manually download and placed in api/src/models, or use included script
|
|
|
|
- TTSGPU/TPSCPU/STTSService classes replaced with a ModelManager service
|
|
|
|
- CPU/GPU of each of ONNX/PyTorch (Note: Only Pytorch GPU, and ONNX CPU/GPU have been tested)
|
|
|
|
- Should be able to improve new models as they become available, or new architectures, in a more modular way
|
|
|
|
- Converted a number of internal processes to async handling to improve concurrency
|
|
|
|
- Improving separation of concerns towards plug-in and modular structure, making PR's and new features easier
|
|
|
|
|
|
|
|
### Web UI (test release)
|
|
|
|
- An integrated simple web UI has been added on the FastAPI server directly
|
|
|
|
- This can be disabled via core/config.py or ENV variables if desired.
|
|
|
|
- Simplifies deployments, utility testing, aesthetics, etc
|
|
|
|
- Looking to deprecate/collaborate/hand off the Gradio UI
|
|
|
|
|
|
|
|
|
2025-01-13 19:31:44 -07:00
|
|
|
## [v0.1.0] - 2025-01-13
|
|
|
|
### Changed
|
|
|
|
- Major Docker improvements:
|
|
|
|
- Baked model directly into Dockerfile for improved deployment reliability
|
|
|
|
- Switched to uv for dependency management
|
|
|
|
- Streamlined container builds and reduced image sizes
|
|
|
|
- Dependency Management:
|
|
|
|
- Migrated from pip/poetry to uv for faster, more reliable package management
|
|
|
|
- Added uv.lock for deterministic builds
|
|
|
|
- Updated dependency resolution strategy
|
|
|
|
|
2025-01-12 23:33:57 -07:00
|
|
|
## [v0.0.5post1] - 2025-01-11
|
2025-01-11 20:00:34 -07:00
|
|
|
### Fixed
|
2025-01-12 23:33:57 -07:00
|
|
|
- Docker image tagging and versioning improvements (-gpu, -cpu, -ui)
|
|
|
|
- Minor vram management improvements
|
|
|
|
- Gradio bugfix causing crashes and errant warnings
|
|
|
|
- Updated GPU and UI container configurations
|
2025-01-11 20:00:34 -07:00
|
|
|
|
2025-01-10 22:03:59 -07:00
|
|
|
## [v0.0.5] - 2025-01-10
|
|
|
|
### Fixed
|
|
|
|
- Stabilized issues with images tagging and structures from v0.0.4
|
|
|
|
- Added automatic master to develop branch synchronization
|
|
|
|
- Improved release tagging and structures
|
|
|
|
- Initial CI/CD setup
|
|
|
|
|
2025-01-04 02:46:27 -07:00
|
|
|
## 2025-01-04
|
|
|
|
### Added
|
|
|
|
- ONNX Support:
|
|
|
|
- Added single batch ONNX support for CPU inference
|
|
|
|
- Roughly 0.4 RTF (2.4x real-time speed)
|
|
|
|
|
|
|
|
### Modified
|
|
|
|
- Code Refactoring:
|
|
|
|
- Work on modularizing phonemizer and tokenizer into separate services
|
|
|
|
- Incorporated these services into a dev endpoint
|
|
|
|
- Testing and Benchmarking:
|
|
|
|
- Cleaned up benchmarking scripts
|
|
|
|
- Cleaned up test scripts
|
|
|
|
- Added auto-WAV validation scripts
|
2025-01-01 17:38:22 -07:00
|
|
|
|
2025-01-02 15:36:53 -07:00
|
|
|
## 2025-01-02
|
|
|
|
- Audio Format Support:
|
|
|
|
- Added comprehensive audio format conversion support (mp3, wav, opus, flac)
|
|
|
|
|
|
|
|
## 2025-01-01
|
2025-01-01 21:50:00 -07:00
|
|
|
### Added
|
|
|
|
- Gradio Web Interface:
|
|
|
|
- Added simple web UI utility for audio generation from input or txt file
|
|
|
|
|
2025-01-01 17:38:22 -07:00
|
|
|
### Modified
|
|
|
|
#### Configuration Changes
|
|
|
|
- Updated Docker configurations:
|
|
|
|
- Changes to `Dockerfile`:
|
|
|
|
- Improved layer caching by separating dependency and code layers
|
|
|
|
- Updates to `docker-compose.yml` and `docker-compose.cpu.yml`:
|
|
|
|
- Removed commit lock from model fetching to allow automatic model updates from HF
|
|
|
|
- Added git index lock cleanup
|
|
|
|
|
|
|
|
#### API Changes
|
|
|
|
- Modified `api/src/main.py`
|
|
|
|
- Updated TTS service implementation in `api/src/services/tts.py`:
|
|
|
|
- Added device management for better resource control:
|
|
|
|
- Voices are now copied from model repository to api/src/voices directory for persistence
|
|
|
|
- Refactored voice pack handling:
|
|
|
|
- Removed static voice pack dictionary
|
|
|
|
- On-demand voice loading from disk
|
|
|
|
- Added model warm-up functionality:
|
|
|
|
- Model now initializes with a dummy text generation
|
|
|
|
- Uses default voice (af.pt) for warm-up
|
|
|
|
- Model is ready for inference on first request
|