Kokoro-FastAPI/CHANGELOG.md

# Changelog

Notable changes to this project will be documented in this file.

## 2025-01-04
### Added
- ONNX Support:
  - Added single batch ONNX support for CPU inference
  - Roughly 0.4 RTF (2.4x real-time speed)

### Modified
- Code Refactoring:
  - Work on modularizing phonemizer and tokenizer into separate services
  - Incorporated these services into a dev endpoint
- Testing and Benchmarking:
  - Cleaned up benchmarking scripts
  - Cleaned up test scripts
  - Added auto-WAV validation scripts

## 2025-01-02
- Audio Format Support:
  - Added comprehensive audio format conversion support (mp3, wav, opus, flac)

## 2025-01-01
### Added
- Gradio Web Interface:
  - Added simple web UI utility for audio generation from input or txt file

### Modified
#### Configuration Changes
- Updated Docker configurations:
  - Changes to `Dockerfile`:
    - Improved layer caching by separating dependency and code layers
  - Updates to `docker-compose.yml` and `docker-compose.cpu.yml`:
    - Removed commit lock from model fetching to allow automatic model updates from HF
    - Added git index lock cleanup

#### API Changes
- Modified `api/src/main.py`
- Updated TTS service implementation in `api/src/services/tts.py`:
  - Added device management for better resource control:
    - Voices are now copied from model repository to api/src/voices directory for persistence
  - Refactored voice pack handling:
    - Removed static voice pack dictionary
    - On-demand voice loading from disk
  - Added model warm-up functionality:
    - Model now initializes with a dummy text generation
    - Uses default voice (af.pt) for warm-up
    - Model is ready for inference on first request
-Removed commit lock on HF repo -Warm start added to model initialization -Layer caching tweaks to dockerfile 2025-01-01 17:38:22 -07:00			`# Changelog`

			`Notable changes to this project will be documented in this file.`

Allow ONNX support optimizations for CPU inference and update benchmarking scripts; modify README for clarity on performance metrics 2025-01-04 02:46:27 -07:00			`## 2025-01-04`
			`### Added`
			`- ONNX Support:`
			`- Added single batch ONNX support for CPU inference`
			`- Roughly 0.4 RTF (2.4x real-time speed)`

			`### Modified`
			`- Code Refactoring:`
			`- Work on modularizing phonemizer and tokenizer into separate services`
			`- Incorporated these services into a dev endpoint`
			`- Testing and Benchmarking:`
			`- Cleaned up benchmarking scripts`
			`- Cleaned up test scripts`
			`- Added auto-WAV validation scripts`
-Removed commit lock on HF repo -Warm start added to model initialization -Layer caching tweaks to dockerfile 2025-01-01 17:38:22 -07:00
added output audio tests, validation 2025-01-02 15:36:53 -07:00			`## 2025-01-02`
			`- Audio Format Support:`
			`- Added comprehensive audio format conversion support (mp3, wav, opus, flac)`

			`## 2025-01-01`
Add Gradio web interface + tests 2025-01-01 21:50:00 -07:00			`### Added`
			`- Gradio Web Interface:`
			`- Added simple web UI utility for audio generation from input or txt file`

-Removed commit lock on HF repo -Warm start added to model initialization -Layer caching tweaks to dockerfile 2025-01-01 17:38:22 -07:00			`### Modified`
			`#### Configuration Changes`
			`- Updated Docker configurations:`
			- Changes to `Dockerfile`:
			`- Improved layer caching by separating dependency and code layers`
			- Updates to `docker-compose.yml` and `docker-compose.cpu.yml`:
			`- Removed commit lock from model fetching to allow automatic model updates from HF`
			`- Added git index lock cleanup`

			`#### API Changes`
			- Modified `api/src/main.py`
			- Updated TTS service implementation in `api/src/services/tts.py`:
			`- Added device management for better resource control:`
			`- Voices are now copied from model repository to api/src/voices directory for persistence`
			`- Refactored voice pack handling:`
			`- Removed static voice pack dictionary`
			`- On-demand voice loading from disk`
			`- Added model warm-up functionality:`
			`- Model now initializes with a dummy text generation`
			`- Uses default voice (af.pt) for warm-up`
			`- Model is ready for inference on first request`