mirror of
https://github.com/remsky/Kokoro-FastAPI.git
synced 2025-08-05 16:48:53 +00:00
121 lines
3.1 KiB
Markdown
121 lines
3.1 KiB
Markdown
![]() |
# Architectural Decision Record: Kokoro v1.0 Integration
|
||
|
|
||
|
## Context
|
||
|
|
||
|
We are integrating Kokoro v1.0 while maintaining backward compatibility with v0.19. The v1.0 release introduces significant architectural changes including a new KModel/KPipeline design, language-blind model architecture, and built-in vocab management.
|
||
|
|
||
|
## Decision
|
||
|
|
||
|
We will implement a hybrid architecture that:
|
||
|
|
||
|
1. Maintains existing streaming infrastructure
|
||
|
2. Supports both v0.19 and v1.0 models
|
||
|
3. Adapts the new KModel/KPipeline interface to our system
|
||
|
|
||
|
### Key Components
|
||
|
|
||
|
#### 1. Version-Specific Model Builds
|
||
|
```
|
||
|
api/src/builds/
|
||
|
├── v0_19/ # Current implementation
|
||
|
└── v1_0/ # New implementation using KModel
|
||
|
```
|
||
|
|
||
|
#### 2. Model Manager Interface
|
||
|
```python
|
||
|
class ModelManager:
|
||
|
def __init__(self):
|
||
|
self.models = {} # version -> model
|
||
|
|
||
|
async def get_model(self, version: str):
|
||
|
if version not in self.models:
|
||
|
if version == "v0.19":
|
||
|
from ..builds.v0_19.models import build_model
|
||
|
elif version == "v1.0":
|
||
|
from ..builds.v1_0.models import build_model
|
||
|
self.models[version] = await build_model()
|
||
|
return self.models[version]
|
||
|
```
|
||
|
|
||
|
#### 3. Voice Management
|
||
|
```
|
||
|
api/src/voices/
|
||
|
├── v0_19/
|
||
|
└── v1_0/
|
||
|
```
|
||
|
|
||
|
### Integration Strategy
|
||
|
|
||
|
1. Model Integration
|
||
|
- Wrap KModel in our build system
|
||
|
- Adapt to new forward pass interface
|
||
|
- Handle phoneme mapping internally
|
||
|
|
||
|
2. Pipeline Integration
|
||
|
```python
|
||
|
class V1ModelWrapper:
|
||
|
def __init__(self, kmodel):
|
||
|
self.model = kmodel
|
||
|
self.pipeline = KPipeline(model=kmodel)
|
||
|
|
||
|
async def forward(self, text, voice):
|
||
|
# Adapt v1.0 interface to our streaming system
|
||
|
generator = self.pipeline(text, voice=voice)
|
||
|
for gs, ps, audio in generator:
|
||
|
yield audio
|
||
|
```
|
||
|
|
||
|
3. API Layer
|
||
|
- Add version parameter to endpoints
|
||
|
- Default to v1.0 if not specified
|
||
|
- Maintain backward compatibility
|
||
|
|
||
|
## Consequences
|
||
|
|
||
|
### Positive
|
||
|
- Clean separation between v0.19 and v1.0 implementations
|
||
|
- Minimal changes to existing streaming infrastructure
|
||
|
- Simple version switching mechanism
|
||
|
- Local voice management maintained
|
||
|
|
||
|
### Negative
|
||
|
- Some code duplication between versions
|
||
|
- Additional wrapper layer for v1.0
|
||
|
- Need to maintain two parallel implementations
|
||
|
|
||
|
### Neutral
|
||
|
- Similar memory footprint (models ~few hundred MB)
|
||
|
- Comparable inference speed expected
|
||
|
- No major architectural bottlenecks
|
||
|
|
||
|
## Implementation Plan
|
||
|
|
||
|
1. Directory Structure Setup
|
||
|
- Create version-specific directories
|
||
|
- Move current implementation to v0_19/
|
||
|
|
||
|
2. V1.0 Integration
|
||
|
- Implement KModel wrapper
|
||
|
- Add version-aware model manager
|
||
|
- Setup voice directory structure
|
||
|
|
||
|
3. Testing Focus
|
||
|
- Basic inference for both versions
|
||
|
- Voice compatibility
|
||
|
- Streaming performance
|
||
|
- Version switching
|
||
|
- API endpoint compatibility
|
||
|
|
||
|
## Migration Path
|
||
|
|
||
|
1. Initial Release
|
||
|
- Both versions available
|
||
|
- v0.19 as default
|
||
|
|
||
|
2. Transition Period
|
||
|
- v1.0 as default
|
||
|
- v0.19 still available
|
||
|
|
||
|
3. Future
|
||
|
- Consider deprecation timeline for v0.19
|
||
|
- Document migration path for users
|