Kokoro-FastAPI/docs/architecture/adr_kokoro_v1_integration.md

121 lines
3.1 KiB
Markdown
Raw Normal View History

# Architectural Decision Record: Kokoro v1.0 Integration
## Context
We are integrating Kokoro v1.0 while maintaining backward compatibility with v0.19. The v1.0 release introduces significant architectural changes including a new KModel/KPipeline design, language-blind model architecture, and built-in vocab management.
## Decision
We will implement a hybrid architecture that:
1. Maintains existing streaming infrastructure
2. Supports both v0.19 and v1.0 models
3. Adapts the new KModel/KPipeline interface to our system
### Key Components
#### 1. Version-Specific Model Builds
```
api/src/builds/
├── v0_19/ # Current implementation
└── v1_0/ # New implementation using KModel
```
#### 2. Model Manager Interface
```python
class ModelManager:
def __init__(self):
self.models = {} # version -> model
async def get_model(self, version: str):
if version not in self.models:
if version == "v0.19":
from ..builds.v0_19.models import build_model
elif version == "v1.0":
from ..builds.v1_0.models import build_model
self.models[version] = await build_model()
return self.models[version]
```
#### 3. Voice Management
```
api/src/voices/
├── v0_19/
└── v1_0/
```
### Integration Strategy
1. Model Integration
- Wrap KModel in our build system
- Adapt to new forward pass interface
- Handle phoneme mapping internally
2. Pipeline Integration
```python
class V1ModelWrapper:
def __init__(self, kmodel):
self.model = kmodel
self.pipeline = KPipeline(model=kmodel)
async def forward(self, text, voice):
# Adapt v1.0 interface to our streaming system
generator = self.pipeline(text, voice=voice)
for gs, ps, audio in generator:
yield audio
```
3. API Layer
- Add version parameter to endpoints
- Default to v1.0 if not specified
- Maintain backward compatibility
## Consequences
### Positive
- Clean separation between v0.19 and v1.0 implementations
- Minimal changes to existing streaming infrastructure
- Simple version switching mechanism
- Local voice management maintained
### Negative
- Some code duplication between versions
- Additional wrapper layer for v1.0
- Need to maintain two parallel implementations
### Neutral
- Similar memory footprint (models ~few hundred MB)
- Comparable inference speed expected
- No major architectural bottlenecks
## Implementation Plan
1. Directory Structure Setup
- Create version-specific directories
- Move current implementation to v0_19/
2. V1.0 Integration
- Implement KModel wrapper
- Add version-aware model manager
- Setup voice directory structure
3. Testing Focus
- Basic inference for both versions
- Voice compatibility
- Streaming performance
- Version switching
- API endpoint compatibility
## Migration Path
1. Initial Release
- Both versions available
- v0.19 as default
2. Transition Period
- v1.0 as default
- v0.19 still available
3. Future
- Consider deprecation timeline for v0.19
- Document migration path for users