mirror of
https://github.com/remsky/Kokoro-FastAPI.git
synced 2025-08-05 16:48:53 +00:00

- Introduced v1.0 model build system integration. - Updated imports to reflect new directory structure for versioned models. - Modified environment variables - Added version selection in the frontend for voice management. - Enhanced Docker build scripts for multi-platform support. - Updated configuration settings for default voice and model paths.
3.1 KiB
3.1 KiB
Architectural Decision Record: Kokoro v1.0 Integration
Context
We are integrating Kokoro v1.0 while maintaining backward compatibility with v0.19. The v1.0 release introduces significant architectural changes including a new KModel/KPipeline design, language-blind model architecture, and built-in vocab management.
Decision
We will implement a hybrid architecture that:
- Maintains existing streaming infrastructure
- Supports both v0.19 and v1.0 models
- Adapts the new KModel/KPipeline interface to our system
Key Components
1. Version-Specific Model Builds
api/src/builds/
├── v0_19/ # Current implementation
└── v1_0/ # New implementation using KModel
2. Model Manager Interface
class ModelManager:
def __init__(self):
self.models = {} # version -> model
async def get_model(self, version: str):
if version not in self.models:
if version == "v0.19":
from ..builds.v0_19.models import build_model
elif version == "v1.0":
from ..builds.v1_0.models import build_model
self.models[version] = await build_model()
return self.models[version]
3. Voice Management
api/src/voices/
├── v0_19/
└── v1_0/
Integration Strategy
-
Model Integration
- Wrap KModel in our build system
- Adapt to new forward pass interface
- Handle phoneme mapping internally
-
Pipeline Integration
class V1ModelWrapper: def __init__(self, kmodel): self.model = kmodel self.pipeline = KPipeline(model=kmodel) async def forward(self, text, voice): # Adapt v1.0 interface to our streaming system generator = self.pipeline(text, voice=voice) for gs, ps, audio in generator: yield audio
-
API Layer
- Add version parameter to endpoints
- Default to v1.0 if not specified
- Maintain backward compatibility
Consequences
Positive
- Clean separation between v0.19 and v1.0 implementations
- Minimal changes to existing streaming infrastructure
- Simple version switching mechanism
- Local voice management maintained
Negative
- Some code duplication between versions
- Additional wrapper layer for v1.0
- Need to maintain two parallel implementations
Neutral
- Similar memory footprint (models ~few hundred MB)
- Comparable inference speed expected
- No major architectural bottlenecks
Implementation Plan
-
Directory Structure Setup
- Create version-specific directories
- Move current implementation to v0_19/
-
V1.0 Integration
- Implement KModel wrapper
- Add version-aware model manager
- Setup voice directory structure
-
Testing Focus
- Basic inference for both versions
- Voice compatibility
- Streaming performance
- Version switching
- API endpoint compatibility
Migration Path
-
Initial Release
- Both versions available
- v0.19 as default
-
Transition Period
- v1.0 as default
- v0.19 still available
-
Future
- Consider deprecation timeline for v0.19
- Document migration path for users