mirror of https://github.com/remsky/Kokoro-FastAPI.git synced 2025-08-05 16:48:53 +00:00

remsky 9a588a3483 WIP: 1.0 integration

- Introduced v1.0 model build system integration.
- Updated imports to reflect new directory structure for versioned models.
- Modified environment variables
- Added version selection in the frontend for voice management.
- Enhanced Docker build scripts for multi-platform support.
- Updated configuration settings for default voice and model paths.

2025-01-31 05:55:57 -07:00

3.1 KiB

Raw Blame History

Architectural Decision Record: Kokoro v1.0 Integration

Context

We are integrating Kokoro v1.0 while maintaining backward compatibility with v0.19. The v1.0 release introduces significant architectural changes including a new KModel/KPipeline design, language-blind model architecture, and built-in vocab management.

Decision

We will implement a hybrid architecture that:

Maintains existing streaming infrastructure
Supports both v0.19 and v1.0 models
Adapts the new KModel/KPipeline interface to our system

Key Components

1. Version-Specific Model Builds

api/src/builds/
├── v0_19/          # Current implementation
└── v1_0/           # New implementation using KModel

2. Model Manager Interface

class ModelManager:
    def __init__(self):
        self.models = {}  # version -> model
        
    async def get_model(self, version: str):
        if version not in self.models:
            if version == "v0.19":
                from ..builds.v0_19.models import build_model
            elif version == "v1.0":
                from ..builds.v1_0.models import build_model
            self.models[version] = await build_model()
        return self.models[version]

3. Voice Management

api/src/voices/
├── v0_19/
└── v1_0/

Integration Strategy

Model Integration
- Wrap KModel in our build system
- Adapt to new forward pass interface
- Handle phoneme mapping internally

Pipeline Integration

class V1ModelWrapper:
    def __init__(self, kmodel):
        self.model = kmodel
        self.pipeline = KPipeline(model=kmodel)

    async def forward(self, text, voice):
        # Adapt v1.0 interface to our streaming system
        generator = self.pipeline(text, voice=voice)
        for gs, ps, audio in generator:
            yield audio

API Layer
- Add version parameter to endpoints
- Default to v1.0 if not specified
- Maintain backward compatibility

Consequences

Positive

Clean separation between v0.19 and v1.0 implementations
Minimal changes to existing streaming infrastructure
Simple version switching mechanism
Local voice management maintained

Negative

Some code duplication between versions
Additional wrapper layer for v1.0
Need to maintain two parallel implementations

Neutral

Similar memory footprint (models ~few hundred MB)
Comparable inference speed expected
No major architectural bottlenecks

Implementation Plan

Directory Structure Setup
- Create version-specific directories
- Move current implementation to v0_19/
V1.0 Integration
- Implement KModel wrapper
- Add version-aware model manager
- Setup voice directory structure
Testing Focus
- Basic inference for both versions
- Voice compatibility
- Streaming performance
- Version switching
- API endpoint compatibility

Migration Path

Initial Release
- Both versions available
- v0.19 as default
Transition Period
- v1.0 as default
- v0.19 still available
Future
- Consider deprecation timeline for v0.19
- Document migration path for users

3.1 KiB Raw Blame History