Kokoro-FastAPI/docs/architecture/adr_kokoro_v1_integration.md
remsky 9a588a3483 WIP: 1.0 integration
- Introduced v1.0 model build system integration.
- Updated imports to reflect new directory structure for versioned models.
- Modified environment variables
- Added version selection in the frontend for voice management.
- Enhanced Docker build scripts for multi-platform support.
- Updated configuration settings for default voice and model paths.
2025-01-31 05:55:57 -07:00

3.1 KiB

Architectural Decision Record: Kokoro v1.0 Integration

Context

We are integrating Kokoro v1.0 while maintaining backward compatibility with v0.19. The v1.0 release introduces significant architectural changes including a new KModel/KPipeline design, language-blind model architecture, and built-in vocab management.

Decision

We will implement a hybrid architecture that:

  1. Maintains existing streaming infrastructure
  2. Supports both v0.19 and v1.0 models
  3. Adapts the new KModel/KPipeline interface to our system

Key Components

1. Version-Specific Model Builds

api/src/builds/
├── v0_19/          # Current implementation
└── v1_0/           # New implementation using KModel

2. Model Manager Interface

class ModelManager:
    def __init__(self):
        self.models = {}  # version -> model
        
    async def get_model(self, version: str):
        if version not in self.models:
            if version == "v0.19":
                from ..builds.v0_19.models import build_model
            elif version == "v1.0":
                from ..builds.v1_0.models import build_model
            self.models[version] = await build_model()
        return self.models[version]

3. Voice Management

api/src/voices/
├── v0_19/
└── v1_0/

Integration Strategy

  1. Model Integration

    • Wrap KModel in our build system
    • Adapt to new forward pass interface
    • Handle phoneme mapping internally
  2. Pipeline Integration

    class V1ModelWrapper:
        def __init__(self, kmodel):
            self.model = kmodel
            self.pipeline = KPipeline(model=kmodel)
    
        async def forward(self, text, voice):
            # Adapt v1.0 interface to our streaming system
            generator = self.pipeline(text, voice=voice)
            for gs, ps, audio in generator:
                yield audio
    
  3. API Layer

    • Add version parameter to endpoints
    • Default to v1.0 if not specified
    • Maintain backward compatibility

Consequences

Positive

  • Clean separation between v0.19 and v1.0 implementations
  • Minimal changes to existing streaming infrastructure
  • Simple version switching mechanism
  • Local voice management maintained

Negative

  • Some code duplication between versions
  • Additional wrapper layer for v1.0
  • Need to maintain two parallel implementations

Neutral

  • Similar memory footprint (models ~few hundred MB)
  • Comparable inference speed expected
  • No major architectural bottlenecks

Implementation Plan

  1. Directory Structure Setup

    • Create version-specific directories
    • Move current implementation to v0_19/
  2. V1.0 Integration

    • Implement KModel wrapper
    • Add version-aware model manager
    • Setup voice directory structure
  3. Testing Focus

    • Basic inference for both versions
    • Voice compatibility
    • Streaming performance
    • Version switching
    • API endpoint compatibility

Migration Path

  1. Initial Release

    • Both versions available
    • v0.19 as default
  2. Transition Period

    • v1.0 as default
    • v0.19 still available
  3. Future

    • Consider deprecation timeline for v0.19
    • Document migration path for users