Kokoro-FastAPI/docs/architecture/adr_kokoro_v1_integration.md

# Architectural Decision Record: Kokoro v1.0 Integration

## Context

We are integrating Kokoro v1.0 while maintaining backward compatibility with v0.19. The v1.0 release introduces significant architectural changes including a new KModel/KPipeline design, language-blind model architecture, and built-in vocab management.

## Decision

We will implement a hybrid architecture that:

1. Maintains existing streaming infrastructure
2. Supports both v0.19 and v1.0 models
3. Adapts the new KModel/KPipeline interface to our system

### Key Components

#### 1. Version-Specific Model Builds
```
api/src/builds/
├── v0_19/          # Current implementation
└── v1_0/           # New implementation using KModel
```

#### 2. Model Manager Interface
```python
class ModelManager:
    def __init__(self):
        self.models = {}  # version -> model
        
    async def get_model(self, version: str):
        if version not in self.models:
            if version == "v0.19":
                from ..builds.v0_19.models import build_model
            elif version == "v1.0":
                from ..builds.v1_0.models import build_model
            self.models[version] = await build_model()
        return self.models[version]
```

#### 3. Voice Management
```
api/src/voices/
├── v0_19/
└── v1_0/
```

### Integration Strategy

1. Model Integration
   - Wrap KModel in our build system
   - Adapt to new forward pass interface
   - Handle phoneme mapping internally

2. Pipeline Integration
   ```python
   class V1ModelWrapper:
       def __init__(self, kmodel):
           self.model = kmodel
           self.pipeline = KPipeline(model=kmodel)
           
       async def forward(self, text, voice):
           # Adapt v1.0 interface to our streaming system
           generator = self.pipeline(text, voice=voice)
           for gs, ps, audio in generator:
               yield audio
   ```

3. API Layer
   - Add version parameter to endpoints
   - Default to v1.0 if not specified
   - Maintain backward compatibility

## Consequences

### Positive
- Clean separation between v0.19 and v1.0 implementations
- Minimal changes to existing streaming infrastructure
- Simple version switching mechanism
- Local voice management maintained

### Negative
- Some code duplication between versions
- Additional wrapper layer for v1.0
- Need to maintain two parallel implementations

### Neutral
- Similar memory footprint (models ~few hundred MB)
- Comparable inference speed expected
- No major architectural bottlenecks

## Implementation Plan

1. Directory Structure Setup
   - Create version-specific directories
   - Move current implementation to v0_19/

2. V1.0 Integration
   - Implement KModel wrapper
   - Add version-aware model manager
   - Setup voice directory structure

3. Testing Focus
   - Basic inference for both versions
   - Voice compatibility
   - Streaming performance
   - Version switching
   - API endpoint compatibility

## Migration Path

1. Initial Release
   - Both versions available
   - v0.19 as default

2. Transition Period
   - v1.0 as default
   - v0.19 still available

3. Future
   - Consider deprecation timeline for v0.19
   - Document migration path for users
WIP: 1.0 integration - Introduced v1.0 model build system integration. - Updated imports to reflect new directory structure for versioned models. - Modified environment variables - Added version selection in the frontend for voice management. - Enhanced Docker build scripts for multi-platform support. - Updated configuration settings for default voice and model paths. 2025-01-31 05:55:57 -07:00			`# Architectural Decision Record: Kokoro v1.0 Integration`

			`## Context`

			`We are integrating Kokoro v1.0 while maintaining backward compatibility with v0.19. The v1.0 release introduces significant architectural changes including a new KModel/KPipeline design, language-blind model architecture, and built-in vocab management.`

			`## Decision`

			`We will implement a hybrid architecture that:`

			`1. Maintains existing streaming infrastructure`
			`2. Supports both v0.19 and v1.0 models`
			`3. Adapts the new KModel/KPipeline interface to our system`

			`### Key Components`

			`#### 1. Version-Specific Model Builds`
			```
			`api/src/builds/`
			`├── v0_19/ # Current implementation`
			`└── v1_0/ # New implementation using KModel`
			```

			`#### 2. Model Manager Interface`
			```python
			`class ModelManager:`
			`def __init__(self):`
			`self.models = {} # version -> model`

			`async def get_model(self, version: str):`
			`if version not in self.models:`
			`if version == "v0.19":`
			`from ..builds.v0_19.models import build_model`
			`elif version == "v1.0":`
			`from ..builds.v1_0.models import build_model`
			`self.models[version] = await build_model()`
			`return self.models[version]`
			```

			`#### 3. Voice Management`
			```
			`api/src/voices/`
			`├── v0_19/`
			`└── v1_0/`
			```

			`### Integration Strategy`

			`1. Model Integration`
			`- Wrap KModel in our build system`
			`- Adapt to new forward pass interface`
			`- Handle phoneme mapping internally`

			`2. Pipeline Integration`
			```python
			`class V1ModelWrapper:`
			`def __init__(self, kmodel):`
			`self.model = kmodel`
			`self.pipeline = KPipeline(model=kmodel)`

			`async def forward(self, text, voice):`
			`# Adapt v1.0 interface to our streaming system`
			`generator = self.pipeline(text, voice=voice)`
			`for gs, ps, audio in generator:`
			`yield audio`
			```

			`3. API Layer`
			`- Add version parameter to endpoints`
			`- Default to v1.0 if not specified`
			`- Maintain backward compatibility`

			`## Consequences`

			`### Positive`
			`- Clean separation between v0.19 and v1.0 implementations`
			`- Minimal changes to existing streaming infrastructure`
			`- Simple version switching mechanism`
			`- Local voice management maintained`

			`### Negative`
			`- Some code duplication between versions`
			`- Additional wrapper layer for v1.0`
			`- Need to maintain two parallel implementations`

			`### Neutral`
			`- Similar memory footprint (models ~few hundred MB)`
			`- Comparable inference speed expected`
			`- No major architectural bottlenecks`

			`## Implementation Plan`

			`1. Directory Structure Setup`
			`- Create version-specific directories`
			`- Move current implementation to v0_19/`

			`2. V1.0 Integration`
			`- Implement KModel wrapper`
			`- Add version-aware model manager`
			`- Setup voice directory structure`

			`3. Testing Focus`
			`- Basic inference for both versions`
			`- Voice compatibility`
			`- Streaming performance`
			`- Version switching`
			`- API endpoint compatibility`

			`## Migration Path`

			`1. Initial Release`
			`- Both versions available`
			`- v0.19 as default`

			`2. Transition Period`
			`- v1.0 as default`
			`- v0.19 still available`

			`3. Future`
			`- Consider deprecation timeline for v0.19`
			`- Document migration path for users`