mirror of https://github.com/remsky/Kokoro-FastAPI.git synced 2025-08-05 16:48:53 +00:00

remsky 9a588a3483 WIP: 1.0 integration

- Introduced v1.0 model build system integration.
- Updated imports to reflect new directory structure for versioned models.
- Modified environment variables
- Added version selection in the frontend for voice management.
- Enhanced Docker build scripts for multi-platform support.
- Updated configuration settings for default voice and model paths.

2025-01-31 05:55:57 -07:00

1.8 KiB

Raw Permalink Blame History

Kokoro Version Support Architecture

Overview

Simple architecture for supporting both Kokoro v0.19 and v1.0 models, allowing version selection via API.

Directory Structure

api/src/builds/
├── v0_19/                # Current implementation
│   ├── config.json      
│   ├── models.py        
│   ├── istftnet.py      
│   └── plbert.py        
└── v1_0/                # New v1.0 implementation
    ├── config.json     
    ├── models.py       
    ├── istftnet.py    
    └── albert.py

Implementation Plan

Move Current Implementation
- Relocate existing files to v0_19/
- Update imports
Add v1.0 Implementation
- Copy reference implementation
- Adapt to our structure
- Keep voice management local

Model Manager Updates

class ModelManager:
    def __init__(self):
        self.models = {}  # version -> model

    async def get_model(self, version: str):
        if version not in self.models:
            if version == "v0.19":
                from ..builds.v0_19.models import build_model
            elif version == "v1.0":
                from ..builds.v1_0.models import build_model
            self.models[version] = await build_model()
        return self.models[version]

API Integration
- Add version parameter to endpoints
- Default to v1.0 if not specified

Voice Management

Simple directory structure:

api/src/voices/
├── v0_19/
└── v1_0/

Keep voice files local, no HF downloads

Testing

Basic functionality tests for each version
Version switching tests
Voice compatibility tests

No need to over-optimize - models and voices are small enough to keep things simple.

1.8 KiB Raw Permalink Blame History