mirror of https://github.com/remsky/Kokoro-FastAPI.git synced 2025-04-13 09:39:17 +00:00

remsky 903bf91c81 v1_0 multilanguage initial support

-note: all languages currently installed, selected by prefix of first chosen voice in call

2025-02-03 03:33:12 -07:00

2.3 KiB

Raw Blame History

NLP Dependencies Management

Overview

This document outlines our approach to managing NLP dependencies, particularly focusing on spaCy models that are required by our dependencies (such as misaki). The goal is to ensure reliable model availability while preventing runtime download attempts that could cause failures.

Challenge

One of our dependencies, misaki, attempts to download the spaCy model en_core_web_sm during runtime. This can lead to failures if:

The download fails due to network issues
The environment lacks proper permissions
The system is running in a restricted environment

Solution

Model Management with UV

We use UV (Universal Versioner) as our package manager. For spaCy model management, we have two approaches:

Development Environment Setup
```
uv run --with spacy -- spacy download en_core_web_sm
```
This command:
- Temporarily installs spaCy if not present
- Downloads the required model
- Places it in the appropriate location
Project Environment
- Add spaCy as a project dependency in pyproject.toml
- Run uv run -- spacy download en_core_web_sm in the project directory
- This installs the model in the project's virtual environment

Docker Environment

For containerized deployments:

Add the model download step in the Dockerfile
Ensure the model is available before application startup
Configure misaki to use the pre-downloaded model

Benefits

Reliability: Prevents runtime download attempts
Reproducibility: Model version is consistent across environments
Performance: No startup delay from download attempts
Security: Better control over external downloads

Implementation Notes

Development environments should use the uv run --with spacy approach for flexibility
CI/CD pipelines should include model download in their setup phase
Docker builds should pre-download models during image creation
Application code should verify model availability at startup

Future Considerations

Consider caching models in a shared location for multiple services
Implement version pinning for NLP models
Add health checks to verify model availability
Monitor model usage and performance

Kokoro V1 Integration
UV Package Manager Documentation
spaCy Model Management Guide

2.3 KiB Raw Blame History