mirror of
https://github.com/remsky/Kokoro-FastAPI.git
synced 2025-04-13 09:39:17 +00:00

-note: all languages currently installed, selected by prefix of first chosen voice in call
2.3 KiB
2.3 KiB
NLP Dependencies Management
Overview
This document outlines our approach to managing NLP dependencies, particularly focusing on spaCy models that are required by our dependencies (such as misaki). The goal is to ensure reliable model availability while preventing runtime download attempts that could cause failures.
Challenge
One of our dependencies, misaki, attempts to download the spaCy model en_core_web_sm
during runtime. This can lead to failures if:
- The download fails due to network issues
- The environment lacks proper permissions
- The system is running in a restricted environment
Solution
Model Management with UV
We use UV (Universal Versioner) as our package manager. For spaCy model management, we have two approaches:
-
Development Environment Setup
uv run --with spacy -- spacy download en_core_web_sm
This command:
- Temporarily installs spaCy if not present
- Downloads the required model
- Places it in the appropriate location
-
Project Environment
- Add spaCy as a project dependency in pyproject.toml
- Run
uv run -- spacy download en_core_web_sm
in the project directory - This installs the model in the project's virtual environment
Docker Environment
For containerized deployments:
- Add the model download step in the Dockerfile
- Ensure the model is available before application startup
- Configure misaki to use the pre-downloaded model
Benefits
- Reliability: Prevents runtime download attempts
- Reproducibility: Model version is consistent across environments
- Performance: No startup delay from download attempts
- Security: Better control over external downloads
Implementation Notes
- Development environments should use the
uv run --with spacy
approach for flexibility - CI/CD pipelines should include model download in their setup phase
- Docker builds should pre-download models during image creation
- Application code should verify model availability at startup
Future Considerations
- Consider caching models in a shared location for multiple services
- Implement version pinning for NLP models
- Add health checks to verify model availability
- Monitor model usage and performance
Related Documentation
- Kokoro V1 Integration
- UV Package Manager Documentation
- spaCy Model Management Guide