mirror of
https://github.com/remsky/Kokoro-FastAPI.git
synced 2025-04-13 09:39:17 +00:00
|
||
---|---|---|
.. | ||
locustfile.py | ||
README.md | ||
requirements.txt | ||
run_tests.sh | ||
start.sh |
Kokoro FastAPI Load Testing
This directory contains load testing scripts using Locust to test the Kokoro FastAPI server's performance under concurrent load.
Docker Setup
The easiest way to run the tests is using Docker:
# Build the Docker image
docker build -t kokoro-locust .
# Run with web interface (default)
docker run -p 8089:8089 -e LOCUST_HOST=http://host.docker.internal:8880 kokoro-locust
# Run headless mode with specific parameters
docker run -e LOCUST_HOST=http://host.docker.internal:8880 \
-e LOCUST_HEADLESS=true \
-e LOCUST_USERS=10 \
-e LOCUST_SPAWN_RATE=1 \
-e LOCUST_RUN_TIME=5m \
kokoro-locust
Environment Variables
LOCUST_HOST
: Target server URL (default: http://localhost:8880)LOCUST_USERS
: Number of users to simulate (default: 10)LOCUST_SPAWN_RATE
: Users to spawn per second (default: 1)LOCUST_RUN_TIME
: Test duration (default: 5m)LOCUST_HEADLESS
: Run without web UI if true (default: false)
Accessing Results
- Web UI: http://localhost:8089 when running in web mode
- HTML Report: Generated in headless mode, copy from container:
docker cp <container_id>:/locust/report.html ./report.html
Local Setup (Alternative)
If you prefer running without Docker:
- Create a virtual environment and install requirements:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
-
Make sure your Kokoro FastAPI server is running (default: http://localhost:8880)
-
Run Locust:
# Web UI mode
locust -f locustfile.py --host http://localhost:8880
# Headless mode
locust -f locustfile.py --host http://localhost:8880 --users 10 --spawn-rate 1 --run-time 5m --headless
Test Scenarios
The load test includes:
- TTS endpoint testing with short phrases
- Model pool monitoring
Testing Different Configurations
To test with different numbers of model instances:
- Set the model instance count in your server environment:
export PYTORCH_MAX_CONCURRENT_MODELS=2 # Adjust as needed
-
Restart your Kokoro FastAPI server
-
Run the load test with different user counts:
# Example: Test with 20 users
docker run -e LOCUST_HOST=http://host.docker.internal:8880 \
-e LOCUST_HEADLESS=true \
-e LOCUST_USERS=20 \
-e LOCUST_SPAWN_RATE=2 \
-e LOCUST_RUN_TIME=5m \
kokoro-locust
Example Test Matrix
Test your server with different configurations:
Model Instances | Concurrent Users | Expected Load |
---|---|---|
1 | 5 | Light |
2 | 10 | Medium |
4 | 20 | Heavy |
Quick Test Script
Here's a quick script to test multiple configurations:
#!/bin/bash
# Array of test configurations
configs=(
"1,5" # 1 instance, 5 users
"2,10" # 2 instances, 10 users
"4,20" # 4 instances, 20 users
)
for config in "${configs[@]}"; do
IFS=',' read -r instances users <<< "$config"
echo "Testing with $instances instances and $users users..."
# Set instance count on server (you'll need to implement this)
# ssh server "export PYTORCH_MAX_CONCURRENT_MODELS=$instances && restart_server"
# Run load test
docker run -e LOCUST_HOST=http://host.docker.internal:8880 \
-e LOCUST_HEADLESS=true \
-e LOCUST_USERS=$users \
-e LOCUST_SPAWN_RATE=1 \
-e LOCUST_RUN_TIME=5m \
kokoro-locust
echo "Waiting 30s before next test..."
sleep 30
done
Tips
- Start with low user counts and gradually increase
- Monitor server resources during tests
- Use the debug endpoint (/debug/model_pool) to monitor instance usage
- Check server logs for any errors or bottlenecks
- When using Docker, use
host.docker.internal
to access localhost