mirror of
https://github.com/remsky/Kokoro-FastAPI.git
synced 2025-08-05 16:48:53 +00:00
142 lines
3.8 KiB
Markdown
142 lines
3.8 KiB
Markdown
![]() |
# Kokoro FastAPI Load Testing
|
||
|
|
||
|
This directory contains load testing scripts using Locust to test the Kokoro FastAPI server's performance under concurrent load.
|
||
|
|
||
|
## Docker Setup
|
||
|
|
||
|
The easiest way to run the tests is using Docker:
|
||
|
|
||
|
```bash
|
||
|
# Build the Docker image
|
||
|
docker build -t kokoro-locust .
|
||
|
|
||
|
# Run with web interface (default)
|
||
|
docker run -p 8089:8089 -e LOCUST_HOST=http://host.docker.internal:8880 kokoro-locust
|
||
|
|
||
|
# Run headless mode with specific parameters
|
||
|
docker run -e LOCUST_HOST=http://host.docker.internal:8880 \
|
||
|
-e LOCUST_HEADLESS=true \
|
||
|
-e LOCUST_USERS=10 \
|
||
|
-e LOCUST_SPAWN_RATE=1 \
|
||
|
-e LOCUST_RUN_TIME=5m \
|
||
|
kokoro-locust
|
||
|
```
|
||
|
|
||
|
### Environment Variables
|
||
|
|
||
|
- `LOCUST_HOST`: Target server URL (default: http://localhost:8880)
|
||
|
- `LOCUST_USERS`: Number of users to simulate (default: 10)
|
||
|
- `LOCUST_SPAWN_RATE`: Users to spawn per second (default: 1)
|
||
|
- `LOCUST_RUN_TIME`: Test duration (default: 5m)
|
||
|
- `LOCUST_HEADLESS`: Run without web UI if true (default: false)
|
||
|
|
||
|
### Accessing Results
|
||
|
|
||
|
- Web UI: http://localhost:8089 when running in web mode
|
||
|
- HTML Report: Generated in headless mode, copy from container:
|
||
|
```bash
|
||
|
docker cp <container_id>:/locust/report.html ./report.html
|
||
|
```
|
||
|
|
||
|
## Local Setup (Alternative)
|
||
|
|
||
|
If you prefer running without Docker:
|
||
|
|
||
|
1. Create a virtual environment and install requirements:
|
||
|
```bash
|
||
|
python -m venv venv
|
||
|
source venv/bin/activate # On Windows: venv\Scripts\activate
|
||
|
pip install -r requirements.txt
|
||
|
```
|
||
|
|
||
|
2. Make sure your Kokoro FastAPI server is running (default: http://localhost:8880)
|
||
|
|
||
|
3. Run Locust:
|
||
|
```bash
|
||
|
# Web UI mode
|
||
|
locust -f locustfile.py --host http://localhost:8880
|
||
|
|
||
|
# Headless mode
|
||
|
locust -f locustfile.py --host http://localhost:8880 --users 10 --spawn-rate 1 --run-time 5m --headless
|
||
|
```
|
||
|
|
||
|
## Test Scenarios
|
||
|
|
||
|
The load test includes:
|
||
|
1. TTS endpoint testing with short phrases
|
||
|
2. Model pool monitoring
|
||
|
|
||
|
## Testing Different Configurations
|
||
|
|
||
|
To test with different numbers of model instances:
|
||
|
|
||
|
1. Set the model instance count in your server environment:
|
||
|
```bash
|
||
|
export PYTORCH_MAX_CONCURRENT_MODELS=2 # Adjust as needed
|
||
|
```
|
||
|
|
||
|
2. Restart your Kokoro FastAPI server
|
||
|
|
||
|
3. Run the load test with different user counts:
|
||
|
```bash
|
||
|
# Example: Test with 20 users
|
||
|
docker run -e LOCUST_HOST=http://host.docker.internal:8880 \
|
||
|
-e LOCUST_HEADLESS=true \
|
||
|
-e LOCUST_USERS=20 \
|
||
|
-e LOCUST_SPAWN_RATE=2 \
|
||
|
-e LOCUST_RUN_TIME=5m \
|
||
|
kokoro-locust
|
||
|
```
|
||
|
|
||
|
## Example Test Matrix
|
||
|
|
||
|
Test your server with different configurations:
|
||
|
|
||
|
| Model Instances | Concurrent Users | Expected Load |
|
||
|
|----------------|------------------|---------------|
|
||
|
| 1 | 5 | Light |
|
||
|
| 2 | 10 | Medium |
|
||
|
| 4 | 20 | Heavy |
|
||
|
|
||
|
## Quick Test Script
|
||
|
|
||
|
Here's a quick script to test multiple configurations:
|
||
|
|
||
|
```bash
|
||
|
#!/bin/bash
|
||
|
|
||
|
# Array of test configurations
|
||
|
configs=(
|
||
|
"1,5" # 1 instance, 5 users
|
||
|
"2,10" # 2 instances, 10 users
|
||
|
"4,20" # 4 instances, 20 users
|
||
|
)
|
||
|
|
||
|
for config in "${configs[@]}"; do
|
||
|
IFS=',' read -r instances users <<< "$config"
|
||
|
|
||
|
echo "Testing with $instances instances and $users users..."
|
||
|
|
||
|
# Set instance count on server (you'll need to implement this)
|
||
|
# ssh server "export PYTORCH_MAX_CONCURRENT_MODELS=$instances && restart_server"
|
||
|
|
||
|
# Run load test
|
||
|
docker run -e LOCUST_HOST=http://host.docker.internal:8880 \
|
||
|
-e LOCUST_HEADLESS=true \
|
||
|
-e LOCUST_USERS=$users \
|
||
|
-e LOCUST_SPAWN_RATE=1 \
|
||
|
-e LOCUST_RUN_TIME=5m \
|
||
|
kokoro-locust
|
||
|
|
||
|
echo "Waiting 30s before next test..."
|
||
|
sleep 30
|
||
|
done
|
||
|
```
|
||
|
|
||
|
## Tips
|
||
|
|
||
|
1. Start with low user counts and gradually increase
|
||
|
2. Monitor server resources during tests
|
||
|
3. Use the debug endpoint (/debug/model_pool) to monitor instance usage
|
||
|
4. Check server logs for any errors or bottlenecks
|
||
|
5. When using Docker, use `host.docker.internal` to access localhost
|