mirror of https://github.com/remsky/Kokoro-FastAPI.git synced 2025-08-05 16:48:53 +00:00

History

remsky e5e85b32d2 -Add model instance pooling, better concurrency -Add load testing setup with Locust		2025-02-09 23:03:16 -07:00
..
locustfile.py	-Add model instance pooling, better concurrency	2025-02-09 23:03:16 -07:00
README.md	-Add model instance pooling, better concurrency	2025-02-09 23:03:16 -07:00
requirements.txt	-Add model instance pooling, better concurrency	2025-02-09 23:03:16 -07:00
run_tests.sh	-Add model instance pooling, better concurrency	2025-02-09 23:03:16 -07:00
start.sh	-Add model instance pooling, better concurrency	2025-02-09 23:03:16 -07:00

README.md

Kokoro FastAPI Load Testing

This directory contains load testing scripts using Locust to test the Kokoro FastAPI server's performance under concurrent load.

Docker Setup

The easiest way to run the tests is using Docker:

# Build the Docker image
docker build -t kokoro-locust .

# Run with web interface (default)
docker run -p 8089:8089 -e LOCUST_HOST=http://host.docker.internal:8880 kokoro-locust

# Run headless mode with specific parameters
docker run -e LOCUST_HOST=http://host.docker.internal:8880 \
    -e LOCUST_HEADLESS=true \
    -e LOCUST_USERS=10 \
    -e LOCUST_SPAWN_RATE=1 \
    -e LOCUST_RUN_TIME=5m \
    kokoro-locust

Environment Variables

LOCUST_HOST: Target server URL (default: http://localhost:8880)
LOCUST_USERS: Number of users to simulate (default: 10)
LOCUST_SPAWN_RATE: Users to spawn per second (default: 1)
LOCUST_RUN_TIME: Test duration (default: 5m)
LOCUST_HEADLESS: Run without web UI if true (default: false)

Accessing Results

Web UI: http://localhost:8089 when running in web mode
HTML Report: Generated in headless mode, copy from container:
```
docker cp <container_id>:/locust/report.html ./report.html
```

Local Setup (Alternative)

If you prefer running without Docker:

Create a virtual environment and install requirements:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

Make sure your Kokoro FastAPI server is running (default: http://localhost:8880)
Run Locust:

# Web UI mode
locust -f locustfile.py --host http://localhost:8880

# Headless mode
locust -f locustfile.py --host http://localhost:8880 --users 10 --spawn-rate 1 --run-time 5m --headless

Test Scenarios

The load test includes:

TTS endpoint testing with short phrases
Model pool monitoring

Testing Different Configurations

To test with different numbers of model instances:

Set the model instance count in your server environment:

export PYTORCH_MAX_CONCURRENT_MODELS=2  # Adjust as needed

Restart your Kokoro FastAPI server
Run the load test with different user counts:

# Example: Test with 20 users
docker run -e LOCUST_HOST=http://host.docker.internal:8880 \
    -e LOCUST_HEADLESS=true \
    -e LOCUST_USERS=20 \
    -e LOCUST_SPAWN_RATE=2 \
    -e LOCUST_RUN_TIME=5m \
    kokoro-locust

Example Test Matrix

Test your server with different configurations:

Model Instances	Concurrent Users	Expected Load
1	5	Light
2	10	Medium
4	20	Heavy

Quick Test Script

Here's a quick script to test multiple configurations:

#!/bin/bash

# Array of test configurations
configs=(
    "1,5"    # 1 instance, 5 users
    "2,10"   # 2 instances, 10 users
    "4,20"   # 4 instances, 20 users
)

for config in "${configs[@]}"; do
    IFS=',' read -r instances users <<< "$config"
    
    echo "Testing with $instances instances and $users users..."
    
    # Set instance count on server (you'll need to implement this)
    # ssh server "export PYTORCH_MAX_CONCURRENT_MODELS=$instances && restart_server"
    
    # Run load test
    docker run -e LOCUST_HOST=http://host.docker.internal:8880 \
        -e LOCUST_HEADLESS=true \
        -e LOCUST_USERS=$users \
        -e LOCUST_SPAWN_RATE=1 \
        -e LOCUST_RUN_TIME=5m \
        kokoro-locust
    
    echo "Waiting 30s before next test..."
    sleep 30
done

Tips

Start with low user counts and gradually increase
Monitor server resources during tests
Use the debug endpoint (/debug/model_pool) to monitor instance usage
Check server logs for any errors or bottlenecks
When using Docker, use host.docker.internal to access localhost