Kokoro-FastAPI/api/src/services/audio.py

"""Audio conversion service"""

from io import BytesIO

import numpy as np
import soundfile as sf
from loguru import logger


class AudioService:
    """Service for audio format conversions"""

    @staticmethod
    def convert_audio(
        audio_data: np.ndarray, sample_rate: int, output_format: str
    ) -> bytes:
        """Convert audio data to specified format

        Args:
            audio_data: Numpy array of audio samples
            sample_rate: Sample rate of the audio
            output_format: Target format (wav, mp3, opus, flac, pcm)

        Returns:
            Bytes of the converted audio
        """
        buffer = BytesIO()

        try:
            if output_format == "wav":
                logger.info("Writing to WAV format...")
                # Ensure audio_data is in int16 format for WAV
                audio_data_wav = (
                    audio_data / np.abs(audio_data).max() * np.iinfo(np.int16).max
                ).astype(np.int16)  # Normalize
                sf.write(buffer, audio_data_wav, sample_rate, format="WAV")
            elif output_format == "mp3":
                logger.info("Converting to MP3 format...")
                # soundfile can write MP3 if ffmpeg or libsox is installed
                sf.write(buffer, audio_data, sample_rate, format="MP3")
            elif output_format == "opus":
                logger.info("Converting to Opus format...")
                sf.write(buffer, audio_data, sample_rate, format="OGG", subtype="OPUS")
            elif output_format == "flac":
                logger.info("Converting to FLAC format...")
                sf.write(buffer, audio_data, sample_rate, format="FLAC")
            elif output_format == "pcm":
                logger.info("Extracting PCM data...")
                # Ensure audio_data is in int16 format for PCM
                audio_data_pcm = (
                    audio_data / np.abs(audio_data).max() * np.iinfo(np.int16).max
                ).astype(np.int16)  # Normalize
                buffer.write(audio_data_pcm.tobytes())
            else:
                raise ValueError(
                    f"Format {output_format} not supported. Supported formats are: wav, mp3, opus, flac, pcm."
                )

            buffer.seek(0)
            return buffer.getvalue()

        except Exception as e:
            logger.error(f"Error converting audio to {output_format}: {str(e)}")
            raise ValueError(f"Failed to convert audio to {output_format}: {str(e)}")
- Complete TTS endpoint replacement with OpenAI compatible -Removed output directory, and update configuration settings - Added benchmarking for entire novel 2024-12-31 01:52:16 -07:00			`"""Audio conversion service"""`
Enhance TTS API with logging, voice pack loading, and schema updates 2024-12-31 01:57:00 -07:00
- Complete TTS endpoint replacement with OpenAI compatible -Removed output directory, and update configuration settings - Added benchmarking for entire novel 2024-12-31 01:52:16 -07:00			`from io import BytesIO`
Refactor TTS API and enhance testing setup with coverage and logging improvements 2024-12-31 02:55:51 -07:00
- Complete TTS endpoint replacement with OpenAI compatible -Removed output directory, and update configuration settings - Added benchmarking for entire novel 2024-12-31 01:52:16 -07:00			`import numpy as np`
			`import soundfile as sf`
Refactor TTS API and enhance testing setup with coverage and logging improvements 2024-12-31 02:55:51 -07:00			`from loguru import logger`
- Complete TTS endpoint replacement with OpenAI compatible -Removed output directory, and update configuration settings - Added benchmarking for entire novel 2024-12-31 01:52:16 -07:00
Enhance TTS API with logging, voice pack loading, and schema updates 2024-12-31 01:57:00 -07:00
- Complete TTS endpoint replacement with OpenAI compatible -Removed output directory, and update configuration settings - Added benchmarking for entire novel 2024-12-31 01:52:16 -07:00			`class AudioService:`
			`"""Service for audio format conversions"""`
Enhance TTS API with logging, voice pack loading, and schema updates 2024-12-31 01:57:00 -07:00
- Complete TTS endpoint replacement with OpenAI compatible -Removed output directory, and update configuration settings - Added benchmarking for entire novel 2024-12-31 01:52:16 -07:00			`@staticmethod`
Enhance TTS API with logging, voice pack loading, and schema updates 2024-12-31 01:57:00 -07:00			`def convert_audio(`
			`audio_data: np.ndarray, sample_rate: int, output_format: str`
			`) -> bytes:`
- Complete TTS endpoint replacement with OpenAI compatible -Removed output directory, and update configuration settings - Added benchmarking for entire novel 2024-12-31 01:52:16 -07:00			`"""Convert audio data to specified format`
Enhance TTS API with logging, voice pack loading, and schema updates 2024-12-31 01:57:00 -07:00
- Complete TTS endpoint replacement with OpenAI compatible -Removed output directory, and update configuration settings - Added benchmarking for entire novel 2024-12-31 01:52:16 -07:00			`Args:`
			`audio_data: Numpy array of audio samples`
			`sample_rate: Sample rate of the audio`
Update audio.py 2025-01-01 21:11:23 +05:30			`output_format: Target format (wav, mp3, opus, flac, pcm)`
Enhance TTS API with logging, voice pack loading, and schema updates 2024-12-31 01:57:00 -07:00
- Complete TTS endpoint replacement with OpenAI compatible -Removed output directory, and update configuration settings - Added benchmarking for entire novel 2024-12-31 01:52:16 -07:00			`Returns:`
			`Bytes of the converted audio`
			`"""`
			`buffer = BytesIO()`
Enhance TTS API with logging, voice pack loading, and schema updates 2024-12-31 01:57:00 -07:00
- Complete TTS endpoint replacement with OpenAI compatible -Removed output directory, and update configuration settings - Added benchmarking for entire novel 2024-12-31 01:52:16 -07:00			`try:`
Enhance TTS API with logging, voice pack loading, and schema updates 2024-12-31 01:57:00 -07:00			`if output_format == "wav":`
- Complete TTS endpoint replacement with OpenAI compatible -Removed output directory, and update configuration settings - Added benchmarking for entire novel 2024-12-31 01:52:16 -07:00			`logger.info("Writing to WAV format...")`
Update audio.py 2025-01-01 21:11:23 +05:30			`# Ensure audio_data is in int16 format for WAV`
added output audio tests, validation 2025-01-02 15:36:53 -07:00			`audio_data_wav = (`
			`audio_data / np.abs(audio_data).max() * np.iinfo(np.int16).max`
			`).astype(np.int16) # Normalize`
Update audio.py 2025-01-01 21:11:23 +05:30			`sf.write(buffer, audio_data_wav, sample_rate, format="WAV")`
Enhance TTS API with logging, voice pack loading, and schema updates 2024-12-31 01:57:00 -07:00			`elif output_format == "mp3":`
- Complete TTS endpoint replacement with OpenAI compatible -Removed output directory, and update configuration settings - Added benchmarking for entire novel 2024-12-31 01:52:16 -07:00			`logger.info("Converting to MP3 format...")`
Update audio.py 2025-01-01 21:11:23 +05:30			`# soundfile can write MP3 if ffmpeg or libsox is installed`
			`sf.write(buffer, audio_data, sample_rate, format="MP3")`
Enhance TTS API with logging, voice pack loading, and schema updates 2024-12-31 01:57:00 -07:00			`elif output_format == "opus":`
- Complete TTS endpoint replacement with OpenAI compatible -Removed output directory, and update configuration settings - Added benchmarking for entire novel 2024-12-31 01:52:16 -07:00			`logger.info("Converting to Opus format...")`
Update audio.py 2025-01-01 21:11:23 +05:30			`sf.write(buffer, audio_data, sample_rate, format="OGG", subtype="OPUS")`
Enhance TTS API with logging, voice pack loading, and schema updates 2024-12-31 01:57:00 -07:00			`elif output_format == "flac":`
- Complete TTS endpoint replacement with OpenAI compatible -Removed output directory, and update configuration settings - Added benchmarking for entire novel 2024-12-31 01:52:16 -07:00			`logger.info("Converting to FLAC format...")`
Update audio.py 2025-01-01 21:11:23 +05:30			`sf.write(buffer, audio_data, sample_rate, format="FLAC")`
Enhance TTS API with logging, voice pack loading, and schema updates 2024-12-31 01:57:00 -07:00			`elif output_format == "pcm":`
Update audio.py 2025-01-01 21:11:23 +05:30			`logger.info("Extracting PCM data...")`
			`# Ensure audio_data is in int16 format for PCM`
added output audio tests, validation 2025-01-02 15:36:53 -07:00			`audio_data_pcm = (`
			`audio_data / np.abs(audio_data).max() * np.iinfo(np.int16).max`
			`).astype(np.int16) # Normalize`
Update audio.py 2025-01-01 21:11:23 +05:30			`buffer.write(audio_data_pcm.tobytes())`
- Complete TTS endpoint replacement with OpenAI compatible -Removed output directory, and update configuration settings - Added benchmarking for entire novel 2024-12-31 01:52:16 -07:00			`else:`
Enhance TTS API with logging, voice pack loading, and schema updates 2024-12-31 01:57:00 -07:00			`raise ValueError(`
Update audio.py 2025-01-01 21:11:23 +05:30			`f"Format {output_format} not supported. Supported formats are: wav, mp3, opus, flac, pcm."`
Enhance TTS API with logging, voice pack loading, and schema updates 2024-12-31 01:57:00 -07:00			`)`

Update audio.py 2025-01-01 21:11:23 +05:30			`buffer.seek(0)`
			`return buffer.getvalue()`

- Complete TTS endpoint replacement with OpenAI compatible -Removed output directory, and update configuration settings - Added benchmarking for entire novel 2024-12-31 01:52:16 -07:00			`except Exception as e:`
			`logger.error(f"Error converting audio to {output_format}: {str(e)}")`
			`raise ValueError(f"Failed to convert audio to {output_format}: {str(e)}")`