Kokoro-FastAPI/api/src/services/audio.py

"""Audio conversion service"""

from io import BytesIO

import numpy as np
import soundfile as sf
import scipy.io.wavfile as wavfile
from loguru import logger


class AudioService:
    """Service for audio format conversions"""

    @staticmethod
    def convert_audio(
        audio_data: np.ndarray, sample_rate: int, output_format: str
    ) -> bytes:
        """Convert audio data to specified format

        Args:
            audio_data: Numpy array of audio samples
            sample_rate: Sample rate of the audio
            output_format: Target format (wav, mp3, etc.)

        Returns:
            Bytes of the converted audio
        """
        buffer = BytesIO()

        try:
            if output_format == "wav":
                logger.info("Writing to WAV format...")
                wavfile.write(buffer, sample_rate, audio_data)
                return buffer.getvalue()

            elif output_format == "mp3":
                # For MP3, we need to convert to WAV first
                logger.info("Converting to MP3 format...")
                wav_buffer = BytesIO()
                wavfile.write(wav_buffer, sample_rate, audio_data)
                wav_buffer.seek(0)

                # Convert WAV to MP3 using soundfile
                buffer = BytesIO()
                sf.write(buffer, audio_data, sample_rate, format="mp3")
                return buffer.getvalue()

            elif output_format == "opus":
                logger.info("Converting to Opus format...")
                sf.write(buffer, audio_data, sample_rate, format="ogg", subtype="opus")
                return buffer.getvalue()

            elif output_format == "flac":
                logger.info("Converting to FLAC format...")
                sf.write(buffer, audio_data, sample_rate, format="flac")
                return buffer.getvalue()

            elif output_format == "aac":
                raise ValueError(
                    "AAC format is not currently supported. Please use wav, mp3, opus, or flac."
                )

            elif output_format == "pcm":
                raise ValueError(
                    "PCM format is not currently supported. Please use wav, mp3, opus, or flac."
                )

            else:
                raise ValueError(
                    f"Format {output_format} not supported. Supported formats are: wav, mp3, opus, flac."
                )

        except Exception as e:
            logger.error(f"Error converting audio to {output_format}: {str(e)}")
            raise ValueError(f"Failed to convert audio to {output_format}: {str(e)}")
- Complete TTS endpoint replacement with OpenAI compatible -Removed output directory, and update configuration settings - Added benchmarking for entire novel 2024-12-31 01:52:16 -07:00			`"""Audio conversion service"""`
Enhance TTS API with logging, voice pack loading, and schema updates 2024-12-31 01:57:00 -07:00
- Complete TTS endpoint replacement with OpenAI compatible -Removed output directory, and update configuration settings - Added benchmarking for entire novel 2024-12-31 01:52:16 -07:00			`from io import BytesIO`
Refactor TTS API and enhance testing setup with coverage and logging improvements 2024-12-31 02:55:51 -07:00
- Complete TTS endpoint replacement with OpenAI compatible -Removed output directory, and update configuration settings - Added benchmarking for entire novel 2024-12-31 01:52:16 -07:00			`import numpy as np`
			`import soundfile as sf`
Refactor TTS API and enhance testing setup with coverage and logging improvements 2024-12-31 02:55:51 -07:00			`import scipy.io.wavfile as wavfile`
			`from loguru import logger`
- Complete TTS endpoint replacement with OpenAI compatible -Removed output directory, and update configuration settings - Added benchmarking for entire novel 2024-12-31 01:52:16 -07:00
Enhance TTS API with logging, voice pack loading, and schema updates 2024-12-31 01:57:00 -07:00
- Complete TTS endpoint replacement with OpenAI compatible -Removed output directory, and update configuration settings - Added benchmarking for entire novel 2024-12-31 01:52:16 -07:00			`class AudioService:`
			`"""Service for audio format conversions"""`
Enhance TTS API with logging, voice pack loading, and schema updates 2024-12-31 01:57:00 -07:00
- Complete TTS endpoint replacement with OpenAI compatible -Removed output directory, and update configuration settings - Added benchmarking for entire novel 2024-12-31 01:52:16 -07:00			`@staticmethod`
Enhance TTS API with logging, voice pack loading, and schema updates 2024-12-31 01:57:00 -07:00			`def convert_audio(`
			`audio_data: np.ndarray, sample_rate: int, output_format: str`
			`) -> bytes:`
- Complete TTS endpoint replacement with OpenAI compatible -Removed output directory, and update configuration settings - Added benchmarking for entire novel 2024-12-31 01:52:16 -07:00			`"""Convert audio data to specified format`
Enhance TTS API with logging, voice pack loading, and schema updates 2024-12-31 01:57:00 -07:00
- Complete TTS endpoint replacement with OpenAI compatible -Removed output directory, and update configuration settings - Added benchmarking for entire novel 2024-12-31 01:52:16 -07:00			`Args:`
			`audio_data: Numpy array of audio samples`
			`sample_rate: Sample rate of the audio`
			`output_format: Target format (wav, mp3, etc.)`
Enhance TTS API with logging, voice pack loading, and schema updates 2024-12-31 01:57:00 -07:00
- Complete TTS endpoint replacement with OpenAI compatible -Removed output directory, and update configuration settings - Added benchmarking for entire novel 2024-12-31 01:52:16 -07:00			`Returns:`
			`Bytes of the converted audio`
			`"""`
			`buffer = BytesIO()`
Enhance TTS API with logging, voice pack loading, and schema updates 2024-12-31 01:57:00 -07:00
- Complete TTS endpoint replacement with OpenAI compatible -Removed output directory, and update configuration settings - Added benchmarking for entire novel 2024-12-31 01:52:16 -07:00			`try:`
Enhance TTS API with logging, voice pack loading, and schema updates 2024-12-31 01:57:00 -07:00			`if output_format == "wav":`
- Complete TTS endpoint replacement with OpenAI compatible -Removed output directory, and update configuration settings - Added benchmarking for entire novel 2024-12-31 01:52:16 -07:00			`logger.info("Writing to WAV format...")`
			`wavfile.write(buffer, sample_rate, audio_data)`
			`return buffer.getvalue()`
Enhance TTS API with logging, voice pack loading, and schema updates 2024-12-31 01:57:00 -07:00
			`elif output_format == "mp3":`
- Complete TTS endpoint replacement with OpenAI compatible -Removed output directory, and update configuration settings - Added benchmarking for entire novel 2024-12-31 01:52:16 -07:00			`# For MP3, we need to convert to WAV first`
			`logger.info("Converting to MP3 format...")`
			`wav_buffer = BytesIO()`
			`wavfile.write(wav_buffer, sample_rate, audio_data)`
			`wav_buffer.seek(0)`
Enhance TTS API with logging, voice pack loading, and schema updates 2024-12-31 01:57:00 -07:00
- Complete TTS endpoint replacement with OpenAI compatible -Removed output directory, and update configuration settings - Added benchmarking for entire novel 2024-12-31 01:52:16 -07:00			`# Convert WAV to MP3 using soundfile`
			`buffer = BytesIO()`
Enhance TTS API with logging, voice pack loading, and schema updates 2024-12-31 01:57:00 -07:00			`sf.write(buffer, audio_data, sample_rate, format="mp3")`
- Complete TTS endpoint replacement with OpenAI compatible -Removed output directory, and update configuration settings - Added benchmarking for entire novel 2024-12-31 01:52:16 -07:00			`return buffer.getvalue()`
Enhance TTS API with logging, voice pack loading, and schema updates 2024-12-31 01:57:00 -07:00
			`elif output_format == "opus":`
- Complete TTS endpoint replacement with OpenAI compatible -Removed output directory, and update configuration settings - Added benchmarking for entire novel 2024-12-31 01:52:16 -07:00			`logger.info("Converting to Opus format...")`
Enhance TTS API with logging, voice pack loading, and schema updates 2024-12-31 01:57:00 -07:00			`sf.write(buffer, audio_data, sample_rate, format="ogg", subtype="opus")`
- Complete TTS endpoint replacement with OpenAI compatible -Removed output directory, and update configuration settings - Added benchmarking for entire novel 2024-12-31 01:52:16 -07:00			`return buffer.getvalue()`
Enhance TTS API with logging, voice pack loading, and schema updates 2024-12-31 01:57:00 -07:00
			`elif output_format == "flac":`
- Complete TTS endpoint replacement with OpenAI compatible -Removed output directory, and update configuration settings - Added benchmarking for entire novel 2024-12-31 01:52:16 -07:00			`logger.info("Converting to FLAC format...")`
Enhance TTS API with logging, voice pack loading, and schema updates 2024-12-31 01:57:00 -07:00			`sf.write(buffer, audio_data, sample_rate, format="flac")`
- Complete TTS endpoint replacement with OpenAI compatible -Removed output directory, and update configuration settings - Added benchmarking for entire novel 2024-12-31 01:52:16 -07:00			`return buffer.getvalue()`
Enhance TTS API with logging, voice pack loading, and schema updates 2024-12-31 01:57:00 -07:00
			`elif output_format == "aac":`
			`raise ValueError(`
			`"AAC format is not currently supported. Please use wav, mp3, opus, or flac."`
			`)`

			`elif output_format == "pcm":`
			`raise ValueError(`
			`"PCM format is not currently supported. Please use wav, mp3, opus, or flac."`
			`)`

- Complete TTS endpoint replacement with OpenAI compatible -Removed output directory, and update configuration settings - Added benchmarking for entire novel 2024-12-31 01:52:16 -07:00			`else:`
Enhance TTS API with logging, voice pack loading, and schema updates 2024-12-31 01:57:00 -07:00			`raise ValueError(`
			`f"Format {output_format} not supported. Supported formats are: wav, mp3, opus, flac."`
			`)`

- Complete TTS endpoint replacement with OpenAI compatible -Removed output directory, and update configuration settings - Added benchmarking for entire novel 2024-12-31 01:52:16 -07:00			`except Exception as e:`
			`logger.error(f"Error converting audio to {output_format}: {str(e)}")`
			`raise ValueError(f"Failed to convert audio to {output_format}: {str(e)}")`