Kokoro-FastAPI/api/src/services/streaming_audio_writer.py

"""Audio conversion service with proper streaming support"""

import struct
from io import BytesIO
from typing import Optional

import av
import numpy as np
import soundfile as sf
from loguru import logger
from pydub import AudioSegment


class StreamingAudioWriter:
    """Handles streaming audio format conversions"""

    def __init__(self, format: str, sample_rate: int, channels: int = 1):
        self.format = format.lower()
        self.sample_rate = sample_rate
        self.channels = channels
        self.bytes_written = 0
        self.pts=0

        codec_map = {"wav":"pcm_s16le","mp3":"mp3","opus":"libopus","flac":"flac", "aac":"aac"}
        # Format-specific setup
        if self.format in ["wav","flac","mp3","pcm","aac","opus"]:
            if self.format != "pcm":
                self.output_buffer = BytesIO()
                self.container = av.open(self.output_buffer, mode="w", format=self.format if self.format != "aac" else "adts")
                self.stream = self.container.add_stream(codec_map[self.format],sample_rate=self.sample_rate,layout='mono' if self.channels == 1 else 'stereo')
                self.stream.bit_rate = 128000
        else:
            raise ValueError(f"Unsupported format: {format}")

    def close(self):
        if hasattr(self, "container"):
            self.container.close()

        if hasattr(self, "output_buffer"):
            self.output_buffer.close()

    def write_chunk(
        self, audio_data: Optional[np.ndarray] = None, finalize: bool = False
    ) -> bytes:
        """Write a chunk of audio data and return bytes in the target format.

        Args:
            audio_data: Audio data to write, or None if finalizing
            finalize: Whether this is the final write to close the stream
        """

        if finalize:
            if self.format != "pcm":
                packets = self.stream.encode(None)
                for packet in packets:
                    self.container.mux(packet)
                    
                data=self.output_buffer.getvalue()
                self.close()
                return data

        if audio_data is None or len(audio_data) == 0:
            return b""

        if self.format == "pcm":
            # Write raw bytes
            return audio_data.tobytes()
        else:
            frame = av.AudioFrame.from_ndarray(audio_data.reshape(1, -1), format='s16', layout='mono' if self.channels == 1 else 'stereo')
            frame.sample_rate=self.sample_rate

            
            frame.pts = self.pts
            self.pts += frame.samples
            
            packets = self.stream.encode(frame)
            for packet in packets:
                self.container.mux(packet)
            
            data = self.output_buffer.getvalue()
            self.output_buffer.seek(0)
            self.output_buffer.truncate(0)
            return data
Add StreamingAudioWriter class for audio format conversions and remove deprecated migration notes 2025-01-27 20:23:35 -07:00			`"""Audio conversion service with proper streaming support"""`

			`import struct`
Ruff check + formatting 2025-02-09 18:32:17 -07:00			`from io import BytesIO`
-Add debug endpoint for system stats -Adjust headers, generate from phonemes, etc 2025-01-30 04:44:04 -07:00			`from typing import Optional`
Add StreamingAudioWriter class for audio format conversions and remove deprecated migration notes 2025-01-27 20:23:35 -07:00
Ruff check 2025-04-04 16:50:46 -06:00			`import av`
Add StreamingAudioWriter class for audio format conversions and remove deprecated migration notes 2025-01-27 20:23:35 -07:00			`import numpy as np`
			`import soundfile as sf`
			`from loguru import logger`
			`from pydub import AudioSegment`
Ruff check 2025-04-04 16:50:46 -06:00
Ruff check + formatting 2025-02-09 18:32:17 -07:00
Add StreamingAudioWriter class for audio format conversions and remove deprecated migration notes 2025-01-27 20:23:35 -07:00			`class StreamingAudioWriter:`
			`"""Handles streaming audio format conversions"""`

			`def __init__(self, format: str, sample_rate: int, channels: int = 1):`
			`self.format = format.lower()`
			`self.sample_rate = sample_rate`
			`self.channels = channels`
			`self.bytes_written = 0`
Converted the stream writer to use pyav 2025-02-19 23:10:51 -05:00			`self.pts=0`
Add StreamingAudioWriter class for audio format conversions and remove deprecated migration notes 2025-01-27 20:23:35 -07:00
Converted the stream writer to use pyav 2025-02-19 23:10:51 -05:00			`codec_map = {"wav":"pcm_s16le","mp3":"mp3","opus":"libopus","flac":"flac", "aac":"aac"}`
Add StreamingAudioWriter class for audio format conversions and remove deprecated migration notes 2025-01-27 20:23:35 -07:00			`# Format-specific setup`
Aculy fixed tests this time 2025-03-20 19:15:07 +00:00			`if self.format in ["wav","flac","mp3","pcm","aac","opus"]:`
Converted the stream writer to use pyav 2025-02-19 23:10:51 -05:00			`if self.format != "pcm":`
			`self.output_buffer = BytesIO()`
Fixes relating to parsing money and tests. Also readme stuff 2025-03-21 18:03:09 +00:00			`self.container = av.open(self.output_buffer, mode="w", format=self.format if self.format != "aac" else "adts")`
notremoved the rate argument which apperently means bitrate 2025-02-26 21:51:00 -05:00			`self.stream = self.container.add_stream(codec_map[self.format],sample_rate=self.sample_rate,layout='mono' if self.channels == 1 else 'stereo')`
fixes the low quality fix not working properly 2025-02-28 21:57:33 -05:00			`self.stream.bit_rate = 128000`
WIP: v1_0_0 migration 2025-01-28 13:52:57 -07:00			`else:`
			`raise ValueError(f"Unsupported format: {format}")`
Add StreamingAudioWriter class for audio format conversions and remove deprecated migration notes 2025-01-27 20:23:35 -07:00
Aculy fixed tests this time 2025-03-20 19:15:07 +00:00			`def close(self):`
			`if hasattr(self, "container"):`
			`self.container.close()`

			`if hasattr(self, "output_buffer"):`
			`self.output_buffer.close()`

Ruff check + formatting 2025-02-09 18:32:17 -07:00			`def write_chunk(`
			`self, audio_data: Optional[np.ndarray] = None, finalize: bool = False`
			`) -> bytes:`
Refactor audio processing and cleanup: remove unused chunker, enhance StreamingAudioWriter for better MP3 handling, and improve text processing compatibility. 2025-01-27 20:23:42 -07:00			`"""Write a chunk of audio data and return bytes in the target format.`
Ruff check + formatting 2025-02-09 18:32:17 -07:00
Refactor audio processing and cleanup: remove unused chunker, enhance StreamingAudioWriter for better MP3 handling, and improve text processing compatibility. 2025-01-27 20:23:42 -07:00			`Args:`
			`audio_data: Audio data to write, or None if finalizing`
			`finalize: Whether this is the final write to close the stream`
			`"""`
Add StreamingAudioWriter class for audio format conversions and remove deprecated migration notes 2025-01-27 20:23:35 -07:00
Refactor audio processing and cleanup: remove unused chunker, enhance StreamingAudioWriter for better MP3 handling, and improve text processing compatibility. 2025-01-27 20:23:42 -07:00			`if finalize:`
Converted the stream writer to use pyav 2025-02-19 23:10:51 -05:00			`if self.format != "pcm":`
			`packets = self.stream.encode(None)`
			`for packet in packets:`
			`self.container.mux(packet)`

			`data=self.output_buffer.getvalue()`
Aculy fixed tests this time 2025-03-20 19:15:07 +00:00			`self.close()`
Converted the stream writer to use pyav 2025-02-19 23:10:51 -05:00			`return data`
Ruff check + formatting 2025-02-09 18:32:17 -07:00
Refactor audio processing and cleanup: remove unused chunker, enhance StreamingAudioWriter for better MP3 handling, and improve text processing compatibility. 2025-01-27 20:23:42 -07:00			`if audio_data is None or len(audio_data) == 0:`
Ruff check + formatting 2025-02-09 18:32:17 -07:00			`return b""`
Refactor audio processing and cleanup: remove unused chunker, enhance StreamingAudioWriter for better MP3 handling, and improve text processing compatibility. 2025-01-27 20:23:42 -07:00
Converted the stream writer to use pyav 2025-02-19 23:10:51 -05:00			`if self.format == "pcm":`
-Add debug endpoint for system stats -Adjust headers, generate from phonemes, etc 2025-01-30 04:44:04 -07:00			`# Write raw bytes`
WIP: v1_0_0 migration 2025-01-28 13:52:57 -07:00			`return audio_data.tobytes()`
Converted the stream writer to use pyav 2025-02-19 23:10:51 -05:00			`else:`
			`frame = av.AudioFrame.from_ndarray(audio_data.reshape(1, -1), format='s16', layout='mono' if self.channels == 1 else 'stereo')`
			`frame.sample_rate=self.sample_rate`


			`frame.pts = self.pts`
			`self.pts += frame.samples`

			`packets = self.stream.encode(frame)`
			`for packet in packets:`
			`self.container.mux(packet)`

			`data = self.output_buffer.getvalue()`
			`self.output_buffer.seek(0)`
			`self.output_buffer.truncate(0)`
			`return data`
WIP: v1_0_0 migration 2025-01-28 13:52:57 -07:00