Merge pull request #40 from santinic/v3

v3 with Torch and Cuda, more languages, bug fixes and more
2025-09-18 21:40:39 +00:00 · 2025-02-01 13:01:58 +01:00 · 2025-02-01 13:01:58 +01:00 · 53a9f5d04b
commit 53a9f5d04b
parent 02be53ff20 61ad8408a8
9 changed files with 2176 additions and 846 deletions
--- a/.github/workflows/git-clone-and-run.yml
+++ b/.github/workflows/git-clone-and-run.yml
@ -0,0 +1,26 @@
+name: Git clone and run
+run-name: Git clone and run
+on: [ push, pull_request ]
+jobs:
+  git-clone-and-run:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: '3.11'
+      - uses: actions/checkout@v3
+      - name: Install dependencies
+        run: pip install poetry && poetry install
+      - name: check it runs as script
+        run: poetry run audiblez --help
+      - name: install ffmpeg and espeak-ng
+        run: sudo apt-get update && sudo apt-get install ffmpeg espeak-ng --fix-missing
+#      - name: download test epub
+#        run: wget https://github.com/daisy/epub-accessibility-tests/releases/download/fundamental-2.0/Fundamental-Accessibility-Tests-Basic-Functionality-v2.0.0.epub
+#      - name: create audiobook
+#        run: poetry run audiblez Fundamental-Accessibility-Tests-Basic-Functionality-v2.0.0.epub
+#      - name: check m4b output file
+#        run: ls -lah Fundamental-Accessibility-Tests-Basic-Functionality-v2.0.0.m4b
+      - name: run unit-tests (unittest classes in /test)
+        run:  poetry run python -m unittest discover test
--- a/.github/workflows/pip-install.yaml
+++ b/.github/workflows/pip-install.yaml
@ -19,8 +19,8 @@ jobs:
        run: python -m audiblez --help
      - name: check it runs as script
        run: audiblez --help
-      - name: install ffmpeg
-        run: sudo apt-get install ffmpeg
+      - name: install ffmpeg and espeak-ng
+        run: sudo apt update && sudo apt-get install ffmpeg espeak-ng
      - name: download test epub
        run: wget https://github.com/daisy/epub-accessibility-tests/releases/download/fundamental-2.0/Fundamental-Accessibility-Tests-Basic-Functionality-v2.0.0.epub
      - name: create audiobook
--- a/README.md
+++ b/README.md
@ -1,34 +1,41 @@
-# Audiblez: Generate  audiobooks from e-books
+# Audiblez: Generate  audiobooks from e-books 
+
 [![Installing via pip and running](https://github.com/santinic/audiblez/actions/workflows/pip-install.yaml/badge.svg)](https://github.com/santinic/audiblez/actions/workflows/pip-install.yaml)
+[![Git clone and run](https://github.com/santinic/audiblez/actions/workflows/git-clone-and-run.yml/badge.svg)](https://github.com/santinic/audiblez/actions/workflows/git-clone-and-run.yml)
 ![PyPI - Python Version](https://img.shields.io/pypi/pyversions/audiblez)
 ![PyPI - Version](https://img.shields.io/pypi/v/audiblez)

-Audiblez generates `.m4b` audiobooks from regular `.epub` e-books, 
+### v3.0 Now with CUDA support!
+
+Audiblez generates `.m4b` audiobooks from regular `.epub` e-books,
 using Kokoro's high-quality speech synthesis.

-[Kokoro v0.19](https://huggingface.co/hexgrad/Kokoro-82M) is a recently published text-to-speech model with just 82M params and very natural sounding output.
+[Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) is a recently published text-to-speech model with just 82M params and very natural sounding output.
 It's released under Apache licence and it was trained on < 100 hours of audio.
-It currently supports American, British English, French, Korean, Japanese and Mandarin, and a bunch of very good voices.
+It currently supports American and British English in a bunch of very good voices. 

-On my M2 MacBook Pro, **it takes about 2 hours to convert to mp3 the Selfish Gene by Richard Dawkins**, which is about 100,000 words (or 600,000 characters),
-at a rate of about 80 characters per second.
+Future support for French, Korean, Japanese and Mandarin is planned.
+
+On a Google Colab's T4 GPU via Cuda, **it takes about 5 minutes to convert "Animal's Farm" by Orwell** (which is a bout 160,000 characters) to audiobook, at a rate of about 600 characters per second.
+
+On my M2 MacBook Pro, on CPU, it takes about 1 hour, at a rate of about 60 characters per second.

 ## How to install and run

 If you have Python 3 on your computer, you can install it with pip.
-Be aware that it won't work with Python 3.13.
-Then you also need to download a couple of additional files in the same folder, which are about ~360MB:
+You also need `espeak-ng` and `ffmpeg` installed on your machine:

 ```bash
 pip install audiblez
-wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/kokoro-v0_19.onnx
-wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/voices.json
+
+sudo apt install ffmpeg espeak-ng     # on Ubuntu/Debian 🐧
+brew install ffmpeg espeak-ng         # on Mac 🍏
 ```

 Then, to convert an epub file into an audiobook, just run:

 ```bash
-audiblez book.epub -l en-gb -v af_sky
+audiblez book.epub -v af_sky
 ```

 It will first create a bunch of `book_chapter_1.wav`, `book_chapter_2.wav`, etc. files in the same directory,
@ -36,57 +43,41 @@ and at the end it will produce a `book.m4b` file with the whole book you can lis
 audiobook player.
 It will only produce the `.m4b` file if you have `ffmpeg` installed on your machine.

-## Supported Languages
-Use `-l` option to specify the language, available language codes are:
-🇺🇸 `en-us`, 🇬🇧 `en-gb`, 🇫🇷 `fr-fr`, 🇯🇵 `ja`, 🇰🇷 `kr` and 🇨🇳 `cmn`.
-
 ## Speed
+
 By default the audio is generated using a normal speed, but you can make it up to twice slower or faster by specifying a speed argument between 0.5 to 2.0:

 ```bash
-audiblez book.epub -l en-gb -v af_sky -s 1.5
+audiblez book.epub -v af_sky -s 1.5
 ```

 ## Supported Voices
-Use `-v` option to specify the voice:
-available voices are `af`, `af_bella`, `af_nicole`, `af_sarah`, `af_sky`, `am_adam`, `am_michael`, `bf_emma`, `bf_isabella`, `bm_george`, `bm_lewis`.
-You can try them here: [https://huggingface.co/spaces/hexgrad/Kokoro-TTS](https://huggingface.co/spaces/hexgrad/Kokoro-TTS)
+
+Use `-v` option to specify the voice to use. Available voices are listed here. 
+the first letter is the language code and the second is the gender of the speaker e.g. `im_nicola` is an italian male voice.
+
+| Language | Voices |
+|----------|--------|
+| 🇺🇸 | `af_alloy`, `af_aoede`, `af_bella`, `af_heart`, `af_jessica`, `af_kore`, `af_nicole`, `af_nova`, `af_river`, `af_sarah`, `af_sky`, `am_adam`, `am_echo`, `am_eric`, `am_fenrir`, `am_liam`, `am_michael`, `am_onyx`, `am_puck`, `am_santa` |
+| 🇬🇧 | `bf_alice`, `bf_emma`, `bf_isabella`, `bf_lily`, `bm_daniel`, `bm_fable`, `bm_george`, `bm_lewis` |
+| 🇪🇸 | `ef_dora`, `em_alex`, `em_santa` |
+| 🇫🇷 | `ff_siwis` |
+| 🇮🇳 | `hf_alpha`, `hf_beta`, `hm_omega`, `hm_psi` |
+| 🇮🇹 | `if_sara`, `im_nicola` |
+| 🇯🇵 | `jf_alpha`, `jf_gongitsune`, `jf_nezumi`, `jf_tebukuro`, `jm_kumo` |
+| 🇧🇷 | `pf_dora`, `pm_alex`, `pm_santa` |
+| 🇨🇳 | `zf_xiaobei`, `zf_xiaoni`, `zf_xiaoxiao`, `zf_xiaoyi`, `zm_yunjian`, `zm_yunxi`, `zm_yunxia`, `zm_yunyang` |


 ## How to run on GPU
-By default audiblez runs on CPU. If you want to use a GPU for faster performance, install the GPU-enabled ONNX Runtime and specify a runtime provider with the `--providers` flag. By default, the CPU-enabled ONNX Runtime is installed. The GPU runtime must be installed manually.

-```bash
-pip install onnxruntime-gpu
-```
+By default audiblez runs on CPU. If you pass the option `--cuda` it will try to use the Cuda device via Torch.

-To specify ONNX providers, such as using an NVIDIA GPU, use the `--providers` tag. For example:
+Check out this example: [Audiblez running on a Google Colab Notebook with Cuda ](https://colab.research.google.com/drive/164PQLowogprWQpRjKk33e-8IORAvqXKI?usp=sharing]).

-```bash
-audiblez book.epub -l en-gb -v af_sky --providers CUDAExecutionProvider
-```
-
-To see the list of available providers on your system, run the following:
-
-```bash
-audiblez --help
-```
-
-or
-
-```bash
-python -c "import onnxruntime as ort; print(ort.get_available_providers())"
-```
-
-This will display the ONNX providers that can be used, such as `CUDAExecutionProvider` for NVIDIA GPUs or `CPUExecutionProvider` for CPU-only execution.
-
-You can specify a provider hierarchy by providing multiple hierarchies separated by spaces.
-
-```bash
-audiblez book.epub -l en-gb -v af_sky --providers CUDAExecutionProvider CPUExecutionProvider
-```
+We don't currently support Apple Silicon, as there is not yet a Kokoro implementation in MLX. As soon as it will be available, we will support it.

 ## Author
+
 by [Claudio Santini](https://claudio.uk) in 2025, distributed under MIT licence.

-Related article: [Convert E-books into audiobooks with Kokoro](https://claudio.uk/posts/epub-to-audiobook.html)
--- a/audiblez.py
+++ b/audiblez.py
@ -2,63 +2,59 @@
 # audiblez - A program to convert e-books into audiobooks using
 # Kokoro-82M model for high-quality text-to-speech synthesis.
 # by Claudio Santini 2025 - https://claudio.uk
-
+import torch
+import spacy
+import ebooklib
+import soundfile
+import numpy as np
 import argparse
 import sys
 import time
 import shutil
 import subprocess
-import soundfile as sf
-import ebooklib
-import warnings
 import re
+from tabulate import tabulate
 from pathlib import Path
 from string import Formatter
+from yaspin import yaspin
 from bs4 import BeautifulSoup
-from kokoro_onnx import config
-from kokoro_onnx import Kokoro
+from kokoro import KPipeline
 from ebooklib import epub
 from pydub import AudioSegment
 from pick import pick
-import onnxruntime as ort
 from tempfile import NamedTemporaryFile

-MODEL_FILE = 'kokoro-v0_19.onnx'
-VOICES_FILE = 'voices.json'
-config.MAX_PHONEME_LENGTH = 128
+from voices import voices, available_voices_str
+
+sample_rate = 24000


-def main(kokoro, file_path, lang, voice, pick_manually, speed, providers):
-    # Set ONNX providers if specified
-    if providers:
-        available_providers = ort.get_available_providers()
-        invalid_providers = [p for p in providers if p not in available_providers]
-        if invalid_providers:
-            print(f"Invalid ONNX providers: {', '.join(invalid_providers)}")
-            print(f"Available providers: {', '.join(available_providers)}")
-            sys.exit(1)
-        kokoro.sess.set_providers(providers)
-        print(f"Using ONNX providers: {', '.join(providers)}")
+def main(file_path, voice, pick_manually, speed, max_chapters=None):
+    if not spacy.util.is_package("xx_ent_wiki_sm"):
+        print("Downloading Spacy model xx_ent_wiki_sm...")
+        spacy.cli.download("xx_ent_wiki_sm")
    filename = Path(file_path).name
-    warnings.simplefilter("ignore")
    book = epub.read_epub(file_path)
-    title = book.get_metadata('DC', 'title')[0][0]
-    creator = book.get_metadata('DC', 'creator')[0][0]
+    meta_title = book.get_metadata('DC', 'title')
+    title = meta_title[0][0] if meta_title else ''
+    meta_creator = book.get_metadata('DC', 'creator')
+    creator = meta_creator[0][0] if meta_creator else ''

    cover_maybe = find_cover(book)
    cover_image = cover_maybe.get_content() if cover_maybe else b""
    if cover_maybe:
        print(f'Found cover image {cover_maybe.file_name} in {cover_maybe.media_type} format')

-    intro = f'{title} by {creator}'
+    intro = f'{title} – {creator}.\n\n'
    print(intro)
-    print('Found Chapters:', [c.get_name() for c in book.get_items() if c.get_type() == ebooklib.ITEM_DOCUMENT])
-    if pick_manually:
-        chapters = pick_chapters(book)
+
+    document_chapters = find_document_chapters_and_extract_texts(book)
+    if pick_manually is True:
+        selected_chapters = pick_chapters(document_chapters)
    else:
-        chapters = find_chapters(book)
-    print('Automatically selected chapters:', [c.get_name() for c in chapters])
-    texts = extract_texts(chapters)
+        selected_chapters = find_good_chapters(document_chapters)
+    print_selected_chapters(document_chapters, selected_chapters)
+    texts = [c.extracted_text for c in selected_chapters]

    has_ffmpeg = shutil.which('ffmpeg') is not None
    if not has_ffmpeg:
@ -68,42 +64,52 @@ def main(kokoro, file_path, lang, voice, pick_manually, speed, providers):
    print('Started at:', time.strftime('%H:%M:%S'))
    print(f'Total characters: {total_chars:,}')
    print('Total words:', len(' '.join(texts).split()))
-    chars_per_sec = 50  # assume 50 chars per second at the beginning
-    print(f'Estimated time remaining (assuming 50 chars/sec): {strfdelta((total_chars - processed_chars) / chars_per_sec)}')
+    chars_per_sec = 500 if torch.cuda.is_available() else 50
+    print(f'Estimated time remaining (assuming {chars_per_sec} chars/sec): {strfdelta((total_chars - processed_chars) / chars_per_sec)}')

-    chapter_mp3_files = []
-    durations = {}
-
-    for i, text in enumerate(texts, start=1):
-        chapter_filename = filename.replace('.epub', f'_chapter_{i}.wav')
-        chapter_mp3_files.append(chapter_filename)
+    chapter_wav_files = []
+    for i, chapter in enumerate(selected_chapters, start=1):
+        if max_chapters and i > max_chapters: break
+        text = chapter.extracted_text
+        xhtml_file_name = chapter.get_name().replace(' ', '_').replace('/', '_').replace('\\', '_')
+        chapter_filename = filename.replace('.epub', f'_chapter_{i}_{voice}_{xhtml_file_name}.wav')
+        chapter_wav_files.append(chapter_filename)
        if Path(chapter_filename).exists():
            print(f'File for chapter {i} already exists. Skipping')
            continue
        if len(text.strip()) < 10:
            print(f'Skipping empty chapter {i}')
-            chapter_mp3_files.remove(chapter_filename)
+            chapter_wav_files.remove(chapter_filename)
            continue
-        print(f'Reading chapter {i} ({len(text):,} characters)...')
        if i == 1:
            text = intro + '.\n\n' + text
        start_time = time.time()
-        samples, sample_rate = kokoro.create(text, voice=voice, speed=speed, lang=lang)
-        sf.write(f'{chapter_filename}', samples, sample_rate)
-        durations[chapter_filename] = len(samples) / sample_rate
-        end_time = time.time()
-        delta_seconds = end_time - start_time
-        chars_per_sec = len(text) / delta_seconds
-        processed_chars += len(text)
-        print(f'Estimated time remaining: {strfdelta((total_chars - processed_chars) / chars_per_sec)}')
-        print('Chapter written to', chapter_filename)
-        print(f'Chapter {i} read in {delta_seconds:.2f} seconds ({chars_per_sec:.0f} characters per second)')
-        progress = processed_chars * 100 // total_chars
-        print('Progress:', f'{progress}%\n')
+        pipeline = KPipeline(lang_code=voice[0])  # a for american or b for british etc.
+
+        with yaspin(text=f'Reading chapter {i} ({len(text):,} characters)...', color="yellow") as spinner:
+            audio_segments = gen_audio_segments(pipeline, text, voice, speed)
+            if audio_segments:
+                final_audio = np.concatenate(audio_segments)
+                soundfile.write(chapter_filename, final_audio, sample_rate)
+                end_time = time.time()
+                delta_seconds = end_time - start_time
+                chars_per_sec = len(text) / delta_seconds
+                processed_chars += len(text)
+                spinner.ok("✅")
+                print(f'Estimated time remaining: {strfdelta((total_chars - processed_chars) / chars_per_sec)}')
+                print('Chapter written to', chapter_filename)
+                print(f'Chapter {i} read in {delta_seconds:.2f} seconds ({chars_per_sec:.0f} characters per second)')
+                progress = processed_chars * 100 // total_chars
+                print('Progress:', f'{progress}%\n')
+            else:
+                spinner.fail("❌")
+                print(f'Warning: No audio generated for chapter {i}')
+                chapter_wav_files.remove(chapter_filename)

    if has_ffmpeg:
-        create_index_file(title, creator, chapter_mp3_files, durations)
-        create_m4b(chapter_mp3_files, filename, title, creator, cover_image)
+        create_index_file(title, creator, chapter_wav_files)
+        create_m4b(chapter_wav_files, filename, cover_image)
+

 def find_cover(book):
    def is_image(item):
@ -127,9 +133,32 @@ def find_cover(book):

    return None

-def extract_texts(chapters):
-    texts = []
-    for chapter in chapters:
+
+def print_selected_chapters(document_chapters, chapters):
+    print(tabulate([
+        [i, c.get_name(), len(c.extracted_text), '✅' if c in chapters else '', chapter_beginning_one_liner(c)]
+        for i, c in enumerate(document_chapters, start=1)
+    ], headers=['#', 'Chapter', 'Text Length', 'Selected', 'First words']))
+
+
+def gen_audio_segments(pipeline, text, voice, speed):
+    nlp = spacy.load('xx_ent_wiki_sm')
+    nlp.add_pipe('sentencizer')
+    audio_segments = []
+    doc = nlp(text)
+    sentences = list(doc.sents)
+    for sent in sentences:
+        for gs, ps, audio in pipeline(sent.text, voice=voice, speed=speed, split_pattern=r'\n\n\n'):
+            audio_segments.append(audio)
+    return audio_segments
+
+
+def find_document_chapters_and_extract_texts(book):
+    """Returns every chapter that is an ITEM_DOCUMENT and enriches each chapter with extracted_text."""
+    document_chapters = []
+    for chapter in book.get_items():
+        if chapter.get_type() != ebooklib.ITEM_DOCUMENT:
+            continue
        xml = chapter.get_body_content()
        soup = BeautifulSoup(xml, features='lxml')
        chapter_text = ''
@ -138,39 +167,46 @@ def extract_texts(chapters):
            inner_text = child.text.strip() if child.text else ""
            if inner_text:
                chapter_text += inner_text + '\n'
-        texts.append(chapter_text)
-    return texts
+        chapter.extracted_text = chapter_text
+        document_chapters.append(chapter)
+    return document_chapters


 def is_chapter(c):
    name = c.get_name().lower()
-    return bool(
+    has_min_len = len(c.extracted_text) > 100
+    title_looks_like_chapter = bool(
        'chapter' in name.lower()
-        or re.search(r'part\d{1,3}', name)
-        or re.search(r'ch\d{1,3}', name)
-        or re.search(r'chap\d{1,3}', name)
+        or re.search(r'part_?\d{1,3}', name)
+        or re.search(r'split_?\d{1,3}', name)
+        or re.search(r'ch_?\d{1,3}', name)
+        or re.search(r'chap_?\d{1,3}', name)
    )
+    return has_min_len and title_looks_like_chapter


-def find_chapters(book, verbose=False):
-    chapters = [c for c in book.get_items() if c.get_type() == ebooklib.ITEM_DOCUMENT and is_chapter(c)]
-    if verbose:
-        for item in book.get_items():
-            if item.get_type() == ebooklib.ITEM_DOCUMENT:
-                print(f"'{item.get_name()}'" + ', #' + str(len(item.get_body_content())))
-                # print(f'{item.get_name()}'.ljust(60), str(len(item.get_body_content())).ljust(15), 'X' if item in chapters else '-')
+def chapter_beginning_one_liner(c, chars=20):
+    s = c.extracted_text[:chars].strip().replace('\n', ' ').replace('\r', ' ')
+    return s + '…' if len(s) > 0 else ''
+
+
+def find_good_chapters(document_chapters):
+    chapters = [c for c in document_chapters if c.get_type() == ebooklib.ITEM_DOCUMENT and is_chapter(c)]
    if len(chapters) == 0:
-        print('Not easy to find the chapters, defaulting to all available documents.')
-        chapters = [c for c in book.get_items() if c.get_type() == ebooklib.ITEM_DOCUMENT]
+        print('Not easy to recognize the chapters, defaulting to all non-empty documents.')
+        chapters = [c for c in document_chapters if c.get_type() == ebooklib.ITEM_DOCUMENT and len(c.extracted_text) > 10]
    return chapters


-def pick_chapters(book):
-    all_chapters_names = [c.get_name() for c in book.get_items() if c.get_type() == ebooklib.ITEM_DOCUMENT]
+def pick_chapters(chapters):
+    # Display the document name, the length and first 50 characters of the text
+    chapters_by_names = {
+        f'{c.get_name()}\t({len(c.extracted_text)} chars)\t[{chapter_beginning_one_liner(c, 50)}]': c
+        for c in chapters}
    title = 'Select which chapters to read in the audiobook'
-    selected_chapters_names = pick(all_chapters_names, title, multiselect=True, min_selection_count=1)
-    selected_chapters_names = [c[0] for c in selected_chapters_names]
-    selected_chapters = [c for c in book.get_items() if c.get_name() in selected_chapters_names]
+    ret = pick(list(chapters_by_names.keys()), title, multiselect=True, min_selection_count=1)
+    selected_chapters_out_of_order = [chapters_by_names[r[0]] for r in ret]
+    selected_chapters = [c for c in chapters if c in selected_chapters_out_of_order]
    return selected_chapters


@ -187,7 +223,7 @@ def strfdelta(tdelta, fmt='{D:02}d {H:02}h {M:02}m {S:02}s'):
    return f.format(fmt, **values)


-def create_m4b(chapter_files, filename, title, author, cover_image):
+def create_m4b(chapter_files, filename, cover_image):
    tmp_filename = filename.replace('.epub', '.tmp.mp4')
    if not Path(tmp_filename).exists():
        combined_audio = AudioSegment.empty()
@ -232,51 +268,46 @@ def probe_duration(file_name):
    return float(proc.stdout.strip())


-def create_index_file(title, creator, chapter_mp3_files, durations):
+def create_index_file(title, creator, chapter_mp3_files):
    with open("chapters.txt", "w") as f:
        f.write(f";FFMETADATA1\ntitle={title}\nartist={creator}\n\n")
        start = 0
        i = 0
        for c in chapter_mp3_files:
-            if c not in durations:
-                durations[c] = probe_duration(c)
-            end = start + (int)(durations[c] * 1000)
+            duration = probe_duration(c)
+            end = start + (int)(duration * 1000)
            f.write(f"[CHAPTER]\nTIMEBASE=1/1000\nSTART={start}\nEND={end}\ntitle=Chapter {i}\n\n")
            i += 1
            start = end


 def cli_main():
-    if not Path(MODEL_FILE).exists() or not Path(VOICES_FILE).exists():
-        print('Error: kokoro-v0_19.onnx and voices.json must be in the current directory. Please download them with:')
-        print('wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/kokoro-v0_19.onnx')
-        print('wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/voices.json')
-        sys.exit(1)
-    kokoro = Kokoro(MODEL_FILE, VOICES_FILE)
-    voices = list(kokoro.get_voices())
    voices_str = ', '.join(voices)
-    epilog = 'example:\n' + \
-             '  audiblez book.epub -l en-us -v af_sky'
-    default_voice = 'af_sky' if 'af_sky' in voices else voices[0]
-
-    # Get available ONNX providers
-    available_providers = ort.get_available_providers()
-    providers_help = f"Available ONNX providers: {', '.join(available_providers)}"
-
+    epilog = ('example:\n' +
+              '  audiblez book.epub -l en-us -v af_sky\n\n' +
+              'available voices:\n' +
+              available_voices_str)
+    default_voice = 'af_sky'
    parser = argparse.ArgumentParser(epilog=epilog, formatter_class=argparse.RawDescriptionHelpFormatter)
    parser.add_argument('epub_file_path', help='Path to the epub file')
-    parser.add_argument('-l', '--lang', default='en-gb', help='Language code: en-gb, en-us, fr-fr, ja, ko, cmn')
    parser.add_argument('-v', '--voice', default=default_voice, help=f'Choose narrating voice: {voices_str}')
-    parser.add_argument('-p', '--pick', default=False, help=f'Interactively select which chapters to read in the audiobook',
-                        action='store_true')
+    parser.add_argument('-p', '--pick', default=False, help=f'Interactively select which chapters to read in the audiobook', action='store_true')
    parser.add_argument('-s', '--speed', default=1.0, help=f'Set speed from 0.5 to 2.0', type=float)
-    parser.add_argument('--providers', nargs='+', metavar='PROVIDER', help=f"Specify ONNX providers. {providers_help}")
+    parser.add_argument('-c', '--cuda', default=False, help=f'Use GPU via Cuda in Torch if available', action='store_true')

    if len(sys.argv) == 1:
        parser.print_help(sys.stderr)
        sys.exit(1)
    args = parser.parse_args()
-    main(kokoro, args.epub_file_path, args.lang, args.voice, args.pick, args.speed, args.providers)
+
+    if args.cuda:
+        if torch.cuda.is_available():
+            print('CUDA GPU available')
+            torch.set_default_device('cuda')
+        else:
+            print('CUDA GPU not available. Defaulting to CPU')
+
+    main(args.epub_file_path, args.voice, args.pick, args.speed)


 if __name__ == '__main__':
--- a/poetry.lock
+++ b/poetry.lock
--- a/pyproject.toml
+++ b/pyproject.toml
@ -1,81 +1,21 @@
 [project]
 name = "audiblez"
-version = "0.2.2"
+version = "0.3.1"
 description = "Generate audiobooks from e-books (epub to wav/m4b)"
 authors = [
    { name = "Claudio Santini", email = "hireclaudio@gmail.com" }
 ]
 readme = "README.md"
-requires-python = ">=3.9,<3.13" # librosa/llvmlite have no support for python > 3.12
+requires-python = ">=3.9,<3.13"
 dependencies = [
-    "bs4 (==0.0.2)",
-    "attrs (==24.3.0)",
-    "audioread (==3.0.1)",
-    "babel (==2.16.0)",
-    "beautifulsoup4 (==4.12.3)",
-    "bibtexparser (==2.0.0b8)",
-    "certifi (==2024.12.14)",
-    "cffi (==1.17.1)",
-    "charset-normalizer (==3.4.1)",
-    "clldutils (==3.24.0)",
-    "colorama (==0.4.6)",
-    "coloredlogs (==15.0.1)",
-    "colorlog (==6.9.0)",
-    "csvw (==3.5.1)",
-    "decorator (==5.1.1)",
-    "dlinfo (==1.2.1)",
-    "ebooklib (==0.18)",
-    "espeakng-loader (==0.2.1)",
-    "flatbuffers (==24.12.23)",
-    "humanfriendly (==10.0)",
-    "idna (==3.10)",
-    "isodate (==0.7.2)",
-    "joblib (==1.4.2)",
-    "jsonschema (==4.23.0)",
-    "jsonschema-specifications (==2024.10.1)",
-    "kokoro-onnx (==0.2.6)",
-    "language-tags (==1.2.0)",
-    "lazy-loader (==0.4)",
-    "librosa (==0.10.2.post1)",
-    "llvmlite (==0.43.0)",
-    "lxml (==5.3.0)",
-    "markdown (==3.7)",
-    "markupsafe (==3.0.2)",
-    "mpmath (==1.3.0)",
-    "msgpack (==1.1.0)",
-    "numba (==0.60.0)",
-    "numpy (==2.0.2)",
-    "onnxruntime (==1.20.1)",
-    "packaging (==24.2)",
-    "phonemizer-fork (==3.3.1)",
-    "platformdirs (==4.3.6)",
-    "pooch (==1.8.2)",
-    "protobuf (==5.29.3)",
-    "pycparser (==2.22)",
-    "pylatexenc (==2.10)",
-    "pyparsing (==3.2.1)",
-    "python-dateutil (==2.9.0.post0)",
-    "rdflib (==7.1.2)",
-    "referencing (==0.35.1)",
-    "regex (==2024.11.6)",
-    "requests (==2.32.3)",
-    "rfc3986 (==1.5.0)",
-    "rpds-py (==0.22.3)",
-    "scikit-learn (==1.6.1)",
-    "scipy (==1.15.1)",
-    "segments (==2.2.1)",
-    "six (==1.17.0)",
-    "soundfile (==0.13.0)",
-    "soupsieve (==2.6)",
-    "soxr (==0.5.0.post1)",
-    "sympy (==1.13.3)",
-    "tabulate (==0.9.0)",
-    "threadpoolctl (==3.5.0)",
-    "typing-extensions (==4.12.2)",
-    "uritemplate (==4.1.1)",
-    "urllib3 (==2.3.0)",
-    "pydub (>=0.25.1,<0.26.0)",
+    "kokoro (>=0.2.3,<0.3.0)",
+    "ebooklib (>=0.18,<0.19)",
+    "soundfile (>=0.13.1,<0.14.0)",
    "pick (>=2.4.0,<3.0.0)",
+    "bs4 (>=0.0.2,<0.0.3)",
+    "pydub (>=0.25.1,<0.26.0)",
+    "spacy (>=3.8.4,<4.0.0)",
+    "yaspin (>=3.1.0,<4.0.0)"
 ]


@ -91,3 +31,6 @@ Issues = "https://github.com/santinic/audiblez/issues"

 [project.scripts]
 audiblez = "audiblez:cli_main"
+
+[tool.poetry.group.dev.dependencies]
+deptry = "^0.23.0"
--- a/test/test_find_chapters.py
+++ b/test/test_find_chapters.py
@ -1,13 +1,15 @@
 import unittest
 from ebooklib import epub

-from audiblez import find_chapters
+from audiblez import find_good_chapters, find_document_chapters_and_extract_texts


+@unittest.skip('Development only, not for CI')
 class FindChaptersTest(unittest.TestCase):
    def base(self, file, expected_chapter_names):
        book = epub.read_epub(file)
-        chapters = find_chapters(book)
+        document_chapters = find_document_chapters_and_extract_texts(book)
+        chapters = find_good_chapters(document_chapters)
        chapter_names = [c.get_name() for c in chapters]
        self.assertEqual(chapter_names, expected_chapter_names)

@ -102,32 +104,32 @@ class FindChaptersTest(unittest.TestCase):
            # 'Text/03_Dedi.xhtml',  # 211
            # 'Text/04_Contents.xhtml',  # 6302
            # 'Text/06_Intro.xhtml',  # 34726
-            'Text/11_Part1.xhtml',  # 332
+            # 'Text/11_Part1.xhtml',  # 332
            'Text/12_Chapter01.xhtml',  # 33940
            'Text/12_Chapter02.xhtml',  # 47738
-            'Text/12_Chapter02_Part2.xhtml',  # 328
+            # 'Text/12_Chapter02_Part2.xhtml',  # 328
            'Text/12_Chapter03.xhtml',  # 42010
            'Text/12_Chapter04.xhtml',  # 42182
            'Text/12_Chapter05.xhtml',  # 51283
-            'Text/12_Chapter05_Part3.xhtml',  # 327
+            # 'Text/12_Chapter05_Part3.xhtml',  # 327
            'Text/12_Chapter06.xhtml',  # 46570
            'Text/12_Chapter07.xhtml',  # 48752
            'Text/12_Chapter08.xhtml',  # 52048
            'Text/12_Chapter09.xhtml',  # 35717
-            'Text/12_Chapter09_Part4.xhtml',  # 343
+            # 'Text/12_Chapter09_Part4.xhtml',  # 343
            'Text/12_Chapter10.xhtml',  # 43114
            'Text/12_Chapter11.xhtml',  # 53469
            'Text/12_Chapter12.xhtml',  # 29441
            'Text/12_Chapter13.xhtml',  # 34265
-            'Text/12_Chapter13_Part5.xhtml',  # 327
+            # 'Text/12_Chapter13_Part5.xhtml',  # 327
            'Text/12_Chapter14.xhtml',  # 42381
            'Text/12_Chapter15.xhtml',  # 43875
            'Text/12_Chapter16.xhtml',  # 28861
-            'Text/12_Chapter16_Part6.xhtml',  # 328
+            # 'Text/12_Chapter16_Part6.xhtml',  # 328
            'Text/12_Chapter17.xhtml',  # 45652
            'Text/12_Chapter18.xhtml',  # 43328
            'Text/12_Chapter19.xhtml',  # 34453
-            'Text/12_Chapter19_Part7.xhtml',  # 334
+            # 'Text/12_Chapter19_Part7.xhtml',  # 334
            'Text/12_Chapter20.xhtml',  # 40554
            'Text/12_Chapter21.xhtml',  # 30382
            'Text/12_Chapter22.xhtml',  # 57467
--- a/test/test_main.py
+++ b/test/test_main.py
@ -1,28 +1,44 @@
+import os
 import unittest
 from pathlib import Path
-from kokoro_onnx import Kokoro

-from audiblez import VOICES_FILE, MODEL_FILE, main
+from kokoro import KPipeline
+
+from audiblez import main


 class MainTest(unittest.TestCase):
-    def base(self, **kwargs):
-        base_path = Path(__file__).parent / '..'
-        kokoro = Kokoro(base_path / MODEL_FILE, base_path / VOICES_FILE)
-        main(kokoro, lang='en-gb', voice='af_sky', providers=None, pick_manually=False, speed=1, **kwargs)
+    def base(self, name, url, **kwargs):
+        if not Path(f'{name}.epub').exists():
+            os.system(f'wget {url} -O {name}.epub')
+        Path(f'{name}.m4b').unlink(missing_ok=True)
+        os.system(f'rm {name}_chapter_*.wav')
+        merged_args = dict(voice='af_sky', pick_manually=False, speed=1.0, max_chapters=2)
+        merged_args.update(kwargs)
+        main(f'{name}.epub', **merged_args)
+        m4b_file = Path(f'{name}.m4b')
+        self.assertTrue(m4b_file.exists())
+        self.assertTrue(m4b_file.stat().st_size > 256 * 1024)

-    def test_1_mini(self):
-        Path('mini.m4b').unlink(missing_ok=True)
-        self.base(file_path='../epub/mini.epub')
-        self.assertTrue(Path('mini.m4b').exists())
+    def test_poe(self):
+        url = 'https://www.gutenberg.org/ebooks/1064.epub.images'
+        self.base('poe', url)

-    def test_2_allan_poe(self):
-        Path('poe.m4b').unlink(missing_ok=True)
-        self.base(file_path='../epub/poe.epub')
-        self.assertTrue(Path('poe.m4b').exists())
+    @unittest.skip('too slow for CI')
+    def test_orwell(self):
+        url = 'https://archive.org/download/AnimalFarmByGeorgeOrwell/Animal%20Farm%20by%20George%20Orwell.epub'
+        self.base('orwell', url)

-    def test_3_gene(self):
-        Path('gene.m4b').unlink(missing_ok=True)
-        self.base(file_path='../epub/gene.epub')
-        self.assertTrue(Path('gene.m4b').exists())
+    def test_italian_pirandello(self):
+        url = 'https://www.liberliber.eu/mediateca/libri/p/pirandello/cosi_e_se_vi_pare_1925/epub/pirandello_cosi_e_se_vi_pare_1925.epub'
+        self.base('pirandello', url, voice='im_nicola')
+        self.assertTrue(Path('pirandello.m4b').exists())

+    @unittest.skip('too slow for CI')
+    def test_italian_manzoni(self):
+        url = 'https://www.liberliber.eu/mediateca/libri/m/manzoni/i_promessi_sposi/epub/manzoni_i_promessi_sposi.epub'
+        self.base('manzoni', url, voice='im_nicola', max_chapters=1)
+
+    def test_french_baudelaire(self):
+        url = 'http://gallica.bnf.fr/ark:/12148/bpt6k70861t.epub'
+        self.base('baudelaire', url, voice='ff_siwis')
--- a/voices.py
+++ b/voices.py
@ -0,0 +1,67 @@
+flags = {
+    'a': '🇺🇸',
+    'b': '🇬🇧',
+    'e': '🇪🇸',
+    'f': '🇫🇷',
+    'h': '🇮🇳',
+    'i': '🇮🇹',
+    'j': '🇯🇵',
+    'p': '🇧🇷',
+    'z': '🇨🇳'
+}
+
+voices = {
+    'a': [
+        'af_alloy',
+        'af_aoede',
+        'af_bella',
+        'af_heart',
+        'af_jessica',
+        'af_kore',
+        'af_nicole',
+        'af_nova',
+        'af_river',
+        'af_sarah',
+        'af_sky',
+        'am_adam',
+        'am_echo',
+        'am_eric',
+        'am_fenrir',
+        'am_liam',
+        'am_michael',
+        'am_onyx',
+        'am_puck',
+        'am_santa'],
+    'b': [
+        'bf_alice',
+        'bf_emma',
+        'bf_isabella',
+        'bf_lily',
+        'bm_daniel',
+        'bm_fable',
+        'bm_george',
+        'bm_lewis'],
+    'e': ['ef_dora', 'em_alex', 'em_santa'],
+    'f': ['ff_siwis'],
+    'h': ['hf_alpha', 'hf_beta', 'hm_omega', 'hm_psi'],
+    'i': ['if_sara', 'im_nicola'],
+    'j': ['jf_alpha', 'jf_gongitsune', 'jf_nezumi', 'jf_tebukuro', 'jm_kumo'],
+    'p': ['pf_dora', 'pm_alex', 'pm_santa'],
+    'z': [
+        'zf_xiaobei',
+        'zf_xiaoni',
+        'zf_xiaoxiao',
+        'zf_xiaoyi',
+        'zm_yunjian',
+        'zm_yunxi',
+        'zm_yunxia',
+        'zm_yunyang'
+    ]
+}
+
+available_voices_str = ('\n'.join([f'  {flags[lang]} {", ".join(voices[lang])}' for lang in voices])
+                        .replace(' af_sky,', '\n       af_sky,'))
+
+# for key, l in voices.items():
+#     ls = ', '.join([f'`{j}`' for j in l])
+#     print(f'| {flags[key]} | {ls} |')