Merge pull request #40 from santinic/v3

v3 with Torch and Cuda, more languages, bug fixes and more
This commit is contained in:
Claudio Santini 2025-02-01 13:01:58 +01:00 committed by GitHub
commit 53a9f5d04b
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
9 changed files with 2176 additions and 846 deletions

26
.github/workflows/git-clone-and-run.yml vendored Normal file
View file

@ -0,0 +1,26 @@
name: Git clone and run
run-name: Git clone and run
on: [ push, pull_request ]
jobs:
git-clone-and-run:
runs-on: ubuntu-latest
steps:
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
- uses: actions/checkout@v3
- name: Install dependencies
run: pip install poetry && poetry install
- name: check it runs as script
run: poetry run audiblez --help
- name: install ffmpeg and espeak-ng
run: sudo apt-get update && sudo apt-get install ffmpeg espeak-ng --fix-missing
# - name: download test epub
# run: wget https://github.com/daisy/epub-accessibility-tests/releases/download/fundamental-2.0/Fundamental-Accessibility-Tests-Basic-Functionality-v2.0.0.epub
# - name: create audiobook
# run: poetry run audiblez Fundamental-Accessibility-Tests-Basic-Functionality-v2.0.0.epub
# - name: check m4b output file
# run: ls -lah Fundamental-Accessibility-Tests-Basic-Functionality-v2.0.0.m4b
- name: run unit-tests (unittest classes in /test)
run: poetry run python -m unittest discover test

View file

@ -19,8 +19,8 @@ jobs:
run: python -m audiblez --help
- name: check it runs as script
run: audiblez --help
- name: install ffmpeg
run: sudo apt-get install ffmpeg
- name: install ffmpeg and espeak-ng
run: sudo apt update && sudo apt-get install ffmpeg espeak-ng
- name: download test epub
run: wget https://github.com/daisy/epub-accessibility-tests/releases/download/fundamental-2.0/Fundamental-Accessibility-Tests-Basic-Functionality-v2.0.0.epub
- name: create audiobook

View file

@ -1,34 +1,41 @@
# Audiblez: Generate audiobooks from e-books
# Audiblez: Generate audiobooks from e-books
[![Installing via pip and running](https://github.com/santinic/audiblez/actions/workflows/pip-install.yaml/badge.svg)](https://github.com/santinic/audiblez/actions/workflows/pip-install.yaml)
[![Git clone and run](https://github.com/santinic/audiblez/actions/workflows/git-clone-and-run.yml/badge.svg)](https://github.com/santinic/audiblez/actions/workflows/git-clone-and-run.yml)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/audiblez)
![PyPI - Version](https://img.shields.io/pypi/v/audiblez)
Audiblez generates `.m4b` audiobooks from regular `.epub` e-books,
### v3.0 Now with CUDA support!
Audiblez generates `.m4b` audiobooks from regular `.epub` e-books,
using Kokoro's high-quality speech synthesis.
[Kokoro v0.19](https://huggingface.co/hexgrad/Kokoro-82M) is a recently published text-to-speech model with just 82M params and very natural sounding output.
[Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) is a recently published text-to-speech model with just 82M params and very natural sounding output.
It's released under Apache licence and it was trained on < 100 hours of audio.
It currently supports American, British English, French, Korean, Japanese and Mandarin, and a bunch of very good voices.
It currently supports American and British English in a bunch of very good voices.
On my M2 MacBook Pro, **it takes about 2 hours to convert to mp3 the Selfish Gene by Richard Dawkins**, which is about 100,000 words (or 600,000 characters),
at a rate of about 80 characters per second.
Future support for French, Korean, Japanese and Mandarin is planned.
On a Google Colab's T4 GPU via Cuda, **it takes about 5 minutes to convert "Animal's Farm" by Orwell** (which is a bout 160,000 characters) to audiobook, at a rate of about 600 characters per second.
On my M2 MacBook Pro, on CPU, it takes about 1 hour, at a rate of about 60 characters per second.
## How to install and run
If you have Python 3 on your computer, you can install it with pip.
Be aware that it won't work with Python 3.13.
Then you also need to download a couple of additional files in the same folder, which are about ~360MB:
You also need `espeak-ng` and `ffmpeg` installed on your machine:
```bash
pip install audiblez
wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/kokoro-v0_19.onnx
wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/voices.json
sudo apt install ffmpeg espeak-ng # on Ubuntu/Debian 🐧
brew install ffmpeg espeak-ng # on Mac 🍏
```
Then, to convert an epub file into an audiobook, just run:
```bash
audiblez book.epub -l en-gb -v af_sky
audiblez book.epub -v af_sky
```
It will first create a bunch of `book_chapter_1.wav`, `book_chapter_2.wav`, etc. files in the same directory,
@ -36,57 +43,41 @@ and at the end it will produce a `book.m4b` file with the whole book you can lis
audiobook player.
It will only produce the `.m4b` file if you have `ffmpeg` installed on your machine.
## Supported Languages
Use `-l` option to specify the language, available language codes are:
🇺🇸 `en-us`, 🇬🇧 `en-gb`, 🇫🇷 `fr-fr`, 🇯🇵 `ja`, 🇰🇷 `kr` and 🇨🇳 `cmn`.
## Speed
By default the audio is generated using a normal speed, but you can make it up to twice slower or faster by specifying a speed argument between 0.5 to 2.0:
```bash
audiblez book.epub -l en-gb -v af_sky -s 1.5
audiblez book.epub -v af_sky -s 1.5
```
## Supported Voices
Use `-v` option to specify the voice:
available voices are `af`, `af_bella`, `af_nicole`, `af_sarah`, `af_sky`, `am_adam`, `am_michael`, `bf_emma`, `bf_isabella`, `bm_george`, `bm_lewis`.
You can try them here: [https://huggingface.co/spaces/hexgrad/Kokoro-TTS](https://huggingface.co/spaces/hexgrad/Kokoro-TTS)
Use `-v` option to specify the voice to use. Available voices are listed here.
the first letter is the language code and the second is the gender of the speaker e.g. `im_nicola` is an italian male voice.
| Language | Voices |
|----------|--------|
| 🇺🇸 | `af_alloy`, `af_aoede`, `af_bella`, `af_heart`, `af_jessica`, `af_kore`, `af_nicole`, `af_nova`, `af_river`, `af_sarah`, `af_sky`, `am_adam`, `am_echo`, `am_eric`, `am_fenrir`, `am_liam`, `am_michael`, `am_onyx`, `am_puck`, `am_santa` |
| 🇬🇧 | `bf_alice`, `bf_emma`, `bf_isabella`, `bf_lily`, `bm_daniel`, `bm_fable`, `bm_george`, `bm_lewis` |
| 🇪🇸 | `ef_dora`, `em_alex`, `em_santa` |
| 🇫🇷 | `ff_siwis` |
| 🇮🇳 | `hf_alpha`, `hf_beta`, `hm_omega`, `hm_psi` |
| 🇮🇹 | `if_sara`, `im_nicola` |
| 🇯🇵 | `jf_alpha`, `jf_gongitsune`, `jf_nezumi`, `jf_tebukuro`, `jm_kumo` |
| 🇧🇷 | `pf_dora`, `pm_alex`, `pm_santa` |
| 🇨🇳 | `zf_xiaobei`, `zf_xiaoni`, `zf_xiaoxiao`, `zf_xiaoyi`, `zm_yunjian`, `zm_yunxi`, `zm_yunxia`, `zm_yunyang` |
## How to run on GPU
By default audiblez runs on CPU. If you want to use a GPU for faster performance, install the GPU-enabled ONNX Runtime and specify a runtime provider with the `--providers` flag. By default, the CPU-enabled ONNX Runtime is installed. The GPU runtime must be installed manually.
```bash
pip install onnxruntime-gpu
```
By default audiblez runs on CPU. If you pass the option `--cuda` it will try to use the Cuda device via Torch.
To specify ONNX providers, such as using an NVIDIA GPU, use the `--providers` tag. For example:
Check out this example: [Audiblez running on a Google Colab Notebook with Cuda ](https://colab.research.google.com/drive/164PQLowogprWQpRjKk33e-8IORAvqXKI?usp=sharing]).
```bash
audiblez book.epub -l en-gb -v af_sky --providers CUDAExecutionProvider
```
To see the list of available providers on your system, run the following:
```bash
audiblez --help
```
or
```bash
python -c "import onnxruntime as ort; print(ort.get_available_providers())"
```
This will display the ONNX providers that can be used, such as `CUDAExecutionProvider` for NVIDIA GPUs or `CPUExecutionProvider` for CPU-only execution.
You can specify a provider hierarchy by providing multiple hierarchies separated by spaces.
```bash
audiblez book.epub -l en-gb -v af_sky --providers CUDAExecutionProvider CPUExecutionProvider
```
We don't currently support Apple Silicon, as there is not yet a Kokoro implementation in MLX. As soon as it will be available, we will support it.
## Author
by [Claudio Santini](https://claudio.uk) in 2025, distributed under MIT licence.
Related article: [Convert E-books into audiobooks with Kokoro](https://claudio.uk/posts/epub-to-audiobook.html)

View file

@ -2,63 +2,59 @@
# audiblez - A program to convert e-books into audiobooks using
# Kokoro-82M model for high-quality text-to-speech synthesis.
# by Claudio Santini 2025 - https://claudio.uk
import torch
import spacy
import ebooklib
import soundfile
import numpy as np
import argparse
import sys
import time
import shutil
import subprocess
import soundfile as sf
import ebooklib
import warnings
import re
from tabulate import tabulate
from pathlib import Path
from string import Formatter
from yaspin import yaspin
from bs4 import BeautifulSoup
from kokoro_onnx import config
from kokoro_onnx import Kokoro
from kokoro import KPipeline
from ebooklib import epub
from pydub import AudioSegment
from pick import pick
import onnxruntime as ort
from tempfile import NamedTemporaryFile
MODEL_FILE = 'kokoro-v0_19.onnx'
VOICES_FILE = 'voices.json'
config.MAX_PHONEME_LENGTH = 128
from voices import voices, available_voices_str
sample_rate = 24000
def main(kokoro, file_path, lang, voice, pick_manually, speed, providers):
# Set ONNX providers if specified
if providers:
available_providers = ort.get_available_providers()
invalid_providers = [p for p in providers if p not in available_providers]
if invalid_providers:
print(f"Invalid ONNX providers: {', '.join(invalid_providers)}")
print(f"Available providers: {', '.join(available_providers)}")
sys.exit(1)
kokoro.sess.set_providers(providers)
print(f"Using ONNX providers: {', '.join(providers)}")
def main(file_path, voice, pick_manually, speed, max_chapters=None):
if not spacy.util.is_package("xx_ent_wiki_sm"):
print("Downloading Spacy model xx_ent_wiki_sm...")
spacy.cli.download("xx_ent_wiki_sm")
filename = Path(file_path).name
warnings.simplefilter("ignore")
book = epub.read_epub(file_path)
title = book.get_metadata('DC', 'title')[0][0]
creator = book.get_metadata('DC', 'creator')[0][0]
meta_title = book.get_metadata('DC', 'title')
title = meta_title[0][0] if meta_title else ''
meta_creator = book.get_metadata('DC', 'creator')
creator = meta_creator[0][0] if meta_creator else ''
cover_maybe = find_cover(book)
cover_image = cover_maybe.get_content() if cover_maybe else b""
if cover_maybe:
print(f'Found cover image {cover_maybe.file_name} in {cover_maybe.media_type} format')
intro = f'{title} by {creator}'
intro = f'{title} {creator}.\n\n'
print(intro)
print('Found Chapters:', [c.get_name() for c in book.get_items() if c.get_type() == ebooklib.ITEM_DOCUMENT])
if pick_manually:
chapters = pick_chapters(book)
document_chapters = find_document_chapters_and_extract_texts(book)
if pick_manually is True:
selected_chapters = pick_chapters(document_chapters)
else:
chapters = find_chapters(book)
print('Automatically selected chapters:', [c.get_name() for c in chapters])
texts = extract_texts(chapters)
selected_chapters = find_good_chapters(document_chapters)
print_selected_chapters(document_chapters, selected_chapters)
texts = [c.extracted_text for c in selected_chapters]
has_ffmpeg = shutil.which('ffmpeg') is not None
if not has_ffmpeg:
@ -68,42 +64,52 @@ def main(kokoro, file_path, lang, voice, pick_manually, speed, providers):
print('Started at:', time.strftime('%H:%M:%S'))
print(f'Total characters: {total_chars:,}')
print('Total words:', len(' '.join(texts).split()))
chars_per_sec = 50 # assume 50 chars per second at the beginning
print(f'Estimated time remaining (assuming 50 chars/sec): {strfdelta((total_chars - processed_chars) / chars_per_sec)}')
chars_per_sec = 500 if torch.cuda.is_available() else 50
print(f'Estimated time remaining (assuming {chars_per_sec} chars/sec): {strfdelta((total_chars - processed_chars) / chars_per_sec)}')
chapter_mp3_files = []
durations = {}
for i, text in enumerate(texts, start=1):
chapter_filename = filename.replace('.epub', f'_chapter_{i}.wav')
chapter_mp3_files.append(chapter_filename)
chapter_wav_files = []
for i, chapter in enumerate(selected_chapters, start=1):
if max_chapters and i > max_chapters: break
text = chapter.extracted_text
xhtml_file_name = chapter.get_name().replace(' ', '_').replace('/', '_').replace('\\', '_')
chapter_filename = filename.replace('.epub', f'_chapter_{i}_{voice}_{xhtml_file_name}.wav')
chapter_wav_files.append(chapter_filename)
if Path(chapter_filename).exists():
print(f'File for chapter {i} already exists. Skipping')
continue
if len(text.strip()) < 10:
print(f'Skipping empty chapter {i}')
chapter_mp3_files.remove(chapter_filename)
chapter_wav_files.remove(chapter_filename)
continue
print(f'Reading chapter {i} ({len(text):,} characters)...')
if i == 1:
text = intro + '.\n\n' + text
start_time = time.time()
samples, sample_rate = kokoro.create(text, voice=voice, speed=speed, lang=lang)
sf.write(f'{chapter_filename}', samples, sample_rate)
durations[chapter_filename] = len(samples) / sample_rate
end_time = time.time()
delta_seconds = end_time - start_time
chars_per_sec = len(text) / delta_seconds
processed_chars += len(text)
print(f'Estimated time remaining: {strfdelta((total_chars - processed_chars) / chars_per_sec)}')
print('Chapter written to', chapter_filename)
print(f'Chapter {i} read in {delta_seconds:.2f} seconds ({chars_per_sec:.0f} characters per second)')
progress = processed_chars * 100 // total_chars
print('Progress:', f'{progress}%\n')
pipeline = KPipeline(lang_code=voice[0]) # a for american or b for british etc.
with yaspin(text=f'Reading chapter {i} ({len(text):,} characters)...', color="yellow") as spinner:
audio_segments = gen_audio_segments(pipeline, text, voice, speed)
if audio_segments:
final_audio = np.concatenate(audio_segments)
soundfile.write(chapter_filename, final_audio, sample_rate)
end_time = time.time()
delta_seconds = end_time - start_time
chars_per_sec = len(text) / delta_seconds
processed_chars += len(text)
spinner.ok("")
print(f'Estimated time remaining: {strfdelta((total_chars - processed_chars) / chars_per_sec)}')
print('Chapter written to', chapter_filename)
print(f'Chapter {i} read in {delta_seconds:.2f} seconds ({chars_per_sec:.0f} characters per second)')
progress = processed_chars * 100 // total_chars
print('Progress:', f'{progress}%\n')
else:
spinner.fail("")
print(f'Warning: No audio generated for chapter {i}')
chapter_wav_files.remove(chapter_filename)
if has_ffmpeg:
create_index_file(title, creator, chapter_mp3_files, durations)
create_m4b(chapter_mp3_files, filename, title, creator, cover_image)
create_index_file(title, creator, chapter_wav_files)
create_m4b(chapter_wav_files, filename, cover_image)
def find_cover(book):
def is_image(item):
@ -127,9 +133,32 @@ def find_cover(book):
return None
def extract_texts(chapters):
texts = []
for chapter in chapters:
def print_selected_chapters(document_chapters, chapters):
print(tabulate([
[i, c.get_name(), len(c.extracted_text), '' if c in chapters else '', chapter_beginning_one_liner(c)]
for i, c in enumerate(document_chapters, start=1)
], headers=['#', 'Chapter', 'Text Length', 'Selected', 'First words']))
def gen_audio_segments(pipeline, text, voice, speed):
nlp = spacy.load('xx_ent_wiki_sm')
nlp.add_pipe('sentencizer')
audio_segments = []
doc = nlp(text)
sentences = list(doc.sents)
for sent in sentences:
for gs, ps, audio in pipeline(sent.text, voice=voice, speed=speed, split_pattern=r'\n\n\n'):
audio_segments.append(audio)
return audio_segments
def find_document_chapters_and_extract_texts(book):
"""Returns every chapter that is an ITEM_DOCUMENT and enriches each chapter with extracted_text."""
document_chapters = []
for chapter in book.get_items():
if chapter.get_type() != ebooklib.ITEM_DOCUMENT:
continue
xml = chapter.get_body_content()
soup = BeautifulSoup(xml, features='lxml')
chapter_text = ''
@ -138,39 +167,46 @@ def extract_texts(chapters):
inner_text = child.text.strip() if child.text else ""
if inner_text:
chapter_text += inner_text + '\n'
texts.append(chapter_text)
return texts
chapter.extracted_text = chapter_text
document_chapters.append(chapter)
return document_chapters
def is_chapter(c):
name = c.get_name().lower()
return bool(
has_min_len = len(c.extracted_text) > 100
title_looks_like_chapter = bool(
'chapter' in name.lower()
or re.search(r'part\d{1,3}', name)
or re.search(r'ch\d{1,3}', name)
or re.search(r'chap\d{1,3}', name)
or re.search(r'part_?\d{1,3}', name)
or re.search(r'split_?\d{1,3}', name)
or re.search(r'ch_?\d{1,3}', name)
or re.search(r'chap_?\d{1,3}', name)
)
return has_min_len and title_looks_like_chapter
def find_chapters(book, verbose=False):
chapters = [c for c in book.get_items() if c.get_type() == ebooklib.ITEM_DOCUMENT and is_chapter(c)]
if verbose:
for item in book.get_items():
if item.get_type() == ebooklib.ITEM_DOCUMENT:
print(f"'{item.get_name()}'" + ', #' + str(len(item.get_body_content())))
# print(f'{item.get_name()}'.ljust(60), str(len(item.get_body_content())).ljust(15), 'X' if item in chapters else '-')
def chapter_beginning_one_liner(c, chars=20):
s = c.extracted_text[:chars].strip().replace('\n', ' ').replace('\r', ' ')
return s + '' if len(s) > 0 else ''
def find_good_chapters(document_chapters):
chapters = [c for c in document_chapters if c.get_type() == ebooklib.ITEM_DOCUMENT and is_chapter(c)]
if len(chapters) == 0:
print('Not easy to find the chapters, defaulting to all available documents.')
chapters = [c for c in book.get_items() if c.get_type() == ebooklib.ITEM_DOCUMENT]
print('Not easy to recognize the chapters, defaulting to all non-empty documents.')
chapters = [c for c in document_chapters if c.get_type() == ebooklib.ITEM_DOCUMENT and len(c.extracted_text) > 10]
return chapters
def pick_chapters(book):
all_chapters_names = [c.get_name() for c in book.get_items() if c.get_type() == ebooklib.ITEM_DOCUMENT]
def pick_chapters(chapters):
# Display the document name, the length and first 50 characters of the text
chapters_by_names = {
f'{c.get_name()}\t({len(c.extracted_text)} chars)\t[{chapter_beginning_one_liner(c, 50)}]': c
for c in chapters}
title = 'Select which chapters to read in the audiobook'
selected_chapters_names = pick(all_chapters_names, title, multiselect=True, min_selection_count=1)
selected_chapters_names = [c[0] for c in selected_chapters_names]
selected_chapters = [c for c in book.get_items() if c.get_name() in selected_chapters_names]
ret = pick(list(chapters_by_names.keys()), title, multiselect=True, min_selection_count=1)
selected_chapters_out_of_order = [chapters_by_names[r[0]] for r in ret]
selected_chapters = [c for c in chapters if c in selected_chapters_out_of_order]
return selected_chapters
@ -187,7 +223,7 @@ def strfdelta(tdelta, fmt='{D:02}d {H:02}h {M:02}m {S:02}s'):
return f.format(fmt, **values)
def create_m4b(chapter_files, filename, title, author, cover_image):
def create_m4b(chapter_files, filename, cover_image):
tmp_filename = filename.replace('.epub', '.tmp.mp4')
if not Path(tmp_filename).exists():
combined_audio = AudioSegment.empty()
@ -232,51 +268,46 @@ def probe_duration(file_name):
return float(proc.stdout.strip())
def create_index_file(title, creator, chapter_mp3_files, durations):
def create_index_file(title, creator, chapter_mp3_files):
with open("chapters.txt", "w") as f:
f.write(f";FFMETADATA1\ntitle={title}\nartist={creator}\n\n")
start = 0
i = 0
for c in chapter_mp3_files:
if c not in durations:
durations[c] = probe_duration(c)
end = start + (int)(durations[c] * 1000)
duration = probe_duration(c)
end = start + (int)(duration * 1000)
f.write(f"[CHAPTER]\nTIMEBASE=1/1000\nSTART={start}\nEND={end}\ntitle=Chapter {i}\n\n")
i += 1
start = end
def cli_main():
if not Path(MODEL_FILE).exists() or not Path(VOICES_FILE).exists():
print('Error: kokoro-v0_19.onnx and voices.json must be in the current directory. Please download them with:')
print('wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/kokoro-v0_19.onnx')
print('wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/voices.json')
sys.exit(1)
kokoro = Kokoro(MODEL_FILE, VOICES_FILE)
voices = list(kokoro.get_voices())
voices_str = ', '.join(voices)
epilog = 'example:\n' + \
' audiblez book.epub -l en-us -v af_sky'
default_voice = 'af_sky' if 'af_sky' in voices else voices[0]
# Get available ONNX providers
available_providers = ort.get_available_providers()
providers_help = f"Available ONNX providers: {', '.join(available_providers)}"
epilog = ('example:\n' +
' audiblez book.epub -l en-us -v af_sky\n\n' +
'available voices:\n' +
available_voices_str)
default_voice = 'af_sky'
parser = argparse.ArgumentParser(epilog=epilog, formatter_class=argparse.RawDescriptionHelpFormatter)
parser.add_argument('epub_file_path', help='Path to the epub file')
parser.add_argument('-l', '--lang', default='en-gb', help='Language code: en-gb, en-us, fr-fr, ja, ko, cmn')
parser.add_argument('-v', '--voice', default=default_voice, help=f'Choose narrating voice: {voices_str}')
parser.add_argument('-p', '--pick', default=False, help=f'Interactively select which chapters to read in the audiobook',
action='store_true')
parser.add_argument('-p', '--pick', default=False, help=f'Interactively select which chapters to read in the audiobook', action='store_true')
parser.add_argument('-s', '--speed', default=1.0, help=f'Set speed from 0.5 to 2.0', type=float)
parser.add_argument('--providers', nargs='+', metavar='PROVIDER', help=f"Specify ONNX providers. {providers_help}")
parser.add_argument('-c', '--cuda', default=False, help=f'Use GPU via Cuda in Torch if available', action='store_true')
if len(sys.argv) == 1:
parser.print_help(sys.stderr)
sys.exit(1)
args = parser.parse_args()
main(kokoro, args.epub_file_path, args.lang, args.voice, args.pick, args.speed, args.providers)
if args.cuda:
if torch.cuda.is_available():
print('CUDA GPU available')
torch.set_default_device('cuda')
else:
print('CUDA GPU not available. Defaulting to CPU')
main(args.epub_file_path, args.voice, args.pick, args.speed)
if __name__ == '__main__':

2448
poetry.lock generated

File diff suppressed because it is too large Load diff

View file

@ -1,81 +1,21 @@
[project]
name = "audiblez"
version = "0.2.2"
version = "0.3.1"
description = "Generate audiobooks from e-books (epub to wav/m4b)"
authors = [
{ name = "Claudio Santini", email = "hireclaudio@gmail.com" }
]
readme = "README.md"
requires-python = ">=3.9,<3.13" # librosa/llvmlite have no support for python > 3.12
requires-python = ">=3.9,<3.13"
dependencies = [
"bs4 (==0.0.2)",
"attrs (==24.3.0)",
"audioread (==3.0.1)",
"babel (==2.16.0)",
"beautifulsoup4 (==4.12.3)",
"bibtexparser (==2.0.0b8)",
"certifi (==2024.12.14)",
"cffi (==1.17.1)",
"charset-normalizer (==3.4.1)",
"clldutils (==3.24.0)",
"colorama (==0.4.6)",
"coloredlogs (==15.0.1)",
"colorlog (==6.9.0)",
"csvw (==3.5.1)",
"decorator (==5.1.1)",
"dlinfo (==1.2.1)",
"ebooklib (==0.18)",
"espeakng-loader (==0.2.1)",
"flatbuffers (==24.12.23)",
"humanfriendly (==10.0)",
"idna (==3.10)",
"isodate (==0.7.2)",
"joblib (==1.4.2)",
"jsonschema (==4.23.0)",
"jsonschema-specifications (==2024.10.1)",
"kokoro-onnx (==0.2.6)",
"language-tags (==1.2.0)",
"lazy-loader (==0.4)",
"librosa (==0.10.2.post1)",
"llvmlite (==0.43.0)",
"lxml (==5.3.0)",
"markdown (==3.7)",
"markupsafe (==3.0.2)",
"mpmath (==1.3.0)",
"msgpack (==1.1.0)",
"numba (==0.60.0)",
"numpy (==2.0.2)",
"onnxruntime (==1.20.1)",
"packaging (==24.2)",
"phonemizer-fork (==3.3.1)",
"platformdirs (==4.3.6)",
"pooch (==1.8.2)",
"protobuf (==5.29.3)",
"pycparser (==2.22)",
"pylatexenc (==2.10)",
"pyparsing (==3.2.1)",
"python-dateutil (==2.9.0.post0)",
"rdflib (==7.1.2)",
"referencing (==0.35.1)",
"regex (==2024.11.6)",
"requests (==2.32.3)",
"rfc3986 (==1.5.0)",
"rpds-py (==0.22.3)",
"scikit-learn (==1.6.1)",
"scipy (==1.15.1)",
"segments (==2.2.1)",
"six (==1.17.0)",
"soundfile (==0.13.0)",
"soupsieve (==2.6)",
"soxr (==0.5.0.post1)",
"sympy (==1.13.3)",
"tabulate (==0.9.0)",
"threadpoolctl (==3.5.0)",
"typing-extensions (==4.12.2)",
"uritemplate (==4.1.1)",
"urllib3 (==2.3.0)",
"pydub (>=0.25.1,<0.26.0)",
"kokoro (>=0.2.3,<0.3.0)",
"ebooklib (>=0.18,<0.19)",
"soundfile (>=0.13.1,<0.14.0)",
"pick (>=2.4.0,<3.0.0)",
"bs4 (>=0.0.2,<0.0.3)",
"pydub (>=0.25.1,<0.26.0)",
"spacy (>=3.8.4,<4.0.0)",
"yaspin (>=3.1.0,<4.0.0)"
]
@ -91,3 +31,6 @@ Issues = "https://github.com/santinic/audiblez/issues"
[project.scripts]
audiblez = "audiblez:cli_main"
[tool.poetry.group.dev.dependencies]
deptry = "^0.23.0"

View file

@ -1,13 +1,15 @@
import unittest
from ebooklib import epub
from audiblez import find_chapters
from audiblez import find_good_chapters, find_document_chapters_and_extract_texts
@unittest.skip('Development only, not for CI')
class FindChaptersTest(unittest.TestCase):
def base(self, file, expected_chapter_names):
book = epub.read_epub(file)
chapters = find_chapters(book)
document_chapters = find_document_chapters_and_extract_texts(book)
chapters = find_good_chapters(document_chapters)
chapter_names = [c.get_name() for c in chapters]
self.assertEqual(chapter_names, expected_chapter_names)
@ -102,32 +104,32 @@ class FindChaptersTest(unittest.TestCase):
# 'Text/03_Dedi.xhtml', # 211
# 'Text/04_Contents.xhtml', # 6302
# 'Text/06_Intro.xhtml', # 34726
'Text/11_Part1.xhtml', # 332
# 'Text/11_Part1.xhtml', # 332
'Text/12_Chapter01.xhtml', # 33940
'Text/12_Chapter02.xhtml', # 47738
'Text/12_Chapter02_Part2.xhtml', # 328
# 'Text/12_Chapter02_Part2.xhtml', # 328
'Text/12_Chapter03.xhtml', # 42010
'Text/12_Chapter04.xhtml', # 42182
'Text/12_Chapter05.xhtml', # 51283
'Text/12_Chapter05_Part3.xhtml', # 327
# 'Text/12_Chapter05_Part3.xhtml', # 327
'Text/12_Chapter06.xhtml', # 46570
'Text/12_Chapter07.xhtml', # 48752
'Text/12_Chapter08.xhtml', # 52048
'Text/12_Chapter09.xhtml', # 35717
'Text/12_Chapter09_Part4.xhtml', # 343
# 'Text/12_Chapter09_Part4.xhtml', # 343
'Text/12_Chapter10.xhtml', # 43114
'Text/12_Chapter11.xhtml', # 53469
'Text/12_Chapter12.xhtml', # 29441
'Text/12_Chapter13.xhtml', # 34265
'Text/12_Chapter13_Part5.xhtml', # 327
# 'Text/12_Chapter13_Part5.xhtml', # 327
'Text/12_Chapter14.xhtml', # 42381
'Text/12_Chapter15.xhtml', # 43875
'Text/12_Chapter16.xhtml', # 28861
'Text/12_Chapter16_Part6.xhtml', # 328
# 'Text/12_Chapter16_Part6.xhtml', # 328
'Text/12_Chapter17.xhtml', # 45652
'Text/12_Chapter18.xhtml', # 43328
'Text/12_Chapter19.xhtml', # 34453
'Text/12_Chapter19_Part7.xhtml', # 334
# 'Text/12_Chapter19_Part7.xhtml', # 334
'Text/12_Chapter20.xhtml', # 40554
'Text/12_Chapter21.xhtml', # 30382
'Text/12_Chapter22.xhtml', # 57467

View file

@ -1,28 +1,44 @@
import os
import unittest
from pathlib import Path
from kokoro_onnx import Kokoro
from audiblez import VOICES_FILE, MODEL_FILE, main
from kokoro import KPipeline
from audiblez import main
class MainTest(unittest.TestCase):
def base(self, **kwargs):
base_path = Path(__file__).parent / '..'
kokoro = Kokoro(base_path / MODEL_FILE, base_path / VOICES_FILE)
main(kokoro, lang='en-gb', voice='af_sky', providers=None, pick_manually=False, speed=1, **kwargs)
def base(self, name, url, **kwargs):
if not Path(f'{name}.epub').exists():
os.system(f'wget {url} -O {name}.epub')
Path(f'{name}.m4b').unlink(missing_ok=True)
os.system(f'rm {name}_chapter_*.wav')
merged_args = dict(voice='af_sky', pick_manually=False, speed=1.0, max_chapters=2)
merged_args.update(kwargs)
main(f'{name}.epub', **merged_args)
m4b_file = Path(f'{name}.m4b')
self.assertTrue(m4b_file.exists())
self.assertTrue(m4b_file.stat().st_size > 256 * 1024)
def test_1_mini(self):
Path('mini.m4b').unlink(missing_ok=True)
self.base(file_path='../epub/mini.epub')
self.assertTrue(Path('mini.m4b').exists())
def test_poe(self):
url = 'https://www.gutenberg.org/ebooks/1064.epub.images'
self.base('poe', url)
def test_2_allan_poe(self):
Path('poe.m4b').unlink(missing_ok=True)
self.base(file_path='../epub/poe.epub')
self.assertTrue(Path('poe.m4b').exists())
@unittest.skip('too slow for CI')
def test_orwell(self):
url = 'https://archive.org/download/AnimalFarmByGeorgeOrwell/Animal%20Farm%20by%20George%20Orwell.epub'
self.base('orwell', url)
def test_3_gene(self):
Path('gene.m4b').unlink(missing_ok=True)
self.base(file_path='../epub/gene.epub')
self.assertTrue(Path('gene.m4b').exists())
def test_italian_pirandello(self):
url = 'https://www.liberliber.eu/mediateca/libri/p/pirandello/cosi_e_se_vi_pare_1925/epub/pirandello_cosi_e_se_vi_pare_1925.epub'
self.base('pirandello', url, voice='im_nicola')
self.assertTrue(Path('pirandello.m4b').exists())
@unittest.skip('too slow for CI')
def test_italian_manzoni(self):
url = 'https://www.liberliber.eu/mediateca/libri/m/manzoni/i_promessi_sposi/epub/manzoni_i_promessi_sposi.epub'
self.base('manzoni', url, voice='im_nicola', max_chapters=1)
def test_french_baudelaire(self):
url = 'http://gallica.bnf.fr/ark:/12148/bpt6k70861t.epub'
self.base('baudelaire', url, voice='ff_siwis')

67
voices.py Normal file
View file

@ -0,0 +1,67 @@
flags = {
'a': '🇺🇸',
'b': '🇬🇧',
'e': '🇪🇸',
'f': '🇫🇷',
'h': '🇮🇳',
'i': '🇮🇹',
'j': '🇯🇵',
'p': '🇧🇷',
'z': '🇨🇳'
}
voices = {
'a': [
'af_alloy',
'af_aoede',
'af_bella',
'af_heart',
'af_jessica',
'af_kore',
'af_nicole',
'af_nova',
'af_river',
'af_sarah',
'af_sky',
'am_adam',
'am_echo',
'am_eric',
'am_fenrir',
'am_liam',
'am_michael',
'am_onyx',
'am_puck',
'am_santa'],
'b': [
'bf_alice',
'bf_emma',
'bf_isabella',
'bf_lily',
'bm_daniel',
'bm_fable',
'bm_george',
'bm_lewis'],
'e': ['ef_dora', 'em_alex', 'em_santa'],
'f': ['ff_siwis'],
'h': ['hf_alpha', 'hf_beta', 'hm_omega', 'hm_psi'],
'i': ['if_sara', 'im_nicola'],
'j': ['jf_alpha', 'jf_gongitsune', 'jf_nezumi', 'jf_tebukuro', 'jm_kumo'],
'p': ['pf_dora', 'pm_alex', 'pm_santa'],
'z': [
'zf_xiaobei',
'zf_xiaoni',
'zf_xiaoxiao',
'zf_xiaoyi',
'zm_yunjian',
'zm_yunxi',
'zm_yunxia',
'zm_yunyang'
]
}
available_voices_str = ('\n'.join([f' {flags[lang]} {", ".join(voices[lang])}' for lang in voices])
.replace(' af_sky,', '\n af_sky,'))
# for key, l in voices.items():
# ls = ', '.join([f'`{j}`' for j in l])
# print(f'| {flags[key]} | {ls} |')