mirror of
https://github.com/santinic/audiblez.git
synced 2025-09-18 21:40:39 +00:00
Merge pull request #40 from santinic/v3
v3 with Torch and Cuda, more languages, bug fixes and more
This commit is contained in:
commit
53a9f5d04b
9 changed files with 2176 additions and 846 deletions
26
.github/workflows/git-clone-and-run.yml
vendored
Normal file
26
.github/workflows/git-clone-and-run.yml
vendored
Normal file
|
@ -0,0 +1,26 @@
|
|||
name: Git clone and run
|
||||
run-name: Git clone and run
|
||||
on: [ push, pull_request ]
|
||||
jobs:
|
||||
git-clone-and-run:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Set up Python
|
||||
uses: actions/setup-python@v5
|
||||
with:
|
||||
python-version: '3.11'
|
||||
- uses: actions/checkout@v3
|
||||
- name: Install dependencies
|
||||
run: pip install poetry && poetry install
|
||||
- name: check it runs as script
|
||||
run: poetry run audiblez --help
|
||||
- name: install ffmpeg and espeak-ng
|
||||
run: sudo apt-get update && sudo apt-get install ffmpeg espeak-ng --fix-missing
|
||||
# - name: download test epub
|
||||
# run: wget https://github.com/daisy/epub-accessibility-tests/releases/download/fundamental-2.0/Fundamental-Accessibility-Tests-Basic-Functionality-v2.0.0.epub
|
||||
# - name: create audiobook
|
||||
# run: poetry run audiblez Fundamental-Accessibility-Tests-Basic-Functionality-v2.0.0.epub
|
||||
# - name: check m4b output file
|
||||
# run: ls -lah Fundamental-Accessibility-Tests-Basic-Functionality-v2.0.0.m4b
|
||||
- name: run unit-tests (unittest classes in /test)
|
||||
run: poetry run python -m unittest discover test
|
4
.github/workflows/pip-install.yaml
vendored
4
.github/workflows/pip-install.yaml
vendored
|
@ -19,8 +19,8 @@ jobs:
|
|||
run: python -m audiblez --help
|
||||
- name: check it runs as script
|
||||
run: audiblez --help
|
||||
- name: install ffmpeg
|
||||
run: sudo apt-get install ffmpeg
|
||||
- name: install ffmpeg and espeak-ng
|
||||
run: sudo apt update && sudo apt-get install ffmpeg espeak-ng
|
||||
- name: download test epub
|
||||
run: wget https://github.com/daisy/epub-accessibility-tests/releases/download/fundamental-2.0/Fundamental-Accessibility-Tests-Basic-Functionality-v2.0.0.epub
|
||||
- name: create audiobook
|
||||
|
|
87
README.md
87
README.md
|
@ -1,34 +1,41 @@
|
|||
# Audiblez: Generate audiobooks from e-books
|
||||
# Audiblez: Generate audiobooks from e-books
|
||||
|
||||
[](https://github.com/santinic/audiblez/actions/workflows/pip-install.yaml)
|
||||
[](https://github.com/santinic/audiblez/actions/workflows/git-clone-and-run.yml)
|
||||

|
||||

|
||||
|
||||
Audiblez generates `.m4b` audiobooks from regular `.epub` e-books,
|
||||
### v3.0 Now with CUDA support!
|
||||
|
||||
Audiblez generates `.m4b` audiobooks from regular `.epub` e-books,
|
||||
using Kokoro's high-quality speech synthesis.
|
||||
|
||||
[Kokoro v0.19](https://huggingface.co/hexgrad/Kokoro-82M) is a recently published text-to-speech model with just 82M params and very natural sounding output.
|
||||
[Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) is a recently published text-to-speech model with just 82M params and very natural sounding output.
|
||||
It's released under Apache licence and it was trained on < 100 hours of audio.
|
||||
It currently supports American, British English, French, Korean, Japanese and Mandarin, and a bunch of very good voices.
|
||||
It currently supports American and British English in a bunch of very good voices.
|
||||
|
||||
On my M2 MacBook Pro, **it takes about 2 hours to convert to mp3 the Selfish Gene by Richard Dawkins**, which is about 100,000 words (or 600,000 characters),
|
||||
at a rate of about 80 characters per second.
|
||||
Future support for French, Korean, Japanese and Mandarin is planned.
|
||||
|
||||
On a Google Colab's T4 GPU via Cuda, **it takes about 5 minutes to convert "Animal's Farm" by Orwell** (which is a bout 160,000 characters) to audiobook, at a rate of about 600 characters per second.
|
||||
|
||||
On my M2 MacBook Pro, on CPU, it takes about 1 hour, at a rate of about 60 characters per second.
|
||||
|
||||
## How to install and run
|
||||
|
||||
If you have Python 3 on your computer, you can install it with pip.
|
||||
Be aware that it won't work with Python 3.13.
|
||||
Then you also need to download a couple of additional files in the same folder, which are about ~360MB:
|
||||
You also need `espeak-ng` and `ffmpeg` installed on your machine:
|
||||
|
||||
```bash
|
||||
pip install audiblez
|
||||
wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/kokoro-v0_19.onnx
|
||||
wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/voices.json
|
||||
|
||||
sudo apt install ffmpeg espeak-ng # on Ubuntu/Debian 🐧
|
||||
brew install ffmpeg espeak-ng # on Mac 🍏
|
||||
```
|
||||
|
||||
Then, to convert an epub file into an audiobook, just run:
|
||||
|
||||
```bash
|
||||
audiblez book.epub -l en-gb -v af_sky
|
||||
audiblez book.epub -v af_sky
|
||||
```
|
||||
|
||||
It will first create a bunch of `book_chapter_1.wav`, `book_chapter_2.wav`, etc. files in the same directory,
|
||||
|
@ -36,57 +43,41 @@ and at the end it will produce a `book.m4b` file with the whole book you can lis
|
|||
audiobook player.
|
||||
It will only produce the `.m4b` file if you have `ffmpeg` installed on your machine.
|
||||
|
||||
## Supported Languages
|
||||
Use `-l` option to specify the language, available language codes are:
|
||||
🇺🇸 `en-us`, 🇬🇧 `en-gb`, 🇫🇷 `fr-fr`, 🇯🇵 `ja`, 🇰🇷 `kr` and 🇨🇳 `cmn`.
|
||||
|
||||
## Speed
|
||||
|
||||
By default the audio is generated using a normal speed, but you can make it up to twice slower or faster by specifying a speed argument between 0.5 to 2.0:
|
||||
|
||||
```bash
|
||||
audiblez book.epub -l en-gb -v af_sky -s 1.5
|
||||
audiblez book.epub -v af_sky -s 1.5
|
||||
```
|
||||
|
||||
## Supported Voices
|
||||
Use `-v` option to specify the voice:
|
||||
available voices are `af`, `af_bella`, `af_nicole`, `af_sarah`, `af_sky`, `am_adam`, `am_michael`, `bf_emma`, `bf_isabella`, `bm_george`, `bm_lewis`.
|
||||
You can try them here: [https://huggingface.co/spaces/hexgrad/Kokoro-TTS](https://huggingface.co/spaces/hexgrad/Kokoro-TTS)
|
||||
|
||||
Use `-v` option to specify the voice to use. Available voices are listed here.
|
||||
the first letter is the language code and the second is the gender of the speaker e.g. `im_nicola` is an italian male voice.
|
||||
|
||||
| Language | Voices |
|
||||
|----------|--------|
|
||||
| 🇺🇸 | `af_alloy`, `af_aoede`, `af_bella`, `af_heart`, `af_jessica`, `af_kore`, `af_nicole`, `af_nova`, `af_river`, `af_sarah`, `af_sky`, `am_adam`, `am_echo`, `am_eric`, `am_fenrir`, `am_liam`, `am_michael`, `am_onyx`, `am_puck`, `am_santa` |
|
||||
| 🇬🇧 | `bf_alice`, `bf_emma`, `bf_isabella`, `bf_lily`, `bm_daniel`, `bm_fable`, `bm_george`, `bm_lewis` |
|
||||
| 🇪🇸 | `ef_dora`, `em_alex`, `em_santa` |
|
||||
| 🇫🇷 | `ff_siwis` |
|
||||
| 🇮🇳 | `hf_alpha`, `hf_beta`, `hm_omega`, `hm_psi` |
|
||||
| 🇮🇹 | `if_sara`, `im_nicola` |
|
||||
| 🇯🇵 | `jf_alpha`, `jf_gongitsune`, `jf_nezumi`, `jf_tebukuro`, `jm_kumo` |
|
||||
| 🇧🇷 | `pf_dora`, `pm_alex`, `pm_santa` |
|
||||
| 🇨🇳 | `zf_xiaobei`, `zf_xiaoni`, `zf_xiaoxiao`, `zf_xiaoyi`, `zm_yunjian`, `zm_yunxi`, `zm_yunxia`, `zm_yunyang` |
|
||||
|
||||
|
||||
## How to run on GPU
|
||||
By default audiblez runs on CPU. If you want to use a GPU for faster performance, install the GPU-enabled ONNX Runtime and specify a runtime provider with the `--providers` flag. By default, the CPU-enabled ONNX Runtime is installed. The GPU runtime must be installed manually.
|
||||
|
||||
```bash
|
||||
pip install onnxruntime-gpu
|
||||
```
|
||||
By default audiblez runs on CPU. If you pass the option `--cuda` it will try to use the Cuda device via Torch.
|
||||
|
||||
To specify ONNX providers, such as using an NVIDIA GPU, use the `--providers` tag. For example:
|
||||
Check out this example: [Audiblez running on a Google Colab Notebook with Cuda ](https://colab.research.google.com/drive/164PQLowogprWQpRjKk33e-8IORAvqXKI?usp=sharing]).
|
||||
|
||||
```bash
|
||||
audiblez book.epub -l en-gb -v af_sky --providers CUDAExecutionProvider
|
||||
```
|
||||
|
||||
To see the list of available providers on your system, run the following:
|
||||
|
||||
```bash
|
||||
audiblez --help
|
||||
```
|
||||
|
||||
or
|
||||
|
||||
```bash
|
||||
python -c "import onnxruntime as ort; print(ort.get_available_providers())"
|
||||
```
|
||||
|
||||
This will display the ONNX providers that can be used, such as `CUDAExecutionProvider` for NVIDIA GPUs or `CPUExecutionProvider` for CPU-only execution.
|
||||
|
||||
You can specify a provider hierarchy by providing multiple hierarchies separated by spaces.
|
||||
|
||||
```bash
|
||||
audiblez book.epub -l en-gb -v af_sky --providers CUDAExecutionProvider CPUExecutionProvider
|
||||
```
|
||||
We don't currently support Apple Silicon, as there is not yet a Kokoro implementation in MLX. As soon as it will be available, we will support it.
|
||||
|
||||
## Author
|
||||
|
||||
by [Claudio Santini](https://claudio.uk) in 2025, distributed under MIT licence.
|
||||
|
||||
Related article: [Convert E-books into audiobooks with Kokoro](https://claudio.uk/posts/epub-to-audiobook.html)
|
||||
|
|
237
audiblez.py
237
audiblez.py
|
@ -2,63 +2,59 @@
|
|||
# audiblez - A program to convert e-books into audiobooks using
|
||||
# Kokoro-82M model for high-quality text-to-speech synthesis.
|
||||
# by Claudio Santini 2025 - https://claudio.uk
|
||||
|
||||
import torch
|
||||
import spacy
|
||||
import ebooklib
|
||||
import soundfile
|
||||
import numpy as np
|
||||
import argparse
|
||||
import sys
|
||||
import time
|
||||
import shutil
|
||||
import subprocess
|
||||
import soundfile as sf
|
||||
import ebooklib
|
||||
import warnings
|
||||
import re
|
||||
from tabulate import tabulate
|
||||
from pathlib import Path
|
||||
from string import Formatter
|
||||
from yaspin import yaspin
|
||||
from bs4 import BeautifulSoup
|
||||
from kokoro_onnx import config
|
||||
from kokoro_onnx import Kokoro
|
||||
from kokoro import KPipeline
|
||||
from ebooklib import epub
|
||||
from pydub import AudioSegment
|
||||
from pick import pick
|
||||
import onnxruntime as ort
|
||||
from tempfile import NamedTemporaryFile
|
||||
|
||||
MODEL_FILE = 'kokoro-v0_19.onnx'
|
||||
VOICES_FILE = 'voices.json'
|
||||
config.MAX_PHONEME_LENGTH = 128
|
||||
from voices import voices, available_voices_str
|
||||
|
||||
sample_rate = 24000
|
||||
|
||||
|
||||
def main(kokoro, file_path, lang, voice, pick_manually, speed, providers):
|
||||
# Set ONNX providers if specified
|
||||
if providers:
|
||||
available_providers = ort.get_available_providers()
|
||||
invalid_providers = [p for p in providers if p not in available_providers]
|
||||
if invalid_providers:
|
||||
print(f"Invalid ONNX providers: {', '.join(invalid_providers)}")
|
||||
print(f"Available providers: {', '.join(available_providers)}")
|
||||
sys.exit(1)
|
||||
kokoro.sess.set_providers(providers)
|
||||
print(f"Using ONNX providers: {', '.join(providers)}")
|
||||
def main(file_path, voice, pick_manually, speed, max_chapters=None):
|
||||
if not spacy.util.is_package("xx_ent_wiki_sm"):
|
||||
print("Downloading Spacy model xx_ent_wiki_sm...")
|
||||
spacy.cli.download("xx_ent_wiki_sm")
|
||||
filename = Path(file_path).name
|
||||
warnings.simplefilter("ignore")
|
||||
book = epub.read_epub(file_path)
|
||||
title = book.get_metadata('DC', 'title')[0][0]
|
||||
creator = book.get_metadata('DC', 'creator')[0][0]
|
||||
meta_title = book.get_metadata('DC', 'title')
|
||||
title = meta_title[0][0] if meta_title else ''
|
||||
meta_creator = book.get_metadata('DC', 'creator')
|
||||
creator = meta_creator[0][0] if meta_creator else ''
|
||||
|
||||
cover_maybe = find_cover(book)
|
||||
cover_image = cover_maybe.get_content() if cover_maybe else b""
|
||||
if cover_maybe:
|
||||
print(f'Found cover image {cover_maybe.file_name} in {cover_maybe.media_type} format')
|
||||
|
||||
intro = f'{title} by {creator}'
|
||||
intro = f'{title} – {creator}.\n\n'
|
||||
print(intro)
|
||||
print('Found Chapters:', [c.get_name() for c in book.get_items() if c.get_type() == ebooklib.ITEM_DOCUMENT])
|
||||
if pick_manually:
|
||||
chapters = pick_chapters(book)
|
||||
|
||||
document_chapters = find_document_chapters_and_extract_texts(book)
|
||||
if pick_manually is True:
|
||||
selected_chapters = pick_chapters(document_chapters)
|
||||
else:
|
||||
chapters = find_chapters(book)
|
||||
print('Automatically selected chapters:', [c.get_name() for c in chapters])
|
||||
texts = extract_texts(chapters)
|
||||
selected_chapters = find_good_chapters(document_chapters)
|
||||
print_selected_chapters(document_chapters, selected_chapters)
|
||||
texts = [c.extracted_text for c in selected_chapters]
|
||||
|
||||
has_ffmpeg = shutil.which('ffmpeg') is not None
|
||||
if not has_ffmpeg:
|
||||
|
@ -68,42 +64,52 @@ def main(kokoro, file_path, lang, voice, pick_manually, speed, providers):
|
|||
print('Started at:', time.strftime('%H:%M:%S'))
|
||||
print(f'Total characters: {total_chars:,}')
|
||||
print('Total words:', len(' '.join(texts).split()))
|
||||
chars_per_sec = 50 # assume 50 chars per second at the beginning
|
||||
print(f'Estimated time remaining (assuming 50 chars/sec): {strfdelta((total_chars - processed_chars) / chars_per_sec)}')
|
||||
chars_per_sec = 500 if torch.cuda.is_available() else 50
|
||||
print(f'Estimated time remaining (assuming {chars_per_sec} chars/sec): {strfdelta((total_chars - processed_chars) / chars_per_sec)}')
|
||||
|
||||
chapter_mp3_files = []
|
||||
durations = {}
|
||||
|
||||
for i, text in enumerate(texts, start=1):
|
||||
chapter_filename = filename.replace('.epub', f'_chapter_{i}.wav')
|
||||
chapter_mp3_files.append(chapter_filename)
|
||||
chapter_wav_files = []
|
||||
for i, chapter in enumerate(selected_chapters, start=1):
|
||||
if max_chapters and i > max_chapters: break
|
||||
text = chapter.extracted_text
|
||||
xhtml_file_name = chapter.get_name().replace(' ', '_').replace('/', '_').replace('\\', '_')
|
||||
chapter_filename = filename.replace('.epub', f'_chapter_{i}_{voice}_{xhtml_file_name}.wav')
|
||||
chapter_wav_files.append(chapter_filename)
|
||||
if Path(chapter_filename).exists():
|
||||
print(f'File for chapter {i} already exists. Skipping')
|
||||
continue
|
||||
if len(text.strip()) < 10:
|
||||
print(f'Skipping empty chapter {i}')
|
||||
chapter_mp3_files.remove(chapter_filename)
|
||||
chapter_wav_files.remove(chapter_filename)
|
||||
continue
|
||||
print(f'Reading chapter {i} ({len(text):,} characters)...')
|
||||
if i == 1:
|
||||
text = intro + '.\n\n' + text
|
||||
start_time = time.time()
|
||||
samples, sample_rate = kokoro.create(text, voice=voice, speed=speed, lang=lang)
|
||||
sf.write(f'{chapter_filename}', samples, sample_rate)
|
||||
durations[chapter_filename] = len(samples) / sample_rate
|
||||
end_time = time.time()
|
||||
delta_seconds = end_time - start_time
|
||||
chars_per_sec = len(text) / delta_seconds
|
||||
processed_chars += len(text)
|
||||
print(f'Estimated time remaining: {strfdelta((total_chars - processed_chars) / chars_per_sec)}')
|
||||
print('Chapter written to', chapter_filename)
|
||||
print(f'Chapter {i} read in {delta_seconds:.2f} seconds ({chars_per_sec:.0f} characters per second)')
|
||||
progress = processed_chars * 100 // total_chars
|
||||
print('Progress:', f'{progress}%\n')
|
||||
pipeline = KPipeline(lang_code=voice[0]) # a for american or b for british etc.
|
||||
|
||||
with yaspin(text=f'Reading chapter {i} ({len(text):,} characters)...', color="yellow") as spinner:
|
||||
audio_segments = gen_audio_segments(pipeline, text, voice, speed)
|
||||
if audio_segments:
|
||||
final_audio = np.concatenate(audio_segments)
|
||||
soundfile.write(chapter_filename, final_audio, sample_rate)
|
||||
end_time = time.time()
|
||||
delta_seconds = end_time - start_time
|
||||
chars_per_sec = len(text) / delta_seconds
|
||||
processed_chars += len(text)
|
||||
spinner.ok("✅")
|
||||
print(f'Estimated time remaining: {strfdelta((total_chars - processed_chars) / chars_per_sec)}')
|
||||
print('Chapter written to', chapter_filename)
|
||||
print(f'Chapter {i} read in {delta_seconds:.2f} seconds ({chars_per_sec:.0f} characters per second)')
|
||||
progress = processed_chars * 100 // total_chars
|
||||
print('Progress:', f'{progress}%\n')
|
||||
else:
|
||||
spinner.fail("❌")
|
||||
print(f'Warning: No audio generated for chapter {i}')
|
||||
chapter_wav_files.remove(chapter_filename)
|
||||
|
||||
if has_ffmpeg:
|
||||
create_index_file(title, creator, chapter_mp3_files, durations)
|
||||
create_m4b(chapter_mp3_files, filename, title, creator, cover_image)
|
||||
create_index_file(title, creator, chapter_wav_files)
|
||||
create_m4b(chapter_wav_files, filename, cover_image)
|
||||
|
||||
|
||||
def find_cover(book):
|
||||
def is_image(item):
|
||||
|
@ -127,9 +133,32 @@ def find_cover(book):
|
|||
|
||||
return None
|
||||
|
||||
def extract_texts(chapters):
|
||||
texts = []
|
||||
for chapter in chapters:
|
||||
|
||||
def print_selected_chapters(document_chapters, chapters):
|
||||
print(tabulate([
|
||||
[i, c.get_name(), len(c.extracted_text), '✅' if c in chapters else '', chapter_beginning_one_liner(c)]
|
||||
for i, c in enumerate(document_chapters, start=1)
|
||||
], headers=['#', 'Chapter', 'Text Length', 'Selected', 'First words']))
|
||||
|
||||
|
||||
def gen_audio_segments(pipeline, text, voice, speed):
|
||||
nlp = spacy.load('xx_ent_wiki_sm')
|
||||
nlp.add_pipe('sentencizer')
|
||||
audio_segments = []
|
||||
doc = nlp(text)
|
||||
sentences = list(doc.sents)
|
||||
for sent in sentences:
|
||||
for gs, ps, audio in pipeline(sent.text, voice=voice, speed=speed, split_pattern=r'\n\n\n'):
|
||||
audio_segments.append(audio)
|
||||
return audio_segments
|
||||
|
||||
|
||||
def find_document_chapters_and_extract_texts(book):
|
||||
"""Returns every chapter that is an ITEM_DOCUMENT and enriches each chapter with extracted_text."""
|
||||
document_chapters = []
|
||||
for chapter in book.get_items():
|
||||
if chapter.get_type() != ebooklib.ITEM_DOCUMENT:
|
||||
continue
|
||||
xml = chapter.get_body_content()
|
||||
soup = BeautifulSoup(xml, features='lxml')
|
||||
chapter_text = ''
|
||||
|
@ -138,39 +167,46 @@ def extract_texts(chapters):
|
|||
inner_text = child.text.strip() if child.text else ""
|
||||
if inner_text:
|
||||
chapter_text += inner_text + '\n'
|
||||
texts.append(chapter_text)
|
||||
return texts
|
||||
chapter.extracted_text = chapter_text
|
||||
document_chapters.append(chapter)
|
||||
return document_chapters
|
||||
|
||||
|
||||
def is_chapter(c):
|
||||
name = c.get_name().lower()
|
||||
return bool(
|
||||
has_min_len = len(c.extracted_text) > 100
|
||||
title_looks_like_chapter = bool(
|
||||
'chapter' in name.lower()
|
||||
or re.search(r'part\d{1,3}', name)
|
||||
or re.search(r'ch\d{1,3}', name)
|
||||
or re.search(r'chap\d{1,3}', name)
|
||||
or re.search(r'part_?\d{1,3}', name)
|
||||
or re.search(r'split_?\d{1,3}', name)
|
||||
or re.search(r'ch_?\d{1,3}', name)
|
||||
or re.search(r'chap_?\d{1,3}', name)
|
||||
)
|
||||
return has_min_len and title_looks_like_chapter
|
||||
|
||||
|
||||
def find_chapters(book, verbose=False):
|
||||
chapters = [c for c in book.get_items() if c.get_type() == ebooklib.ITEM_DOCUMENT and is_chapter(c)]
|
||||
if verbose:
|
||||
for item in book.get_items():
|
||||
if item.get_type() == ebooklib.ITEM_DOCUMENT:
|
||||
print(f"'{item.get_name()}'" + ', #' + str(len(item.get_body_content())))
|
||||
# print(f'{item.get_name()}'.ljust(60), str(len(item.get_body_content())).ljust(15), 'X' if item in chapters else '-')
|
||||
def chapter_beginning_one_liner(c, chars=20):
|
||||
s = c.extracted_text[:chars].strip().replace('\n', ' ').replace('\r', ' ')
|
||||
return s + '…' if len(s) > 0 else ''
|
||||
|
||||
|
||||
def find_good_chapters(document_chapters):
|
||||
chapters = [c for c in document_chapters if c.get_type() == ebooklib.ITEM_DOCUMENT and is_chapter(c)]
|
||||
if len(chapters) == 0:
|
||||
print('Not easy to find the chapters, defaulting to all available documents.')
|
||||
chapters = [c for c in book.get_items() if c.get_type() == ebooklib.ITEM_DOCUMENT]
|
||||
print('Not easy to recognize the chapters, defaulting to all non-empty documents.')
|
||||
chapters = [c for c in document_chapters if c.get_type() == ebooklib.ITEM_DOCUMENT and len(c.extracted_text) > 10]
|
||||
return chapters
|
||||
|
||||
|
||||
def pick_chapters(book):
|
||||
all_chapters_names = [c.get_name() for c in book.get_items() if c.get_type() == ebooklib.ITEM_DOCUMENT]
|
||||
def pick_chapters(chapters):
|
||||
# Display the document name, the length and first 50 characters of the text
|
||||
chapters_by_names = {
|
||||
f'{c.get_name()}\t({len(c.extracted_text)} chars)\t[{chapter_beginning_one_liner(c, 50)}]': c
|
||||
for c in chapters}
|
||||
title = 'Select which chapters to read in the audiobook'
|
||||
selected_chapters_names = pick(all_chapters_names, title, multiselect=True, min_selection_count=1)
|
||||
selected_chapters_names = [c[0] for c in selected_chapters_names]
|
||||
selected_chapters = [c for c in book.get_items() if c.get_name() in selected_chapters_names]
|
||||
ret = pick(list(chapters_by_names.keys()), title, multiselect=True, min_selection_count=1)
|
||||
selected_chapters_out_of_order = [chapters_by_names[r[0]] for r in ret]
|
||||
selected_chapters = [c for c in chapters if c in selected_chapters_out_of_order]
|
||||
return selected_chapters
|
||||
|
||||
|
||||
|
@ -187,7 +223,7 @@ def strfdelta(tdelta, fmt='{D:02}d {H:02}h {M:02}m {S:02}s'):
|
|||
return f.format(fmt, **values)
|
||||
|
||||
|
||||
def create_m4b(chapter_files, filename, title, author, cover_image):
|
||||
def create_m4b(chapter_files, filename, cover_image):
|
||||
tmp_filename = filename.replace('.epub', '.tmp.mp4')
|
||||
if not Path(tmp_filename).exists():
|
||||
combined_audio = AudioSegment.empty()
|
||||
|
@ -232,51 +268,46 @@ def probe_duration(file_name):
|
|||
return float(proc.stdout.strip())
|
||||
|
||||
|
||||
def create_index_file(title, creator, chapter_mp3_files, durations):
|
||||
def create_index_file(title, creator, chapter_mp3_files):
|
||||
with open("chapters.txt", "w") as f:
|
||||
f.write(f";FFMETADATA1\ntitle={title}\nartist={creator}\n\n")
|
||||
start = 0
|
||||
i = 0
|
||||
for c in chapter_mp3_files:
|
||||
if c not in durations:
|
||||
durations[c] = probe_duration(c)
|
||||
end = start + (int)(durations[c] * 1000)
|
||||
duration = probe_duration(c)
|
||||
end = start + (int)(duration * 1000)
|
||||
f.write(f"[CHAPTER]\nTIMEBASE=1/1000\nSTART={start}\nEND={end}\ntitle=Chapter {i}\n\n")
|
||||
i += 1
|
||||
start = end
|
||||
|
||||
|
||||
def cli_main():
|
||||
if not Path(MODEL_FILE).exists() or not Path(VOICES_FILE).exists():
|
||||
print('Error: kokoro-v0_19.onnx and voices.json must be in the current directory. Please download them with:')
|
||||
print('wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/kokoro-v0_19.onnx')
|
||||
print('wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/voices.json')
|
||||
sys.exit(1)
|
||||
kokoro = Kokoro(MODEL_FILE, VOICES_FILE)
|
||||
voices = list(kokoro.get_voices())
|
||||
voices_str = ', '.join(voices)
|
||||
epilog = 'example:\n' + \
|
||||
' audiblez book.epub -l en-us -v af_sky'
|
||||
default_voice = 'af_sky' if 'af_sky' in voices else voices[0]
|
||||
|
||||
# Get available ONNX providers
|
||||
available_providers = ort.get_available_providers()
|
||||
providers_help = f"Available ONNX providers: {', '.join(available_providers)}"
|
||||
|
||||
epilog = ('example:\n' +
|
||||
' audiblez book.epub -l en-us -v af_sky\n\n' +
|
||||
'available voices:\n' +
|
||||
available_voices_str)
|
||||
default_voice = 'af_sky'
|
||||
parser = argparse.ArgumentParser(epilog=epilog, formatter_class=argparse.RawDescriptionHelpFormatter)
|
||||
parser.add_argument('epub_file_path', help='Path to the epub file')
|
||||
parser.add_argument('-l', '--lang', default='en-gb', help='Language code: en-gb, en-us, fr-fr, ja, ko, cmn')
|
||||
parser.add_argument('-v', '--voice', default=default_voice, help=f'Choose narrating voice: {voices_str}')
|
||||
parser.add_argument('-p', '--pick', default=False, help=f'Interactively select which chapters to read in the audiobook',
|
||||
action='store_true')
|
||||
parser.add_argument('-p', '--pick', default=False, help=f'Interactively select which chapters to read in the audiobook', action='store_true')
|
||||
parser.add_argument('-s', '--speed', default=1.0, help=f'Set speed from 0.5 to 2.0', type=float)
|
||||
parser.add_argument('--providers', nargs='+', metavar='PROVIDER', help=f"Specify ONNX providers. {providers_help}")
|
||||
parser.add_argument('-c', '--cuda', default=False, help=f'Use GPU via Cuda in Torch if available', action='store_true')
|
||||
|
||||
if len(sys.argv) == 1:
|
||||
parser.print_help(sys.stderr)
|
||||
sys.exit(1)
|
||||
args = parser.parse_args()
|
||||
main(kokoro, args.epub_file_path, args.lang, args.voice, args.pick, args.speed, args.providers)
|
||||
|
||||
if args.cuda:
|
||||
if torch.cuda.is_available():
|
||||
print('CUDA GPU available')
|
||||
torch.set_default_device('cuda')
|
||||
else:
|
||||
print('CUDA GPU not available. Defaulting to CPU')
|
||||
|
||||
main(args.epub_file_path, args.voice, args.pick, args.speed)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
|
2448
poetry.lock
generated
2448
poetry.lock
generated
File diff suppressed because it is too large
Load diff
|
@ -1,81 +1,21 @@
|
|||
[project]
|
||||
name = "audiblez"
|
||||
version = "0.2.2"
|
||||
version = "0.3.1"
|
||||
description = "Generate audiobooks from e-books (epub to wav/m4b)"
|
||||
authors = [
|
||||
{ name = "Claudio Santini", email = "hireclaudio@gmail.com" }
|
||||
]
|
||||
readme = "README.md"
|
||||
requires-python = ">=3.9,<3.13" # librosa/llvmlite have no support for python > 3.12
|
||||
requires-python = ">=3.9,<3.13"
|
||||
dependencies = [
|
||||
"bs4 (==0.0.2)",
|
||||
"attrs (==24.3.0)",
|
||||
"audioread (==3.0.1)",
|
||||
"babel (==2.16.0)",
|
||||
"beautifulsoup4 (==4.12.3)",
|
||||
"bibtexparser (==2.0.0b8)",
|
||||
"certifi (==2024.12.14)",
|
||||
"cffi (==1.17.1)",
|
||||
"charset-normalizer (==3.4.1)",
|
||||
"clldutils (==3.24.0)",
|
||||
"colorama (==0.4.6)",
|
||||
"coloredlogs (==15.0.1)",
|
||||
"colorlog (==6.9.0)",
|
||||
"csvw (==3.5.1)",
|
||||
"decorator (==5.1.1)",
|
||||
"dlinfo (==1.2.1)",
|
||||
"ebooklib (==0.18)",
|
||||
"espeakng-loader (==0.2.1)",
|
||||
"flatbuffers (==24.12.23)",
|
||||
"humanfriendly (==10.0)",
|
||||
"idna (==3.10)",
|
||||
"isodate (==0.7.2)",
|
||||
"joblib (==1.4.2)",
|
||||
"jsonschema (==4.23.0)",
|
||||
"jsonschema-specifications (==2024.10.1)",
|
||||
"kokoro-onnx (==0.2.6)",
|
||||
"language-tags (==1.2.0)",
|
||||
"lazy-loader (==0.4)",
|
||||
"librosa (==0.10.2.post1)",
|
||||
"llvmlite (==0.43.0)",
|
||||
"lxml (==5.3.0)",
|
||||
"markdown (==3.7)",
|
||||
"markupsafe (==3.0.2)",
|
||||
"mpmath (==1.3.0)",
|
||||
"msgpack (==1.1.0)",
|
||||
"numba (==0.60.0)",
|
||||
"numpy (==2.0.2)",
|
||||
"onnxruntime (==1.20.1)",
|
||||
"packaging (==24.2)",
|
||||
"phonemizer-fork (==3.3.1)",
|
||||
"platformdirs (==4.3.6)",
|
||||
"pooch (==1.8.2)",
|
||||
"protobuf (==5.29.3)",
|
||||
"pycparser (==2.22)",
|
||||
"pylatexenc (==2.10)",
|
||||
"pyparsing (==3.2.1)",
|
||||
"python-dateutil (==2.9.0.post0)",
|
||||
"rdflib (==7.1.2)",
|
||||
"referencing (==0.35.1)",
|
||||
"regex (==2024.11.6)",
|
||||
"requests (==2.32.3)",
|
||||
"rfc3986 (==1.5.0)",
|
||||
"rpds-py (==0.22.3)",
|
||||
"scikit-learn (==1.6.1)",
|
||||
"scipy (==1.15.1)",
|
||||
"segments (==2.2.1)",
|
||||
"six (==1.17.0)",
|
||||
"soundfile (==0.13.0)",
|
||||
"soupsieve (==2.6)",
|
||||
"soxr (==0.5.0.post1)",
|
||||
"sympy (==1.13.3)",
|
||||
"tabulate (==0.9.0)",
|
||||
"threadpoolctl (==3.5.0)",
|
||||
"typing-extensions (==4.12.2)",
|
||||
"uritemplate (==4.1.1)",
|
||||
"urllib3 (==2.3.0)",
|
||||
"pydub (>=0.25.1,<0.26.0)",
|
||||
"kokoro (>=0.2.3,<0.3.0)",
|
||||
"ebooklib (>=0.18,<0.19)",
|
||||
"soundfile (>=0.13.1,<0.14.0)",
|
||||
"pick (>=2.4.0,<3.0.0)",
|
||||
"bs4 (>=0.0.2,<0.0.3)",
|
||||
"pydub (>=0.25.1,<0.26.0)",
|
||||
"spacy (>=3.8.4,<4.0.0)",
|
||||
"yaspin (>=3.1.0,<4.0.0)"
|
||||
]
|
||||
|
||||
|
||||
|
@ -91,3 +31,6 @@ Issues = "https://github.com/santinic/audiblez/issues"
|
|||
|
||||
[project.scripts]
|
||||
audiblez = "audiblez:cli_main"
|
||||
|
||||
[tool.poetry.group.dev.dependencies]
|
||||
deptry = "^0.23.0"
|
|
@ -1,13 +1,15 @@
|
|||
import unittest
|
||||
from ebooklib import epub
|
||||
|
||||
from audiblez import find_chapters
|
||||
from audiblez import find_good_chapters, find_document_chapters_and_extract_texts
|
||||
|
||||
|
||||
@unittest.skip('Development only, not for CI')
|
||||
class FindChaptersTest(unittest.TestCase):
|
||||
def base(self, file, expected_chapter_names):
|
||||
book = epub.read_epub(file)
|
||||
chapters = find_chapters(book)
|
||||
document_chapters = find_document_chapters_and_extract_texts(book)
|
||||
chapters = find_good_chapters(document_chapters)
|
||||
chapter_names = [c.get_name() for c in chapters]
|
||||
self.assertEqual(chapter_names, expected_chapter_names)
|
||||
|
||||
|
@ -102,32 +104,32 @@ class FindChaptersTest(unittest.TestCase):
|
|||
# 'Text/03_Dedi.xhtml', # 211
|
||||
# 'Text/04_Contents.xhtml', # 6302
|
||||
# 'Text/06_Intro.xhtml', # 34726
|
||||
'Text/11_Part1.xhtml', # 332
|
||||
# 'Text/11_Part1.xhtml', # 332
|
||||
'Text/12_Chapter01.xhtml', # 33940
|
||||
'Text/12_Chapter02.xhtml', # 47738
|
||||
'Text/12_Chapter02_Part2.xhtml', # 328
|
||||
# 'Text/12_Chapter02_Part2.xhtml', # 328
|
||||
'Text/12_Chapter03.xhtml', # 42010
|
||||
'Text/12_Chapter04.xhtml', # 42182
|
||||
'Text/12_Chapter05.xhtml', # 51283
|
||||
'Text/12_Chapter05_Part3.xhtml', # 327
|
||||
# 'Text/12_Chapter05_Part3.xhtml', # 327
|
||||
'Text/12_Chapter06.xhtml', # 46570
|
||||
'Text/12_Chapter07.xhtml', # 48752
|
||||
'Text/12_Chapter08.xhtml', # 52048
|
||||
'Text/12_Chapter09.xhtml', # 35717
|
||||
'Text/12_Chapter09_Part4.xhtml', # 343
|
||||
# 'Text/12_Chapter09_Part4.xhtml', # 343
|
||||
'Text/12_Chapter10.xhtml', # 43114
|
||||
'Text/12_Chapter11.xhtml', # 53469
|
||||
'Text/12_Chapter12.xhtml', # 29441
|
||||
'Text/12_Chapter13.xhtml', # 34265
|
||||
'Text/12_Chapter13_Part5.xhtml', # 327
|
||||
# 'Text/12_Chapter13_Part5.xhtml', # 327
|
||||
'Text/12_Chapter14.xhtml', # 42381
|
||||
'Text/12_Chapter15.xhtml', # 43875
|
||||
'Text/12_Chapter16.xhtml', # 28861
|
||||
'Text/12_Chapter16_Part6.xhtml', # 328
|
||||
# 'Text/12_Chapter16_Part6.xhtml', # 328
|
||||
'Text/12_Chapter17.xhtml', # 45652
|
||||
'Text/12_Chapter18.xhtml', # 43328
|
||||
'Text/12_Chapter19.xhtml', # 34453
|
||||
'Text/12_Chapter19_Part7.xhtml', # 334
|
||||
# 'Text/12_Chapter19_Part7.xhtml', # 334
|
||||
'Text/12_Chapter20.xhtml', # 40554
|
||||
'Text/12_Chapter21.xhtml', # 30382
|
||||
'Text/12_Chapter22.xhtml', # 57467
|
||||
|
|
|
@ -1,28 +1,44 @@
|
|||
import os
|
||||
import unittest
|
||||
from pathlib import Path
|
||||
from kokoro_onnx import Kokoro
|
||||
|
||||
from audiblez import VOICES_FILE, MODEL_FILE, main
|
||||
from kokoro import KPipeline
|
||||
|
||||
from audiblez import main
|
||||
|
||||
|
||||
class MainTest(unittest.TestCase):
|
||||
def base(self, **kwargs):
|
||||
base_path = Path(__file__).parent / '..'
|
||||
kokoro = Kokoro(base_path / MODEL_FILE, base_path / VOICES_FILE)
|
||||
main(kokoro, lang='en-gb', voice='af_sky', providers=None, pick_manually=False, speed=1, **kwargs)
|
||||
def base(self, name, url, **kwargs):
|
||||
if not Path(f'{name}.epub').exists():
|
||||
os.system(f'wget {url} -O {name}.epub')
|
||||
Path(f'{name}.m4b').unlink(missing_ok=True)
|
||||
os.system(f'rm {name}_chapter_*.wav')
|
||||
merged_args = dict(voice='af_sky', pick_manually=False, speed=1.0, max_chapters=2)
|
||||
merged_args.update(kwargs)
|
||||
main(f'{name}.epub', **merged_args)
|
||||
m4b_file = Path(f'{name}.m4b')
|
||||
self.assertTrue(m4b_file.exists())
|
||||
self.assertTrue(m4b_file.stat().st_size > 256 * 1024)
|
||||
|
||||
def test_1_mini(self):
|
||||
Path('mini.m4b').unlink(missing_ok=True)
|
||||
self.base(file_path='../epub/mini.epub')
|
||||
self.assertTrue(Path('mini.m4b').exists())
|
||||
def test_poe(self):
|
||||
url = 'https://www.gutenberg.org/ebooks/1064.epub.images'
|
||||
self.base('poe', url)
|
||||
|
||||
def test_2_allan_poe(self):
|
||||
Path('poe.m4b').unlink(missing_ok=True)
|
||||
self.base(file_path='../epub/poe.epub')
|
||||
self.assertTrue(Path('poe.m4b').exists())
|
||||
@unittest.skip('too slow for CI')
|
||||
def test_orwell(self):
|
||||
url = 'https://archive.org/download/AnimalFarmByGeorgeOrwell/Animal%20Farm%20by%20George%20Orwell.epub'
|
||||
self.base('orwell', url)
|
||||
|
||||
def test_3_gene(self):
|
||||
Path('gene.m4b').unlink(missing_ok=True)
|
||||
self.base(file_path='../epub/gene.epub')
|
||||
self.assertTrue(Path('gene.m4b').exists())
|
||||
def test_italian_pirandello(self):
|
||||
url = 'https://www.liberliber.eu/mediateca/libri/p/pirandello/cosi_e_se_vi_pare_1925/epub/pirandello_cosi_e_se_vi_pare_1925.epub'
|
||||
self.base('pirandello', url, voice='im_nicola')
|
||||
self.assertTrue(Path('pirandello.m4b').exists())
|
||||
|
||||
@unittest.skip('too slow for CI')
|
||||
def test_italian_manzoni(self):
|
||||
url = 'https://www.liberliber.eu/mediateca/libri/m/manzoni/i_promessi_sposi/epub/manzoni_i_promessi_sposi.epub'
|
||||
self.base('manzoni', url, voice='im_nicola', max_chapters=1)
|
||||
|
||||
def test_french_baudelaire(self):
|
||||
url = 'http://gallica.bnf.fr/ark:/12148/bpt6k70861t.epub'
|
||||
self.base('baudelaire', url, voice='ff_siwis')
|
||||
|
|
67
voices.py
Normal file
67
voices.py
Normal file
|
@ -0,0 +1,67 @@
|
|||
flags = {
|
||||
'a': '🇺🇸',
|
||||
'b': '🇬🇧',
|
||||
'e': '🇪🇸',
|
||||
'f': '🇫🇷',
|
||||
'h': '🇮🇳',
|
||||
'i': '🇮🇹',
|
||||
'j': '🇯🇵',
|
||||
'p': '🇧🇷',
|
||||
'z': '🇨🇳'
|
||||
}
|
||||
|
||||
voices = {
|
||||
'a': [
|
||||
'af_alloy',
|
||||
'af_aoede',
|
||||
'af_bella',
|
||||
'af_heart',
|
||||
'af_jessica',
|
||||
'af_kore',
|
||||
'af_nicole',
|
||||
'af_nova',
|
||||
'af_river',
|
||||
'af_sarah',
|
||||
'af_sky',
|
||||
'am_adam',
|
||||
'am_echo',
|
||||
'am_eric',
|
||||
'am_fenrir',
|
||||
'am_liam',
|
||||
'am_michael',
|
||||
'am_onyx',
|
||||
'am_puck',
|
||||
'am_santa'],
|
||||
'b': [
|
||||
'bf_alice',
|
||||
'bf_emma',
|
||||
'bf_isabella',
|
||||
'bf_lily',
|
||||
'bm_daniel',
|
||||
'bm_fable',
|
||||
'bm_george',
|
||||
'bm_lewis'],
|
||||
'e': ['ef_dora', 'em_alex', 'em_santa'],
|
||||
'f': ['ff_siwis'],
|
||||
'h': ['hf_alpha', 'hf_beta', 'hm_omega', 'hm_psi'],
|
||||
'i': ['if_sara', 'im_nicola'],
|
||||
'j': ['jf_alpha', 'jf_gongitsune', 'jf_nezumi', 'jf_tebukuro', 'jm_kumo'],
|
||||
'p': ['pf_dora', 'pm_alex', 'pm_santa'],
|
||||
'z': [
|
||||
'zf_xiaobei',
|
||||
'zf_xiaoni',
|
||||
'zf_xiaoxiao',
|
||||
'zf_xiaoyi',
|
||||
'zm_yunjian',
|
||||
'zm_yunxi',
|
||||
'zm_yunxia',
|
||||
'zm_yunyang'
|
||||
]
|
||||
}
|
||||
|
||||
available_voices_str = ('\n'.join([f' {flags[lang]} {", ".join(voices[lang])}' for lang in voices])
|
||||
.replace(' af_sky,', '\n af_sky,'))
|
||||
|
||||
# for key, l in voices.items():
|
||||
# ls = ', '.join([f'`{j}`' for j in l])
|
||||
# print(f'| {flags[key]} | {ls} |')
|
Loading…
Add table
Reference in a new issue