Skip to content

Latest commit

 

History

History
29 lines (20 loc) · 1.96 KB

README.md

File metadata and controls

29 lines (20 loc) · 1.96 KB

Epub to Audiobook (M4B)

Epub to MB4 Audiobook, with StyleTTS2 via local TTS api

Notes

  • This fork is designed to run the TTS through a local server / predictor. I got it working using this one: https://replicate.com/adirik/styletts2 but others should work as well.
  • You need approx 5GB of VRAM, and 5 GB of RAM to run the model locally (a bit less without custom voice)
  • This is designed to handle failure gracefully, in that if you start generating, and it crashes / errors during the generating process (happens occasionally with the HF API), then it'll skip over already generated chapters, and generate starting with the first one that is missing. This is also nice for breaking up the generation of large books. The m4b file is only generated once all chapters are generated.

Directions

  • Clone this repository locally.

  • Install all dependencies, as needed: pip install -r requirements.txt. Also make sure you have ffmpeg installed!

  • Run a local StyleTTS2 server using docker run -d -p 5000:5000 --gpus=all nixolas1/styletts2-api

  • Run using python3 epub-to-audiobook-hf.py <filename-of-epub> --voice reference_voice.wav

  • You should use the command line flag --voice reference_voice.wav. Leaving this out defaults to LJSpeech, the (faster, worse sounding imo) option, although it has only one voice.

A Big Thanks To: