A command line interface to combine text information from subtitles with voice data in the video. Provides a convenient way to generate training data for speech-recognition purposes.
The project provides a quick way to generate audio training data for speech-recognition machine learning models. It utilises the vast knowledge bank of annotated voice data we already have, Subtitles!!
It reads the subtitles line by line and clips the audio from the video for the corresponding time interval.
example usage:
$ srt_voice -fv video.mkv -fs subtitles.srt -o output_dir
This then follows a series a prompts that allow you to decide to whether to keep or discard an audio clip. Like the one given below
I know what you are.
[y: Keep] [n: Delete] [r: Repeat] [q: Quit]
Kept as 5-I_know_what_you_are-f3nKAy.mp3
------------------------------------------
It creates the directory output_dir
and nicely arranges the audio clips there.
The training text (utf-8 encoded) is kept intact as the title
attribute of the mp3 file.
For more usage options:
$ srt_voice -h
You will need these
- Audacious Music Player
- Python 3 (Optional, but recommended because of some syncing issues in moviepy)
Then:
$ pip install srtvoiceext
This has been possible only because of the hard work of the maintainers of packages like
- moviepy
- pysrt
- mutagen
- shortuuid
This project has been set up using PyScaffold 2.5.7. For details and usage information on PyScaffold see http://pyscaffold.readthedocs.org/.