Check out my blog on my progress and process throughout GSoC 2019!
Given an audio file of some recognizable song, autosynch will try to align its lyrics to their temporal location in the song. The song lyrics must be available on Genius.
This project is still in its early stages and is inaccurate in many cases. Optimization is a work in progress, but feel free to try it out, modify it, or contribute!
Developed during Google Summer of Code 2019 with CCExtractor.
To install, do the following:
git clone
cd autosynch
pip install -r requirements.txt
Note: autosynch is supported only on Python 3.6+.
Using autosynch requires a trained model for vocal isolation as well as PortAudio. For mp3 support, SoX is required. On MacOS/Linux, get everything by executing:
chmod a+x setup.sh
./setup.sh
If you would like to download the weights manually or get a different version, check here:
Weights must be placed into autosynch/mad_twinnet/outputs/states
.
On Mac:
brew install portaudio
On Linux:
sudo apt-get update
sudo apt-get install portaudio19-dev
Note: Installing SoX is optional and only required for processing mp3 files.
On Mac:
brew install ffmpeg
On Linux:
sudo apt install ffmpeg
To play a song with its lyrics displayed at its calculated position:
python autosynch/playback.py [audio_file.wav] [artist] [song_title]
It will take a few minutes to perform the alignment process. To save the
alignment data to eliminate processing time in future plays of the same audio,
add the flag -s SAVE_DIR
, where SAVE_DIR
is the directory you want to save
the alignment data.
If you have already generated and saved an alignment data file:
python autosynch/playback.py [audio_file.wav] -f [align_file.yml]
If you would like to process an mp3 file, see this section. Running with an mp3 will automatically generate a wav file in the same directory.
Note: If you did not use setup.sh
, first make sure you set your Python
environment correctly with export PYTHONPATH=$PYTHONPATH:./
from the outer
autosynch
directory.
Bruno Mars - Finesse
(https://www.youtube.com/watch?v=csBDM14ssts)
The last chorus lags behind a bit, but for the most part sections and lines are nicely aligned.
Fun. - We Are Young
(https://www.youtube.com/watch?v=Z-yTGKd3ji8)
The instrumental at the beginning throws off the first verse, but everything catches up in by line 4.
- de Jong, N. and T. Wempe. "Praat script to detect syllable nuclei and measure speech rate automatically." Behavior Research Methods 41(2), 2009, pp. 385–390.
- Dedina, M. J. and H. C. Nusbaum. "PRONOUNCE: a program for pronunciation by analogy." Computer Speech & Language 5(1), 1991, pp. 55-64.
- Drossos, K., S. I. Mimilakis, D. Serdyuk, G. Schuller, T. Virtanen, Y. Bengio. "MaD TwinNet: Masker-denoiser architecture with twin networks for monaural sound source separation." IJCNN 2018.
- Lee, K. and M. Cremer. "Segmentation-based lyrics-audio alignment using dynamic programming." ISMIR 2008.
- Marchand, Y. and R. I. Damper. "A multistrategy approach to improving pronunciation by analogy." Computational Linguistics 26(2), 2000, pp. 196-219.
- Marchand, Y. and R. I. Damper. "Can syllabification improve pronunciation by analogy of English?" Natural Language Engineering 13(1), 2007, pp. 1-24.
- Nieto, O. and J. P. Bello. "Systematic exploration of computational music structure research." ISMIR 2016.
- Sejnowski, T. J. and C. R. Rosenberg. "Parallel networks that learn to pronounce English text." Complex Systems 1(1), 1987, pp. 145–168.