1, speaker embedding extract
https://github.com/PlayVoice/VI-Speaker
2, ppg extract
https://github.com/PlayVoice/whisper_ppg
1, python prepare/preprocess_wave.py
2, python prepare/preprocess.py
3, python train.py -c configs/singing_base.json -m singing_base
4, python visvc_infer.py -s [waves] -p [ppg&hubert] -e [speaker_embedding]
VI-SVC model is just VITS without MAS and DurationPredictor. Big data [more and more wave] make things to be interesing!