-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize WGS VCF File Annotation for Improved Performance and Speed #1769
Comments
Hi @Ananya-swi thanks for reaching out to us. It would be useful to know the full VEP command you are using, so we can try and identify potential speed ups. However, even without that, I can suggest a possible option which is to use our Nextflow VEP, which offers a degree of parallelisation to speed up processing large data. I should say that we haven't tested it on cloud compute yet - so if you do decide to try it, let us know if you encounter any challenges. |
Hi @jamie-m-a, Thank you for the recommendation! I’m sharing the full VEP command I used below for your reference:
I’ll also explore the Nextflow VEP option to see if it speeds up the annotation process. If any issues arise on the cloud platform, I’ll follow up accordingly. Best regards, |
Hi @Ananya-swi No problem! Now that I can see your command, I notice you're not using forks, which can have a significant speed impact. Some general instructions for speeding up Ensembl VEP can be found here. Let us know how you get on. |
Hi @jamie-m-a, Thank you for the feedback! I wanted to clarify that I did use the --fork option, setting it to 32. However, I still observed long runtimes, with the process taking around 15 hours for a 1.8GB input VCF. It would be great if you could share any additional insights or optimization tips, particularly regarding other parameters that could improve performance. I’ll also explore the general recommendations provided in the link you shared. Looking forward to hearing from you! Thanks again, |
Apologies @Ananya-swi - I missed the fork flag. The other easy thing to check is whether your input VCF is properly sorted. Your run time does seem long for a file that size. Can you advise how many variants are in your input? |
Hi @jamie-m-a, Thank you for your response! I appreciate the suggestion about checking the sorting of my input VCF. I have confirmed that the VCF file is sorted correctly. Regarding your question, the input VCF contains 6,139,369 variants. Thanks again for your help! Best, |
Thanks for the update @Ananya-swi the run time does seem slow - I'll try running some tests on a similarly sized input and get back to you. |
Hi,
I am working on annotating large datasets, specifically Whole Genome Sequencing (WGS) VCF files, using the Variant Effect Predictor (VEP). However, the annotation process is taking significantly longer than expected. For example, annotating a 1.8GB VCF file took approximately 15 hours.
Environment Details:
I am seeking guidance on how to optimize VEP for faster annotation. Could you provide recommendations on:
Thank you for your support and insights.
Best regards,
Ananya Saji
The text was updated successfully, but these errors were encountered: