Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotation fails randomly with structural variants coming from Manta #1800

Open
fperezcobos opened this issue Nov 20, 2024 · 7 comments
Open
Assignees

Comments

@fperezcobos
Copy link

Describe the issue

I'm annotating a vcf coming from Manta. In the output, some of the variants are not annotated and I don't know why. In my vcf, all the variants have all fields information (including alternative allele). Also, there is no warning explaining why they are not annotated.

Additional information

Please fill in the following sections to help us find the source of your issue as quickly as possible.

System

  • VEP version: 112
  • tabix installed ? Yes

Full VEP command line

./vep -i formatted_SRX338100_SRR955759_vs_SRX2581517_SRR5277628_processed.vcf --gtf gtf_file.gz --fasta ntab.fa -o test_vep_annotation.txt --stats_file test_vep_annotation.stats.html --tab --fork 10 --buffer_size 500 --pick --minimal --dont_skip --force_overwrite

Full error message

No warning

Data files (if applicable)

They include:
gtf_file.gz
formatted_SRX338100_SRR955759_vs_SRX2581517_SRR5277628_processed.vcf.gz
test_vep_annotation.txt

@fperezcobos fperezcobos changed the title Annotations fails randomly with structural variants coming from Manta Annotation fails randomly with structural variants coming from Manta Nov 20, 2024
@olaaustine olaaustine self-assigned this Nov 20, 2024
@olaaustine
Copy link
Contributor

Hi @fperezcobos,
Hope this meets you well?
Can you tell us what species and what assembly so we can try to recreate the issue?
Thank you
Ola

@fperezcobos
Copy link
Author

Hi @olaaustine,
I'm working with tobacco. This is the link to download the genome:
ntab.fa.gz

And here you can find more files, including gtf and gff:
https://zenodo.org/records/8256256

Thank you for the quick reply :)

@olaaustine
Copy link
Contributor

olaaustine commented Nov 20, 2024

Hi @fperezcobos,
Hope you are well?
Please do you have an index file of the FASTA and gtf file being used.
If you do not, can you create a FASTA fai by using faidx and a gtf file index using tabix and run the command again
Thank you.
Ola

@fperezcobos
Copy link
Author

Hi @olaaustine,
Yes! Here they are:
ntab.fa.fai.gz
gtf_file.gz.tbi.gz
I included manually the .gz extension in both names of the files so I can upload them to github, please remove it.

@olaaustine
Copy link
Contributor

Hi @fperezcobos,
I have not been able to recreate the issue.
For the variants not annotated, there was a warning, I will attach the warning

WARNING: Ignoring 'five_prime_UTR' feature_type from gtf_file.gz GFF/GTF file. This feature_type is not supported in VEP.
WARNING: Ignoring 'three_prime_UTR' feature_type from gtf_file.gz GFF/GTF file. This feature_type is not supported in VEP.
WARNING: line 27 skipped (Chr22 44334344 Chr22_44334344_A_<DEL> A <DEL> ...): variant size (77552255) is bigger than --max_sv_size (10000000)
WARNING: line 33 skipped (Un00045 294530 Un00045_294530_A_A[Un00523:424[...): Chromosome Un00045 not found in annotation sources or synonyms; chromosome Un00045 does not overlap any features
WARNING: line 34 skipped (Un00047 89539 Un00047_89539_G_G]Un00432:41413]...): Chromosome Un00047 not found in annotation sources or synonyms; chromosome Un00047 does not overlap any features
WARNING: line 35 skipped (Un00090 175760 Un00090_175760_C_[Un00090:19815...): Chromosome Un00090 not found in annotation sources or synonyms; chromosome Un00090 does not overlap any features
WARNING: line 36 skipped (Un00090 198158 Un00090_198158_G_[Un00090:17576...): Chromosome Un00090 not found in annotation sources or synonyms; chromosome Un00090 does not overlap any features
WARNING: line 37 skipped (Un00138 44611 Un00138_44611_T_[Chr23:145107553...): Chromosome Un00138 not found in annotation sources or synonyms; chromosome Un00138 does not overlap any features
WARNING: line 38 skipped (Un00260 78544 Un00260_78544_A_A]Chr07:7672953]...): Chromosome Un00260 not found in annotation sources or synonyms; chromosome Un00260 does not overlap any features
WARNING: line 39 skipped (Un00387 21576 Un00387_21576_G_G]Un00041:113942...): Chromosome Un00387 not found in annotation sources or synonyms; chromosome Un00387 does not overlap any features
WARNING: line 40 skipped (Un00396 43504 Un00396_43504_G_]Un00432:41498]G...): Chromosome Un00396 not found in annotation sources or synonyms; chromosome Un00396 does not overlap any features
WARNING: line 41 skipped (Un00432 41413 Un00432_41413_A_A]Un00047:89539]...): Chromosome Un00432 not found in annotation sources or synonyms; chromosome Un00432 does not overlap any features
WARNING: line 42 skipped (Un00432 41498 Un00432_41498_G_G[Un00396:43504[...): Chromosome Un00432 not found in annotation sources or synonyms; chromosome Un00432 does not overlap any features
WARNING: line 43 skipped (Un00523 424 Un00523_424_T_]Un00045:294530]T T ...): Chromosome Un00523 not found in annotation sources or synonyms; chromosome Un00523 does not overlap any features
WARNING: line 44 skipped (Un00597 4502 Un00597_4502_G_[Un00597:8590[G G ...): Chromosome Un00597 not found in annotation sources or synonyms; chromosome Un00597 does not overlap any features
WARNING: line 45 skipped (Un00597 8590 Un00597_8590_A_[Un00597:4502[A A ...): Chromosome Un00597 not found in annotation sources or synonyms; chromosome Un00597 does not overlap any features
WARNING: line 46 skipped (Un00660 403 Un00660_403_C_]Un00662:2809]C C ]U...): Chromosome Un00660 not found in annotation sources or synonyms; chromosome Un00660 does not overlap any features
WARNING: line 47 skipped (Un00662 2809 Un00662_2809_G_G[Un00660:403[ G G...): Chromosome Un00662 not found in annotation sources or synonyms; chromosome Un00662 does not overlap any features
WARNING: line 48 skipped (Un00668 2421 Un00668_2421_T_T]Chr18:127887142]...): Chromosome Un00668 not found in annotation sources or synonyms; chromosome Un00668 does not overlap any features

It is an issue with the annotation source being used.
Let me know if this helps.
Thank you
Ola.

@fperezcobos
Copy link
Author

fperezcobos commented Nov 21, 2024

Hi @olaaustine,
Thanks for trying to recreate the issue :)
Actually I reported this behaviour in issue #1798 . In those cases, the fasta file has some small contigs with no genes annotated. I think this is normal and there should be no error for variants located in contigs with no genes.

In this case, I refer to other variants. If you check the input vcf, there are 48 variants. While running VEP, there are 18 warnings so in the vep output should be 30 variants annotated, but if you check, there are only 14 variants annotated. Sixteen variants are not annotated and there was no warning. I'm reporting that behaviour in this issue.

I hope I could make myself clear and explain properly the issue.

Thank you for your time :)

@olaaustine
Copy link
Contributor

olaaustine commented Dec 3, 2024

Hi @fperezcobos,
Hope you are well?
Thank you for explaining the issue.
Can you give me an example of a variant that is not annotated in the input file shared that was not skipped?
Thank you
Ola

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants