-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow c. positions for intergenic variants in the context of a genomic reference sequence #652
Comments
@ifokkema . We are waiting on the outcome of HGVSnomenclature/hgvs-nomenclature#186 (comment) Before updating |
I would not support this format because it does not support the intron format. It breaks it :). See the link to the above HGVS discussion |
How do you mean it breaks the intron format? It's basically the same format as the current UTR format, but then calculating positions into the NC, the same way the intronic variants do. Or do you mean that an "intronic" implementation of this description would natively be "c.-100-100", that should now become "c.-200"? |
I think the c.-100-100 is more correct, assuming a 100 nt UTR. My reasoning is that an intron in the UTR would be c.-50-100 where the intron requires statement of the last base of the intron boundary. We should treat positions beyond the UTR in exactly the same way. c.-200 would not therefore be correct brecause you need to state the last position of the UTR, then go into the flanking region. Same at the 3* end. c.*200 should be c.*100+100 not c.*200. Using c.-200 or *200 would imply that the UTR is still going |
perhaps I misunderstood your initial example? |
No, you nailed it!
There are indeed good arguments for that format, and I believe the earlier suggestion was close to that (c.-100-u100). The HVNC, however, voted against using that "intron-like" format. I believe the general idea was that it was easily confused with a (perhaps deep) intronic sequence, whereas the c.-200 format more clearly indicated it could affect the promotor region. It could also, indeed, falsely suggest that this position is located in the RNA, but this was considered less important of a misinterpretation than the interpretation that it is a (deeply) intronic position. |
Well, I will trust you to be sensible in the vote. I think that the only format that is useble is the c.-X-X etc. The addition of the u in indeed unnecessary and over complicated. The -200 format is misleading and does not need to have a deeply intronic/intergenic position to be clear e.g. -100-1 is very clear. |
Although the vote has already occurred, the I honestly think the HVNC should start meeting on a monthly basis since we have so many things to discuss, but we'll have to do with the current schedule. Next meeting is Dec 9th (moved from Dec 2nd), and I do believe/hope the |
Don'g get me stated on NC_(NM_). As I sais in a previous email, it is not sufficient as a unique identifier. It is also dangerous. NC(NM)UCSC alignment gives in very important clinical genes a different outcome than NC(NM)RefSeq alignments. Also, handling of gapped alignments. This really needs a lot more thougt. It is being over simplified and is goping to lead to miss/missed diagnoses and lack of reproducability and findability in some clinical genes P.s. I have lots of examples :) |
I have never understood the rationale of the choice of reference sequence order when specifying an intronic variant. The variant description The extra information required to confirm that the nucleotide at the +1 position is derived from the the genomic sequence, in this instance GRCh38, which is In written English (and perhaps in other languages) brackets are commonly used to enclose information that might be regarded as being subsidiary to what is being described or explained. For that reason, I think that the logical order for the presentation of intronic variants ought to be, for example, I have never (to my knowledge) ever seen a reasoned argument on behalf "NC_(NM_)" as the logical order. |
As far as I know, the logic was that the annotation of the NM was located within the Genbank file of the NC (or NC slice, more likely). As we already had the format reference_sequence(selector), the result was NC(NM). But please do take part in the (very long, by now) discussion here: HGVSnomenclature/hgvs-nomenclature#182 |
Thank you for the pointer to the discussion. I will have another look in the early morning when my head is clearer. |
Is your feature request related to a problem? Please describe.
Yesterday, the HVNC approved the suggestion to update the nomenclature to allow c. positions for intergenic variants in the context of a genomic reference sequence. See the HVNC issue on this subject. This allows notations like
NC_000023.10(NM_004006.2):c.-128354C>T
and treats positions beyond the UTR the same as positions in introns. Currently, VV does not support this, which causes issues with variants (partially) intergenic.Describe the solution you'd like
Whereas it would be entirely up to you when to take initiative in providing mappings to transcripts when given a genomic variant, I see a few changes that should be considered:
NC_000023.10(NM_004006.2):c.-128354C>T
are given as input, VV should support mapping these to the NC (NC_000023.10:g.33357783G>A
, in this case, for hg19).Describe alternatives you've considered
N/A
Additional context
Related to #333.
The text was updated successfully, but these errors were encountered: