gene2transcripts and gene2transcripts_v2 don't like HGNC IDs. #578

ifokkema · 2024-01-23T11:58:12Z

Describe the bug
API endpoints gene2transcripts and gene2transcripts_v2 allow for genes to be passed as "HGNC:2197". That's great for genes that have recently changed their symbols, and I'm going to use this now. However, the "HGNC:" addition is required but undocumented. If sent as "2197", calls return an HTTP 500. It actually took me some time to realize I needed to add "HGNC:" and I was preparing this bug report as an "it doesn't work" when I realized what the required format was.

To Reproduce
Steps to reproduce the behavior:

Sending a gene symbol works and lists the transcripts. https://rest.variantvalidator.org/VariantValidator/tools/gene2transcripts_v2/COL1A1/False/refseq/GRCh37?content-type=application%2Fjson
Sending "HGNC:2197" works and lists the transcripts. https://rest.variantvalidator.org/VariantValidator/tools/gene2transcripts_v2/HGNC:2197/False/refseq/GRCh37?content-type=application%2Fjson
Sending just the numeric ID doesn't work and returns an HTTP 500. https://rest.variantvalidator.org/VariantValidator/tools/gene2transcripts_v2/2197/False/refseq/GRCh37?content-type=application%2Fjson

Expected behavior

Either the numeric ID should be interpreted as an HGNC ID, or the API should have documented on the swagger interface how the HGNC ID should be passed. Preferably, by the mentioning of an example.
The gene2transcripts endpoint ("v1") can also have documented that HGNC IDs are accepted, this is currently also undocumented on the swagger interface.

Thank you!

EDIT

Also; not all HGNC IDs work. HGNC:7414 doesn't work, while its gene symbol, MT-ATP6, does work.

[
  {
    "error": "Unable to recognise gene symbol NO DATA",
    "requested_symbol": "NO DATA"
  }
]

Also; when using MT-ATP6, VV uses "MT" as the letter for chromosome "M".

[
  {
    "current_name": "mitochondrially encoded ATP synthase membrane subunit 6",
    "current_symbol": "MT-ATP6",
    "hgnc": "HGNC:7414",
    "previous_symbol": "MTATP6,RP",
    "requested_symbol": "MT-ATP6",
    "transcripts": [
      {
        "annotations": {
          "chromosome": "MT",
          "db_xref": {
            "CCDS": null,
            "ensemblgene": "ENSG00000198899",
            "hgnc": "HGNC:7414",
            "ncbigene": null,
            "select": "Ensembl"
          },
          "ensembl_select": true,
          "mane_plus_clinical": false,
          "mane_select": false,
          "map": "chrMT:8527:9207",
          "note": "mitochondrially encoded ATP synthase membrane subunit 6",
          "refseq_select": false,
          "variant": "ATP6"
        },
        "coding_end": 681,
        "coding_start": 1,
        "description": "MT-ATP6-201",
        "genomic_spans": {},
        "length": 681,
        "reference": "ENST00000361899.2",
        "translation": "ENSP00000354632.2"
      }
    ]
  }
]

The text was updated successfully, but these errors were encountered:

Peter-J-Freeman · 2024-02-16T12:16:58Z

This will be a documentation change @ifokkema. I will not just accept the numeric value as it may get confused witht the numeric value of NIH gene IDs.

Peter-J-Freeman · 2024-02-16T12:39:39Z

Also; when using MT-ATP6, VV uses "MT" as the letter for chromosome "M".

This is annotation provided by Ensembl directly. Not from us. Ensembl need to be more responsible for their standards. We correct as much as we can :)

Peter-J-Freeman · 2024-02-16T14:44:38Z

OK, all these are fixed. Goint to close, but need to update the server still @ifokkema , so please nudge me next weeek

ifokkema · 2024-02-21T09:21:47Z

This will be a documentation change @ifokkema. I will not just accept the numeric value as it may get confused witht the numeric value of NIH gene IDs.

Makes sense! And a doc fix is just fine!

This is annotation provided by Ensembl directly. Not from us. Ensembl need to be more responsible for their standards. We correct as much as we can :)

Weeiiiird! OK, thanks!

OK, all these are fixed. Goint to close, but need to update the server still @ifokkema , so please nudge me next weeek

Excellent, thanks a lot! There's no rush, but when you do update the server, please let me know and I'll have another look!

Peter-J-Freeman · 2024-02-21T09:27:52Z

I want to say weird, but Ensembl do this sort of thing

There was a bug though. The gene symbol was coming out as MT not MT-ATP6 which is now fixed. Also, some slight changes that will happen now that I fixed the code once we update the databases in the next few days.

Hope to release the new software version next week

ifokkema · 2024-07-15T13:25:44Z

Hi Pete!
Just to be sure:

Sending a numeric HGNC ID still returns an HTTP 500 - is this intended, or should it show a warning/error now?
Using HGNC:7414 for mitochondrial genes doesn't work yet; the gene symbol shows up as "MT", as you mentioned in February.

Peter-J-Freeman · 2024-07-15T13:48:45Z

r.e. HGNC:7414, looks like there is another issue that is causing MT to migrate into the db instead of MT- something. I will look at this. See if I can patch rather than do a new release.

The numeric HGNC entry should return an error. But do we want it to. The main reason we may want to add HGNC is that we may in the furue WANT TO use other numeric gene searches???

ifokkema · 2024-07-15T14:01:43Z

r.e. HGNC:7414, looks like there is another issue that is causing MT to migrate into the db instead of MT- something. I will look at this. See if I can patch rather than do a new release.

Cool, thanks!

The numeric HGNC entry should return an error. But do we want it to. The main reason we may want to add HGNC is that we may in the furue WANT TO use other numeric gene searches???

Personally, in LOVD, I consider all numeric references to genes as HGNC IDs. My logic is simply that the HGNC hands out the gene symbols and they name the genes. They're the representative source, so I use their IDs. I actually don't know why they prefix their numeric IDs with "HGNC:" as I've never seen other resources prefix their numeric IDs. I do see the benefit, of course, as it identifies the ID. However, I also see a downside, as it causes inconsistent use of the prefix and, therefore, ambiguity in the ID. Either way, I show NCBI gene IDs, but I don't use them as keys or so. So they don't clash in LOVD. NCBI gene IDs are only used for linking to the NCBI. If you want to keep the possibility open to use multiple numeric identifiers, by all means, don't accept the numeric input. However, I would recommend returning an error rather than an HTTP 500.

Peter-J-Freeman · 2024-07-15T14:35:31Z

The only other I am aware of if GenBank gene IDs. But we do not currently use these. So I see no issue with dropping the HGNC from the input really

Peter-J-Freeman · 2024-07-15T15:39:24Z

OK, the code is fixed, but I think it will need another database build for HGNC:7414. This is not a quick process. I need to liase with @John-F-Wagstaff.

John-F-Wagstaff · 2024-07-15T16:17:11Z

We may still want to allow users to include 'HGNC:', even if we do allow just the plain number, as others including the NCBI do (in their genbank records for example you get '/db_xref="HGNC:HGNC:25180"'). Also the number of users that write things down without context, unless prompted, is large enough that I would prefer to keep the 'HGNC:' prefix on the output too.

The only transcript we include currently for this in the underlying VVTA is ENST00000361899.2. The RefSeq record that can be found in the HGNC record for this is a 'YP_' with a DBSOURCE of " REFSEQ: accession NC_012920.1", it is currently a "PROVISIONAL REFSEQ" and has no associated transcript. We don't include any protein sequences without transcripts so this is missed out.

I am intending to build a new version of the VVTA soon, @Peter-J-Freeman should I bump this up the priority queue?

Peter-J-Freeman · 2024-07-15T16:19:20Z

I think new versions of all db's is needed. I found some more errors in validator. New line characters in some fields. Explains why the updates weren't successful! 🙄

John-F-Wagstaff · 2024-07-15T16:35:29Z

The RefSeq alignments have moved to https://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/historical/GRCh38/current/ but have not updated since 2023-09-18 . Hence me not moving too fast on this, new RefSeq transcript data is all well and good but without alignments to go with it not much use for the alignment database, I can't load identifiers that don't have alignments.

I can get you updated RefSeqGene data, HGNC gene data for 2024/06 and, Ensembl Release 112(which was done in may) though. I will try to get it done before Friday.

Have you seen any issues in the VVTA data other than the Ensembl stuff that we already patched?

Peter-J-Freeman · 2024-07-16T08:35:19Z

Nope, no additional errors I am aware of. Just the patch to make sure of. The RefSeqGene FE data will be handled via the validator database. We just need to make sure the sequences are in SeqRepo which I believe they are now. Perhaps we need to contact RefSeq and find out why they stopped producing alignments. We need this data and it is vital. Surely they need it too

Peter-J-Freeman · 2024-07-16T08:42:11Z

Also @John-F-Wagstaff once done, please can I have the file of all transcript IDs in VVTA. Thanks

leicray · 2024-07-16T08:55:48Z

The link to the RefSeq alignments just does not look right. The URL path includes the directory historical and that simply feels wrong. Why would the current alignments by classified as historical?

I wonder if, instead, the current data are to be found in Homo_sapiens.gene_info.gz

John-F-Wagstaff · 2024-07-16T09:17:11Z

@leicray They archived the current alignments and put a link in the ftp to this location, after starting to produce this newer file set in parallel for a while. It is called historical because it includes all historic transcript variants back to a certain date cut off, as well as the current data. Yes the naming is bad. I have checked elsewhere and the RefSeq annotation pipeline last ran to completion on a human genome at that date too, so there should not be newer alignment data either way. Unfortunately the gene_info files only includes map locations per whole gene, in the form of 19q13.43 which does not work for us.

@Peter-J-Freeman I will get the transcript ID's to you as soon as the database is finished.

ifokkema · 2024-07-16T10:08:32Z

Thanks, guys!

We may still want to allow users to include 'HGNC:', even if we do allow just the plain number, as others including the NCBI do (in their genbank records for example you get '/db_xref="HGNC:HGNC:25180"'). Also the number of users that write things down without context, unless prompted, is large enough that I would prefer to keep the 'HGNC:' prefix on the output too.

Oh, yes, never remove allowed input or change the formatting of a variable in the output in a "live" API that doesn't have versioning! I'm personally OK with adding additional fields to a JSON API, as I assume that existing implementations won't crash if additional data is returned. Other implementers are more strict and even increase the version number when adding fields. In any case, allowing more diverse input doesn't change existing implementations ever, so IMO never requires an increment of the version number.

Peter-J-Freeman · 2024-07-16T11:19:16Z

The updated code will accept "HGNC:1234" or "1234" and return the same result.

Just not pushing yet because having a few database difficulties :)

Peter-J-Freeman · 2024-07-16T11:20:27Z

We will be updating the version numbers for all tools because recent changes to the VV engine required breaking changes, and I like to keep all major versions of all tools the same. May not be engineering correct, but prevents my brain fron hurting

Peter-J-Freeman · 2024-07-17T08:24:38Z

Sending a numeric HGNC ID still returns an HTTP 500 - is this intended, or should it show a warning/error now?

Now working and active on the server @ifokkema

on my system, this

import json
import VariantValidator
vval = VariantValidator.Validator()
gene = '7414'
select_transcripts = None
g_and_t = vval.gene2transcripts(gene, validator=vval, select_transcripts=select_transcripts, transcript_set="ensembl")
print(json.dumps(g_and_t, sort_keys=True, indent=4, separators=(',', ': ')))

will now return

{
    "current_name": "mitochondrially encoded ATP synthase membrane subunit 6",
    "current_symbol": "MT-ATP6",
    "hgnc": "HGNC:7414",
    "previous_symbol": "MTATP6,RP",
    "requested_symbol": "MT-ATP6",
    "transcripts": [
        {
            "annotations": {
                "chromosome": "MT",
                "db_xref": {
                    "CCDS": null,
                    "ensemblgene": "ENSG00000198899",
                    "hgnc": "HGNC:7414",
                    "ncbigene": null,
                    "select": "Ensembl"
                },
                "ensembl_select": true,
                "mane_plus_clinical": false,
                "mane_select": false,
                "map": "mitochondria",
                "note": "mitochondrially encoded ATP synthase membrane subunit 6",
                "refseq_select": false,
                "variant": "201"
            },
            "coding_end": 681,
            "coding_start": 1,
            "description": "ATP6-201",
            "genomic_spans": {},
            "length": 681,
            "reference": "ENST00000361899.2",
            "translation": "ENSP00000354632.2"
        }
    ]
}

We will roll out new database builds ASAP to make this work on the server. This is to show what a patch would look like, but we want to make a full db release

Peter-J-Freeman · 2024-07-17T08:26:43Z

Hmm, seems I need to fix the alignments. They are missing!!! Will look into this since it works for other genes e.g. COL1A1

Peter-J-Freeman · 2024-07-17T08:41:37Z

Now also fixed, but again, will not work until the dbs are recreated. Will take a few weeks

ifokkema · 2024-07-24T15:07:13Z

We will be updating the version numbers for all tools because recent changes to the VV engine required breaking changes, and I like to keep all major versions of all tools the same. May not be engineering correct, but prevents my brain fron hurting

I meant the API version, e.g., /api/v1/method?arguments vs /api/v2/method?arguments. The versions in the meta data of the output are a different thing. I meant that as long as the endpoint isn't versioned, stuff from the output shouldn't be removed, input requirements shouldn't be changed, but additions to the output are generally OK.

Now also fixed, but again, will not work until the dbs are recreated. Will take a few weeks

Excellent, thanks!

ifokkema · 2024-11-14T12:39:28Z

OK, the code is fixed, but I think it will need another database build for HGNC:7414. This is not a quick process. I need to liase with @John-F-Wagstaff.

Hi Pete, I'm going through old emails; this doesn't work yet (sending HGNC:7414 to the gene2transcripts_v2 when the gene is a mitochondrial gene). Is the mentioned database build delayed, or didn't it fix the problem? Thanks!

Peter-J-Freeman · 2024-11-14T12:51:13Z

Not sure why this keeps popping back up. Will look asap

Peter-J-Freeman · 2024-11-15T10:08:10Z

Lookin again at this

On my setup, local, I see

>>> import json
>>> import VariantValidator
>>> vval = VariantValidator.Validator()
>>> gene = '["HGNC:7414", "MT-APT6"]'
>>> g_and_t = vval.gene2transcripts(gene)
>>> print(json.dumps(g_and_t, sort_keys=True, indent=4, separators=(',', ': ')))
[
    {
        "current_name": "mitochondrially encoded ATP synthase membrane subunit 6",
        "current_symbol": "MT-ATP6",
        "hgnc": "HGNC:7414",
        "previous_symbol": "MTATP6,RP",
        "requested_symbol": "MT-ATP6",
        "transcripts": []
    },
    {
        "error": "Unable to recognise gene symbol MT-APT6",
        "requested_symbol": "MT-APT6"
    }
]
>>>

Which looks like the HGNC ID is working but the Symbol is not.

Peter-J-Freeman · 2024-11-15T10:11:29Z

And now without the typo

>>> import json
>>> import VariantValidator
>>> vval = VariantValidator.Validator()
>>> gene = '["HGNC:7414", "MT-ATP6"]'
>>> g_and_t = vval.gene2transcripts(gene)
>>> print(json.dumps(g_and_t, sort_keys=True, indent=4, separators=(',', ': ')))
[
    {
        "current_name": "mitochondrially encoded ATP synthase membrane subunit 6",
        "current_symbol": "MT-ATP6",
        "hgnc": "HGNC:7414",
        "previous_symbol": "MTATP6,RP",
        "requested_symbol": "MT-ATP6",
        "transcripts": []
    },
    {
        "current_name": "mitochondrially encoded ATP synthase membrane subunit 6",
        "current_symbol": "MT-ATP6",
        "hgnc": "HGNC:7414",
        "previous_symbol": "MTATP6,RP",
        "requested_symbol": "MT-ATP6",
        "transcripts": []
    }
]
>>>

So all is working. Now to test the server since the db is good and the code is good

Peter-J-Freeman · 2024-11-15T10:14:54Z

The server setup is showing

import json
>>> import VariantValidator
>>> vval = VariantValidator.Validator()
>>> gene = '["HGNC:7414", "MT-ATP6"]'
>>> g_and_t = vval.gene2transcripts(gene)
>>> print(json.dumps(g_and_t, sort_keys=True, indent=4, separators=(',', ': ')))
[
    {
        "current_name": "mitochondrially encoded ATP synthase membrane subunit 6",
        "current_symbol": "MT-ATP6",
        "hgnc": "HGNC:7414",
        "previous_symbol": "MTATP6,RP",
        "requested_symbol": "MT-ATP6",
        "transcripts": []
    },
    {
        "current_name": "mitochondrially encoded ATP synthase membrane subunit 6",
        "current_symbol": "MT-ATP6",
        "hgnc": "HGNC:7414",
        "previous_symbol": "MTATP6,RP",
        "requested_symbol": "MT-ATP6",
        "transcripts": []
    }
]
>>>

So is working. So, now to look at whether the REST interface is the issue

Peter-J-Freeman · 2024-11-15T10:21:08Z

local rest interface
http://127.0.0.1:8000/VariantValidator/tools/gene2transcripts_v2/HGNC%3A7414%7CMT-ATP6/False/all/GRCh38?content-type=application%2Fjson

[
  {
    "current_name": "mitochondrially encoded ATP synthase membrane subunit 6",
    "current_symbol": "MT-ATP6",
    "hgnc": "HGNC:7414",
    "previous_symbol": "MTATP6,RP",
    "requested_symbol": "MT-ATP6",
    "transcripts": [
      {
        "annotations": {
          "chromosome": "MT",
          "db_xref": {
            "CCDS": null,
            "ensemblgene": "ENSG00000198899",
            "hgnc": "HGNC:7414",
            "ncbigene": null,
            "select": "Ensembl"
          },
          "ensembl_select": true,
          "mane_plus_clinical": false,
          "mane_select": false,
          "map": "mitochondria",
          "note": "mitochondrially encoded ATP synthase membrane subunit 6",
          "refseq_select": false,
          "variant": "201"
        },
        "coding_end": 681,
        "coding_start": 1,
        "description": "ATP6-201",
        "genomic_spans": {
          "NC_012920.1": {
            "end_position": 9207,
            "exon_structure": [
              {
                "cigar": "681=",
                "exon_number": 1,
                "genomic_end": 9207,
                "genomic_start": 8527,
                "transcript_end": 681,
                "transcript_start": 1
              }
            ],
            "orientation": 1,
            "start_position": 8527,
            "total_exons": 1
          }
        },
        "length": 681,
        "reference": "ENST00000361899.2",
        "translation": "ENSP00000354632.2"
      }
    ]
  },
  {
    "current_name": "mitochondrially encoded ATP synthase membrane subunit 6",
    "current_symbol": "MT-ATP6",
    "hgnc": "HGNC:7414",
    "previous_symbol": "MTATP6,RP",
    "requested_symbol": "MT-ATP6",
    "transcripts": [
      {
        "annotations": {
          "chromosome": "MT",
          "db_xref": {
            "CCDS": null,
            "ensemblgene": "ENSG00000198899",
            "hgnc": "HGNC:7414",
            "ncbigene": null,
            "select": "Ensembl"
          },
          "ensembl_select": true,
          "mane_plus_clinical": false,
          "mane_select": false,
          "map": "mitochondria",
          "note": "mitochondrially encoded ATP synthase membrane subunit 6",
          "refseq_select": false,
          "variant": "201"
        },
        "coding_end": 681,
        "coding_start": 1,
        "description": "ATP6-201",
        "genomic_spans": {
          "NC_012920.1": {
            "end_position": 9207,
            "exon_structure": [
              {
                "cigar": "681=",
                "exon_number": 1,
                "genomic_end": 9207,
                "genomic_start": 8527,
                "transcript_end": 681,
                "transcript_start": 1
              }
            ],
            "orientation": 1,
            "start_position": 8527,
            "total_exons": 1
          }
        },
        "length": 681,
        "reference": "ENST00000361899.2",
        "translation": "ENSP00000354632.2"
      }
    ]
  }
]

Which makes me happy because we can now generate c. and p. for mito genes thanks to Ensembl

Peter-J-Freeman · 2024-11-15T10:24:06Z

http://127.0.0.1:8000/VariantValidator/tools/gene2transcripts_v2/HGNC%3A7414%7CMT-ATP6/False/all/GRCh38?content-type=application%2Fjson

gives an error

[
{
"error": "Unable to recognise gene symbol MT",
"requested_symbol": "MT"
},
{
"current_name": "mitochondrially encoded ATP synthase membrane subunit 6",
"current_symbol": "MT-ATP6",
"hgnc": "HGNC:7414",
"previous_symbol": "MTATP6,RP",
"requested_symbol": "MT-ATP6",
"transcripts": [
{
"annotations": {
"chromosome": "MT",
"db_xref": {
"CCDS": null,
"ensemblgene": "ENSG00000198899",
"hgnc": "HGNC:7414",
"ncbigene": null,
"select": "Ensembl"
},
"ensembl_select": true,
"mane_plus_clinical": false,
"mane_select": false,
"map": "chrMT:8527:9207",
"note": "mitochondrially encoded ATP synthase membrane subunit 6",
"refseq_select": false,
"variant": "ATP6"
},
"coding_end": 681,
"coding_start": 1,
"description": "MT-ATP6-201",
"genomic_spans": {
"NC_012920.1": {
"end_position": 9207,
"exon_structure": [
{
"cigar": "681=",
"exon_number": 1,
"genomic_end": 9207,
"genomic_start": 8527,
"transcript_end": 681,
"transcript_start": 1
}
],
"orientation": 1,
"start_position": 8527,
"total_exons": 1
}
},
"length": 681,
"reference": "ENST00000361899.2",
"translation": "ENSP00000354632.2"
}
]
}
]

I will look at the rest interface. May need an update

Peter-J-Freeman · 2024-11-15T10:40:44Z

OK, I updated the server with the latest local version.

local versions are

[VariantValidator](https://github.com/openvar/rest_variantValidator) version 2.2.1.dev685+g607f552
[VariantFormatter](https://github.com/openvar/variantFormatter) version 2.2.1.dev66+g99f5b9a
[vv_hgvs](https://github.com/openvar/vv_hgvs) version 2.2.0
[VVTA](https://www528.lamp.le.ac.uk/) release vvta_2024_09
[vvSeqRepo](https://www528.lamp.le.ac.uk/) release VV_SR_2024_09

Live versions are

[VariantValidator](https://github.com/openvar/rest_variantValidator) version 2.2.1.dev734+ga70a50c
[VariantFormatter](https://github.com/openvar/variantFormatter) version 2.2.1.dev73+g6cb7954
[vv_hgvs](https://github.com/openvar/vv_hgvs) version 2.2.0
[VVTA](https://www528.lamp.le.ac.uk/) release vvta_2024_01
[vvSeqRepo](https://www528.lamp.le.ac.uk/) release VV_SR_2024_04

which were out due to being on different branches. Gonna test a local from master install

the versions are now in line withe the live

{'variantvalidator_version': '2.2.1.dev734+ga70a50c', 'variantvalidator_hgvs_version': '2.2.0', 'vvta_version': 'vvta_2024_09', 'vvseqrepo_db': '/Users/user/variantvalidator_data/seqdata/VV_SR_2024_09/master', 'vvdb_version': 'vvdb_2024_8'}

note: The uopdated VVTA and SR do not affect this, we already know from the above the validartor db has the correct info

http://127.0.0.1:8000/VariantValidator/tools/gene2transcripts_v2/HGNC%3A7414%7CMT-ATP6/False/all/GRCh38?content-type=application%2Fjson

[
{
"current_name": "mitochondrially encoded ATP synthase membrane subunit 6",
"current_symbol": "MT-ATP6",
"hgnc": "HGNC:7414",
"previous_symbol": "MTATP6,RP",
"requested_symbol": "MT-ATP6",
"transcripts": [
{
"annotations": {
"chromosome": "MT",
"db_xref": {
"CCDS": null,
"ensemblgene": "ENSG00000198899",
"hgnc": "HGNC:7414",
"ncbigene": null,
"select": "Ensembl"
},
"ensembl_select": true,
"mane_plus_clinical": false,
"mane_select": false,
"map": "mitochondria",
"note": "mitochondrially encoded ATP synthase membrane subunit 6",
"refseq_select": false,
"variant": "201"
},
"coding_end": 681,
"coding_start": 1,
"description": "ATP6-201",
"genomic_spans": {
"NC_012920.1": {
"end_position": 9207,
"exon_structure": [
{
"cigar": "681=",
"exon_number": 1,
"genomic_end": 9207,
"genomic_start": 8527,
"transcript_end": 681,
"transcript_start": 1
}
],
"orientation": 1,
"start_position": 8527,
"total_exons": 1
}
},
"length": 681,
"reference": "ENST00000361899.2",
"translation": "ENSP00000354632.2"
}
]
},
{
"current_name": "mitochondrially encoded ATP synthase membrane subunit 6",
"current_symbol": "MT-ATP6",
"hgnc": "HGNC:7414",
"previous_symbol": "MTATP6,RP",
"requested_symbol": "MT-ATP6",
"transcripts": [
{
"annotations": {
"chromosome": "MT",
"db_xref": {
"CCDS": null,
"ensemblgene": "ENSG00000198899",
"hgnc": "HGNC:7414",
"ncbigene": null,
"select": "Ensembl"
},
"ensembl_select": true,
"mane_plus_clinical": false,
"mane_select": false,
"map": "mitochondria",
"note": "mitochondrially encoded ATP synthase membrane subunit 6",
"refseq_select": false,
"variant": "201"
},
"coding_end": 681,
"coding_start": 1,
"description": "ATP6-201",
"genomic_spans": {
"NC_012920.1": {
"end_position": 9207,
"exon_structure": [
{
"cigar": "681=",
"exon_number": 1,
"genomic_end": 9207,
"genomic_start": 8527,
"transcript_end": 681,
"transcript_start": 1
}
],
"orientation": 1,
"start_position": 8527,
"total_exons": 1
}
},
"length": 681,
"reference": "ENST00000361899.2",
"translation": "ENSP00000354632.2"
}
]
}
]

So it does not seem to be software. @John-F-Wagstaff , I can only think that there is something odd with the mounting to APACHE

Peter-J-Freeman · 2024-11-15T10:43:13Z

@John-F-Wagstaff @ifokkema , It looks to me from the error

[
{
"error": "Unable to recognise gene symbol MT",
"requested_symbol": "MT"
},

That decoding in mod_wsgi apache is deleting the "-" character. We have seen this before when trying to pass HGVS intrinic descriptions. I think it is somewhere in the VVweb code. The "-" character when passed can become a space.

ifokkema · 2024-11-15T10:58:18Z

Thank you for the research! How awesome is it, by the way, being able to handle MT variants 😍 !

That decoding in mod_wsgi apache is deleting the "-" character. We have seen this before when trying to pass HGVS intrinic descriptions. I think it is somewhere in the VVweb code. The "-" character when passed can become a space.

Very interesting! But wasn't the issue with intronic variants the "+" character, maybe? That needs to be URL encoded to "%2B" to not be interpreted as a space, indeed. However, the hyphen doesn't have a URL-encoded equivalent. There is no encoding, as far as I know, that translates a hyphen in a space. Google doesn't help me much here. The only thing that I am thinking of is hyphens can be used as argument separators, but then they still need whitespace... I don't know enough about mod_wsgi to know what's going on here... 🤔

ifokkema · 2024-11-15T11:00:40Z

It doesn't seem to be the hyphen. I realized there are other gene symbols with hyphens, like A1BG-AS1.

Both
https://rest.variantvalidator.org/VariantValidator/tools/gene2transcripts_v2/HGNC%3A37133/mane/all/GRCh38?content-type=application%2Fjson
and
https://rest.variantvalidator.org/VariantValidator/tools/gene2transcripts_v2/A1BG-AS1/mane/all/GRCh38?content-type=application%2Fjson
output:

[
  {
    "current_name": "A1BG antisense RNA 1",
    "current_symbol": "A1BG-AS1",
    "hgnc": "HGNC:37133",
    "previous_symbol": "NCRNA00181,A1BGAS,A1BG-AS",
    "requested_symbol": "A1BG-AS1",
    "transcripts": []
  }
]

So it's not the hyphen. Right?

Peter-J-Freeman · 2024-11-15T11:04:02Z

It doesn't seem to be the hyphen. I realized there are other gene symbols with hyphens, like A1BG-AS1.

Thanks @ifokkema. This is useful. Althought really confusing. Why is it happening with this symbol? I'll keep digging.

p.s. Can LOVD use ensembl transcripts for MT?

Peter-J-Freeman · 2024-11-15T11:07:49Z

Hold on

https://rest.variantvalidator.org/VariantValidator/tools/gene2transcripts_v2/MT-ATP6/False/all/GRCh38?content-type=application%2Fjson

[
{
"current_name": "mitochondrially encoded ATP synthase membrane subunit 6",
"current_symbol": "MT-ATP6",
"hgnc": "HGNC:7414",
"previous_symbol": "MTATP6,RP",
"requested_symbol": "MT-ATP6",
"transcripts": [
{
"annotations": {
"chromosome": "MT",
"db_xref": {
"CCDS": null,
"ensemblgene": "ENSG00000198899",
"hgnc": "HGNC:7414",
"ncbigene": null,
"select": "Ensembl"
},
"ensembl_select": true,
"mane_plus_clinical": false,
"mane_select": false,
"map": "chrMT:8527:9207",
"note": "mitochondrially encoded ATP synthase membrane subunit 6",
"refseq_select": false,
"variant": "ATP6"
},
"coding_end": 681,
"coding_start": 1,
"description": "MT-ATP6-201",
"genomic_spans": {
"NC_012920.1": {
"end_position": 9207,
"exon_structure": [
{
"cigar": "681=",
"exon_number": 1,
"genomic_end": 9207,
"genomic_start": 8527,
"transcript_end": 681,
"transcript_start": 1
}
],
"orientation": 1,
"start_position": 8527,
"total_exons": 1
}
},
"length": 681,
"reference": "ENST00000361899.2",
"translation": "ENSP00000354632.2"
}
]
}
]

It just worked. Submitted as a single entry

Peter-J-Freeman · 2024-11-15T11:08:20Z

@ifokkema , please test

ifokkema · 2024-11-15T11:09:27Z

It doesn't seem to be the hyphen. I realized there are other gene symbols with hyphens, like A1BG-AS1.

Thanks @ifokkema. This is useful. Althought really confusing. Why is it happening with this symbol? I'll keep digging.

I guess all MT symbols, but I could double-check that, if you'd like.

p.s. Can LOVD use ensembl transcripts for MT?

We have had an "Ensembl ID" field for our transcripts since forever, but we never really used it. For MT genes, we used to have a "fake" NCBI ID that triggered Mutalyzer to use the annotation given in the MT GenBank file as a transcript. Now, I guess we'll solve it using the Ensembl IDs that VV gives us!

ifokkema · 2024-11-15T11:11:43Z

It just worked. Submitted as a single entry

Interesting?

https://rest.variantvalidator.org/VariantValidator/tools/gene2transcripts_v2/HGNC%3A7415/mane/all/GRCh38?content-type=application%2Fjson
still gives:

[
  {
    "error": "Unable to recognise gene symbol MT",
    "requested_symbol": "MT"
  }
]

Using the gene symbol,
https://rest.variantvalidator.org/VariantValidator/tools/gene2transcripts_v2/MT-ATP6/mane/all/GRCh38?content-type=application%2Fjson
now works:

[
  {
    "current_name": "mitochondrially encoded ATP synthase membrane subunit 6",
    "current_symbol": "MT-ATP6",
    "hgnc": "HGNC:7414",
    "previous_symbol": "MTATP6,RP",
    "requested_symbol": "MT-ATP6",
    "transcripts": []
  }
]

Uhhhh odd! So there's something up in the conversion between HGNC ID to gene symbol, but only for the MT genes?

Peter-J-Freeman · 2024-11-15T11:15:22Z

Oh yes, so it is decoding of the HGNC ID. I concur

https://rest.variantvalidator.org/VariantValidator/tools/gene2transcripts_v2/HGNC%3A7414/False/all/GRCh38?content-type=application%2Fjson works fine on my laptop running on local but not on the live server

Peter-J-Freeman · 2024-11-15T11:20:02Z

I am now wondering if that version of the database contains a duplicate entry. My local version does not.

I am making a new build anyway. Will complete over the weekend. We will then NUKE the old database and install the new and test again before digging further

ifokkema · 2024-11-15T11:34:20Z

Alright! Sounds good! I'm anyway working on other stuff right now. I decided to torture myself and rebuild our HGVS tool from the ground up before doing the data analysis and writing that paper... let's hope that was a good idea 😆

Peter-J-Freeman · 2024-11-15T12:06:47Z

Sounds fun :P

ifokkema · 2024-11-15T12:28:31Z

Attempting to convert over 3000 lines of unreadable and unmanageable code into something readable and manageable in a completely different structure, what's not to like? 😂

Peter-J-Freeman closed this as completed Feb 16, 2024

Peter-J-Freeman reopened this Nov 15, 2024

gene2transcripts and gene2transcripts_v2 don't like HGNC IDs. #578

gene2transcripts and gene2transcripts_v2 don't like HGNC IDs. #578

Comments

ifokkema commented Jan 23, 2024 • edited Loading

EDIT

Peter-J-Freeman commented Feb 16, 2024

Peter-J-Freeman commented Feb 16, 2024 • edited Loading

Peter-J-Freeman commented Feb 16, 2024

ifokkema commented Feb 21, 2024

Peter-J-Freeman commented Feb 21, 2024

ifokkema commented Jul 15, 2024

Peter-J-Freeman commented Jul 15, 2024

ifokkema commented Jul 15, 2024

Peter-J-Freeman commented Jul 15, 2024

Peter-J-Freeman commented Jul 15, 2024

John-F-Wagstaff commented Jul 15, 2024

Peter-J-Freeman commented Jul 15, 2024

John-F-Wagstaff commented Jul 15, 2024

Peter-J-Freeman commented Jul 16, 2024

Peter-J-Freeman commented Jul 16, 2024

leicray commented Jul 16, 2024

John-F-Wagstaff commented Jul 16, 2024

ifokkema commented Jul 16, 2024

Peter-J-Freeman commented Jul 16, 2024

Peter-J-Freeman commented Jul 16, 2024

Peter-J-Freeman commented Jul 17, 2024

Peter-J-Freeman commented Jul 17, 2024

Peter-J-Freeman commented Jul 17, 2024

ifokkema commented Jul 24, 2024

ifokkema commented Nov 14, 2024

Peter-J-Freeman commented Nov 14, 2024

Peter-J-Freeman commented Nov 15, 2024

Peter-J-Freeman commented Nov 15, 2024 • edited Loading

Peter-J-Freeman commented Nov 15, 2024

Peter-J-Freeman commented Nov 15, 2024 • edited Loading

Peter-J-Freeman commented Nov 15, 2024

Peter-J-Freeman commented Nov 15, 2024

Peter-J-Freeman commented Nov 15, 2024

ifokkema commented Nov 15, 2024

ifokkema commented Nov 15, 2024 • edited Loading

Peter-J-Freeman commented Nov 15, 2024

Peter-J-Freeman commented Nov 15, 2024

Peter-J-Freeman commented Nov 15, 2024

ifokkema commented Nov 15, 2024

ifokkema commented Nov 15, 2024

Peter-J-Freeman commented Nov 15, 2024

Peter-J-Freeman commented Nov 15, 2024

ifokkema commented Nov 15, 2024

Peter-J-Freeman commented Nov 15, 2024

ifokkema commented Nov 15, 2024

ifokkema commented Jan 23, 2024 •

edited

Loading

Peter-J-Freeman commented Feb 16, 2024 •

edited

Loading

Peter-J-Freeman commented Nov 15, 2024 •

edited

Loading

Peter-J-Freeman commented Nov 15, 2024 •

edited

Loading

ifokkema commented Nov 15, 2024 •

edited

Loading