-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add error codes to all warnings #637
Comments
We also need to document the codes and come up with a release strategy |
This arose from a recent push that broke LOVD. Sorry @ifokkema. So I have updated the warning to include the error code However, we have lots of error messages still lacking codes. This will be a big effort that I think is vital though @John-F-Wagstaff |
Hello Pete Code wise we probably want to create a organised pattern for our warning/error/info codes and probably want to work out and document the codes before we start to work on it. How are you thinking of structuring this? The current suggestion seems to follow the pattern of camel case with an order of Either way we need a constant pattern, since a lot of the time other consumers of these codes will only care about some specific part of the code, e.g. is this an error a significant warning or cautionary information, or is this a transcript error. If it is not consistent internally then our other tools, and the other users that consume these errors won't be able to actually make consistent use of them. Any thoughts on the pattern? |
I like the camel case codes. We have a few in the test set alrady. Look in the test_warnings. Lets drag these into this issue and agree a pattern. We can work together on the coding. Yes, I think this woul be best to prevent us from going crazy. What we can do is find the errors, and come up with the error code for each error together, then apply throughout in turn. I think the low hanging fruit is our own errors. Then we get more complex trying to resolve how to handle the HGVS.py errors |
The core search term to spot our warnings in the VV Repo will be |
Thanks for this, guys! It will make our code a lot simpler and reduce the chances of breaking stuff in the LOVD/VV communication.
Agreed; this was a lot of effort for me (I use codes rather than description parsing for internal communication between functions), but it really helped get everything consistent. I personally started with my test sets, assuming that most possible errors would be in there. I resorted my tests based on the type of error and then started adding the codes.
Exactly; currently, I have a list of regexes that parse the descriptions from VV and categorize them so that I know what is what. I also rewrite some warnings and errors but still rely on consistent descriptions to recognize what we're receiving. Being able to check for some codes that I want to rewrite and knowing what the rest is based on the structure of the code will reduce the need for me to make updates whenever VV is updated. I may, in the future, even incorporate the VV error codes in the LOVD3 source code. Currently, I'm assigning my own codes based on the VV descriptions. |
We are adding code to better handle allele variant descriptions. This is done by separating the variant out into position edits, validating the parts and finally re-composing the variant(s) from the component validated position edits. This generates the possibility of new and "exiting" errors/warnings when data fails to match up, which will also need their own prefix. So in the case of a Genomic/Alt/LRG/RSG mapping fails/does not exist, for some but not all of the position edits, or a, Genomic/LRG/RSG mapping where the target of the mapping is different for some of the position edits (this seems most likely as a non-error case for LRG/RSG mappings for genomic alleles), would there be a useful difference between an error/warning suffix like AlleleReasembalyDataMissmatchWarning and AlleleReasembalyDataMissingWarning, for end user purposes, or would just the more generic AlleleReasembalyWarning be sufficient on its own for both types of problem. If you think that AlleleReasembalyLRGWarning/AlleleReasembalyGenomicWarning etc would be more useful then this is also a possibility. |
Hi John, was this question (also) for me or for Pete? In case my input is requested, I'm not sure I fully follow the question. Do you have (theoretic if needed) examples? |
Hello Ivo Yes it was mostly a question for you, Pete thinks the more generic warning for any issue putting an allele back together probably should be OK, but I was concerned that they might be too simple to actually be usefully informative that way, so I just wanted to check before putting them in. We don't want to have to change error prefixes after the fact. Both of these errors/warnings would be triggered by hgvs alleles (see Here for examples if you are unfamiliar) which need splitting up and analysing as parts, when the parts are re-composed into a singe allele again. So the question was, could you as an end user see some utility in having the separate error prefixes for the different types of problem our would just one (provisionally AlleleReasembalyWarning) be better? If we did split this up I would currently split the warnings into 2 categories: The first type of warning (provisionally AlleleReasembalyDataMissmatchWarning unless you or Pete had a better name) would be triggered if we had an allele description that was genomic type (but which could be main Genomic, Alt, LRG, or RSG) but for which there was no single transcript mapping that contained the locations of all of the variants described. An example of this would be an allele description that included changes to 2 neighbouring genes, all of which validate but not on the same transcript (which could then have differing LRG/RSG mappings too). There could be no single set of transcript mapping data for this allele, and any between genome build mappings (GRCh37<->GRCh38) will be using data from multiple transcript alignments. (I can also see some arguments for returning this when the input positions are partly intergenic but sometimes map to one transcript.) The second type of warning (which might be better as an error, provisionally AlleleReasembalyDataMissingWarning) would be triggered if we had an allele describing a set of positions, but only some of them could be mapped to the current reference. I can see three types of data that might trigger this, without actual errors
As an example of how you might use this, you might allow data to be used if it only has a AlleleReasembalyDataMissmatchWarning attached, since it is complete if not ideal in it's consistency, but exclude data with AlleleReasembalyDataMissingWarning as incomplete. Alternatively for users wanting data specific to individual genes data with AlleleReasembalyDataMissingWarning: attached might be fit for their purposes, if incomplete, but AlleleReasembalyDataMissmatchWarning would indicate data that might be affecting multiple genes and could need to be excluded. The question was whether you thought this would be useful in practice, or whether the detail was overkill? Sometimes it is hard to get perspective on this kind of thing when you are in the middle of writhing the code. With thanks |
@John-F-Wagstaff
Hmm... but that's not a problem of the actual variant... it's a technical issue when mapping/projecting that variant onto the transcript level. In fact, whether you can or can not create a description of that allele syntax on the transcript level at all, i.e., one description with multiple reference sequences, is under debate. That said, there are some typos in
That's an error if the positions don't apply to the current reference, indeed, of the same type of error as g.9999999999999999999del on an NC that's shorter than that. But only if we're talking about the current (input) reference.
However, this is again not a problem of the input variant, but of the technical process/decision making when mapping/projecting the variant onto the transcript level.
Also here, not a problem of the variant. I'd say that if your input is transcript-level, NO other transcript-level output should be generated that does not include all of the input.
This one I don't follow, sorry. What's the input here?
I think we should first discuss what variants should be handled at all... because we might not want to / be able to handle any or all of those variants. For my use case, in LOVD, we discourage the use of the allele syntax. Within LOVD, you can indicate that two variants are on the same allele. This allows you to submit them separately, so they'll collect all their annotation separately, as well. Within the same gene, it may make sense to use the allele syntax, especially for frameshift variants that are sequential and together cause only a deletion/insertion instead of a fully destructive frameshift, but since tooling (mapping etc) never used to handle it, we don't support it either. |
Hello @ifokkema Sorry about my own delay replying, I have had fairly bad covid, which resurged on me and caused complications. I went from sick enough that I felt unable to produce a good reply to needing to catch up with stuff urgently needed for our next VV release. Thanks for spotting the typos on "AlleleReassemblyDataMismatchWarning" this is actually intended to go on an "intergenic" type variant, which would not count as mapping to multiple references as it would not contain any transcript data, beyond any warnings generated by component mappings. It would be paired with an explanatory text like "AlleleReassemblyDataMismatchWarning: This allele variant is reported as intergenic due to not fully mapping inside any single transcript, however some transcript mappings are present. You are encouraged to submit the individual components of this allele variant for validation to get further detail on these mappings." Although this might count as a "technical issue when mapping/projecting that variant onto the transcript level" you can imagine that even somewhat informed users who, for example, submit an allele affecting a promoter and it's associated transcript would react badly to just getting an intergenic variant returned without explanation. Many users are not informed, and hurried uses will often not consider things logically without clear reasons given either, people do in fact use our site to check things just before submission deadlines. We could switch this to an "Info" rather than "Warning" but regardless of the error code we chose to attach we must inform users when we make this kind of decision. For AlleleReasembalyDataMissingWarning this would turn up in the same example as above (a genomic allele affecting a promoter and locations within it's associated transcript) as a warning attached to the set of variation affecting just the transcript. This would happen even if we dispose of any incomplete transcript mappings when we have complete transcript inputs, as you suggest. Again if we return any type of incomplete data but don't tell the users what is going on they will at least occasionally get confused and need help at best, or at worst use the data improperly and consider it our fault that we did not warn them. An example warning text (with the varying data set based on user input in angle brackets) would be something like "AlleleReasembalyDataMissingWarning: This mapping of the input allele is incomplete, and included in part to show which allele components map to the transcript reference , please consult the full <genomic/transcript> mapping of <intergenic/transcript variant> for more details." The key question I had was, when phrased in terms of this example of a genomic allele affecting a promoter, and locations within it's associated transcript: If you got back an output containing at least 2 allele variant descriptions, one of which is complete in allele terms, but lacks all transcript type annotations, and the other of which is incomplete (being limited to transcript mapping variants) but is more fully annotated, would you expect it to be useful to have different error codes/prefixes for the different types of output? |
Is your feature request related to a problem? Please describe.
We need consistent error codes for all errors and warnings so other tools can searh for them without error messages breaking flow.
Loop in @ifokkema
The text was updated successfully, but these errors were encountered: