-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add VRS representations to the VV output #655
Comments
@John-F-Wagstaff One for you to start looking into. I recommend opening dialogue with Reece Hart, Larry Babb and Alex Wagner. We have their detais. Leaving this one to you so that you can investigate how VRS could run on VVTA rather than UTA. We may need a custom dataprovider in VRS-Python, or a modification that we may be able to request with the team there. |
Firstly there is a VRS repo, separate from the vrs-python project, namely https://github.com/ga4gh/vrs . If we want an easy VRS input/output filter vrs-python is probably the best starting project, but it is not required. Given this VRS compatible output should not need any of that, if we don't want it to. The fundamental nature of the spec is relatively simple, if clunky from a human perspective. The question is, is it more complex to build a VRS output generator for our pre-validated output, including the risk of adding our own bugs? or is it more complex to lug the extra set of (unused by us and redundant) validation from "VRS-python" around? The spec was basically written to be easy to implement in python, so worst case complexity for this is the slightly clunky version. Unless you specifically want to get into contact with other people over this there will be no need for any outside help to get this done reasonably quickly. Part of this depends on how easy it is to use just the bits of VRS python that we actually need. This should be possible, if the documentation is valid, both the SeqRepo and UTA dataproviders are supposed to only be necessary if we want to validate inputs. We don't want to validate, since this would be an output filter for pre-validated variant inputs. As such, if we can trust the docs, it looks like missing them should not cause issues for the VRS output parts. If we do have to include them then SeqRepo part should also be fine with our local version of SeqRepo, if we want to use it, we have not needed to branch the code, we just don't want to be tied to their data release rules/limits/schedule, though we might chose to do different for performance reasons etc. We might even want to use the VVTA->UTA compatibility views to run the UTA code unmodified, which should work as-is (though this wont have the best performance and unlike the SeqRepo connection this does not seem to have any use case for us). The biggest inconvenience I can see if we don't use either is generating the ids for the sequence. We would otherwise get our hash ids via our local build of SeqRepo, but we will want to generate them in advance if we don't, on the other hand this is one table in a database of our choice at worst. This is likely to be a fairly big project but probably only on the same scale as the allele work at worst, how urgently do you want this scheduled? I am going to keep on working on the performance issues particularly WRT to dels/indels that you asked me to drop my allele work to do, as a high priority due to our server performance issues, and then switch back to trying to finish off the tests for the fallback on the allele type variants, to finally get that finished, before pushing on this, unless you specifically want this bumped up the priority list. |
@John-F-Wagstaff lots of great points, ideas, and concerns. I'm going to ask two of our devs, Kori @korikuzma and Kyle @theferrit32, that have considerable experience creating and applying VRS-python to weigh in along with @ahwagner. My cursory feedback is that I think the larger concern may be around the use of the I think a small pilot task to try it out would be super helpful. (assuming you agree and have the bandwidth to do it and report back with any concerning findings). @korikuzma @theferrit32 could you please validate paragraph 3 in @John-F-Wagstaffs post above?
|
The main README.md says that these are both only required for ga4gh.vrs.extras. As such unless I have misread things completely, if we make the object directly we should be fine. Otherwise they can't by any means be called "extras". Obviously the convenience functions in vrs.extras look useful, making the objects without them might be needed if there is extra validation going on (we have had performance issues caused by re-validation of known good data) but hopefully won't. I don't intend to re-implement anything I don't have too, and they also don't look like they would use anything that might cause problems, but even if we do we should have a better fallback than re-doing the VRS creation code, if that assumption (that VRS extras is not needed to create basic VRS objects directly from known good input and ready derived IDs) is valid. |
@John-F-Wagstaff correct. You do not need SeqRepo or UTA if you are just generating VRS objects. For example, from ga4gh.vrs import models
allele_dict = {
"type": "Allele",
"location": {
"type": "SequenceLocation",
"sequenceReference": {
"type": "SequenceReference",
"refgetAccession": "SQ.Pw3Ch0x3XWD6ljsnIfmk_NERcZCI9sNM"
},
"start": 80656509,
"end": 80656510
},
"state": {
"type": "LiteralSequenceExpression",
"sequence": "TT"
}
}
allele = models.Allele(**allele_dict) You are also able to generate VRS computed identifiers without SeqRepo/UTA: from ga4gh.core import ga4gh_identify
ga4gh_identify(allele) Note: Above used |
Thank you. It is as I hoped. We may be able to benefit from more than this, and it would be a shame to not take advantage of the existing work where it can be productive, but knowing that this is possible means I can just get started with the simpler integration as/when I get time to work on this, and we can expand our out/inputs later as necessary/useful. |
This is really great news @John-F-Wagstaff. Thanks @larrybabb and @korikuzma . We willk deploy VRS ASAP and look at what else we can do with the VRS-Python too. @John-F-Wagstaff when you have something up and running paste some data here so we can get feedback from the others. Make sure we are using VRS optimally |
Awesome. Please don't hesitate to reach out if y'all have any other questions! Always happy to assist if I can. |
Is your feature request related to a problem? Please describe.
There is a good reason to represent VRS in journal article metadata.
Describe the solution you'd like
VRS representations in VV outputs. It would seem like each variant requires a single VRS representation. I would like to get input from the authors on this.
Describe alternatives you've considered
We could also add SPDI and ClinGen Allele Registry IDs, if these can be also obtained.
Additional context
Would VRS be able to use VVTA? What is the data provider? We would beed to check this and potentially add in some glue code
Here is the VRS Repo
https://github.com/ga4gh/vrs-python
The text was updated successfully, but these errors were encountered: