-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partial DOIs #101
base: master
Are you sure you want to change the base?
Partial DOIs #101
Conversation
to make internal PLOS server processes easier for checking articles likely to have updated XML, adds `from_partial_doi` class method & `find_valid_partial_dois` regex.
The same as it can initialize an Article object from DOI, it can do that from a partial DOI as well.
return 'pone.0040259' | ||
|
||
|
||
def test_partial_doi_regex(test_partial_doi): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you test the regexes directly as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this test is confusingly named… its not testing the doi regexes it's testing the validate_partial_doi function.
@@ -22,6 +22,10 @@ | |||
full_doi_regex_match = re.compile(regex_match_prefix+regex_body_match) | |||
full_doi_regex_search = re.compile(r"10\.1371/journal\.p[a-zA-Z]{3}\.[\d]{7}" | |||
"|10\.1371/annotation/[a-zA-Z0-9]{8}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{12}") | |||
partial_doi_regex_search = re.compile(r"p[a-zA-Z]{3}\.[\d]{7}" | |||
"|annotation/[a-zA-Z0-9]{8}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{12}") | |||
partial_doi_regex_match = re.compile(r"^p[a-zA-Z]{3}\.[\d]{7}$" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this is just the same string with ^
+your_str+$
, am I correct? and the last part of it is almost the same as the last part of the full_doi_regex_search?
Why not make this based on a single string that you then combine as needed to make the search and match?
I'm not sure you need to define search vs. match separately anyway (in this case at least). I ask below… but it'd be a lot easier to know whether that is the case if you tested the regexes directly.
Example: 'pbio.2000777' is True, but '10.1371/journal.pbio.2000777' is False | ||
:return: True if string is in valid PLOS partial DOI format; False if not | ||
""" | ||
return bool(partial_doi_regex_match.search(partial_doi)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is the match regex used to search?
@property | ||
def partial_doi(self): | ||
"""Convert a DOI to a partial DOI.""" | ||
return self.doi.lstrip('10.1371/').replace('journal.', '') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why aren't you using doi_to_partial
?
if isinstance(key, int): | ||
return Article(self.dois[key], directory=self.directory) | ||
elif isinstance(key, slice): | ||
return (Article(doi, directory=self.directory) | ||
for doi in self.dois[key]) | ||
elif key not in self.dois: | ||
if partial_to_doi(key) in self.dois: | ||
return Article.from_partial_doi(key, directory=self.directory) | ||
elif validate_partial_doi(key): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't it be safer to validate first?
@@ -1336,3 +1342,15 @@ def from_filename(cls, filename): | |||
else: | |||
directory = None | |||
return cls(filename_to_doi(filename), directory=directory) | |||
|
|||
@classmethod | |||
def from_partial_doi(cls, partial_doi, directory=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't you validate the partial doi first?
|
||
|
||
@pytest.fixture | ||
def corpus(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should consider the corpus a module/session level scoped fixture since this is basically the same as in some of the other tests. You don't need to do it for this PR but it's worth considering
|
||
|
||
@pytest.fixture | ||
def test_article(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally fixtures don't have test prefixed to them (see corpus above)
|
||
@pytest.fixture | ||
def test_article(): | ||
return Article('10.1371/journal.pone.0040259', directory=starterdir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use the corpus & doi fixtures directly here, since fixtures can be used in other fixtures:
@pytest.fixture(scope='module')
def article(corpus, doi):
return corpus[doi]
|
||
|
||
@pytest.fixture | ||
def test_doi(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as with test_article
shouldn't be prefixed with test, better is to just use def doi():
|
||
|
||
@pytest.fixture | ||
def test_partial_doi(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be called partial_doi
|
||
def test_partial_doi_regex(test_partial_doi): | ||
assert validate_partial_doi(test_partial_doi) | ||
assert not validate_partial_doi(' pone.0040259') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're going to have a test like this it's better to make explicit what the manipulation is " " + partial_doi
would do that.
def test_partial_doi_regex(test_partial_doi): | ||
assert validate_partial_doi(test_partial_doi) | ||
assert not validate_partial_doi(' pone.0040259') | ||
assert not validate_partial_doi('pone.0040259 ') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above… reuse partial_doi
… partial_doi + " "
assert doi == test_doi | ||
|
||
|
||
def test_partial_doi_method_article(test_partial_doi, test_article): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically I would prefer this to be in its own test_article.py
for article functionality but I don't think that exists.
assert article == test_article | ||
|
||
|
||
def test_partial_doi_method_corpus(corpus, test_article, test_partial_doi): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be in the test_corpus.py
testing module.
I was thinking about this for a bit… why do you ever need to validate a partial doi? Why not validate only on dois since dois are the only things that actually have canonical representations? That way the canonical way to do everything is always to transform the partial doi into the doi first and then validate the doi. This seems like it introduces a lot of complexity and it's not clear to me what the wins are. |
This PR adds a number of methods to the Article and Corpus class to allow for Article() creation from a partial DOI, including from Corpus, transformations between partial and full DOIs, as well as regex to validate partial DOIs. I added a new pytest file as well to cover these changes.