Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation, UI and examples #15

Open
WolfgangFahl opened this issue Jun 15, 2020 · 5 comments
Open

Documentation, UI and examples #15

WolfgangFahl opened this issue Jun 15, 2020 · 5 comments
Labels
bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request

Comments

@WolfgangFahl
Copy link

WolfgangFahl commented Jun 15, 2020

wdumper looks lik a very promising and potentially very helpful tool.

When trying out wdumper i was not able to achieve what i wanted. I had expected if i specifiy P31 "instanceof" and "Q13442814" https://www.wikidata.org/wiki/Q13442814 scholarly article that I'd get a dump with triples of all scholarly articles (hopefully with all their properties).

The dump ended up to be:
https://tools.wmflabs.org/wdumps/dump/414
took hours to be finished and included just 38 triples after processing 86949976 items.

So i tried again this time after seing P31 was not used so i ended up with:
https://tools.wmflabs.org/wdumps/dump/415

with same timing and result. I find this very frustrating since a simple "give me all entities of type xy1,xy2,xy3" should be straight forward. It would be great to have improved documentation, UI and examples and it would save a lot of waste processing time that others might want to use. A very importan factor should be "limit" options which make sure that for tests only a subset of the data and only a subset of the result can be specified to speed up the processing of finding out what a query should look like.

@bennofs bennofs added bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request labels Jun 15, 2020
@bennofs
Copy link
Owner

bennofs commented Jun 15, 2020

Thank you for the feedback! This is very valuable to me, as those are issues I didn't see myself since I'm blinded by already knowing how it's supposed to be used 😄

Empty properties/values should be an error or default to the hint text, and property/id comparision shouldn't be case sensitive (the second example has lowercase p31, while the actual property is P31) (we should probably check whether those entities exist).

So there are three tasks here:

  • validate filter values (disallow empty or non-existent
  • more examples and documentation
  • provide a preview (sample) of the full dump in less time

I will hopefully be able to fix the first one this week, so that at least there is an error if you enter invalid data.

@WolfgangFahl
Copy link
Author

Please don't forget the limit options. Those seem to be important for me and should probably be a default so that the first dumps will only run a few minutes to check that things are as expected and than the limits can be relaxed.

@AtilioA
Copy link

AtilioA commented Sep 22, 2020

Limit and preview options would be great, there are a lot of empty requests that keep running for days on end just wasting resources because whoever requested them couldn't figure out how to use this tool. These things should be easy to implement as well

@WolfgangFahl
Copy link
Author

Yesterday I came up with this query:

# get a list of cities
# for geograpy3 library
# see https://github.com/somnathrakshit/geograpy3/issues/15
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX p: <http://www.wikidata.org/prop/>
PREFIX ps: <http://www.wikidata.org/prop/statement/>
PREFIX pq: <http://www.wikidata.org/prop/qualifier/>
# get City details with Country
SELECT DISTINCT ?country ?countryLabel ?countryIsoCode ?countryPopulation ?countryGDP_perCapita ?region ?regionLabel ?regionIsoCode ?city ?cityLabel ?coord ?cityPopulation ?date ?ratio WHERE {
  # run for Paris as example only
  # if you uncomment this line this query might run for some 3 hours on a local wikidata copy using Apache Jena
  VALUES ?city {wd:Q90}.
  # instance of City Q515
  # instance of human settlement https://www.wikidata.org/wiki/Q486972
  ?city wdt:P31/wdt:P279* wd:Q486972 .
  # label of the City
  ?city rdfs:label ?cityLabel filter (lang(?cityLabel) = "en").
  # get the coordinates
  ?city wdt:P625 ?coord.
  # region this country belongs to
  # https://www.wikidata.org/wiki/Property:P361
  OPTIONAL {
    # part of
    # https://www.wikidata.org/wiki/Property:P361
    ?city wdt:P131 ?region.
    # first order region
    ?region wdt:P31/wdt:P279* wd:Q10864048.
    ?region rdfs:label ?regionLabel filter (lang(?regionLabel) = "en").
    ?region wdt:P300 ?regionIsoCode
  }
  # country this city belongs to
  ?city wdt:P17 ?country .
  # label for the country
  ?country rdfs:label ?countryLabel filter (lang(?countryLabel) = "en").
  # https://www.wikidata.org/wiki/Property:P297 ISO 3166-1 alpha-2 code
  ?country wdt:P297 ?countryIsoCode.
  # population of country
  ?country wdt:P1082 ?countryPopulation.
  # https://www.wikidata.org/wiki/Property:P2132
  # nonminal GDP per capita
  ?country wdt:P2132 ?countryGDP_perCapita.
  # population of city
  ?city p:P1082 ?populationStatement .
  ?populationStatement ps:P1082 ?cityPopulation.
  ?populationStatement pq:P585 ?date
  FILTER NOT EXISTS { ?city p:P1082/pq:P585 ?date_ . FILTER (?date_ > ?date) }
  BIND ( concat(str(round(10000*?cityPopulation/?countryPopulation)/100), '%') AS ?ratio)
}

it takes 3.5 hours on my local copy of Wikidata see http://wiki.bitplan.com/index.php/WikiData_Import_2020-08-15.

I'd love to have this as a regular dump e.g. monthly but I'd not know how to create a dump from a query. I think the dumper should be changed to accept SPARQL queries as input.

@WolfgangFahl
Copy link
Author

Any news on this?
For a start the transitive selection:
?city wdt:P31/wdt:P279* wd:Q486972

would be something i'd love to specify.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants