-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Where to store rasters? #23
Comments
Isnt this just going back to your origional proposal? Downloading 30GB will still go into Its not clear to me why traits wont work? The idea is the variable is set outside the session, and that will only affect large datasets anyway? Maybe we should just use the new preferences system to set this instead of the env var. But a PR would make this more substantial thing to discuss. We cant use artifacts because there are millions of files. |
Agreed that we can't use Artifacts. I'm slowly realizing the size of other datasets - I might end up being convinced that maybe asking to set a path is not unreasonable - I also don't want to write into |
Lets use this https://github.com/JuliaPackaging/Preferences.jl? Then you can set preferences in session, after an error gives you an example of what to do. And the path will stick when you set it. The trait can work as planned earlier and we only throw the error require setting the preferences for the large weather datasets. |
Yeah, literally multuple terrabytes! Preferences.jl could be the middle ground we need, its less weird than |
For cesar its important to have both scales... We'll run GrowthMaps.jl with tiny climate datasets for exploration and sharing ideas, but swap to hundred GB datasets for fitting real models - the GrowthMaps/GeoData/RasterDataSources combo abstracts that away and the output is the same format. |
But I think you are right for Bioclim and Climate, I had to set the path in support scripts for a paper, and its pretty awful, and makes the script not reproducable without editing. |
Let's definitely go with Preferences - I'll work on this when I've made progress on the future bioclim data |
Would it make sense to use a scratch space from Scratch.jl as a default, with the user being given the option to override that by some mechanism (either Preferences.jl or some environment variable)? |
The Scratch.jl docs kind of say not to use it for this use case :
I personally occasionally manage these files in a browser - say to copy them for someone else when I've downloaded a lot. But I know the current solution kind of sucks too. Some of these future climate datasets and current weather datasets are many GB downloadable with a single command, so we need to be a little bit careful about the location and let users access and manage it. |
#21 ended up being about org admin, so let me restate the issue here:
The current solution is to require
ENV["RASTERDATASOURCES_PATH"]
- this can work but it requires setting the variable even for small datasets, which is an additional step for users.The solution I suggested in #21 was to use traits for the different types, but this is also possibly confusing - sometimes things will stop working unless the variable is set (and setting it from the session will not make it permanent).
Solutions like Artifacts don't work because we don't want to download ALL data when the package is built.
Here is my current thinking on this - we might want to keep the idea of a
ENV["RASTERDATASOURCES_PATH"]
, and have agreet()
function that reminds users of what it does. Specifically, if there is no such path set, we can use a folder in@__DIR__
to store the data? Users who don't want to make a choice will have their data there, users who want to specify a path will have a choice.@rafaqz what do you think?
The text was updated successfully, but these errors were encountered: