-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add future bioclim from CHELSA #24
Conversation
As an FYI (maybe for @rafaqz to comment on), an alternative way to specify these models would be |
That sounds good to me, although dates are an argument for other sources. You can add a helper method to get the type params, like _model(::Type{<:Future{M,X}}) where M = M The other option is we switch to objects, which just adds |
I see your point about date - they have a slightly different meaning in climate scenarios, but we can keep them as an argument. I don't think we need to add fields and constructors and all of that to the package - types work for what we do, and if we parameterize something as |
Sorry should it maybe be Or are the multiple kinds of Future data? (I havent really read it) |
CHELSA has future raw data as well, like worldclim. An alternative would be |
Your point made me think about the fact that we can have future data from multiple things (e.g. landcover, species richness, etc). What I've done is separate the getraster(CHELSA{BioClim}, FutureClimate{CCSM4,RCP26}, 5) and the date is an optional argument on top - I think it makes more sense as it keeps the same idea of sources within datasets, and the "future" aspect is just a complication on top. |
I'm concurrently workin on PoisotLab/SimpleSDMLayers.jl#86 - this makes the API super easy to write. I am now fully ❤️ about the central raster location even for small files. |
So WorldClim has finally released the future BioClim data, and the code is ~ ready, but their zip files seem to be corrupted? I can see the contents, but can't extract them. Anyways as soon as I figure it out, we'll also get the scenarios for tmin, tmax, and prec, as they all follow the same naming convention. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks this is great.
struct RCP85 <: RCP end | ||
|
||
""" | ||
Abstract type for Shared Socio-economic Pathways (SSPs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indented type as first line of docstring
struct SSP585 <: SSP end | ||
|
||
""" | ||
A ClimateScenario can be a RCP or a SSP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be a comment?
export BCCCSM2MR, CNRMCM61, CNRMESM21, CanESM5, GFDLESM4, IPSLCM6ALR, MIROCES2L, MIROC6, MRIESM2 | ||
|
||
# Future datasets | ||
export FutureClimate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about just Future
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might have FutureLandscape
or FutureBiodiversity
at some point. I'm trying to future-proof the package
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably you can specify Landscape
as the internal type, like Future{Landscape}
, as with Future{BioClim}
.
export SSP126, SSP245, SSP370, SSP585 | ||
|
||
# CC models from CMIP5 (used in CHELSA) | ||
export ACCESS1, BNUESM, CCSM4, CESM1BGC, CESM1CAM5, CMCCCMS, CMCCCM, CNRMCM5, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about these being symbols instead? to reduce the exports? I was thinking of switching ALWB to symbols too - but it does require users typing one more confusing character (:
), and a big elseif
block... not sure anyway, just flagging this because I've been thinking about which is best to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think dispatching on values is awkward - and it also requires the big elseif
(like I have to do for years), so it's overall less maintainable. I think the exports are not too bad, since the names are really quite specific and it's unlikely that users would have variables like that.
The alternative is... a sub-module named FutureClimate
and we call FutureClimate.CanEMS5
? It seems a bit more verbose than we need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries, there are for and against for both.
src/worldclim/futures.jl
Outdated
return raster_path | ||
end | ||
|
||
rasterpath(T::Type{<:WorldClim{BioClim}}, ::Type{F}) where {F <: FutureClimate} = joinpath(rasterpath(T), "Future", _format(WorldClim, _scenario(F)), _format(WorldClim, _model(F))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably don't need to use where, although we should recombine these types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you have a suggestion about how to recombine them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about WorldClim{Future{BioClim}}
? reads like a modifier to BioClim
, which it kind of is, and lets you have WorldClim{Future{Climate}}
if they have the monthly climate data as well.
@@ -6,6 +6,7 @@ include("worldclim-weather.jl") | |||
include("earthenv-landcover.jl") | |||
include("earthenv-heterogeneity.jl") | |||
include("chelsa-bioclim.jl") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests for WorldClim
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, because the zip files don't seem to contain any data... this is weird and I need to dig into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok no worries, that sounds annoying. What kind of zip is it?
You could ould also move WorldClim Future to another PR if you want to get CHELSA merged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might do that. Let me think about the types a bit more, and I'll get back with some infos tomorrow.
src/worldclim/futures.jl
Outdated
Without a layer argument, all layers will be downloaded, and a tuple of paths is returned. | ||
If the data is already downloaded the path will be returned. | ||
""" | ||
function getraster(T::Type{WorldClim{BioClim}}, ::Type{F}, layer::Integer, date=Year(2050); res::String=defres(T)) where {F <: FutureClimate} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again probably better if this is a single type argument to keep things standardised
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's going to be a big type though - do you have an idea of what the type would look like? I ended up picking this syntax because it makes sense to me from a semantic point of view (we want the data, and then we want them in the future).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. Its not the best, but I think method signature consistency is overall more important.
So:
getraster(T::Type{WorldClim{Future{BioClim}}}, layer::Integer, date=Year(2050); res::String=defres(T))
And this is actually less characters...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also makes writing the GeoData.jl wrapper easier, it's one function for future bioclim and bioclim, instead of 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I have a few counter-arguments for that.
- I think it makes sense to separate two distinct elements, i.e. the type of data we want (bioclim from CHELSA) to the specific (no argument: extant data, argument: future data).
- I also think it makes sense because we can define the "future" part independently from the data source - multiple data sources will have future scenarios, and we want to be able to specify them correctly
- the type signature you give as an example isn't actually shorter in practice because there is not a valid default model for the future climate, so the user still needs to specify all of it when getting the data
- I get that this makes the GeoData wrapper simpler to write, but it's not going to be a majority of datasets at the moment
- I think having the data source contains sometimes a dataset and sometimes a future data specification is actually less consistent for the user - I agree that the date should be kept outside of it, tho
If you think of it in terms of path, as well, I think it would make sense that CHELSA{BioClim}, Future{X,Y}
goes into CHELSA/BioClim/Future/X_Y...
- there is an implicit semantics to the types that should also be present in the folder hierarchy.
Almost all functions of this type in Julia accept multiple inputs, I think that as long as it is properly documented, it would make things less confusing for users. An example is if you want to change from extant to future data: with a second type argument, you simply need to add it to the function call. With nested types, you need to go into the type and edit it. I'd rather pay the cost of writing more methods for the datasets who require it than have an akward typing, especially since we have multiple dispatch for this reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hear where you are coming from here, so maybe I'll explain myself a little more. A lot of the work I did to this package was to introduce strict stardardiations. Partly for users, but mostly for devs and for interop and reducing workload downstream. Not just in GeoData.jl but every package and script that builds on this in future. This is the only way I can maintain 20 or so julia packages - simplicity and strict standardisations. It also helps everyone else building on this in future.
This PR means now instead of one consistent method signature, this package has 2. I'm really strongly against that happening for only aesthetic reasons. There is no technical reason 3 args is better than 2 here.
- I think it makes sense to separate two distinct elements, i.e. the type of data we want (bioclim from CHELSA) to the specific (no argument: extant data, argument: future data).
This could be argued either way.
- I also think it makes sense because we can define the "future" part independently from the data source - multiple data sources will have future scenarios, and we want to be able to specify them correctly
This is possible with Future
how I defined it? That generality was the point of removing the ...Climate
part... You can add as many params for it as you want, and use however many you need to in each case? The trailing params are just ignored.
- the type signature you give as an example isn't actually shorter in practice because there is not a valid default model for the future climate, so the user still needs to specify all of it when getting the data
Its still shorter, because future is just Future
, not FutureClimate
, and you have an extra space and comma. So 2 chars at minimum ;)
- I get that this makes the GeoData wrapper simpler to write, but it's not going to be a majority of datasets at the moment
Not just geodata, ever wrapper anyone wants to write. Now there are 2 methods instead of 1. Downstream packages and scripts shouldn't know or care if its bioclim or future bioclim being passed through. We need to consider ecosystem consequences.
- I think having the data source contains sometimes a dataset and sometimes a future data specification is actually less consistent for the user - I agree that the date should be kept outside of it, tho
I dont understand this point... but again its an aesthetic choice not a technical one. The user consistency I mean is argument number and placement, which this complicates.
How I organised this package was that besides source and layer all other args should be exposed as kw arguments, so as not to require more methods everywhere downstream. It can still be improved, but I think its fairly simple currently. But I think we should push to standardise it even more.
I can edit this into the PR, im just mostly here from my phone and busy currently.
Codecov Report
@@ Coverage Diff @@
## master #24 +/- ##
===========================================
- Coverage 77.51% 59.20% -18.31%
===========================================
Files 13 16 +3
Lines 249 326 +77
===========================================
Hits 193 193
- Misses 56 133 +77
Continue to review full report at Codecov.
|
@rafaqz I've made some changes - I disagree with your rationale and design decision, but respect your lead on the project. I will let you edit the PR as you see fit, as you seem to have a very set idea of how to express this. Feel free to merge when you have setup the types in a way that matches your vision of how it should be done, my work on this branch is done at this point. When this is merged, I'll move on to #25 . |
Great. I was thinking even if you really wanted 3 args I would still define a 2 arg version as well, instead of propagating that complexity into the ecosystem. I honestly dont mind the syntax either way. But if they are not all the same we objectively create downstream work. This decision determines how many methods and tests will be needed in packages and scripts that use That concern its central to all of my code and allways my first concern, hopefully you will come to see the sense in it ;) |
Also I'll try to lay out my reasoning about package design more clearly in the docs. |
I think I'll keep the two args version downstream (i.e. in |
I missed this. It probably means this PR wont be merged for quite a while, and is kind of unusual. |
Just getting back to old issues - what I meant is, I'd rather let you do this one so I have no doubt about how you want to organize things. I've been suggesting a way you don't like, and I am OK with that. I'm used to finishing PRs from people when I'm really specific about how something should be done. This is kinda blocking development on SimpleSDMLayers, but this isn't urgent. Worse case scenario I will add a temporary way to get the bioclim future data there while you decide how to implement this one. |
Ok no problem. I usually finish PRs incorporating requested changes, especially if I need them for something. Also I dont have to decide how to implement it, its straightforward and clear - use two arguments like the other sources. |
How I would do it is something like this, pretty much as expressed earlier: struct Future{A,B,C} end
function getraster(T::Type{Chelsa{Future{BioClim,B,C}}}, layer; kw...) where {B,C}
path = rasterpath(T, layer)
url = rasterurl(T, layer)
# etc
end So that: source = CHELSA{Future{BioCLim,RCP26,CSM24}}
layer = 7
path = getraster(source, layer) I'm on the road travelling and also trying to finish a methods paper and all the packages that it relies on. I won't need this functionality for a long time, so I wouldn't be able to prioritise getting this finished and tested to the point of merging. |
Refactor CHELSA future from #24, add climate and CMIP6
See #18 - I am not super set on the interface, but I added a number of types for models and RCPs. This is mostly because I expect the models and RCP to pop up at different places, so we can use them to e.g. build URLs.