-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Downscale Function created for for hourly data in nc #3322
Merged
Merged
Changes from all commits
Commits
Show all changes
23 commits
Select commit
Hold shift + click to select a range
50cca2c
v1 hourly downscale
Snafkin547 bf7adcb
Merge branch 'develop' into downscale/hourly
Snafkin547 95e9691
changelog updated
Snafkin547 32115a4
Merge branch 'downscale/hourly' of https://github.com/Snafkin547/peca…
Snafkin547 a214027
man file created created
Snafkin547 502bf89
import package list updated
Snafkin547 29272cf
Merge branch 'develop' into downscale/hourly
Snafkin547 df6de03
Merge branch 'develop' into downscale/hourly
Snafkin547 e143b38
Name added to CITATION
Snafkin547 c418c86
Merge branch 'downscale/hourly' of https://github.com/Snafkin547/peca…
Snafkin547 9c840cc
Merge branch 'develop' into downscale/hourly
Snafkin547 85ed698
Suggested change in namespace and file input style modified
Snafkin547 c6bb0d0
Time units uses lubridate
Snafkin547 16b84a4
Merge branch 'develop' into downscale/hourly
Snafkin547 900acee
downscale func takes time series
Snafkin547 8c4234a
downscale func takes time series
Snafkin547 a23e600
Merge branch 'downscale/hourly' of https://github.com/Snafkin547/peca…
Snafkin547 bfeb8e5
Time Zone Checked
Snafkin547 8cad31f
Merge branch 'develop' into downscale/hourly
Snafkin547 f56fd9c
Updated downscale based on comment
Snafkin547 30de7b9
Merge branch 'downscale/hourly' of https://github.com/Snafkin547/peca…
Snafkin547 6ca6b65
name space added
Snafkin547 84780a7
Merge branch 'develop' into downscale/hourly
mdietze File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,103 @@ | ||
#' SDA Downscale Function for Hourly Data | ||
#' | ||
#' This function uses the randomForest model to downscale forecast data (hourly) to unmodeled locations using covariates and site locations | ||
#' | ||
#' @author Harunobu Ishii | ||
#' @param nc_file In quotes, file path for .nc containing ensemble data. | ||
#' @param coords In quotes, file path for .csv file containing the site coordinates, columns named "lon" and "lat". | ||
#' @param yyyy In string, format is yyyy(year of interest) | ||
#' @param covariates SpatRaster stack, used as predictors in randomForest. Layers within stack should be named. Recommended that this stack be generated using 'covariates' instructions in assim.sequential/inst folder | ||
#' @return It returns the `downscale_output` list containing lists for the training and testing data sets, models, and predicted maps for each ensemble member. | ||
#' @export | ||
|
||
SDA_downscale_hrly <- function(nc_file, coords, yyyy, covariates){ | ||
|
||
# Read the input data and site coordinates | ||
nc_data <- ncdf4::nc_open(nc_file) | ||
on.exit(ncdf4::nc_close(nc_data)) | ||
input_data <- ncdf4::ncvar_get(nc_data, "NEE") | ||
covariate_names <- names(covariates) | ||
|
||
|
||
# Extract time and units | ||
time <- nc_data$dim$time$vals | ||
time_units <- nc_data$dim$time$units | ||
time_origin_str <- substr(time_units, 12, 31) | ||
|
||
# Check if timezone is specified in the time units string | ||
if (grepl("UTC|GMT", time_units)) { | ||
time_origin <- lubridate::ymd_hm(time_origin_str, tz = "UTC") | ||
} else if (grepl("EST", time_units)) { | ||
time_origin <- lubridate::ymd_hm(time_origin_str, tz = "EST") | ||
} else { | ||
time_origin <- lubridate::ymd_hm(time_origin_str, tz = "UTC") # Default to UTC if not specified | ||
} | ||
|
||
# Timereadable | ||
if (grepl("hours", time_units)) { | ||
time_readable <- time_origin + lubridate::dhours(time) | ||
} else if (grepl("seconds", time_units)) { | ||
time_readable <- time_origin + lubridate::dseconds(time) | ||
} else { | ||
stop("Unsupported time units") | ||
} | ||
|
||
# Extract predictors from covariates raster using site coordinates | ||
site_coordinates <- terra::vect(readr::read_csv(coords), geom=c("lon", "lat"), crs="EPSG:4326") | ||
predictors <- as.data.frame(terra::extract(covariates, site_coordinates,ID = FALSE)) | ||
|
||
downscale_output<- list() | ||
|
||
# Train & Test split | ||
sample <- sample(1:nrow(predictors), size = round(0.75*nrow(predictors))) | ||
|
||
# Predict for each time stamp of the year selected | ||
time_indices <- which(year(time_readable) == yyyy) | ||
for (index in time_indices) { | ||
data <- input_data[index, , ] | ||
carbon_data <- as.data.frame(data) | ||
names(carbon_data) <- paste0("ensemble",seq(1:ncol(carbon_data))) | ||
|
||
# Combine carbon data and covariates/predictors and split into training/test | ||
full_data <- cbind(carbon_data, predictors) | ||
train_data <- full_data[sample, ] | ||
test_data <- full_data[-sample, ] | ||
|
||
# Combine each ensemble member with all predictors | ||
models <- list() | ||
maps <- list() | ||
predictions <- list() | ||
ensembles <- list() | ||
for (i in seq_along(carbon_data)) { | ||
ensemble_col <- paste0("ensemble", i) | ||
formula <- stats::as.formula(paste(ensemble_col, "~", paste(covariate_names, collapse = " + "))) | ||
models[[i]] <- randomForest::randomForest(formula, | ||
data = train_data, | ||
ntree = 1000, | ||
na.action = stats::na.omit, | ||
keep.forest = TRUE, | ||
importance = TRUE) | ||
|
||
maps[[i]] <- terra::predict(covariates, model = models[[i]], na.rm = TRUE) | ||
predictions[[i]] <- stats::predict(models[[i]], test_data) | ||
} | ||
|
||
# Organize the results into a single output list | ||
curr_downscaled <- list( data = list(training = train_data, testing = test_data), | ||
models = models, | ||
maps = maps, | ||
predictions = predictions | ||
) | ||
|
||
# Rename each element of the output list with appropriate ensemble numbers | ||
for (i in 1:length(curr_downscaled$data)) { | ||
names(curr_downscaled$data[[i]]) <- paste0("ensemble", seq(1:ncol(carbon_data))) | ||
} | ||
names(curr_downscaled$models) <- paste0("ensemble", seq(1:ncol(carbon_data))) | ||
names(curr_downscaled$maps) <- paste0("ensemble", seq(1:ncol(carbon_data))) | ||
names(curr_downscaled$predictions) <- paste0("ensemble", seq(1:ncol(carbon_data))) | ||
|
||
downscale_output[[as.character(time_readable[index])]]<-curr_downscaled | ||
} | ||
return(downscale_output) | ||
} |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't need any change for this PR, but making a note for those doing future development that this would be a great place to either parallelize this work or to modify the ML model to take time into account explicitly.