blockCV

Spatial and environmental blocking for k-fold and LOO cross-validation

The package blockCV offers a range of functions for generating train and test folds for k-fold and leave-one-out (LOO) cross-validation (CV). It allows for separation of data spatially and environmentally, with various options for block construction. Additionally, it includes a function for assessing the level of spatial autocorrelation in response or raster covariates, to aid in selecting an appropriate distance band for data separation. The blockCV package is suitable for the evaluation of a variety of spatial modelling applications, including classification of remote sensing imagery, soil mapping, and species distribution modelling (SDM). It also provides support for different SDM scenarios, including presence-absence and presence-background species data, rare and common species, and raster data for predictor variables.

Main features

There are four blocking methods: spatial, clustering, buffers, and NNDM (Nearest Neighbour Distance Matching) blocks
Several ways to construct spatial blocks
The assignment of the spatial blocks to cross-validation folds can be done in three different ways: random, systematic and checkerboard pattern
The spatial blocks can be assigned to cross-validation folds to have evenly distributed records for binary (e.g. species presence-absence/background) or multi-class responses (e.g. land cover classes for remote sensing image classification)
The buffering and NNDM functions can account for presence-absence and presence-background data types
Using geostatistical techniques to inform the choice of a suitable distance band by which to separate the data sets

New updates of the version 3.0

The latest version blockCV (v3.0) features significant updates and changes. All function names have been revised to more general names, beginning with cv_*. Although the previous functions (version 2.x) will continue to work, they will be removed in future updates after being available for an extended period. It is highly recommended to update your code with the new functions provided below.

Some new updates:

Function names have been changed, with all functions now starting with cv_
The CV blocking functions are now: cv_spatial, cv_cluster, cv_buffer, and cv_nndm
Spatial blocks now support hexagonal (now, default), rectangular, and user-defined blocks
A fast C++ implementation of Nearest Neighbour Distance Matching (NNDM) algorithm (Milà et al. 2022) is now added
The NNDM algorithm can handle species presence-background data and other types of data
The cv_cluster function generates blocks based on kmeans clustering. It now works on both environmental rasters and the spatial coordinates of sample points
The cv_spatial_autocor function now calculates the spatial autocorrelation range for both the response (i.e. binary or continuous data) and a set of continuous raster covariates
The new cv_plot function allows for visualization of folds from all blocking strategies using ggplot facets
The terra package is now used for all raster processing and supports both stars and raster objects, as well as files on disk.
The new cv_similarity provides measures on possible extrapolation to testing folds

Installation

To install the latest update of the package from GitHub use:

remotes::install_github("rvalavi/blockCV", dependencies = TRUE)

Or installing from CRAN:

install.packages("blockCV", dependencies = TRUE)

Vignettes

To see the practical examples of the package see:

blockCV introduction: how to create block cross-validation folds
Block cross-validation for species distribution modelling
Using blockCV with the caret and tidymodels (coming soon!)

Basic usage

This code snippet showcases some of the package's functionalities, but for more comprehensive tutorials, please refer to the vignette included with the package (and above).

# loading the package
library(blockCV)
library(sf) # working with spatial vector data
library(terra) # working with spatial raster data

# load raster data; the pipe operator |> is available for R v4.1 or higher
myrasters <- system.file("extdata/au/", package = "blockCV") |>
  list.files(full.names = TRUE) |>
  terra::rast()

# load species presence-absence data and convert to sf
pa_data <- read.csv(system.file("extdata/", "species.csv", package = "blockCV")) |>
  sf::st_as_sf(coords = c("x", "y"), crs = 7845)

# spatial blocking by specified range and random assignment
sb <- cv_spatial(x = pa_data, # sf or SpatialPoints of sample data (e.g. species data)
                 column = "occ", # the response column (binary or multi-class)
                 r = myrasters, # a raster for background (optional)
                 size = 450000, # size of the blocks in metres
                 k = 5, # number of folds
                 hexagon = TRUE, # use hexagonal blocks - defualt
                 selection = "random", # random blocks-to-fold
                 iteration = 100, # to find evenly dispersed folds
                 biomod2 = TRUE) # also create folds for biomod2

Or create spatial clusters for k-fold cross-validation:

# create spatial clusters
set.seed(6)
sc <- cv_cluster(x = pa_data, 
                 column = "occ", # optionally count data in folds (binary or multi-class)
                 k = 5)

# now plot the created folds
cv_plot(cv = sc, # a blockCV object
        x = pa_data, # sample points
        r = myrasters[[1]], # optionally add a raster background
        points_alpha = 0.5,
        nrow = 2)

Investigate spatial autocorrelation in the landscape to choose a suitable size for spatial blocks:

# exploring the effective range of spatial autocorrelation in raster covariates or sample data
cv_spatial_autocor(r = myrasters, # a SpatRaster object or path to files
                   num_sample = 5000, # number of cells to be used
                   plot = TRUE)

Alternatively, you can manually choose the size of spatial blocks in an interactive session using a Shiny app.

# shiny app to aid selecting a size for spatial blocks
cv_block_size(r = myrasters[[1]],
              x = pa_data, # optionally add sample points
              column = "occ",
              min_size = 2e5,
              max_size = 9e5)

Reporting issues

Please report issues at: https://github.com/rvalavi/blockCV/issues

Citation

To cite package blockCV in publications, please use:

Valavi R, Elith J, Lahoz-Monfort JJ, Guillera-Arroita G. blockCV: An R package for generating spatially or environmentally separated folds for k-fold cross-validation of species distribution models. Methods Ecol Evol. 2019; 10:225--232. https://doi.org/10.1111/2041-210X.13107

Name		Name	Last commit message	Last commit date
Latest commit History 278 Commits
.github/workflows		.github/workflows
R		R
inst		inst
man		man
src		src
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.md		README.md
appveyor.yml		appveyor.yml
blockCV.Rproj		blockCV.Rproj
codecov.yml		codecov.yml
cran_comments.md		cran_comments.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

blockCV

Spatial and environmental blocking for k-fold and LOO cross-validation

Main features

New updates of the version 3.0

Installation

Vignettes

Basic usage

Reporting issues

Citation

About

Releases

Packages

Languages

License

biomodhub/blockCV

Folders and files

Latest commit

History

Repository files navigation

blockCV

Spatial and environmental blocking for k-fold and LOO cross-validation

Main features

New updates of the version 3.0

Installation

Vignettes

Basic usage

Reporting issues

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages