Run regression sub-models — run_regression

Wrapper to run many regression sub-models using the caret package

Usage

run_regression_submodels(
  input_data,
  id_raster,
  covariates,
  cv_settings,
  model_settings,
  family = "binomial",
  clamping = TRUE,
  use_admin_bounds = FALSE,
  admin_bounds = NULL,
  admin_bounds_id = "polygon_id",
  prediction_range = c(-Inf, Inf),
  verbose = TRUE
)

Arguments

input_data

A data.frame with at least the following columns:

'indicator': number of "hits' per site, e.g. tested positive for malaria
'samplesize': total population sampled at the site
'x': x position, often longitude
'y': y position, often latitude

id_raster

terra::SpatRaster with non-NA pixels delineating the extent of the study area

covariates

(list) Named list of all covariate effects included in the model, typically generated by load_covariates().

cv_settings

Named list of cross-validation settings, passed to caret::trainControl.

model_settings

Named list where the name of each header corresponds to a model run in caret::train, and the arguments correspond to the model-specific settings for that model type.

family

(character(1), default 'binomial') Statistical model family being evaluated. For Gaussian models, this function trains against the 'mean' field; for all other families, this function trains against the ratio of 'indicator':'samplesize'.

clamping

(logical(1), default TRUE) Should the predictions of individual ML models be limited to the range observed in the data?

use_admin_bounds

(logical(1), default FALSE) Use one-hot encoding of administrative boundaries as a candidate feature?

admin_bounds

(sf, default NULL) Administrative boundaries to use. Only considered if use_admin_bounds is TRUE.

admin_bounds_id

(character, default 'polygon_id') Field to use for administrative boundary one-hot encoding. Only considered if use_admin_bounds is TRUE.

prediction_range

(numeric(2), default c(-Inf, Inf)) Prediction limits for the outcome range. Used when the predictions are in a limited range, for example, 0 to 1 or -1 to 1.

verbose

(logical(1), default TRUE) Log progress for ML model fitting?

Value

List with two items:

"models": A list containing summary objects for each regression model
"predictions": Model predictions covering the entire id_raster