R6 class to run a full MBG model and make predictions.
Details
To see examples of this object, run vignette('mbg')
Public fields
input_data
(data.table::data.table)
Table containing at least the following fields:x (
numeric
) location longitude in decimal degreesy (
numeric
) location latitude in decimal degreesindicator (
integer
) The number of events in the populationsamplesize (
integer
) The total population, denominator forindicator
id_raster
(terra::SpatRaster)
raster showing the total area that will be predicted using this model.covariate_rasters
(
list()
)
A list containing all predictor covariates. Each covariate is a terra::SpatRaster object with the same extent and dimensions asid_raster
.aggregation_table
(data.table::data.table)
A table created by build_aggregation_table, used to link each grid cell to higher-level administrative units.aggregation_levels
(
list()
)
A named list: for each named item, the name is the label for that aggregation level, and the value is a character vector of all fields in the original polygons to be used for aggregation at that level.population_raster
(terra::SpatRaster)
A raster giving population for each grid cell, to be used for population-weighted aggregation from grid cells to polygon boundaries. Should have the same dimensions asid_raster
. If no population raster is passed and the results are aggregated, aggregation will be by simple mean rather than population-weighted meanadmin_bounds
(sf::sf)
Polygons showing the boundaries of administrative divisions within the study region. Only required ifuse_admin_effect
ORstacking_use_admin_bounds
isTRUE
.admin_bounds_id
(
character
)
Field containing unique identifiers foradmin_bounds
, if passed.use_covariates
(
logical(1)
)
Should covariate effects be included in the predictive model?use_gp
(
logical(1)
)
Should a smoothed spatial surface be included in the predictive model?use_admin_effect
(
logical(1)
)
Should IID administrative-level effects be included in the predictive model?use_nugget
(
logical(1)
)
Should an IID effect by pixel be included in the predictive model?use_stacking
(
logical(1)
)
Should machine learning submodels be trained to relate the covariate rasters with the outcome data? Only run ifuse_covariates
isTRUE
.stacking_model_settings
(
list()
)
A named list of submodels to be run. For more information about this term, see run_regression_submodels. Only considered ifuse_stacking
is TRUE.stacking_cv_settings
(
list()
)
How should the stacking submodels be cross-validated? For more information about this term, see run_regression_submodels. Only considered ifuse_stacking
isTRUE
.stacking_use_admin_bounds
(
logical(1)
)
Should admin boundaries be included as features in the stacking submodels? For more information about this term, see run_regression_submodels. Only considered ifuse_stacking
is TRUE.stacking_prediction_range
(
logical(1)
)
Range of possible predictions for the stacking submodels. For more information about this term, see run_regression_submodels. Only considered ifuse_stacking
is TRUE.mesh_max_edge
(
numeric(2)
or NULL)
Maximum size of the INLA SPDE mesh inside (1) and outside (2) of the modeled region. Only considered ifuse_gp
is TRUE.mesh_cutoff
(
numeric(1)
)
Minimum size of the INLA mesh, usually reached in data-dense areas. Only considered ifuse_gp
is TRUE.spde_integrate_to_zero
(
boolean(1)
)
Should the 'volume' under the SPDE mesh integrate to zero? Only considered ifuse_gp
is TRUE.prior_spde_range
(
list()
)
A named list specifying the penalized complexity prior for the SPDE range. The two named items are "threshold", the test threshold (set as a proportion of the overall mesh extent), and "prob_below", the prior probability that the value is BELOW that range threshold. The function automatically converts "threshold" from a proportion of the overall mesh extent into a distance. Only considered ifuse_gp
is TRUE.prior_spde_sigma
(
list()
)
A named list specifying the penalized complexity prior for sigma (standard deviation) of the SPDE object. The two named items are "threshold", the test threshold for the standard deviation, and "prob_above", the prior probability that sigma will EXCEED that threshold. Only considered ifuse_gp
is TRUEprior_nugget
(
list()
)
A named list specifying the penalized complexity prior for the nugget term. The two named items are "threshold", the test threshold for the nugget standard deviation, and "prob_above", the prior probability that the standard deviation will EXCEED that threshold. Only considered ifuse_nugget
is TRUE.prior_admin_effect
(
list()
)
A named list specifying the penalized complexity prior for the admin-level IID term. The two named items are "threshold", the test threshold for the standard deviation of admin-level effects, and "prob_above", the prior probability that the standard deviation will EXCEED that threshold. Only considered ifuse_admin_effect
is TRUE.prior_covariate_effect
(
list()
)
A named list specifying the penalized complexity prior for all covariate effects except for the intercept, if an intercept is included. The two named items are "threshold", the test threshold for the size of each fixed effect, and "prob_above", the prior probability that the beta for each covariate will EXCEED that threshold. Only considered ifuse_covariates
is TRUE anduse_stacking
is FALSE.inla_link
(
character(1)
)
Link function for fitting the INLA model, typically related to the GLMfamily
.inverse_link
(
character(1)
)
Inverse function ofinla_link
.inla_family
(character)
GLM family to use. For more information, see stats::family.nugget_in_predict
(
logical(1)
)
If the nugget is used in model fitting, should it also be included as an IID effect by pixel in the model prediction step?verbose
Should model progress be timed?
model_covariates
(
list()
)
A list of covariates to be included in the INLA model. Either equal tocovariate_rasters
, or ML model predictions for stacked generalization.inla_inputs_list
(
list()
)
List of model inputs yielded by prepare_inla_data_stackinla_fitted_model
(
list()
)
List of model outputs yielded by fit_inla_modelgrid_cell_predictions
List of predictive surfaces yielded by generate_cell_draws_and_summarize
aggregated_predictions
List of predictions by administrative unit. Only created if
aggregation_table
andaggregation_levels
are both defined.
Methods
Method new()
Create a new MbgModelRunner object
Usage
MbgModelRunner$new(
input_data,
id_raster,
covariate_rasters = NULL,
aggregation_table = NULL,
aggregation_levels = NULL,
population_raster = NULL,
admin_bounds = NULL,
admin_bounds_id = NULL,
use_covariates = TRUE,
use_gp = TRUE,
use_admin_effect = FALSE,
use_nugget = TRUE,
use_stacking = FALSE,
stacking_cv_settings = list(method = "repeatedcv", number = 5, repeats = 5),
stacking_model_settings = list(gbm = NULL, treebag = NULL, rf = NULL),
stacking_use_admin_bounds = FALSE,
stacking_prediction_range = NULL,
mesh_max_edge = c(0.2, 5),
mesh_cutoff = c(0.04),
spde_integrate_to_zero = FALSE,
prior_spde_range = list(threshold = 0.1, prob_below = 0.05),
prior_spde_sigma = list(threshold = 3, prob_above = 0.05),
prior_nugget = list(threshold = 3, prob_above = 0.05),
prior_admin_effect = list(threshold = 3, prob_above = 0.05),
prior_covariate_effect = list(threshold = 3, prob_above = 0.05),
inla_link = "logit",
inverse_link = "plogis",
inla_family = "binomial",
nugget_in_predict = TRUE,
verbose = TRUE
)
Arguments
input_data
(data.table::data.table) Table containing at least the following fields:
x (
numeric
) location x position, in the same projection as theid_raster
y (
numeric
) location y position, in the same projection as theid_raster
indicator (
integer
) The number of events in the populationsamplesize (
integer
) The total population, denominator forindicator
id_raster
(terra::SpatRaster) raster showing the total area that will be predicted using this model
covariate_rasters
(
list()
, default NULL) A list containing all predictor covariates. Each covariate is a terra::SpatRaster object with the same extent and dimensions asid_raster
.aggregation_table
(data.table::data.table) A table created by build_aggregation_table, linking each grid cell to one or more polygons
aggregation_levels
(
list()
) A named list: for each named item, the name is the label for that aggregation level, and the value is a character vector of all fields in the original polygons to be used for aggregation at that level.population_raster
(terra::SpatRaster) A raster giving population for each grid cell, to be used for population-weighted aggregation from grid cells to polygon boundaries. Should have the same dimensions as
id_raster
. If no population raster is passed and the results are aggregated, aggregation will be by simple mean rather than population-weighted meanadmin_bounds
(sf::sf, default
NULL
) Polygons showing the boundaries of administrative divisions within the study region. Only required ifuse_admin_effect
ORstacking_use_admin_bounds
isTRUE
.admin_bounds_id
(
character
, defaultNULL
) Field containing unique identifiers foradmin_bounds
, if passed.use_covariates
(
logical(1)
, default TRUE) Should covariate effects be included in the predictive model?use_gp
(
logical(1)
, default TRUE) Should a smoothed spatial surface be included in the predictive model?use_admin_effect
(
logical(1)
default FALSE) Should IID administrative-level effects be included in the predictive model?use_nugget
(
logical(1)
, default TRUE) Should an IID effect by pixel be included in the predictive model?use_stacking
(
logical(1)
, default FALSE) Should machine learning submodels be trained to relate the covariate rasters with the outcome data? Only run ifuse_covariates
isTRUE
.stacking_cv_settings
(
list()
) How should the stacking submodels be cross-validated? For more information about this term, see run_regression_submodels. Only considered ifuse_stacking
isTRUE
.stacking_model_settings
(
list()
) A named list of submodels to be run. For more information about this term, see run_regression_submodels. Only considered ifuse_stacking
is TRUE.stacking_use_admin_bounds
(
logical(1)
, default FALSE) Should admin boundaries be included as features in the stacking submodels? For more information about this term, see run_regression_submodels. Only considered ifuse_stacking
is TRUE.stacking_prediction_range
(
numeric(2)
, default NULL) Range of possible predictions for the stacking submodels. For more information about this term, see run_regression_submodels. Only considered ifuse_stacking
is TRUE.mesh_max_edge
(
numeric(2)
, default c(0.2, 5)) Maximum size of the INLA SPDE mesh inside (1) and outside (2) of the modeled region. Only considered ifuse_gp
is TRUE.mesh_cutoff
(
numeric(1)
, default 0.04) Minimum size of the INLA mesh, usually reached in data-dense areas. Only considered ifuse_gp
is TRUE.spde_integrate_to_zero
(
boolean(1)
, default FALSE) Should the 'volume' under the SPDE mesh integrate to zero? Only considered ifuse_gp
is TRUE.prior_spde_range
(
list()
) A named list specifying the penalized complexity prior for the SPDE range. The two named items are "threshold", the test threshold (set as a proportion of the overall mesh extent), and "prob_below", the prior probability that the value is BELOW that range threshold. The function automatically converts "threshold" from a proportion of the overall mesh extent into a distance. Only considered ifuse_gp
is TRUE.prior_spde_sigma
(
list()
) A named list specifying the penalized complexity prior for sigma (standard deviation) of the SPDE object. The two named items are "threshold", the test threshold for the standard deviation, and "prob_above", the prior probability that sigma will EXCEED that threshold. Only considered ifuse_gp
is TRUEprior_nugget
(
list()
) A named list specifying the penalized complexity prior for the nugget term. The two named items are "threshold", the test threshold for the nugget standard deviation, and "prob_above", the prior probability that the standard deviation will EXCEED that threshold. Only considered ifuse_nugget
is TRUE.prior_admin_effect
(
list()
) A named list specifying the penalized complexity prior for the admin-level IID term. The two named items are "threshold", the test threshold for the standard deviation of admin-level effects, and "prob_above", the prior probability that the standard deviation will EXCEED that threshold. Only considered ifuse_admin_effect
is TRUE.prior_covariate_effect
(
list()
) A named list specifying the penalized complexity prior for all covariate effects except for the intercept, if an intercept is included. The two named items are "threshold", the test threshold for the size of each fixed effect, and "prob_above", the prior probability that the beta for each covariate will exceed that threshold. Only considered ifuse_covariates
is TRUE anduse_stacking
is FALSE.inla_link
(
character(1)
, default 'logit') Link function for fitting the INLA model, typically related to the GLMfamily
.inverse_link
(
character(1)
, default 'plogis') Inverse function ofinla_link
.inla_family
(
character(1)
, default 'binomial') GLM family to use. For more information, seestats::family()
.nugget_in_predict
(
logical(1)
, default TRUE) If the nugget is used in model fitting, should it also be included as an IID effect by pixel in the model prediction step?verbose
(
logical(1)
, default TRUE) Should model progress be timed?
Method aggregate_predictions()
Aggregate grid cell predictions
Method run_mbg_pipeline()
Run a full MBG pipeline, including stacking, MBG model fitting, and prediction
Method get_predictive_validity()
Get predictive validity metrics for the fitted model
Usage
MbgModelRunner$get_predictive_validity(
in_sample = TRUE,
validation_data = NULL,
na.rm = FALSE
)
Arguments
in_sample
(
logical(1)
, default TRUE) Compare model predictions to the data used to generate the model? If FALSE, does not return the WAIC, which is only useful for in-sample predictive validity.validation_data
(data.table::data.table, default NULL) Observed data to compare against. Expected for out-of-sample model validation. Table containing at least the following fields:
x (
numeric
) location x position, in the same projection as theid_raster
y (
numeric
) location y position, in the same projection as theid_raster
indicator (
integer
) The number of events in the populationsamplesize (
integer
) The total population, denominator forindicator
na.rm
(
logical(1)
, default FALSE) Should NA values be dropped from the RMSE and log predictive density calculations?
Details
Returns the point RMSE (compared against the mean estimates by pixel), log-posterior density (compared against the predictive draws), and the Watanabe-Aikake Information Criterion (WAIC, only returned for in-sample predictive validity).
Returns
data.table::data.table Containing the following fields:
'rmse': Root mean squared error when compared against the mean estimates by pixel. Lower RMSE is better.
'lpd': Log posterior predictive density when compared against pixel-level samples from the model. Higher LPD is better.
'waic' (in-sample only): Watanable-Aikake information criterion estimated by INLA. Lower WAIC is better.
For clarity, these fields will have the suffix "_is" for in-sample models, and "_oos" for out-of-sample models.