Wrapper to generate multi-response predictive models.
Usage
mrIMLpredicts(
X,
X1 = NULL,
Y,
Model,
balance_data = "no",
mode = "regression",
dummy = FALSE,
prop = 0.5,
morans = F,
tune_grid_size = 10,
k = 10,
racing = T,
seed = sample.int(1e+08, 1)
)
Arguments
- X
A
dataframe
represents predictor or feature data.- X1
A
dataframe
extra predictor set used in each model. For the MrIML Joint species distribution model (JSDM) this is just a copy of the response data.- Y
A
dataframe
is response variable data (species, OTUs, SNPs etc).- Model
1 A
list
can be any model from the tidy model package. See examples.- balance_data
A
character
'up', 'down' or 'no'.- mode
character
'classification' or 'regression' i.e., is the generative model a regression or classification?- dummy
A
logical
'TRUE or FALSE'.- morans
logical
'TRUE or FALSE'. If 'TRUE' global Morans I is calculated for each response- tune_grid_size
A
numeric
sets the grid size for hyperparameter tuning. Larger grid sizes increase computational time. Ignored if racing=T.- k
A
numeric
sets the number of folds in the 10-fold cross-validation. 10 is the default.- racing
logical
'TRUE or FALSE'. If 'TRUE' MrIML performs the grid search using the 'racing' ANOVA method. See https://finetune.tidymodels.org/reference/tune_race_anova.html- seed
A
numeric
as these models have a stochastic component, a seed is set to make to make the analysis reproducible. Defaults between 100 million and 1.
Details
This function produces yhats that used in all subsequent functions. This function fits separate classification/regression models for each response variable in a data set. Rows in X (features) have the same id (host/site/population) as Y. Class imbalance can be a real issue for classification analyses. Class imbalance can be addressed for each response variable using 'up' (upsampling using ROSE bootstrapping), 'down' (downsampling) or 'no' (no balancing of classes).