--- title: "On prediction from multivariate repeated measures DI models" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{On prediction from multivariate repeated measures DI models} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{css styling, echo=FALSE} span.R { font-family: Courier New; } ``` ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) options(crayon.enabled = TRUE) ansi_aware_handler <- function(x, options) { paste0( "
",
    fansi::sgr_to_html(x = x, warn = FALSE, term.cap = "256"),
    "
" ) } old_hooks <- fansi::set_knit_hooks(knitr::knit_hooks, which = c("output", "message", "error", "warning")) knitr::knit_hooks$set( output = ansi_aware_handler, message = ansi_aware_handler, warning = ansi_aware_handler, error = ansi_aware_handler ) ``` ```{r setup} library(DImodelsMulti) ``` For this vignette, we will use the final model achieved in the vignette [workflow](DImulti_workflow.html) as an example. ```{r DImulti_modelEx} modelFinal <- DImulti(y = c("Y1", "Y2", "Y3"), eco_func = c("NA", "UN"), time = c("time", "CS"), unit_IDs = 1, prop = 2:5, data = simMVRM, DImodel = "AV", method = "REML") print(modelFinal) ```

Prediction how-to overview

To predict for any data from this model, which has custom class DImulti, we use the predict() function, which is formatted as below, where object is the DImulti model object, newdata is a dataframe or tibble containing the community designs that you wish to predict from, if left NULL then the data used to train the model will be predicted from instead, and stacked is a boolean which determines whether the output from this function will be given in a stacked/long format (TRUE) or wide format (FALSE). ```{r predict_layout, eval=FALSE} predict.DImulti(object, newdata = NULL, stacked = TRUE, ...) ``` The first option for prediction is to simply provide the model object to the function to predict from the dataframe we used to train it (simMVRM). By default, the prediction dataframe is output in a stacked format, as it is more commonly used for plotting than a wide output. ```{r predict_default} head(predict(modelFinal)) ``` If we would rather a wide output, which can be easier to infer from without plotting, we can set stacked = FALSE. ```{r predict_wide} head(predict(modelFinal, stacked = FALSE)) ``` We can also provide some subset of the original dataset rather than using it all. ```{r predict_subset} predict(modelFinal, newdata = simMVRM[c(1, 4, 7, 10, 21), ]) ``` Or we can use a dataset which follows the same format as simMVRM but is entirely new data. If no information is supplied for which ecosystem functions or time points from which you wish to predict, then all will be included automatically. ```{r predict_newSim} newSim <- data.frame(plot = c(1, 2), p1 = c(0.25, 0.6), p2 = c(0.25, 0.2), p3 = c(0.25, 0.1), p4 = c(0.25, 0.1)) predict(modelFinal, newdata = newSim) ``` Otherwise, only the ecosystem functions/time points specified will be predicted from. As our dataset is in a wide format, we will need to supply some arbitrary value to our desired ecosystem function column. ```{r predict_Y1} newSim <- data.frame(plot = c(1, 2), p1 = c(0.25, 0.6), p2 = c(0.25, 0.2), p3 = c(0.25, 0.1), p4 = c(0.25, 0.1), Y1 = 0) predict(modelFinal, newdata = newSim) ``` In the case that some information is missing from this new data, the function will try to set a value for the column and will inform the user through a warning printed to the console. ```{r predict_newSim_missingID} newSim <- data.frame(p1 = c(0.25, 0.6), p2 = c(0.25, 0.2), p3 = c(0.25, 0.1), p4 = c(0.25, 0.1)) predict(modelFinal, newdata = newSim) ```

Caution

Merging predictions

You may wish to merge your predictions to your newdata dataframe for plotting, printing, or further analysis. As the function DImulti(), and as a consequence, the function predict.DImulti(), sorts the data it is provided, to ensure proper labelling, you may not be able to directly use cbind() to append the predictions to your dataset. In this case, ensure the unit_IDs column contains unique identifiers for your data rows and that you specify stacked to correctly match your data layout. Then use the function merge(). ```{r predict_newSim_merge} newSim <- data.frame(plot = c(1, 2), p1 = c(0.25, 0.6), p2 = c(0.25, 0.2), p3 = c(0.25, 0.1), p4 = c(0.25, 0.1)) preds <- predict(modelFinal, newdata = newSim, stacked = FALSE) merge(newSim, preds, by = "plot") ```

Non-unique unit_IDs

In the case that your newdata contains non-unique unit_IDs values and stacked = FALSE, any rows with common unit_IDs will be aggregated using the mean() function. ```{r predict_newSim_aggregate} newSim <- data.frame(plot = c(1, 1), p1 = c(0.25, 0.6), p2 = c(0.25, 0.2), p3 = c(0.25, 0.1), p4 = c(0.25, 0.1)) predict(modelFinal, newdata = newSim, stacked = FALSE) ```