---
title: "On prediction from multivariate repeated measures DI models"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{On prediction from multivariate repeated measures DI models}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{css styling, echo=FALSE}
span.R {
font-family: Courier New;
}
```
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
options(crayon.enabled = TRUE)
ansi_aware_handler <- function(x, options)
{
paste0(
"
",
fansi::sgr_to_html(x = x, warn = FALSE, term.cap = "256"),
"
"
)
}
old_hooks <- fansi::set_knit_hooks(knitr::knit_hooks,
which = c("output", "message", "error", "warning"))
knitr::knit_hooks$set(
output = ansi_aware_handler,
message = ansi_aware_handler,
warning = ansi_aware_handler,
error = ansi_aware_handler
)
```
```{r setup}
library(DImodelsMulti)
```
For this vignette, we will use the final model achieved in the vignette
[workflow](DImulti_workflow.html) as an example.
```{r DImulti_modelEx}
modelFinal <- DImulti(y = c("Y1", "Y2", "Y3"), eco_func = c("NA", "UN"), time = c("time", "CS"),
unit_IDs = 1, prop = 2:5, data = simMVRM, DImodel = "AV", method = "REML")
print(modelFinal)
```
Prediction how-to overview
To predict for any data from this model, which has custom class DImulti, we
use the predict() function, which is formatted as below, where object is the
DImulti model object, newdata is a dataframe or tibble
containing the community designs that you wish to predict from, if left NULL
then the data used to train the model will be predicted from instead, and
stacked is a boolean which determines whether the output from this function
will be given in a stacked/long format (TRUE) or wide format
(FALSE).
```{r predict_layout, eval=FALSE}
predict.DImulti(object, newdata = NULL, stacked = TRUE, ...)
```
The first option for prediction is to simply provide the model object to the function to predict
from the dataframe we used to train it (simMVRM). By default, the prediction
dataframe is output in a stacked format, as it is more commonly used for plotting than a wide
output.
```{r predict_default}
head(predict(modelFinal))
```
If we would rather a wide output, which can be easier to infer from without plotting, we can set
stacked = FALSE.
```{r predict_wide}
head(predict(modelFinal, stacked = FALSE))
```
We can also provide some subset of the original dataset rather than using it all.
```{r predict_subset}
predict(modelFinal, newdata = simMVRM[c(1, 4, 7, 10, 21), ])
```
Or we can use a dataset which follows the same format as simMVRM but is
entirely new data. If no information is supplied for which ecosystem functions or time points from
which you wish to predict, then all will be included automatically.
```{r predict_newSim}
newSim <- data.frame(plot = c(1, 2),
p1 = c(0.25, 0.6),
p2 = c(0.25, 0.2),
p3 = c(0.25, 0.1),
p4 = c(0.25, 0.1))
predict(modelFinal, newdata = newSim)
```
Otherwise, only the ecosystem functions/time points specified will be predicted from. As our dataset
is in a wide format, we will need to supply some arbitrary value to our desired ecosystem function
column.
```{r predict_Y1}
newSim <- data.frame(plot = c(1, 2),
p1 = c(0.25, 0.6),
p2 = c(0.25, 0.2),
p3 = c(0.25, 0.1),
p4 = c(0.25, 0.1),
Y1 = 0)
predict(modelFinal, newdata = newSim)
```
In the case that some information is missing from this new data, the function will try to set a
value for the column and will inform the user through a warning printed to the console.
```{r predict_newSim_missingID}
newSim <- data.frame(p1 = c(0.25, 0.6),
p2 = c(0.25, 0.2),
p3 = c(0.25, 0.1),
p4 = c(0.25, 0.1))
predict(modelFinal, newdata = newSim)
```
Caution
Merging predictions
You may wish to merge your predictions to your newdata dataframe for
plotting, printing, or further analysis.
As the function DImulti(), and as a consequence, the function
predict.DImulti(), sorts the data it is provided, to ensure proper labelling,
you may not be able to directly use cbind() to append the predictions to
your dataset.
In this case, ensure the unit_IDs column contains unique identifiers for your data rows and that
you specify stacked to correctly match your data layout.
Then use the function merge().
```{r predict_newSim_merge}
newSim <- data.frame(plot = c(1, 2),
p1 = c(0.25, 0.6),
p2 = c(0.25, 0.2),
p3 = c(0.25, 0.1),
p4 = c(0.25, 0.1))
preds <- predict(modelFinal, newdata = newSim, stacked = FALSE)
merge(newSim, preds, by = "plot")
```
Non-unique unit_IDs
In the case that your newdata contains non-unique unit_IDs values and
stacked = FALSE, any rows with common unit_IDs will be aggregated using the
mean() function.
```{r predict_newSim_aggregate}
newSim <- data.frame(plot = c(1, 1),
p1 = c(0.25, 0.6),
p2 = c(0.25, 0.2),
p3 = c(0.25, 0.1),
p4 = c(0.25, 0.1))
predict(modelFinal, newdata = newSim, stacked = FALSE)
```