--- title: "The recommended workflow of DImulti() analyses" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{The recommended workflow of DImulti() analyses} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{css styling, echo=FALSE} span.R { font-family: Courier New; } ``` ```{r, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) options(crayon.enabled = TRUE) ansi_aware_handler <- function(x, options) { paste0( "

",
    fansi::sgr_to_html(x = x, warn = FALSE, term.cap = "256"),
    "

" ) } old_hooks <- fansi::set_knit_hooks(knitr::knit_hooks, which = c("output", "message", "error", "warning")) knitr::knit_hooks$set( output = ansi_aware_handler, message = ansi_aware_handler, warning = ansi_aware_handler, error = ansi_aware_handler ) ``` ```{r setup, message=FALSE, warning=FALSE} library(DImodelsMulti) library(dplyr) library(nlme) library(ggplot2) ``` This vignette aims to give a rough outline for the process to be followed when examining multivariate and/or repeated measures BEF relationship study data using the DImulti() function from the DImodelsMulti R package. We will use the dataset simMVRM, which is included in the package, to illustrate the workflow of the package.

Examining the data

We always start by looking at our data using View() or head(), to make sure it is as expected and includes all required information. ```{r data_head} head(simMVRM) ``` We can then look at some summarising metrics, such as the mean of each ecosystem function, separated by time as below, which can help us know what to expect in the analysis, in this case we see that ecosystem functioning improves at time point 2 for the average multifunctionality index, MFindex, but the function Y2 performs better at time 1. ```{r data_group} simMVRM_group <- dplyr::summarise(dplyr::group_by(simMVRM, time), Y1 = mean(Y1), Y2 = mean(Y2), Y3 = mean(Y3), MFindex = mean(Y1 + Y2 + Y3)) simMVRM_group ``` We can also produce plots to the same effect. ```{r data_hist, out.width="75%"} hist(simMVRM[which(simMVRM$time == 1), ]$Y1, main = "Y1 at time 1", xlab = "Y1") ```

Fitting the first model

We use the main function of the DImodelsMulti R package, DImulti(), to fit a repeated measures Diversity-Interactions model to the data. It is recommended to begin with the simplest model options and increase complexity as model selection/comparisons permit. The simplest model available to be fit by the package is the structural model, "STR", which only includes an intercept for each time and ecosystem function. $$ y_{kmt} = \beta_{0kt} + \epsilon_{kmt} $$ ```{r DImulti_modelSTR} modelSTR <- DImulti(y = c("Y1", "Y2", "Y3"), eco_func = c("NA", "UN"), time = c("time", "CS"), unit_IDs = 1, prop = 2:5, data = simMVRM, DImodel = "STR", method = "ML") ``` As our dataset is multivariate and in a wide format, we specify our response columns through the parameter y, "NA" through the first index of the vector eco_func, and select the unstructured ("UN") autocorrelation structure for our ecosystem functions through the second index of eco_func. The data also contains multiple time points, so we pass the column holding the time point indicator through the parameter time, along with the chosen autocorrelation structure, in this case we use compound symmetry, "CS". To facilitate grouping the recorded responses, a column index for the unique identifier of the experimental units is passed through unit_IDs. We choose to use the maximum likelihood ("ML") estimation method as we intend to compare models with differing fixed effects. We can use print() or summary() to view information on the model in our console. ```{r DImulti_modelSTR_print} print(modelSTR) ``` From this evaluation, we can see that each of our coefficients are significant at an alpha significance level of 0.05 ($\alpha = 0.05$). We can also see the variance covariance matrices for our repeated measures and multiple ecosystem functions, along with the combined matrix, from which we can see the estimated strength and direction of the covarying relationships between response types.

Fitting a Diversity-Interactions model

The next model to be fit is the simplest Diversity-Interactions (DI) model available in the package, the identity ("ID") structure. $$ y_{kmt} = \sum^{S}_{i=1}{\beta_{ikt} p_{im}} + \epsilon_{kmt} $$ The DI modelling framework assumes that the main driver behind changes in ecosystem functioning is the initial relative abundance/proportion of the species.To reflect this, the intercept from the structural model we fit previously is replaced by the initial proportion of the species in the study, each with their own $\beta$ coefficient. These fixed effects form a simplex space. The only code that we need to change for this model is the DImodel tag, from "STR" to "ID". ```{r DImulti_modelID} modelID <- DImulti(y = c("Y1", "Y2", "Y3"), eco_func = c("NA", "UN"), time = c("time", "CS"), unit_IDs = 1, prop = 2:5, data = simMVRM, DImodel = "ID", method = "ML") ``` We can now view this model, as before. In this case, we simply extract the t-table from the model summary. ```{r DImulti_modelID_tTable} summary(modelID)$tTable ``` The models within the DI modelling framework can be easily compared as they are hierarchical in nature, i.e. they are nested within one another. We can compare nested models using a likelihood ratio test, through anova(), or information criteria, such as AIC() or BIC() (the correct versions of which, AICc() or BICc() should be used in the cased of models with a large number of terms). ![](Hierarchy.png){width=100%} Image Credit: Kirwan *et al.* 2009 We choose to compare these models using a likelihood ratio test, where the null hypothesis states that the likelihoods of the two models do not significantly differ from one another, i.e. the extra terms in our larger model are not worth the added complexity. ```{r DImulti_STR_ID_LRT} anova(modelSTR, modelID) ``` As the p-value of the test is less than our chosen $\alpha$ value of 0.05, we reject the null hypothesis and proceed with the more complex model. We can confirm this selection by choosing the model with the lower AIC and BIC values, which are also printed by the anova() call.

Fitting and comparing interactions structures

We continue to increase the complexity of the models by fitting the next interaction structure in the nesting series, the average interaction model, "AV", a reparameterisation of the evenness model, "E". Again, the only code change required is the change of the DImodel tag. We then compare the two models. ```{r DImulti_modelAV} modelAV <- DImulti(y = c("Y1", "Y2", "Y3"), eco_func = c("NA", "UN"), time = c("time", "CS"), unit_IDs = 1, prop = 2:5, data = simMVRM, DImodel = "AV", method = "ML") coef(modelAV) anova(modelID, modelAV) ``` Once again, as our p-value < 0.05, we select the more complex model and continue increasing complexity.

Estimating theta

Now that we are fitting interactions, we may elect to estimate the non-linear parameter $\theta$, see vignette [onTheta](DImulti_onTheta.html) for a more in depth look at this process. We include the parameter estimate_theta with argument TRUE to have DImulti() automatically fit and compare different values of $\theta$, or if a user has *a priori* information on the nature of the term, they may pass values through the parameter theta. We can then test the two models against one another using information criteria, to see if the theta values significantly differ from 1. ```{r DImulti_modelAV_theta} modelAV_theta <- DImulti(y = c("Y1", "Y2", "Y3"), eco_func = c("NA", "UN"), time = c("time", "CS"), unit_IDs = 1, prop = 2:5, data = simMVRM, DImodel = "AV", method = "ML", estimate_theta = TRUE) thetaVals <- modelAV_theta$theta thetaVals AICc(modelAV) AICc(modelAV_theta) ``` In this case, the inclusion of theta does improve the model. The next models in the series are not nested within one another, so either can be used next but they cannot be compared to one another using a likelihood ratio test. As this dataset was simulated without any functional groupings, i.e. any groupings would be arbitrary, we choose to fit the additive interaction structure, "ADD". ```{r DImulti_modelADD} modelADD <- DImulti(y = c("Y1", "Y2", "Y3"), eco_func = c("NA", "UN"), time = c("time", "CS"), unit_IDs = 1, prop = 2:5, data = simMVRM, DImodel = "ADD", method = "ML", theta = thetaVals) modelADD$coefficients anova(modelAV_theta, modelADD) ``` In this case, the p-value is greater than our chosen alpha level of 0.05, therefore we fail to reject the null hypothesis that the added complexity of the model does not significantly increase the model likelihood, and proceed in our analysis with the simpler model, modelAV. As we do not need to increase the complexity of the model through the interaction structures any further, we turn to other options. It is at this stage that one could introduce treatment effects or environmental variables. As the simMVRM dataset does not contain any such information, example code is included below, where model modelAV_treat1 includes an intercept for treat for each level of ecosystem function and time point, while modelAV_treat2 crosses treat with each other fixed effect term. ```{r DImulti_modelADD_treat, eval=FALSE} modelAV_treat1 <- DImulti(y = c("Y1", "Y2", "Y3"), eco_func = c("NA", "UN"), time = c("time", "CS"), unit_IDs = 1, prop = 2:5, data = simMVRM, DImodel = "ADD", method = "ML", theta = thetaVals, extra_fixed = ~treat) modelAV_treat2 <- DImulti(y = c("Y1", "Y2", "Y3"), eco_func = c("NA", "UN"), time = c("time", "CS"), unit_IDs = 1, prop = 2:5, data = simMVRM, DImodel = "ADD", method = "ML", theta = thetaVals, extra_fixed = ~1:treat) ``` The model could also alternatively be simplified here, by adding ID grouping, i.e. introducing functional redundancy within species ID effects. As simMVRM is simulated with no ID grouping in mind, any groups selected would be arbitrary and thus not recommended, however, example code for doing so is included below. These groupings do not affect the interaction term(s). ```{r DImulti_modelAV_ID} modelAV_ID <- DImulti(y = c("Y1", "Y2", "Y3"), eco_func = c("NA", "UN"), time = c("time", "CS"), unit_IDs = 1, prop = 2:5, data = simMVRM, DImodel = "AV", method = "ML", theta = thetaVals, ID = c("Group1", "Group1", "Group2", "Group2")) summary(modelAV_ID)$tTable anova(modelAV_ID, modelAV) ``` As the p-value is lower than 0.05, we reject the null hypothesis and continue with the more complex model, without any ID groupings. If any errors are encountered in the process of fitting these models, please see vignette [commonErrors](DImulti_commonErrors.html) for a guide.

Final model

As we have concluded our model selection process, the final step before interpretation is to refit the chosen model design using the method "REML", for unbiased estimates. ```{r DImulti_modelFinal} modelFinal <- DImulti(y = c("Y1", "Y2", "Y3"), eco_func = c("NA", "UN"), time = c("time", "CS"), unit_IDs = 1, prop = 2:5, data = simMVRM, DImodel = "AV", method = "REML", theta = thetaVals) summary(modelFinal) ```

Interpretation

The final and most important step in the workflow of any analysis is the interpretation of the final model. We first look at the fixed effect coefficients, their significance and direction. For example, we can see that the identity effect of species p1 has a significant, negative effect on ecosystem function Y1 at time point 1, but a significant positive effect on ecosystem function Y3 at the same time point. We can also see that the interaction effect AV has a significant positive effect on all ecosystem functions at both time points. As this interaction term is maxmimised in an ecosystem design where all four species are represented evenly, it may be beneficial to include species p1 despite its negative effect on Y1, the presence of other species, which have a positive effect on this ecosystem function could also 'make up' for this fault of species p1 (assuming that our goal is the maximisation of all ecosystem functions). ```{r DImulti_modelFinal_tTable} summary(modelFinal)$tTable ``` Next we could examine the variance covariance matrices of the model. These can be extracted from the model by either using the function getVarCov() from the R package nlme or by using the $ operator (the components of a model in R can be viewed using View(model)). ```{r DImulti_modelFinal_varCovs, eval=FALSE} nlme::getVarCov(modelFinal) modelFinal$vcov ``` ```{r DImulti_modelFinal_getVarCov, echo=FALSE} nlme::getVarCov(modelFinal) ``` An example of an interpretation that we can make from these variance covariance matrices, is that the ecosystem functions Y1 and Y3 are negatively correlated, therefore would likely result in trade-offs when trying to estimated one over the other. As multivariate repeated measures DI models can contain a large number of fixed effects, they can be difficult to interpret from. Our suggestions to circumvent this issue is to first predict from the model for community designs of interest, using the predict() function, as seen below, see vignette [prediction](DImulti_prediction.html) for further details on this. ```{r DImulti_modelFinal_predict} comms <- simMVRM[c(1, 4, 7, 10, 21), ] print(comms) commPred <- predict(modelFinal, newdata = comms) commPred ``` Then, with these predictions, we can employ summarising metrics, such as a multifunctionality index, to summarise our findings while maintaining the benefits of having modelled each ecosystem function separately but simultaneously, alla Suter *et al* 2021. We could also use these predictions to generate a plot/graph to display our findings, such as a histogram, grouped bar chart, or ternary diagram. ```{r DImulti_modelFinal_grouped, out.width="75%"} ggplot2::ggplot(commPred, ggplot2::aes(fill=Ytype, y=Yvalue, x=plot)) + ggplot2::geom_bar(position="dodge", stat="identity") ``` A new R package, DImodelsVis, is in development, which will make plotting DI models fit using the R packages DImodels or DImodelsMulti a simple process.

References

Suter, M., Huguenin-Elie, O. and Lüscher, A., 2021. Multispecies for multifunctions: combining four complementary species enhances multifunctionality of sown grassland. Scientific Reports, 11(1), p.3835. Kirwan, L., Connolly, J., Finn, J.A., Brophy, C., Lüscher, A., Nyfeler, D. and Sebastià, M.T., 2009. Diversity-interaction modeling: estimating contributions of species identities and interactions to ecosystem function. Ecology, 90(8), pp.2032-2038.