# predict

##### Predict indicator scores

Predict the indicator scores of endogenous constructs.

##### Usage

```
predict(
.object = NULL,
.benchmark = c("lm", "unit", "PLS-PM", "GSCA", "PCA", "MAXVAR"),
.cv_folds = 10,
.handle_inadmissibles = c("stop", "ignore", "set_NA"),
.only_common_factors = TRUE,
.r = 10,
.test_data = NULL
)
```

##### Arguments

- .object
An R object of class cSEMResults resulting from a call to

`csem()`

.- .benchmark
Character string. The procedure to obtain benchmark predictions. One of "

*lm*", "*unit*", "*PLS-PM*", "*GSCA*", "*PCA*", or "*MAXVAR*". Default to "*lm*".- .cv_folds
Integer. The number of cross-validation folds to use. Setting

`.cv_folds`

to`N`

(the number of observations) produces leave-one-out cross-validation samples. Defaults to`10`

.- .handle_inadmissibles
Character string. How should inadmissible results be treated? One of "

*stop*", "*ignore*", or "*set_NA*". If "*stop*",`predict()`

will stop immediatly if estimation yields an inadmissible result. For "*ignore*" all results are returned even if all or some of the estimates yielded inadmissible results. For "*set_NA*" predictions based on inadmissible parameter estimates are set to`NA`

.- .only_common_factors
Logical. Should only indicator scores for concepts modeled as common factors be predicted? Defaults to

`TRUE`

.- .r
Integer. The number of repetitions to use. Defaults to

`10`

.- .test_data
A matrix of test data with the same column names as the training data.

##### Details

Predict uses the procedure introduced by Shmueli2016;textualcSEM in the context of
PLS (commonly called: "PLSPredict" Shmueli2019cSEM).
Predict uses k-fold cross-validation to randomly
split the data into training and test data and subsequently predicts the
relevant values in the test data based on the model parameter estimates obtained
using the training data. The number of cross-validation folds is 10 by default but
may be changed using the `.cv_folds`

argument.
By default, the procedure is repeated `.r = 10`

times to avoid irregularities
due to a particular split. See Shmueli2019;textualcSEM for
details.

Alternatively, users may supply a matrix of `.test_data`

with the same column names
as those in the data used to obtain `.object`

(the training data).
In this case, arguments `.cv_folds`

and `.r`

are
ignored and predict uses the estimated coefficients from `.object`

to
predict the values in the columns of `.test_data`

.

In Shmueli2016;textualcSEM PLS-based predictions for indicator `i`

are compared to the predictions based on a multiple regression of indicator `i`

on all available exogenous indicators (`.benchmark = "lm"`

) and
a simple mean-based prediction summarized in the Q2_predict metric.
`predict()`

is more general in that is allows users to compare the predictions
based on a so-called target model/specificiation to predictions based on an
alternative benchmark. Available benchmarks include predictions
based on a linear model, PLS-PM weights, unit weights (i.e. sum scores), GSCA weights, PCA weights, and
MAXVAR weights.

By default, only the indicator scores of
constructs modeled as common factors are predicted (`.only_common_factors = TRUE`

).
While technically possible, prediction for constructs modeled
as composites is conceptually difficult since composites are by design build
by their indicators, i.e., composites are not though of as being predictive of
their indicators.

Each estimation run is checked for admissibility using `verify()`

. If the
estimation yields inadmissible results, `predict()`

stops with an error (`"stop"`

).
Users may choose to `"ignore"`

inadmissible results or to simply set predictions
to `NA`

(`"set_NA"`

) for the particular run that failed.

##### Value

An object of class `cSEMPredict`

with print and plot methods.
Technically, `cSEMPredict`

is a
named list containing the following list elements:

`$Actual`

A matrix of the actual values/indicator scores of the endogenous constructs.

`$Prediction_target`

A matrix of the predicted indicator scores of the endogenous constructs based on the target model. Target refers to

`$Residuals_target`

A matrix of the residuals indicator scores of the endogenous constructs based on the target model.

`$Residuals_lm`

A matrix of the residuals indicator scores of the endogenous constructs based on a linear model in which the indicator scores of endogenous constructs are predicted by exogenous indicator scores. This serves as a benchmark for comparisons.

`$Prediction_metrics`

A data frame containing the predictions metrics MAE, RMSE, and Q2_predict.

`$Information`

A list with elements

`Target`

,`Benchmark`

,`Number_of_observations_training`

,`Number_of_observations_test`

,`Number_of_folds`

,`Number_of_repetitions`

, and`Handle_inadmissibles`

.

##### References

##### See Also

##### Examples

```
# NOT RUN {
### Anime example taken from https://github.com/ISS-Analytics/pls-predict
# Load data
data(Anime) # data is similar to the Anime.csv found on
# https://github.com/ISS-Analytics/pls-predict but with irrelevant
# columns removed
# Split into training and data the same way as it is done on
# https://github.com/ISS-Analytics/pls-predict
set.seed(123)
index <- sample.int(dim(Anime)[1], 83, replace = FALSE)
dat_train <- Anime[-index, ]
dat_test <- Anime[index, ]
# Specify model
model <- "
# Structural model
ApproachAvoidance ~ PerceivedVisualComplexity + Arousal
# Measurement/composite model
ApproachAvoidance =~ AA0 + AA1 + AA2 + AA3
PerceivedVisualComplexity <~ VX0 + VX1 + VX2 + VX3 + VX4
Arousal <~ Aro1 + Aro2 + Aro3 + Aro4
"
# Estimate (replicating the results of the `simplePLS()` function)
res <- csem(dat_train,
model,
.disattenuate = FALSE, # original PLS
.iter_max = 300,
.tolerance = 1e-07,
.PLS_weight_scheme_inner = "factorial"
)
# Predict using a user-supplied training data set
pp <- predict(res, .test_data = dat_test)
pp$Predictions_target[1:6, ]
pp
### Compute prediction metrics ------------------------------------------------
res2 <- csem(Anime, # whole data set
model,
.disattenuate = FALSE, # original PLS
.iter_max = 300,
.tolerance = 1e-07,
.PLS_weight_scheme_inner = "factorial"
)
# Predict using 10-fold cross-validation with 5 repetitions
pp2 <- predict(res, .benchmark = "lm")
pp2
# }
```

*Documentation reproduced from package cSEM, version 0.1.0, License: GPL-3*