# csem

##### Composite-based SEM

Estimate linear, nonlinear, hierarchical or multigroup structural equation models using a composite-based approach. In cSEM any method or approach that involves linear compounds (scores/proxies/composites) of observables (indicators/items/manifest variables) is defined as composite-based. See the Get started section of the cSEM website for a general introduction to composite-based SEM and cSEM.

##### Usage

```
csem(
.data = NULL,
.model = NULL,
.approach_2ndorder = c("2stage", "mixed"),
.approach_nl = c("sequential", "replace"),
.approach_paths = c("OLS", "2SLS"),
.approach_weights = c("PLS-PM", "SUMCORR", "MAXVAR", "SSQCORR",
"MINVAR", "GENVAR", "GSCA", "PCA",
"unit", "bartlett", "regression"),
.disattenuate = TRUE,
.id = NULL,
.instruments = NULL,
.normality = FALSE,
.reliabilities = NULL,
.starting_values = NULL,
.resample_method = c("none", "bootstrap", "jackknife"),
.resample_method2 = c("none", "bootstrap", "jackknife"),
.R = 499,
.R2 = 199,
.handle_inadmissibles = c("drop", "ignore", "replace"),
.user_funs = NULL,
.eval_plan = c("sequential", "multiprocess"),
.seed = NULL,
.sign_change_option = c("none", "individual", "individual_reestimate",
"construct_reestimate"),
...
)
```

##### Arguments

- .data
A

`data.frame`

or a`matrix`

of standardized or unstandardized data (indicators/items/manifest variables). Additionally, a`list`

of data sets (data frames or matrices) is accepted in which case estimation is repeated for each data set. Possible column types or classes of the data provided are: "`logical`

", "`numeric`

" ("`double`

" or "`integer`

"), "`factor`

" ("`ordered`

" and/or "`unordered`

"), "`character`

" (will be converted to factor), or a mix of several types.- .model
A model in lavaan model syntax or a cSEMModel list.

- .approach_2ndorder
Character string. Approach used for models containing second-order constructs. One of: "

*2stage*", or "*mixed*". Defaults to "*2stage*".- .approach_nl
Character string. Approach used to estimate nonlinear structural relationships. One of: "

*sequential*" or "*replace*". Defaults to "*sequential*".- .approach_paths
Character string. Approach used to estimate the structural coefficients. One of: "

*OLS*" or "*2SLS*". If "*2SLS*", instruments need to be supplied to`.instruments`

. Defaults to "*OLS*".- .approach_weights
Character string. Approach used to obtain composite weights. One of: "

*PLS-PM*", "*SUMCORR*", "*MAXVAR*", "*SSQCORR*", "*MINVAR*", "*GENVAR*", "*GSCA*", "*PCA*", "*unit*", "*bartlett*", or "*regression*". Defaults to "*PLS-PM*".- .disattenuate
Logical. Should composite/proxy correlations be disattenuated to yield consistent loadings and path estimates if at least one of the construct is modeled as a common factor? Defaults to

`TRUE`

.- .id
Character string or integer. A character string giving the name or an integer of the position of the column of

`.data`

whose levels are used to split`.data`

into groups. Defaults to`NULL`

.- .instruments
A named list of vectors of instruments. The names of the list elements are the names of the dependent (LHS) constructs of the structural equation whose explanatory variables are endogenous. The vectors contain the names of the instruments corresponding to each equation. Note that exogenous variables of a given equation

**must**be supplied as instruments for themselves. Defaults to`NULL`

.- .normality
Logical. Should joint normality of \([\eta_{1:p}; \zeta; \epsilon]\) be assumed in the nonlinear model? See Dijkstra2014cSEM for details. Defaults to

`FALSE`

. Ignored if the model is not nonlinear.- .reliabilities
A character vector of

`"name" = value`

pairs, where`value`

is a number between 0 and 1 and`"name"`

a character string of the corresponding construct name, or`NULL`

. Reliabilities may be given for a subset of the constructs. Defaults to`NULL`

in which case reliabilities are estimated by`csem()`

. Currently, only supported for`.approach_weights = "PLS-PM"`

.- .starting_values
A named list of vectors where the list names are the construct names whose indicator weights the user wishes to set. The vectors must be named vectors of

`"indicator_name" = value`

pairs, where`value`

is the (scaled or unscaled) starting weight. Defaults to`NULL`

.- .resample_method
Character string. The resampling method to use. One of: "

*none*", "*bootstrap*" or "*jackknife*". Defaults to "*none*".- .resample_method2
Character string. The resampling method to use when resampling from a resample. One of: "

*none*", "*bootstrap*" or "*jackknife*". For "*bootstrap*" the number of draws is provided via`.R2`

. Currently, resampling from each resample is only required for the studentized confidence intervall ("*CI_t_interval*") computed by the`infer()`

function. Defaults to "*none*".- .R
Integer. The number of bootstrap replications. Defaults to

`499`

.- .R2
Integer. The number of bootstrap replications to use when resampling from a resample. Defaults to

`199`

.- .handle_inadmissibles
Character string. How should inadmissible results be treated? One of "

*drop*", "*ignore*", or "*replace*". If "*drop*", all replications/resamples yielding an inadmissible result will be dropped (i.e. the number of results returned will potentially be less than`.R`

). For "*ignore*" all results are returned even if all or some of the replications yielded inadmissible results (i.e. number of results returned is equal to`.R`

). For "*replace*" resampling continues until there are exactly`.R`

admissible solutions. Depending on the frequency of inadmissible solutions this may significantly increase computing time. Defaults to "*drop*".- .user_funs
A function or a (named) list of functions to apply to every resample. The functions must take

`.object`

as its first argument (e.g.,`myFun <- function(.object, ...) {body-of-the-function}`

). Function output should preferably be a (named) vector but matrices are also accepted. However, the output will be vectorized (columnwise) in this case. See the examples section for details.- .eval_plan
Character string. The evaluation plan to use. One of "

*sequential*" or "*multiprocess*". In the latter case all available cores will be used. Defaults to "*sequential*".- .seed
Integer or

`NULL`

. The random seed to use. Defaults to`NULL`

in which case an arbitrary seed is chosen. Note that the scope of the seed is limited to the body of the function it is used in. Hence, the global seed will not be altered!- .sign_change_option
Character string. Which sign change option should be used to handle flipping signs when resampling? One of "

*none*","*individual*", "*individual_reestimate*", "*construct_reestimate*". Defaults to "*none*".- ...
Further arguments to be passed down to lower level functions of

`csem()`

. See args_csem_dotdotdot for a complete list of available arguments.

##### Details

`csem()`

estimates linear, nonlinear, hierarchical or multigroup structural
equation models using a composite-based approach.

### Data and model:

The `.data`

and `.model`

arguments are required. `.data`

must be given
a `matrix`

or a `data.frame`

with column names matching
the indicator names used in the model description. Alternatively,
a `list`

of data sets (matrices or data frames) may be provided
in which case estimation is repeated for each data set.
Possible column types/classes of the data provided are: "`logical`

",
"`numeric`

" ("`double`

" or "`integer`

"), "`factor`

" ("`ordered`

" and/or "`unordered`

"),
"`character`

", or a mix of several types. Character columns will be treated
as (unordered) factors.

Depending on the type/class of the indicator data provided cSEM computes the indicator
correlation matrix in different ways. See `calculateIndicatorCor()`

for details.

In the current version `.data`

must not contain missing values. Future versions
are likely to handle missing values as well.

To provide a model use the lavaan model syntax.
Note, however, that cSEM currently only supports the "standard" lavaan
model syntax (Types 1, 2, 3, and 7 as described on the help page).
Therefore, specifying e.g., a threshold or scaling factors is ignored.
Alternatively, a standardized (possibly incomplete) cSEMModel-list may be supplied.
See `parseModel()`

for details.

### Weights and path coefficients:

By default weights are estimated using the partial least squares path modeling
algorithm (`"PLS-PM"`

).
A range of alternative weighting algorithms may be supplied to
`.approach_weights`

. Currently, the following approaches are implemented

(Default) Partial least squares path modeling (

`"PLS-PM"`

). The algorithm can be customized. See`calculateWeightsPLS()`

for details.Generalized structured component analysis (

`"GSCA"`

) and generalized structured component analysis with uniqueness terms (GSCAm). The algorithms can be customized. See`calculateWeightsGSCA()`

and`calculateWeightsGSCAm()`

for details. Note that GSCAm is called indirectly when the model contains constructs modeled as common factors only and`.disattenuate = TRUE`

. See below.Generalized canonical correlation analysis (

*GCCA*), including`"SUMCORR"`

,`"MAXVAR"`

,`"SSQCORR"`

,`"MINVAR"`

,`"GENVAR"`

.Principal component analysis (

`"PCA"`

)Factor score regression using sum scores (

`"unit"`

), regression (`"regression"`

) or bartlett scores (`"bartlett"`

)

It is possible to supply starting values for the weighting algorithm
via `.starting_values`

. The argument accepts a named list of vectors where the
list names are the construct names whose indicator weights the user
wishes to set. The vectors must be named vectors of `"indicator_name" = value`

pairs, where `value`

is the starting weight. See the examples section below for details.

Composite-indicator and composite-composite correlations are properly disattenuated by default to yield consistent loadings, construct correlations, and path coefficients if any of the concepts are modeled as a common factor.

For *PLS-PM* disattenuation is done using *PLSc* Dijkstra2015cSEM.
For *GSCA* disattenuation is done implicitly by using *GSCAm* Hwang2017cSEM.
Weights obtained by *GCCA*, *unit*, *regression*, *bartlett* or *PCA* are
disattenuated using Croon's approach Croon2002cSEM.
Disattenuation my be suppressed by setting `.disattenuate = FALSE`

.
Note, however, that quantities in this case are inconsistent
estimates for their construct level counterparts if any of the constructs in
the structural model are modeled as a common factor!

By default path coefficients are estimated using ordinary least squares (`.approach_path = "OLS"`

).
For linear models, two-stage least squares (`"2SLS"`

) is available, however, *only if*
*instruments are internal*, i.e., part of the structural model. Future versions
will add support for external instruments if possible. Instruments must be supplied to
`.instruments`

as a named list where the names
of the list elements are the names of the dependent constructs of the structural
equations whose explanatory variables are believed to be endogenous.
The list consists of vectors of names of instruments corresponding to each equation.
Note that exogenous variables of a given equation **must** be supplied as
instruments for themselves.

If reliabilities are known they can be supplied as `"name" = value`

pairs to
`.reliabilities`

, where `value`

is a numeric value between 0 and 1.
Currently, only supported for "PLS-PM".

### Nonlinear models:

If the model contains nonlinear terms `csem()`

estimates a polynomial structural equation model
using a non-iterative method of moments approach described in
Dijkstra2014;textualcSEM. Nonlinear terms include interactions and
exponential terms. The latter is described in model syntax as an
"interaction with itself", e.g., `xi^3 = xi.xi.xi`

. Currently only exponential
terms up to a power of three (e.g., three-way interactions or cubic terms) are allowed.

The current version of the package allows two kinds of estimation:
estimation of the reduced form equation (`.approach_nl = "replace"`

) and
sequential estimation (`.approach_nl = "sequential"`

, the default). The latter does not
allow for multivariate normality of all exogenous variables, i.e.,
the latent variables and the error terms.

Distributional assumptions are kept to a minimum (an i.i.d. sample from a population with finite moments for the relevant order); for higher order models, that go beyond interaction, we work in this version with the assumption that as far as the relevant moments are concerned certain combinations of measurement errors behave as if they were Gaussian. For details see: Dijkstra2014;textualcSEM.

### Second-order model

Second-order models are specified using the operators `=~`

and `<~`

. These
operators are usually used with indicators on their right-hand side. For
second-order models the right-hand side variables are constructs instead.
If c1, and c2 are constructs forming or measuring a higher order
construct, a model would look like this:

my_model <- " # Structural model SAT ~ QUAL VAL ~ SAT

# Measurement/composite model QUAL =~ qual1 + qual2 SAT =~ sat1 + sat2

c1 =~ x11 + x12 c2 =~ x21 + x22

# Second-order term (in this case a second-order composite build by common # factors) VAL <~ c1 + c2 " Currently, two approaches are explicitly implemented:

(Default)

`"2stage"`

. The (disjoint) two stage approach as proposed by Agarwal2000;textualcSEM.`"mixed"`

. The mixed repeated indicators/two-stage approach as proposed by Ringle2012;textualcSEM.

The repeated indicators approach as proposed by Joereskog1982b;textualcSEM
and the extension proposed by Becker2012;textualcSEM are
not directly implemented as they simply require a respecification of the model.
In the above example the repeated indicators approach
would require to change the model and to append the repeated indicators to
the data supplied to `.data`

. Note that the indicators need to be renamed in this case as
`csem()`

does not allow for one indicator to be attached to multiple constructs.

my_model <- " # Structural model SAT ~ QUAL VAL ~ SAT

VAL ~ c1 + c2

# Measurement/composite model QUAL =~ qual1 + qual2 SAT =~ sat1 + sat2 VAL =~ x11_temp + x12_temp + x21_temp + x22_temp

c1 =~ x11 + x12
c2 =~ x21 + x22
"
According to the extended approach indirect effects of `QUAL`

on `VAL`

via `c1`

and `c2`

would have to be specified as well.

### Multigroup analysis

To perform multigroup analysis provide either a list of data sets or one
data set containing a group-identifier-column whose column
name must be provided to `.id`

. Values of this column are taken as levels of a
factor and are interpreted as group
identifiers. `csem()`

will split the data by levels of that column and run
the estimation for each level separately. Note that the more levels
the group-identifier-column has, the more estimation runs are required.
This can considerably slow down estimation, especially if resampling is
requested. For the latter it will generally be faster to use
`.eval_plan = "multiprocess"`

.

### Inference:

Inference is done via resampling. See `resamplecSEMResults()`

and `infer()`

for details.

##### Value

An object of class `cSEMResults`

with methods for all postestimation generics.
Technically, a call to `csem()`

results in an object with at least
two class attributes. The first class attribute is always `cSEMResults`

.
The second is one of `cSEMResults_default`

, `cSEMResults_multi`

, or
`cSEMResults_2ndorder`

and depends on the estimated model and/or the type of
data provided to the `.model`

and `.data`

arguments. The third class attribute
`cSEMResults_resampled`

is only added if resampling was conducted.
For a details see the cSEMResults helpfile .

##### Postestimation

`assess()`

Assess results using common quality criteria, e.g., reliability, fit measures, HTMT, R2 etc.

`infer()`

Calculate common inferential quantities, e.g., standard errors, confidence intervals.

`predict()`

Predict endogenous indicator scores and compute common prediction metrics.

`summarize()`

Summarize the results. Mainly called for its side-effect the print method.

`verify()`

Verify/Check admissibility of the estimates.

Tests are performed using the test-family of functions. Currently the following tests are implemented:

`testOMF()`

Bootstrap-based test for overall model fit based on Beran1985;textualcSEM

`testMICOM()`

Permutation-based test for measurement invariance of composites proposed by Henseler2016;textualcSEM

`testMGD()`

Several (mainly) permutation-based tests for multi-group comparisons.

`testHausman()`

Regression-based Hausman test to test for endogeneity.

Other miscellaneous postestimation functions belong do the do-family of functions. Currently two do functions are implemented:

`doFloodlightAnalysis()`

Perform a floodlight analysis as described in Spiller2013;textualcSEM

`doRedundancyAnalysis()`

Perform a redundancy analysis (RA) as proposed by Hair2016;textualcSEM with reference to Chin1998;textualcSEM

##### References

##### See Also

`args_default()`

, cSEMArguments, cSEMResults, `foreman()`

, `resamplecSEMResults()`

,
`assess()`

, `infer()`

, `predict()`

, `summarize()`

, `verify()`

, `testOMF()`

,
`testMGD()`

, `testMICOM()`

, `testHausman()`

##### Examples

```
# NOT RUN {
# ===========================================================================
# Basic usage
# ===========================================================================
### Linear model ------------------------------------------------------------
# Most basic usage requires a dataset and a model. We use the
# `threecommonfactors` dataset.
## Take a look at the dataset
#?threecommonfactors
## Specify the (correct) model
model <- "
# Structural model
eta2 ~ eta1
eta3 ~ eta1 + eta2
# (Reflective) measurement model
eta1 =~ y11 + y12 + y13
eta2 =~ y21 + y22 + y23
eta3 =~ y31 + y32 + y33
"
## Estimate
res <- csem(threecommonfactors, model)
## Postestimation
verify(res)
summarize(res)
assess(res)
# Notes:
# 1. By default no inferential quantities (e.g. Std. errors, p-values, or
# confidence intervals) are calculated. Use resampling to obtain
# inferential quantities. See "Resampling" in the "Extended usage"
# section below.
# 2. `summarize()` prints the full output by default. For a more condensed
# output use:
print(summarize(res), .full_output = FALSE)
## Dealing with endogeneity -------------------------------------------------
# See: ?testHausman()
### Models containing second constructs--------------------------------------
## Take a look at the dataset
#?dgp_2ndorder_cf_of_c
model <- "
# Path model / Regressions
c4 ~ eta1
eta2 ~ eta1 + c4
# Reflective measurement model
c1 <~ y11 + y12
c2 <~ y21 + y22 + y23 + y24
c3 <~ y31 + y32 + y33 + y34 + y35 + y36 + y37 + y38
eta1 =~ y41 + y42 + y43
eta2 =~ y51 + y52 + y53
# Composite model (second order)
c4 =~ c1 + c2 + c3
"
res_2stage <- csem(dgp_2ndorder_cf_of_c, model, .approach_2ndorder = "2stage")
res_mixed <- csem(dgp_2ndorder_cf_of_c, model, .approach_2ndorder = "mixed")
# The standard repeated indicators approach is done by 1.) respecifying the model
# and 2.) adding the repeated indicators to the data set
# 1.) Respecify the model
model_RI <- "
# Path model / Regressions
c4 ~ eta1
eta2 ~ eta1 + c4
c4 ~ c1 + c2 + c3
# Reflective measurement model
c1 <~ y11 + y12
c2 <~ y21 + y22 + y23 + y24
c3 <~ y31 + y32 + y33 + y34 + y35 + y36 + y37 + y38
eta1 =~ y41 + y42 + y43
eta2 =~ y51 + y52 + y53
# c4 is a common factor measured by composites
c4 =~ y11_temp + y12_temp + y21_temp + y22_temp + y23_temp + y24_temp +
y31_temp + y32_temp + y33_temp + y34_temp + y35_temp + y36_temp +
y37_temp + y38_temp
"
# 2.) Update data set
data_RI <- dgp_2ndorder_cf_of_c
coln <- c(colnames(data_RI), paste0(colnames(data_RI), "_temp"))
data_RI <- data_RI[, c(1:ncol(data_RI), 1:ncol(data_RI))]
colnames(data_RI) <- coln
# Estimate
res_RI <- csem(data_RI, model_RI)
summarize(res_RI)
### Multigroup analysis -----------------------------------------------------
# See: ?testMGD()
# ===========================================================================
# Extended usage
# ===========================================================================
# `csem()` provides defaults for all arguments except `.data` and `.model`.
# Below some common options/tasks that users are likely to be interested in.
# We use the threecommonfactors data set again:
model <- "
# Structural model
eta2 ~ eta1
eta3 ~ eta1 + eta2
# (Reflective) measurement model
eta1 =~ y11 + y12 + y13
eta2 =~ y21 + y22 + y23
eta3 =~ y31 + y32 + y33
"
### PLS vs PLSc and disattenuation
# In the model all concepts are modeled as common factors. If
# .approach_weights = "PLS-PM", csem() uses PLSc to disattenuate composite-indicator
# and composite-composite correlations.
res_plsc <- csem(threecommonfactors, model, .approach_weights = "PLS-PM")
res$Information$Model$construct_type # all common factors
# To obtain "original" (inconsistent) PLS estimates use `.disattenuate = FALSE`
res_pls <- csem(threecommonfactors, model,
.approach_weights = "PLS-PM",
.disattenuate = FALSE
)
s_plsc <- summarize(res_plsc)
s_pls <- summarize(res_pls)
# Compare
data.frame(
"Path" = s_plsc$Estimates$Path_estimates$Name,
"Pop_value" = c(0.6, 0.4, 0.35), # see ?threecommonfactors
"PLSc" = s_plsc$Estimates$Path_estimates$Estimate,
"PLS" = s_pls$Estimates$Path_estimates$Estimate
)
### Resampling --------------------------------------------------------------
# }
# NOT RUN {
## Basic resampling
res_boot <- csem(threecommonfactors, model, .resample_method = "bootstrap", .R = 40)
res_jack <- csem(threecommonfactors, model, .resample_method = "jackknife")
# See ?resamplecSEMResults for more examples
### Choosing a different weightning scheme ----------------------------------
res_gscam <- csem(threecommonfactors, model, .approach_weights = "GSCA")
res_gsca <- csem(threecommonfactors, model,
.approach_weights = "GSCA",
.disattenuate = FALSE
)
s_gscam <- summarize(res_gscam)
s_gsca <- summarize(res_gsca)
# Compare
data.frame(
"Path" = s_gscam$Estimates$Path_estimates$Name,
"Pop_value" = c(0.6, 0.4, 0.35), # see ?threecommonfactors
"GSCAm" = s_gscam$Estimates$Path_estimates$Estimate,
"GSCA" = s_gsca$Estimates$Path_estimates$Estimate
)
### Fine-tuning a weighting scheme ------------------------------------------
## Setting starting values
sv <- list("eta1" = c("y12" = 10, "y13" = 4, "y11" = 1))
res <- csem(threecommonfactors, model, .starting_values = sv)
## Choosing a different inner weighting scheme
#?args_csem_dotdotdot
res <- csem(threecommonfactors, model, .PLS_weight_scheme_inner = "factorial",
.PLS_ignore_structural_model = TRUE)
## Choosing different modes for PLS
# By default, concepts modeled as common factors uses PLS Mode A weights.
modes <- list("eta1" = "unit", "eta2" = "modeB", "eta3" = "unit")
res <- csem(threecommonfactors, model, .PLS_modes = modes)
summarize(res)
# }
```

*Documentation reproduced from package cSEM, version 0.1.0, License: GPL-3*