lslSEM-class: A Reference Class for Learning a SEM model via penalized likelihood.

Description

A Reference Class for Learning a SEM model via penalized likelihood.

Arguments

Fields

$data

A N x P data frame contains responses of N observations on P observed variables. In the present version, all variables must be numerical and missing value should be set as NA. Only complete observations will be used for further analysis

$pattern

A list of matrices to represent the pattern of a specified SEM model. $pattern contains six matrices with element either 0, 1, or -1 to represent fixed, free, or penalized parameter, respectively. The six matrices are

Ldp: a P x M pattern matrix for factor loadings. Ldp must be given and no default value will be generated through method check()
Psp: a P x P pattern matrix for measurement error covariances. In the present framework, the diagonal element of Psp can only be set as 1, i.e., measurement error variance must be freely estimated. The default value of Psp is the P x P identity matrix.
Btp: a M x M pattern matrix for path coefficients. The default value of Psp is the P x P zero matrix.
Php: a M x M pattern matrix for latent factors or residuals. In the present framework, the diagonal element of Php can only be set as 1, i.e., residual variance must be freely estimated. The default value of Php is the M x M identity matrix.
nup: a P x 1 pattern matrix for intercepts of observed variables. The default value of nup is the P x 1 matrix with all elements being one.
app: a M x 1 pattern matrix for intercepts of latent factors. The default value of app is the M x 1 matrix with all elements being zero.

$value

A list of matrices to specify the starting value and fixed value of a specified SEM model. value also contains six matrices corresponding to pattern

Ld: a P x M matrix for the starting value and fixed value of factor loadings. Ld must be given and no default value will be generated through method check()
Ps: a P x P matrix for the starting value and fixed value of measurement error covariances.
Bt: a M x M matrix for the starting value and fixed value of path coefficients.
Ph: a M x M matrix for the starting value and fixed value of latent factor or residual covariances.
nu: a P x 1 matrix for the starting value and fixed value of intercepts of observed variables.
ap: a M x 1 matrix for the starting value of intercepts of latent factors.

An element in a matrix of value represents a stating value if the corresponding element in matrix of pattern is 1 or -1; otherwise, this element represents a fixed value.

$penalty

A list of vectors to specify penalization related parameters. penalty contains three elements:

type: a string vector to specify the implemented penalty function. Three penalty functions can be implemented: "l1", "scad", and "mcp". The default penalty is l1.
gm_all: a numeric vector to specify a candidate set of regularization parameter gamma. The default value is gm_all = seq(0.01, 0.1, 0.01).
dt_all: a numeric vector to specify a candidate set of delta. The default value is
- dt_all = Inf if type = "l1"
- dt_all = c(3, 4) if type = "scad"
- dt_all = c(2, 3) if type = "mcp"

$control

A list of numerical values to specify optimization related parameters. control contains two elements:

itmax: a numeric value to specify the maximal number of iterations. The default value is 500.
eps: a numeric value to specify the convergence criterion. The default value is 10^-5.

$analysis_info

A list of numerical values to represent analysis related numbers. analysis_info includes four elements:

N: the number of sample size.
P: the number of observed variables.
M: the number of latent factors.
Qall: the number of total free and penalized parameters.

Each element value of analysis_info will be automatically assigned when learn() is executed.

$obs_moment

A list of sample moment created by excuting learn().

Sg: a P x P sample covariance matrix.
mu: a P x 1 sample mean vector.

$learn_summary

A 3-dimensional array containing the overall analysis result created by executing method learn().

$learn_theta

A 3-dimensional array containing the parameter estimates created by executing method learn().

$fit_summary

A matrix containing the overall model information and the values of goodness-of-fit indices obtained by method fit().

$fit_theta

A Qall x 1 matrix containing parameter estimates obtained through method fit().

$fit_value

A list of estimated parameter matrices obtained by fit().

Ld: a P x M estimated factor loading matrix.
Ps: a P x P estimated measurement error covariance matrix.
Bt: a M x M estimated path coefficient matrix.
Ph: a M x M estimated residual covariance matrix
nu: a P x 1 estimated observed variable intercept
ap: a M x 1 estimated latent factor intercept

$fit_moment

A list of model implied moments obtained by fit().

Sg: a P x P estimated model implied covariance.
mu: a P x 1 estimated model implied mean.

Methods

check(): Method check() checks the correctness of specifying pattern, value, penalty, and control. If possible, check() also generates default values for the unspecified field elements.
fit(criterion): Method fit() fits a SEM model given the criterion for selecting optimal gm (gamma) and dt (delta). Argument criterion can be 'dml' (likelihood), 'aic', or 'bic'. The default value of criterion is 'bic'. The final model information and goodness-of-fit will be stored in the field fit_summary. The final parameter estimate will be stored in the field fit_theta. The estimated parameter matrix will be stored in the field fit_value. The model implied moments will be stored in the field fit_moment.
learn(): Method learn() calculates PL estimates under each combination of gm (gamma) and dt (delta) in the field gm_all and dt_all respectively. The overall model information will be stored in the field learn_summary and the parameter estimate will be stored in the field learn_theta.
plot_path(mat_name, row_idx, col_idx): Method plot_path() draws a plot of solution path given mat_name, row_idx, and col_idx. Argument mat_name must be 'Ld', 'Ps', 'Bt', 'Ph', 'nu', or 'ap'.
plot_validation(): Method plot_validation() draws a plot for likelihood, aic, and bic under values in gm_all and dt_all

Examples

Run this code

#Example 1: Factor Analysis Model#
#create a P x M population factor loading matrix#
Ld0 <- diag(1, 4) %x% matrix(c(.8, .65, .75, .65, .8), 5, 1)
Ld0[2, 2] = Ld0[7, 3] = Ld0[12, 4] = Ld0[17, 1] = .5
Ld0[9, 1] = Ld0[14, 2] = Ld0[19, 3] = Ld0[4, 4] = .5

#create a M x M population factor covariance matrix#
Ph0 <- 0.3 * matrix(1, 4, 1) %*% matrix(1, 1, 4) + diag( .7, 4)

#create a P x P population covariance matrix#
Sg0 <- Ld0 %*% Ph0 %*% t(Ld0)
diag(Sg0) <- 1

#create a P x P population measurement error covariance matrix#
Ps0 <- Sg0 - Ld0 %*% Ph0 %*% t(Ld0)

#create a P x M pattern matrix for factor loadings#
Ldp <- 1*(Ld0!=0)
Ldp[Ld0 < 0.6] = -1
Ldp[1, 1] = Ldp[6, 2] = Ldp[11, 3] = Ldp[16, 4] = 0

#create a M x M pattern matrix for factor covariance#
Php <- matrix(1, 4, 4)

#specify field pattern, value, and penalty#
pattern <- list(Ldp = Ldp, Php = Php)
value <- list(Ld = Ld0, Ps = Ps0, Ph = Ph0)
penalty <- list(type = "mcp", gm_all = seq(0.03, .12, .03), dt_all = 1.5)

#generate data with N = 400 and P = 20#
Z <- matrix(rnorm(400 * 20, 0, 1), 400, 20)
Y <- Z %*% eigen(Sg0)$vectors %*% diag(sqrt(eigen(Sg0)$values)) %*% t(eigen(Sg0)$vectors)
Y <- as.data.frame(Y)

#create lslSEM object#
rc_sem <- lsl:::lslSEM(data = Y, pattern = pattern, value = value, penalty = penalty)

#check the specification through method check()#
rc_sem$check()

#obtain the estimates under each pair of gamma and dt through method learn()#
rc_sem$learn()

#obtain the final model based on bic through method fit()#
rc_sem$fit(criterion = "bic")

#see overall model information and fit indices of final model#
rc_sem$fit_summary

#see estimated Ld of final model#
rc_sem$fit_value$Ld

#see the plot for likelihood, aic, and bic under the given gamma and delta values#
rc_sem$plot_validation()

#################################################################################

#Example 2: A general SEM Model#
#create a P x M population factor loading matrix#
Ld0 <- diag(1, 9) %x% matrix(c(.8,.75,.8), 3, 1)

#create a M x M population path coefficients matrix#
Bt0 <- matrix(0, 9, 9)
Bt0[2, 1] = Bt0[3, 2] = Bt0[5, 4] = Bt0[6, 5] = Bt0[8, 7] = Bt0[9, 8] =.45
Bt0[4, 1] = Bt0[5, 2] = Bt0[6, 3] = Bt0[7, 4] = Bt0[8, 5] = Bt0[9, 6]= .55
Bt0iv <- solve(diag(1, 9) - Bt0)

#create a M x M population residual covariance matrix#
Ph0 <- diag(0, 9)
Ph0[1, 1] <- 1
for (m in 2:9) {Ph0[m, m] <- (1 - sum((Bt0iv[m,] ^ 2) * diag(Ph0)))}

#create a P x P population measurement error matrix#
Ps0 <- diag(c(.36, 0.4375, .36), 27)
#create a P x M population covariance matrix#
Sg0 <- Ld0 %*% Bt0iv %*% Ph0 %*% t(Bt0iv) %*% t(Ld0) + Ps0

#create a P x M pattern matrix for factor loadings#
Ldp <- (Ld0 != 0)
Ldp[1, 1] = Ldp[4, 2] = Ldp[7, 3] = Ldp[10, 4] = Ldp[13, 5] = 0
Ldp[16, 6] = Ldp[19, 7] = Ldp[22, 8] = Ldp[25, 9] = 0

#create a P x M pattern matrix for path coefficients#
Btp <- matrix(0, 9, 9)
Btp[lower.tri(Btp)] <- -1

#specify field pattern, value, and penalty#
pattern <- list(Ldp = Ldp, Btp = Btp)
value <- list(Ld = Ld0, Bt = Bt0)
penalty <- list(type = "mcp", gm_all = seq(0.03, .12, .03), dt_all = 2)

#generate data with N = 400 and P = 27#
Z <- matrix(rnorm(400 * 27, 0, 1), 400, 27)
Y <- Z %*% eigen(Sg0)$vectors %*% diag(sqrt(eigen(Sg0)$values)) %*% t(eigen(Sg0)$vectors)
Y <- as.data.frame(Y)

#create lslSEM object#
rc_sem <- lslSEM(data = Y, pattern = pattern, value = value, penalty = penalty)

#check the specification through method check()#
rc_sem$check()

#obtain the estimates under each pair of gamma and dt through method learn()#
rc_sem$learn()

#obtain the final model based on bic through method fit()#
rc_sem$fit(criterion = "bic")

#see overall model information and fit indices of final model#
rc_sem$fit_summary

#see estimated Bt of final model#
rc_sem$fit_value$Bt

#see the solution path parameters in Bt#
rc_sem$plot_path(mat_name = "Bt")

Run the code above in your browser using DataLab