start_em: Starting values for parameters

Description

The starting values for parameters used for the EM algorithm in the functions: mult.em_1level, mult.em_2level, mult.reg_1level and mult.reg_2level.

Value

The starting values (in a list) for parameters in the models \(x_{i} = \alpha + \beta z_k + \Gamma v_i + \varepsilon_i\) (Zhang and Einbeck, 2024) and

\(x_{ij} = \alpha + \beta z_k + \Gamma v_{ij} + \varepsilon_{ij}\) (Zhang et al., 2023) used in the four fucntions: mult.em_1level, mult.em_2level, mult.reg_1level and mult.reg_2level.

p: The starting value for the parameter \(\pi_k\), which is a vector of length \(K\).
alpha: The starting value for the parameter \(\alpha\), which is a vector of length \(m\).
z: The starting value for the parameter \(z_k\), which is a vector of length \(K\).
beta: The starting value for the parameter \(\beta\), which is a vector of length \(m\).
gamma: The starting value for the parameter \(\Gamma\), which is a matrix.
sigma: The starting value for the parameter \(\Sigma_k\). When var_fun = 1, \(\Sigma_k\) is a diagonal matrix and \(\Sigma_k = \Sigma\), and we obtain a vector of the diagonal elements; When var_fun = 2, \(\Sigma_k\) is a diagonal matrix, and we obtain K vectors of the diagonal elements; When var_fun = 3, \(\Sigma_k\) is a full variance-covariance matrix, \(\Sigma_k = \Sigma\), and we obtain a matrix \(\Sigma\); When var_fun = 4, \(\Sigma_k\) is a full variance-covariance matrix, and we obtain K different matrices \(\Sigma_k\).

Arguments

data: A data set object; we denote the dimension of a data set to be \(m\).
v: Covariate(s); we denote the dimension of it to be \(r\).
K: Number of mixture components, the default is K = 2.
steps: Number of iterations. This will only be used when using option = 2 for both the 1-level model and the 2-level model. It should also be used when using option = 3 and option = 4 for the 1-level model, provided var_fun is set to either 3 or 4; the default is steps = 20.
option: Four options for selecting the starting values for the parameters. The default is option = 1. When option = 1: \(\pi_k\) = \(\frac{1}{K}\), \(z_k\) ~ rnorm(\(K\), mean = 0, sd=1), \(\alpha\) = column means, \(\beta\) = a random row minus alpha, \(\Gamma\) = coefficient estimates from separate linear models, \(\Sigma\) is diagonal matrix where the diagonals take the value of column standard deviations over \(K\); when option = 2: use a short run (steps = 5) of the EM function which uses option = 1 with var_fun = 1 and use the estimates as the starting values for all the parameters; when option = 3: the starting value of \(\beta\) is the first principal component, and the starting values for the rest of the parameters are the same as described when option = 1; when option = 4: first, take the scores of the first principal component of the data and perform \(K\)-means, \(\pi_k\) is the proportion of the clustering assignments, and \(z_k\) take the values of the \(K\)-means centers, and the starting values for the rest of the parameters are the same as described when option = 1.
var_fun: The four variance specifications. When var_fun = 1, the same diagonal variance specification to all \(K\) components of the mixture; var_fun = 2, different diagonal variance matrices for different components. var_fun = 3, the same full (unrestricted) variance for all components. var_fun = 4, different full (unrestricted) variance matrices for different components. If unspecified, var_fun = 2. Note that for application propose, in two-level models, var_fun can only take values of 1 or 2.
p: optional; specifies starting values for \(\pi_k\), it is input as a \(K\)-dimensional vector.
z: optional; specifies starting values for \(z_k\), it is input as a \(K\)-dimensional vector.
beta: optional; specifies starting values for \(\beta\), it is input as an \(m\)-dimensional vector.
alpha: optional; specifies starting values for \(\alpha\), it is input as an \(m\)-dimensional vector.
sigma: optional; specifies starting values for \(\Sigma_k\) (\(\Sigma\), when var_fun = 1 or var_fun = 3), when var_fun = 1, it is input as an \(m\)-dimensional vector, when var_fun = 2, it is input as a list (of length \(K\)) of \(m\)-dimensional vectors, when var_fun = 3, it is input as an \(m \times m\) matrix, when var_fun = 4, it is input as a list (of length \(K\)) of \(m \times m\) matrices.
gamma: optional; the coefficients for the covariates; specifies starting values for \(\Gamma\), it is input as an \(m \times r\) matrix.

References

Zhang, Y., Einbeck, J. and Drikvandi, R. (2023). A multilevel multivariate response model for data with latent structures. In: Proceedings of the 37th International Workshop on Statistical Modelling, pages 343-348. Link on RG: https://www.researchgate.net/publication/375641972_A_multilevel_multivariate_response_model_for_data_with_latent_structures.

Zhang, Y. and Einbeck, J. (2024). A Versatile Model for Clustered and Highly Correlated Multivariate Data. J Stat Theory Pract 18(5).tools:::Rd_expr_doi("10.1007/s42519-023-00357-0")

Examples

Run this code

##example for the faithful data.
data(faithful)
start <- start_em(faithful, option = 1)

Run the code above in your browser using DataLab