modelsearch2: Data-driven Extension of a Latent Variable Model

Description

Procedure adding relationship between variables that are supported by the data.

Usage

modelsearch2(
  object,
  link,
  data,
  method.p.adjust,
  method.maxdist,
  n.sample,
  na.omit,
  alpha,
  nStep,
  trace,
  cpus
)
# S3 method for lvmfit
modelsearch2(
  object,
  link = NULL,
  data = NULL,
  method.p.adjust = "fastmax",
  method.maxdist = "approximate",
  n.sample = 1e+05,
  na.omit = TRUE,
  alpha = 0.05,
  nStep = NULL,
  trace = TRUE,
  cpus = 1
)

Value

A list containing:

sequenceTest: the sequence of test that has been performed.
sequenceModel: the sequence of models that has been obtained.
sequenceQuantile: the sequence of rejection threshold. Optional.
sequenceIID: the influence functions relative to each test. Optional.
sequenceSigma: the covariance matrix relative to each test. Optional.
initialModel: the model before the sequential search.
statistic: the argument statistic.
method.p.adjust: the argument method.p.adjust.
alpha: [numeric 0-1] the significance cutoff for the p-values.
cv: whether the procedure has converged.

Arguments

object: a lvmfit object.
link: [character, optional for lvmfit objects] the name of the additional relationships to consider when expanding the model. Should be a vector containing strings like "Y~X". See the details section.
data: [data.frame, optional] the dataset used to identify the model
method.p.adjust: [character] the method used to adjust the p.values for multiple comparisons. Can be any method that is valid for the stats::p.adjust function (e.g. "fdr"). Can also be "max", "fastmax", or "gof".
method.maxdist: [character] the method used to estimate the distribution of the max statistic. "resampling" resample the score under the null to estimate the null distribution. "bootstrap" performs a wild bootstrap of the iid decomposition of the score to estimate the null distribution. "approximate" attemps to identify the latent gaussian variable corresponding to each score statistic (that is chi-2 distributed). It approximates the correlation matrix between these latent gaussian variables and uses numerical integration to compute the distribution of the max.
n.sample: [integer, >0] number of samples used in the resampling approach.
na.omit: should tests leading to NA for the test statistic be ignored. Otherwise this will stop the selection process.
alpha: [numeric 0-1] the significance cutoff for the p-values. When the p-value is below, the corresponding link will be added to the model and the search will continue. Otherwise the search will stop.
nStep: the maximum number of links that can be added to the model.
trace: [logical] should the execution of the function be traced?
cpus: the number of cpus that can be used for the computations.

Details

method.p.adjust = "max" computes the p-values based on the distribution of the max statistic. This max statistic is the max of the square root of the score statistic. The p-value are computed integrating the multivariate normal distribution.

method.p.adjust = "fastmax" only compute the p-value for the largest statistic. It is faster than "max" and lead to identical results.

method.p.adjust = "gof" keep adding links until the chi-squared test (of correct specification of the covariance matrix) is no longer significant.

Examples

Run this code


## simulate data 
mSim <- lvm()
regression(mSim) <- c(y1,y2,y3,y4)~u
regression(mSim) <- u~x1+x2
categorical(mSim,labels=c("A","B","C")) <- "x2"
latent(mSim) <- ~u
covariance(mSim) <- y1~y2
transform(mSim, Id~u) <- function(x){1:NROW(x)}

set.seed(10)
df.data <- lava::sim(mSim, n = 1e2, latent = FALSE)

## only identifiable extensions
m <- lvm(c(y1,y2,y3,y4)~u)
latent(m) <- ~u
addvar(m) <- ~x1+x2

e <- estimate(m, df.data)

if (FALSE) {
resSearch <- modelsearch(e)
resSearch

resSearch2 <- modelsearch2(e, nStep = 2)
resSearch2
}
# \dontshow{
search.link <- c("u~x1","u~x2","y1~x1","y1~x2","y1~~y2","y1~~y3")
resSearch2 <- modelsearch2(e, nStep = 2, link = search.link)
resSearch2
# }

## some extensions are not identifiable
m <- lvm(c(y1,y2,y3)~u)
latent(m) <- ~u
addvar(m) <- ~x1+x2 

e <- estimate(m, df.data)

if (FALSE) {
resSearch <- modelsearch(e)
resSearch
resSearch2 <- modelsearch2(e)
resSearch2
}

## for instance
mNI <- lvm(c(y1,y2,y3)~u)
latent(mNI) <- ~u
covariance(mNI) <- y1~y2
## estimate(mNI, data = df.data)
## does not converge

Run the code above in your browser using DataLab