sem.fit: Goodness-of-fit tests for piecewise SEM

Description

Tests independence claims and calculates Fisher's C statistic and associated p-value, and AIC and AICc, for a piecewise structural equation model (SEM).

Usage

sem.fit(modelList, data, conditional = FALSE, corr.errors = NULL, add.vars = NULL, 
  grouping.vars = NULL, grouping.fun = mean, adjust.p = FALSE, basis.set = NULL, 
  pvalues.df = NULL, model.control = NULL, .progressBar = TRUE)

Arguments

modelList

a list of regressions representing the structural equation model.

data

a data.frame used to construct the structured equations.

conditional

whether conditional variables should be shown in the independence claim (unless the formula is fewer than 30 characters). Default is FALSE.

corr.errors

a vector of variables with correlated errors (separated by "~~").

add.vars

a vector of additional variables whose independence claims should be evaluated, but which do not appear in the model list.

grouping.vars

an optional variable that represents the levels of data aggregation for a multi-level dataset.

grouping.fun

a function defining how variables are aggregated in grouping.vars. Default is mean.

adjust.p

whether p-values degrees of freedom should be adjusted (see below). Default is FALSE.

basis.set

provide an optional basis set.

pvalues.df

an optional data.frame corresponding to p-values for independence claims.

model.control

a list of model control arguments to be passed to d-sep models.

.progressBar

enable optional text progress bar. Default is TRUE.

Value

Returns a list with the following:

missing.paths

A data.frame where the first column is the independence claim, and the second through sixth columns the model estimates corresponding to the response variable in the independence claim.

fishers.c

A data.frame with the first entry corresponding to Fisher's C statistic, the second corresponding to the Chi-squared test degrees of freedom, and the third corresponding to the outcome (p-value) of the significance test derived from a Chi-squared distribution.

AIC

A data.frame where the first entry is the AIC score, and the second is the AICc score, and the third is the likelihood degrees of freedom (K).

Details

Independence claims are constructed based on how the variables are treated as in the model list. For example, if the indepedence claim includes a binary variable that is fit to a binomial distribution using an identity link, the function will evaluate the any claims using the same parameters.

Similarly, for linear mixed effects models construted in lme4 or nlme, varying slopes and intercepts are treated as in the model list. For example, if a variable is modeled with both a random slope and intercept in any model in the model list, that variable will be modeled with a random slope and intercept when evaluating all independence claims in which it appears. If slopes and intercepts vary for multiple variables, they will appear as such, even if they are conditional.

For models of class lmerMod, denominator degrees of freedom and resulting P-values are calculated using the Kenward-Rogers approximation from the pbkrtest package.

For linear mixed effects models, p-values can be adjusted to accommodate the full model degrees of freedom using the argument p.adjust = TRUE. For more information, see Shipley 2013.

Variables with correlated errors have no direct relationship but rather are hypothesized to be driven by the same underlying factor. This covariance should be reflected as correlated errors (double-headed arrow). Correlated errors are specified using the syntax from the lavaan package: var1 ~~ var2. Variables with correlated errors are ignored in the basis set under the assumption that their correlations will be quantified later using the function sem.coefs.

The argument add.vars requires a vector of character strings corresponding to column names in the dataset used to construct the models in modelList. This is useful if comparing nested SEMs where one wishes to account for additional variables whose independence claims should be evaluated, but which do not have any hypothesized paths in the current SEM. The default assumes there is no additional independence claims that do not appear in the model list.

If the data is hierarchical dataset and one or more responses are identical for each level of a grouping factors -- artificially inflating the degrees of freedom -- users can summarize the dataset for each grouping factor(s) specified in the argument grouping.vars. For example, consider a two-level hierarchy, where variables at the top level have identical values for each level of the grouping variable. If the response is a top level variable (is identically replicated for the grouping variable), this function takes a mean of the lower level variables for each level of the grouping variable, then runs the test of d-separation. If the response is fully replicated (occurs at the lower level), then no aggregation occurs.

References

Shipley, Bill. "Confirmatory path analysis in a generalized multilevel context." Ecology 90.2 (2009): 363-368.

Shipley, Bill. "The AIC model selection method applied to path analytic models compared using a d-separation test." Ecology 94.3 (2013): 560-564.

Examples

Run this code

# NOT RUN {
# Load example data
data(shipley2009)

# Reduce dataset for example
shipley2009.reduced = shipley2009[1:200, ]

# Load model packages
library(lme4)
library(nlme)

# Create list of models 
  shipley2009.reduced.modlist = list(
    
    lme(DD ~ lat, random = ~1|site/tree, na.action = na.omit, 
        data = shipley2009.reduced),
    
    lme(Date ~ DD, random = ~1|site/tree, na.action = na.omit, 
        data = shipley2009.reduced),
    
    lme(Growth ~ Date, random = ~1|site/tree, na.action = na.omit, 
        data = shipley2009.reduced),
    
    glmer(Live ~ Growth+(1|site)+(1|tree), 
          family=binomial(link = "logit"), data = shipley2009.reduced) 
    
  )

# Conduct goodness-of-fit tests
sem.fit(shipley2009.reduced.modlist, shipley2009.reduced)

# }
# NOT RUN {
  # Repeat with full dataset as in Shipley (2009)

  # Create list of models 
  shipley2009.modlist = list(
    
    lme(DD ~ lat, random = ~1|site/tree, na.action = na.omit, 
        data = shipley2009),
    
    lme(Date ~ DD, random = ~1|site/tree, na.action = na.omit, 
        data = shipley2009),
    
    lme(Growth ~ Date, random = ~1|site/tree, na.action = na.omit, 
        data = shipley2009),
    
    glmer(Live ~ Growth+(1|site)+(1|tree), 
          family=binomial(link = "logit"), data = shipley2009) 
    
  )

  # Conduct goodness-of-fit tests
  sem.fit(shipley2009.modlist, shipley2009)
  
  ###
  
  # Repeat but use lme4 to construct all models
  shipley2009.modlist.lme4 = list(
  
    lmer(DD~lat + (1|site/tree), na.action = na.omit,
         data = shipley2009),
  
    lmer(Date~DD + (1|site/tree), na.action = na.omit,
         data = shipley2009),
  
    lmer(Growth~Date + (1|site/tree), na.action = na.omit,
         data = shipley2009),
  
    glmer(Live~Growth+(1|site)+(1|tree),
          family=binomial(link = "logit"), data = shipley2009)
  
  )
  
  sem.fit(shipley2009.modlist.lme4, shipley2009)
  
  # Add new variable with no hypothesized links
  set.seed(1)
  add.var = rnorm(nrow(shipley2009), 50, 20)
  
  # Test for independence
  sem.fit(shipley2009.modlist, shipley2009, add.vars = c("add.var"))
  
  ###
  
  # Add new variable grouped at site level
  set.seed(2)
  
  shipley2009$site.var = rep(rnorm(20, 10, 5), each = 95)
  
  # Add a final model regressing at the level of the grouping variable
  shipley2009.modlist2 = append(
  
    shipley2009.modlist, list(
  
      lm(site.var~Growth, data = aggregate(shipley2009, by = list(site = shipley2009[,"site"]),
                                           mean, na.rm = TRUE)))
  
  )
  
  # Test for independence
  sem.fit(shipley2009.modlist2, shipley2009, grouping.vars = c("site"))
  
  ###
  
  # Fit model from Shipley (2013)
  data(shipley2013)
  
  shipley2013.modlist = list(
  
    lme(x2~x1, random = ~x1 | species, data = shipley2013),
  
    lme(x3~x2, random = ~x2 | species, data = shipley2013),
  
    lme(x4~x2, random = ~x2 | species, data = shipley2013),
  
    lme(x5~x3+x4, random = ~x3+x4 | species, data = shipley2013)
  
  )
  
  sem.fit(shipley2013.modlist, shipley2013) # Convergence error!
  
  # Specify model controls by switching to old optimizer
  sem.fit(shipley2013.modlist, shipley2013, model.control = list(lmeControl(opt = "optim")))
  
  ###
  
  # Compare to output from `ggm` package
  
  # Load library
  library(ggm)
  
  # Load data
  data(marks)
  # Generate direct acyclic graph (DAG)
  dag = DAG(mechanics ~ vectors+algebra,
            vectors ~ algebra,
            statistics ~ algebra+analysis,
            analysis ~ algebra)
  # Run test of directed separation
  shipley.test(dag, cov(marks), n=88)
  
  # Now create list of structured equations using lm()
  modelList = list(
    lm(mechanics ~ vectors+algebra, data = marks),
    lm( vectors ~ algebra, data = marks),
    lm(statistics ~ algebra+analysis, data = marks),
    lm(analysis ~ algebra, data = marks)
  )
  
  # Test model fit
  sem.fisher.c(modelList, marks)
  
# }

Run the code above in your browser using DataLab