CompareCausalNetworks (version 0.2.6.2)

getParentsStable: Estimate the connectivity matrix of a causal graph using stability selection.

Description

Estimates the connectivity matrix of a directed causal graph, using various possible methods. Supported methods at the moment are ARGES, backShift, bivariateANM, bivariateCAM, CAM, FCI, FCI+, GES, GIES, hiddenICP, ICP, LINGAM, MMHC, rankARGES, rankFci, rankGES, rankGIES, rankPC, regression, RFCI and PC. Uses stability selection to select an appropriate sparseness.

Usage

getParentsStable(
  X,
  environment,
  interventions = NULL,
  EV = 1,
  nodewise = TRUE,
  threshold = 0.75,
  nsim = 100,
  sampleSettings = 1/sqrt(2),
  sampleObservations = 1/sqrt(2),
  parentsOf = 1:ncol(X),
  method = c("ICP", "hiddenICP", "backShift", "pc", "LINGAM", "ges", "gies", "CAM",
    "fci", "rfci", "regression", "bivariateANM", "bivariateCAM")[1],
  alpha = 0.1,
  mode = c("raw", "parental", "ancestral")[1],
  variableSelMat = NULL,
  excludeTargetInterventions = TRUE,
  onlyObservationalData = FALSE,
  indexObservationalData = NULL,
  setOptions = list(),
  verbose = FALSE
)

Arguments

X

A (nxp)-data matrix with n observations of p variables.

environment

A vector of length n, where the entry for observation i is an index for the environment in which observation i took place (simplest case entries 1 for observational data and entries 2 for interventional data of unspecified type). Is required for methods ICP, hiddenICP, backShift.

interventions

A optional list of length n. The entry for observation i is a numeric vector that specifies the variables on which interventions happened for observation i (a scalar if an intervention happened on just one variable and numeric(0) if no intervention occured for this observation). Is used for method gies but will generate the vector environment if this is set to NULL (even though it might generate too many different environments for some data so a hand-picked vector environment is preferable). Is also used for ICP and hiddenICP to exclude interventions on the target variable of interest.

EV

A bound on the expected number of falsely selected edges.

nodewise

If FALSE, stability selection retains for each subsample the largest overall entries in the connectivity matrix. If TRUE, values are ordered row- and node-wise first and then the largest entries in each row and column are retained. Error control is valid (under exchangeability assumption) in both cases. The latter setting TRUE is perhaps more robust and is the default.

threshold

The empirical selection frequency in (0.5,1) under subsampling that needs to be surpassed for an edge to be selected.

nsim

The number of resamples for stability selection.

sampleSettings

The fraction of different environments to resample in each resampling (at least two different environments will be selected so the argument is without effect if there are just two different environments in total).

sampleObservations

The fraction of samples to resample in each environment.

parentsOf

The variables for which we would like to estimate the parents. Default are all variables.

method

A string that specfies the method to use. The methods pc (PC-algorithm), LINGAM (LINGAM), arges (Adaptively restricted greedy equivalence search), ges (Greedy equivalence search), gies (Greedy interventional equivalence search), fci (Fast causal inference) and rfci (Really fast causal inference) are imported from the package "pcalg" and are documented there in more detail, including the additional options that can be supplied via setOptions. The method CAM (Causal additive models) is documented in the package "CAM" and the methods ICP (Invariant causal prediction), hiddenICP (Invariant causal prediction with hidden variables) are from the package "InvariantCausalPrediction". The method backShift comes from the package "backShift". The method mmhc comes from the package "bnlearn". Finally, the methods bivariateANM and bivariateCAM are for now implemented internally but will hopefully be part of another package at some point in the near future.

alpha

The level at which tests are done. This leads to confidence intervals for ICP and hiddenICP and is used internally for pc and rfci.

mode

Output type - can be "raw", "parental" or "ancestral". If "raw" output is the output of the underlying method, without modifications. If "parental" output described parental relations; if "ancestral" output is casted to ancestral relations. #TODO explain further

variableSelMat

An optional logical matrix of dimension (pxp). An entry TRUE for entry (i,j) says that variable i should be considered as a potential parent for variable j and vice versa for FALSE. If the default value of NULL is used, all variables will be considered, but this can be very slow, especially for methods pc, ges, gies, rfci and CAM.

excludeTargetInterventions

When looking for parents of variable k in 1,...,p, set to TRUE if observations where an intervention on variable k occured should be excluded. Default is TRUE.

onlyObservationalData

If set to TRUE, only observational data is used. It will take the index in environment specified by indexObservationalData. If environment is NULL, all observations are used. Default is FALSE.

indexObservationalData

Index in environment that encodes observational data. Default is 1.

setOptions

A list that can take method-specific options; see the individual documentations of the methods for more options and their possible values.

verbose

If TRUE, detailed output is provided.

Value

A sparse matrix, where a 0 entry in (j,k) corresponds to an estimate of 'no edge' j -> parentsOf[k]. Entries between 0 and 100 give the selection percentage of this edge over all resamples (set to 0 if below critical threshold) and all non-zero values are considered as selected edges.

References

Stability selection (2010): N. Meinshausen and P. Buhlmann, Journal of the Royal Statistical Society: Series B, 72, 417-473

See Also

getParents for the underlying point-estimate of the causal graph.