Learn R Programming

pcalg (version 1.0-0)

pcSelect.presel: PC-Select preselection: Estimate subgraph around a response variable using preselection

Description

This function uses pcSelect to preselect some covariates and then runs pcSelect again on the reduced data set.

Usage

pcSelect.presel(y, dm, alpha, alphapre, corMethod = "standard", verbose = 0, directed=FALSE)

Arguments

y
Response Vector (length(y)=nrow(dm))
dm
Data matrix (rows: samples, cols: nodes)
alpha
Significance level of individual partial correlation tests
alphapre
Significance level for pcSelect in preselection
corMethod
"standard" or "Qn" for standard or robust correlation estimation
verbose
0-no output, 1-small output, 2-details (using 1 and 2 makes the function very much slower)
directed
Boolean; should the output graph be directed?

Value

  • pcsA boolean vector indicating which column of dm is associated with y
  • zMinThe minimal z-values when testing partial correlations between y and each column of dm. The larger the number, the more consistent is the edge with the data.
  • XnewPreselected Variables.

Details

This function basically applies pcAlgo on the data matrix obtained by joining y and dm. Since the output is not concerned with the edges found within the columns of dm, the algorithm is adapted accordingly. Therefore, the runtime and the ability to deal with large datasets is typically increased quite a lot.

First, pcSelect is run using alphapre. Then, only the important variables are kept and pcSelect is run on them again.

References

P. Spirtes, C. Glymour and R. Scheines (2000) Causation, Prediction, and Search, 2nd edition, The MIT Press.

See Also

pcAlgo which is the more general version of this function.

Examples

Run this code
p <- 10
## generate and draw random DAG :
set.seed(101)
myDAG <- randomDAG(p, prob = 0.2)
plot(myDAG, main = "randomDAG(10, prob = 0.2)")

## generate 1000 samples of DAG using standard normal error distribution
n <- 1000
d.mat <- rmvDAG(n, myDAG, errDist = "normal")

## let's pretend that the 10th column is the response and the first 9
## columns are explanatory variable. Which of the first 9 variables
## "cause" the tenth variable?
y <- d.mat[,10]
dm <- d.mat[,-10]
res <- pcSelect.presel(d.mat[,10],d.mat[,-10],alpha=0.05,alphapre=0.6)

Run the code above in your browser using DataLab