Sparse.graph: Graphic Modeling Using LASSO-Type Sparse Learning Algorithm.

Description

This function builds a gaussian or binary graph based on the bootstrap ranking LASSO regression method.

Usage

Sparse.graph(x, graph.type = c("gaussian"), B = 5, Boots = 100, edge.rule = c("AND"),  kfold = 10, plot = TRUE, seed = 0123)

Arguments

input matrix. The dimension of the matrix is nobs x nvars; each row is a vector of observations of the variables. Gaussian or binary data is supported.

graph.type

the type of gaussian or binary graph. Defaults to gaussian.

the number of external loop for intersection operation. Defaults to 5.

Boots

the number of internal loop for bootstrap sampling. Defaults to 100.

edge.rule

the rule indicating whether the AND-rule or the OR-rule should be used to define the edges in the graph. Defaults to AND.

kfold

the number of folds of cross validation - default is 10. Although kfold can be as large as the sample size (leave-one-out CV), it is not recommended for large datasets. Smallest value allowable is kfold=3.

plot

logical. Should the resulting graph be plotted? Defaults to TRUE.

seed

the seed for random sampling, with the default value 0123.

Value

adj.matrix: the adjacency matrix.
graph.type: the type of graph. Currently, this procedure is supported for gaussian and binary data.
B: the number of external loop for intersection operation.
Boots: the number of internal loop for bootstrap sampling.
edge.rule: the rule used to define the edges in the graph.

Details

This graph estimation procedure Sparse.graph, which is based on the L1-regularized regression model, combines a bootstrap ranking strategy with model selection using the glmnet algorithm. The glmnet algorithm fits LASSO model paths for linear and logistic regression using coordinate descent. Thus, the Sparse.graph procedure identifies relevant relationships between gaussian and binary variables, and assesses network structures from data. The resulting graph consists of variables as nodes and relevant relationships as edges. The combination of the LASSO penalized regression model and a bootstrap ranking strategy demonstrates a higher power and a lower false positive rate during variable selection, and is proposed to identify significant association between variables in epidemiological analysis.

References

[1] Guo, P., Zeng, F., Hu, X., Zhang, D., Zhu, S., Deng, Y., & Hao, Y. (2015). Improved Variable Selection Algorithm Using a LASSO-Type Penalty, with an Application to Assessing Hepatitis B Infection Relevant Factors in Community Residents. PLoS One, 27;10(7):e0134151.

[2] Friedman, J., Hastie, T. and Tibshirani, R. (2008). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1-22.

[3] Strobl, R., Grill, E., Mansmann, U. (2012). Graphical modeling of binary data using the LASSO: a simulation study. BMC Medical Research Methodology, 12:16.

[4] Meinshausen, N., Buehlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. Ann Stat, 34:1436-1462.

Examples

Run this code

# Example 1: Gene network estimation using the bootstrap ranking LASSO method.
# Gaussian graph with OR-rule.
library(SIS)
data(leukemia.train)
# Genes screened by the LASSO algorithm as candidates for graphical modeling.
x <- as.matrix(leukemia.train[, -7130])
y <- as.numeric(leukemia.train[, 7130])
set.seed(0123)
cvfit <- cv.glmnet(x=x, y=y, type.measure="deviance", nfolds=3, family="binomial")
model.final <- cvfit$glmnet.fit
nzero <- as.matrix(coef(model.final, s=cvfit$lambda.min))
# To reduce the running time, only half of significant genes are shown.
var_nz <- sort(abs(nzero[nzero[,1]!=0, ][-1]), decreasing=TRUE)
var_nz <- names(var_nz[1:(length(var_nz)/2)])
sub_data <- leukemia.train[, c(var_nz, "V7130")]
# Gene expression data subset from patients with acute myeloid leukemia.
subset_1 <- subset(sub_data, sub_data$V7130==1)
subset_1 <- as.matrix(subset_1[, -dim(subset_1)[2]])
# The parameters of B and Boots in the following example are set as small values to
# reduce the running time, however the default values are proposed.
Sparse.graph.fit1 <- Sparse.graph(subset_1, graph.type=c("gaussian"), 
                                   B=2, Boots=1, edge.rule=c("OR"))
# Give out the adjacency matrix of variables.
Sparse.graph.fit1$adj.matrix

# Example 2: Gaussian graph with AND-rule.
# The parameters of B and Boots in the following example are set as small values to
# reduce the running time, however the default values are proposed.
Sparse.graph.fit2 <- Sparse.graph(subset_1, graph.type=c("gaussian"), 
                        B=2, Boots=1, edge.rule=c("OR"), plot=FALSE)
# Give out the adjacency matrix of variables.
Sparse.graph.fit2$adj.matrix
# Plot the graph based on the adjacency matrix of variables using the qgraph package.
library(qgraph)
qgraph(Sparse.graph.fit2$adj.matrix, directed=FALSE, color="blue", 
        negCol="red", edge.labels=TRUE, layout="circle")

Run the code above in your browser using DataLab