netEst.dir: Constrained estimation of directed networks

Description

Estimates a directed network using a lasso (L1) penalty.

Usage

netEst.dir(X, zero = NULL, one = NULL, lambda, verbose = FALSE, eps = 1e-08)

Arguments

The $n \times p$ data matrix.

zero

(Optional) indices of entries of the matrix to be constrained to be zero. The input should be a matrix of $p \times p$, with 1 at entries to be constrained to be zero and 0 elsewhere.

one

(Optional) indices of entries of the matrix to be kept regardless of the regularization parameter for lasso. The input is similar to that of zero.

lambda

(Non-negative) numeric scalar or a vector of length $p-1$ representing the regularization parameters for nodewise lasso. If lambda is a scalar, the same penalty will be used for all $p-1$ lasso regressions. By default (lambda=NULL), the vector of lambda is defined as $$\lambda_j(\alpha) = 2 n^{-1/2} Z^*_{\frac{\alpha}{2p(j-1)}}, \quad j=2,\ldots,p.$$ Here $Z^*_q$ represents the $(1-q)$-th quantile of the standard normal distribution and $\alpha$ is a positive constant between 0 and 1. See Shojaie and Michailidis (2010a) for details on the choice of tuning parameters.

verbose

Whether to print out information as estimation proceeds. Default = FALSE.

eps

(Non-negative) numeric scalar indicating the tolerance level for differentiating zero and non-zero edges: entries with magnitude $<$ eps will be set to 0.

Value

A list with components

Adj

The weighted adjacency matrix of dimension $p \times p$. This is the matrix that will be used in NetGSA.

infmat

The influence matrix of dimension $p \times p$.

lambda

The values of tuning parameters used.

Details

The function netEst.dir performs constrained estimation of a directed network using a lasso (L1) penalty, as described in Shojaie and Michailidis (2010a). Two sets of constraints determine subsets of entries of the weighted adjacency matrix that should be exactly zero (the option zero argument), or should take non-zero values (option one argument). The remaining entries will be estimated from data.

The arguments one and/or zero can come from external knowledge on the 0-1 structure of underlying network, such as a list of edges and/or non-edges learned frm available databases. Then the function edgelist2adj can be used to first construct one and/or zero.

In this function, it is assumed that the columns of $X$ are ordered accoring to a correct (Wald) causal order, such that no $X_j$ is a parent of $X_k$ ($k \le j$). Given the causal ordering of nodes, the resulting adjacency matrix is lower triangular (see Shojaie & Michailidis, 2010b). Thus, only lower triangular parts of zero and one are used in this function. For this reason, it is important that both of these matrices are also ordered according to the causal order of the nodes in $X$. To estimate the network, first each node is regressed on the known edges (one). The reisdual obtained from this regression is then used to find the additional edges, among the nodes that could potentially interact with the given node (those not in zero).

This function is closely related to NetGSA, which requires the weighted adjacency matrix as input. When the user does not have complete information on the weighted adjacency matrix, but has data (X, not necessarily the same as the x in NetGSA) and external information (one and/or zero) on the adjacency matrix, then netEst.dir can be used to estimate the remaining interactions in the adjacency matrix using the data. Further, when it is anticipated that the adjacency matrices under different conditions are different, and data from different conditions are available, the user needs to run netEst.dir separately to obtain estimates of the adjacency matrices under each condition.

The algorithm used in netEst.undir is based on glmnet. Please refer to glmnet for computational details.

References

Shojaie, A., & Michailidis, G. (2010a). Penalized likelihood methods for estimation of sparse high-dimensional directed acyclic graphs. Biometrika 97(3), 519-538. http://biomet.oxfordjournals.org/content/97/3/519.short

Shojaie, A., & Michailidis, G. (2010b). Network enrichment analysis in complex experiments. Statistical applications in genetics and molecular biology, 9(1), Article 22. http://www.ncbi.nlm.nih.gov/pubmed/20597848.

Shojaie, A., & Michailidis, G. (2009). Analysis of gene sets based on the underlying regulatory network. Journal of Computational Biology, 16(3), 407-426. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3131840/

Examples

Run this code

# NOT RUN {
library(MASS)
library(glmnet)
set.seed(1)

p <- 100 # number of variables
s <- 0.1 # probability of having an edge in the adjacency matrix
rho <- 0.6 # edge weights

## form an adjacency matrix
A <- matrix(rbinom(p*p,1,s),p,p)
A[upper.tri(A)] = 0

I <- diag(rep(1,p))

## generate data (see Shojaie & Michaildis, 2010b)
n <- 100
X <- solve(I-rho*A) 
# }
# NOT RUN {
<!-- %*% t(matrix(rnorm(n*p), n, p)) -->
# }
# NOT RUN {
X <- t(X)
X <- scale(X)

zeros <- matrix(0,p,p)
ones <- matrix(0,p,p)

zeros[5,1] <- zeros[10,1] <- 1
ones[4,1] <- ones[11,2] <- 1

fit.dir <- netEst.dir(X=X, zero=zeros, one=ones, lambda=0.1)
# }

Run the code above in your browser using DataLab