wfgl: weighted fused graphical lasso

Description

wfgl estimates joint partial correlation matrices from two multivariate normal distributed datasets using an ADMM based algorithm. Allows for paired data.

Usage

wfgl(D1, D2, lambda1, lambda2, paired = TRUE, automLambdas = TRUE, 
     sigmaEstimate = "CRmad", pairedEst = "Reg-based-sim", maxiter = 30, 
     tol = 1e-05, nsubset = 10000, weights = c(1,1), rho = 1, rho.increment = 1, 
     triangleCorrection = TRUE, alphaTri = 0.01, temporalFolders = FALSE, 
     notOnlyLambda2 = TRUE, roundDec = 16, burn = 0, lambda1B = NULL, lambda2B = NULL)

Arguments

first population dataset in matrix $n_1 \times p$ form.

second population dataset in matrix $n_2 \times p$ form.

lambda1

tuning parameter for sparsity in the precision matrices (sequence of lambda1 is allowed).

lambda2

tuning parameter for similarity between the precision matrices in the two populations (only one value allowed at a time).

paired

if TRUE, observations in D1 and D2 are assumed to be matched ($n_1$ must be equal to $n_2$).

automLambdas

if TRUE the lambda's are estimated automatically with lambda1 and lambda2 being expected false positive rate levels.

sigmaEstimate

robust method used to estimate the variance of estimated partial correlations: name that uniquely identifies "mad", "IQR" or "CRmad" (default). This measure is used to automatically select the tuning parameter (when automLambdas = TRUE).

pairedEst

type of estimator for the correlation of estimated partial correlation coefficients when "paired = TRUE": to select from Reg-based and Reg-based-sim (default). This measure is used to weight similarity penalization lambda2 for different pairs of variables.

maxiter

maximum number of iterations for the ADMM algorithm.

tol

convergence tolerance

nsubset

maximum number of estimated partial correlation coefficients (chosen randomly) used to select lambda1 and lambda2 automatically (when automLambdas = TRUE).

weights

weights for the two populations to find the inverse covariance matrices.

rho

regularization parameter used to compute matrix inverse by eigen value decomposition (default of 1).

rho.increment

default of 1.

triangleCorrection

if TRUE the estimated triangle graph structures are tested.

alphaTri

significance level for the tested triangle graph structures.

temporalFolders

if TRUE temporal files are created and eliminated within the procedure. It is used to free R memory space when the dimension is very large (order of thousands).

notOnlyLambda2

if FALSE only lambda2 is found automatically.

roundDec

number of decimals to be stored, if low it reduces de memory space used.

burn

initial number of iterations which consider the original interpretation of lambda1 (given by lambda1B) and of lambda2 (given by lambda1B) even when automLambdas = TRUE.

lambda1B

lambda1 interpreted as when automLambdas = FALSE (see details).

lambda2B

lambda2 interpreted as when automLambdas = FALSE (see details).

Value

An object of class wfgl containing the following components:

path

adjacency matrices.

omega

precision matrices.

triangleCorrection

determines if triangle structures are tested.

weakTriangEdges

weakest edges in triangle structures which have been tested.

weakTriangEdgesPval

p-values for the weakest edge in triangle structures.

diff_value

convergence control.

iters

number of iterations used.

corEst

dependence structure estimated measure used in the estimation to account for dependent datasets.

Details

wfgl uses a weighted-fused graphical lasso maximum likelihood estimator by solving:

$$ \hat{\Omega}_{WFGL}^{\lambda} = \arg\max\limits_{\Omega_X,\Omega_Y} [\sum_{k=X,Y} \log\det\Omega_k -tr(\Omega_k S_k) - P_{\lambda_1,\lambda_2,V}(\Omega_X,\Omega_Y)], $$ with $$ P_{\lambda_1,\lambda_2, V}(\Omega_X,\Omega_Y) = \lambda_1||\Omega_X||_1 + \lambda_1||\Omega_Y||_1 +\lambda_2\sum_{i,j} v_{ij} |\Omega_{Y_{ij}}-\Omega_{X_{ij}}|, $$ where $\lambda_1$ is the sparsity tuning parameter, $\lambda_2$ is the similarity tuning parameter, and $V = [v_{ij}]$ is a $p\times p$ matrix to weight $\lambda_2$ for each coefficient of the differential precision matrix. If datasets are independent (paired = "FALSE"), then it is assumed that $v_{ij} = 1$ for all pairs $(i,j)$. Otherwise (paired = "TRUE"), weights are estimated in order to account for the dependence structure between datasets in the differential network estimation.

Lambdas can be estimated in each iteration by controlling the expected false positive rate (EFPR) in case automLambdas = TRUE. This transforms the problem of selecting the tuning parameters $\lambda_1$ and $\lambda_2$ to the selection of the desired EFPR. In case lambda2 is a single value and lambda1 is a vector with several values, then lambda selection approaches implemented at lambdaSelection can also be used.

If triangleCorrection = TRUE, the weakest edges of estimated triangular motifs are further tested. The reason is that edges that complete triangular graph structures suffer an overestimation when applying the ADMM due to using regularized inverse procedures.

References

Danaher, P., P. Wang, and D. Witten (2014). The joint graphical lasso for inverse covariance estimation across multiple classes. Journal of the Royal Statistical Society: Series B (Statistical Methodology) (2006), 1-20.

Boyd, S. (2010). Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. Foundations and Trends in Machine Learning 3(1), 1-122.

Examples

Run this code

# NOT RUN {
					
# example to use of wfgl
EX2 <- pcorSimulatorJoint(nobs =50, nclusters = 3, nnodesxcluster = c(30, 30,30), 
                          pattern = "pow", diffType = "cluster", dataDepend = "diag", 
                          low.strength = 0.5, sup.strength = 0.9, pdiff = 0.5, nhubs = 5, 
                          degree.hubs = 20,  nOtherEdges = 30, alpha = 2.3, plus = 0, 
                          prob = 0.05, perturb.clust = 0.2, mu = 0, diagCCtype = "dicot", 
                          diagNZ.strength = 0.6, mixProb = 0.5, probSign = 0.7,  
                          exactZeroTh = 0.05)
## not run
#wfgl1 <- wfgl(EX2$D1, EX2$D2, lambda1 = 0.05, lambda2 = 0.1, paired = TRUE, 
#              automLambdas = TRUE, sigmaEstimate = "CRmad", pairedEst = "Reg-based-sim", 
#              maxiter = 30)
#print(wfgl1)

# }

Run the code above in your browser using DataLab