Learn R Programming

PCGII

R Package for Information-incorporated Gene Network Construction with FDR Control

Authors:

Hao Wang, Yumou Qiu and Peng Liu.

Contact:

[haydo.wang@outlook.com] (Hao Wang)

Citation:

Wang, H., Qiu, Y.*, Guo, H., Yin, Y., Liu, P.*, 2024. Information-incorporated Gene Network Construction with FDR Control. Under review.

Installation and Package loading

# R version is required >= 3.4.4
# When the first time to use the package, please make sure dependent packages are installed under your R environment, if not, please use commands below to install
> #install.packages("tidyverse")
> #install.packages("glmnet")
> #install.packages("mvtnorm")
> #install.packages("igraph")
> #install.packages("Matrix")
# install "devtools" package in your R environment
> # devtools::install_github("HaoWang47/PCGII")
> library(PCGII)
> library(corpcor)
> library(glmnet)
> library(igraph)
> library(Matrix)
> library(mvtnorm)
> library(tidyverse)

This is a tutorial script for researchers who are interested in applying PCGII on omics data to learn the direct association structure of omics features. The main function PGCII() takes a biologically pre-processed expression data matrix as input, and returns a list of statistics (estimates and test statistics). The function inference() takes a list returned by PGCII() as input and conduct simultaneous test to identify significant partial correlations with False Discovery Rate (FDR) controlled at a pre-determined nominal level (0.05 by default).

Usage

PCGII()

  • Input:
    • df: the main expression data, an $n$ by $p$ matrix/dataframe, in which each row corresponds to a sample and each column represents expression/abundance of an omics feature;
    • prior: the prior set, a $k$ by $2$ dataframe, in which each row corresponds to a pair of nodes (any omics features) that are connected under prior belief. Note, prior input has to be dataframe with column names "row" and "col";
    • lambda: the regularization parameter, used in the node-wise regression. If missing, default lambda will be used which is at the order of sqrt(2log(p/sqrt(n))/n).
  • Remark: mathematical standardization will be automatically done within the function.
  • Output: This function returns a list of
    • Est: estimated partial correlation matrix;
    • EstThresh: sparse partial correlation estimation matrix with threshold;
    • kappa: estimated ratio of forth and squared second moment of residuals, please refer to the manuscript for details;
    • tscore: estimated test statistics matrix of partial correlations;
    • n: sample size;
    • p: number of genes under study.

Inference()

  • Input:
    • list: a list returned by either PCGII() or clevel().
    • alpha: pre-determined False Discovery Rate. Nominal FDR is set at 0.05 by default.
  • Output: an adjacency matrix of significant partial correlations.

Network Analysis

Simulate data $X$ from a scale-free network $g$.

> # Simulating data
> set.seed(1234567)
> n=50 # sample size
> p=30 # number of nodes
>
> omega=make_random_precision_mat(eta=.01, p=p)
>
> Sigma=solve(omega) # population covariance matrix, which is used to generate data
> X = rmvnorm(n = n, sigma = Sigma) # simulate expression data

Network analysis of data matrix X.

> # determine tuning parameter: fixed lambda
> lam=2*sqrt(log(p)/n)
>
> # create prior set: directed prior network
> prior_set=matrix(data=c(6,5, 28,14), nrow=2, ncol=2, byrow = TRUE)
> colnames(prior_set)=c("row", "col")
> PCGII_out=PCGII(df=X, prior=as.data.frame(prior_set), lambda = lam)
> inference_out=inference(list=PCGII_out)
> diag(inference_out)=0
> # Visualization
> inference_out %>%
+   graph_from_adjacency_matrix(mode = "undirected") %>%
+   plot(vertex.size=4, vertex.label.dist=0.5, vertex.color="red", edge.arrow.size=0.5)
>
> # create prior set: undirected prior network
> PCGII_out=PCGII(df=X, prior=undirected_prior(prior_set), lambda = lam)
> inference_out=inference(list=PCGII_out)
> diag(inference_out)=0
> # Visualization
> inference_out %>%
+   graph_from_adjacency_matrix(mode = "undirected") %>%
+   plot(vertex.size=4, vertex.label.dist=0.5, vertex.color="red", edge.arrow.size=0.5)

Copy Link

Version

Install

install.packages('PCGII')

Monthly Downloads

156

Version

1.1.2

License

MIT + file LICENSE

Maintainer

Hao Wang

Last Published

February 2nd, 2024

Functions in PCGII (1.1.2)

undirected_prior

Pre-process the input prior set to ensure the input prior set corresponds to an undirected prior network
PCGII

Get the estimated partial correlation graph with information incorporation
inference

Conduct simultaneous inference of estimated partial correlations
make_random_precision_mat

Generate unstructured/random network skeleton and simulates corresponding precision matrix
sigs2mat

Utility function for PCGII inference results
clevel

Get the estimated partial correlation graph without information incorporation
makeBlockDiag

Generate block-diagonal matrix of size p by p
make_sf_precision_mat

Generate scale-free network skeleton and simulates corresponding precision matrix