netsnp: Reconstructs intra- and inter- chromosomal conditional interactions among genetic loci

Description

This is one of the main functions of the netgwas package. This function can be used to reconstruct the intra- and inter-chromosomal interactions among genetic loci in diploids and polyploids. The input data can be belong to any biparental genotype data which contains at least two genotype states. Two methods are available to reconstruct the network, namely (1) approximation method, and (2) gibbs sampling within the Gaussian copula graphical model. Both methods are able to deal with missing genotypes.

Usage

netsnp(data, method = "gibbs", rho = NULL, n.rho = NULL, rho.ratio = NULL, 
		ncores = "all", em.iter = 5, em.tol = .001, verbose = TRUE)

Arguments

data

An (\(n \times p\)) matrix or a data.frame corresponding to a genotype data matrix (\(n\) is the sample size and \(p\) is the number of variables). It also could be an object of class "simgeno". Input data can contain missing values.

method

Reconstructs intra- and inter- chromosomal conditional interactions (epistatic selection) network with three methods: "gibbs", "approx", and "npn". For a medium (~500) and a large number of variables we would recommend to choose "gibbs" and "approx", respectively. For a very large number of variables (> 2000) choose "npn". The default method is "gibbs".

rho

A decreasing sequence of non-negative numbers that control the sparsity level. Leaving the input as rho = NULL, the program automatically computes a sequence of rho based on n.rho and rho.ratio. Users can also supply a decreasing sequence values to override this.

n.rho

The number of regularization parameters. The default value is 10.

rho.ratio

Determines the distance between the elements of rho sequence. A small value of rho.ratio results in a large distance between the elements of rho sequence. And a large value of rho.ratio results into a small distance between elements of rho. If keep it as NULL the program internally chooses a value.

ncores

The number of cores to use for the calculations. Using ncores = "all" automatically detects number of available cores and runs the computations in parallel on (available cores - 1).

em.iter

The number of EM iterations. The default value is 5.

em.tol

A criteria to stop the EM iterations. The default value is .001.

verbose

Providing a detail message for tracing output. The default value is TRUE.

Value

An object with S3 class "netgwas" is returned:

Theta

A list of estimated p by p precision matrices that show the conditional independence relationships patterns among genetic loci.

path

A list of estimated p by p adjacency matrices. This is the graph path corresponding to Theta.

A list of estimated p by p conditional expectation corresponding to rho.

A list of n by p transformed data based on Gaussian copula.

rho

A n.rho dimensional vector containing the penalty terms.

loglik

A n.rho dimensional vector containing the maximized log-likelihood values along the graph path.

data

The \(n\) by \(p\) input data matrix.

Details

Viability is a phenotype that can be considered. This function detects the conditional dependent short- and long-range linkage disequilibrium structure of genomes and thus reveals aberrant marker-marker associations that are due to epistatic selection. This function can be used to estimate conditional independence relationships between partially observed data that not follow Gaussianity assumption (e.g. continuous non-Gaussian, discrete, or mixed dataset).

References

1. Behrouzi, P., and Wit, E. C. (2017a). Detecting Epistatic Selection with Partially Observed Genotype Data Using Copula Graphical Models. arXiv preprint, arXiv:1710.00894. 2. Behrouzi, P., and Wit, E. C. (2017c). netgwas: An R Package for Network-Based Genome-Wide Association Studies. arXiv preprint, arXiv:1710.01236. 3. D. Witten and J. Friedman. New insights and faster computations for the graphical lasso. Journal of Computational and Graphical Statistics, to appear, 2011. 4. Guo, Jian, et al. "Graphical models for ordinal data." Journal of Computational and Graphical Statistics 24.1 (2015): 183-204.

Examples

Run this code

# NOT RUN {
    
# }
# NOT RUN {
	
# }
# NOT RUN {
		data(CviCol)
		out <- netsnp(CviCol); out
		plot(out)
		
		#select optimal graph
		epi <- selectnet(out)
		plot(epi, vis="CI", xlab="markers", ylab="markers", 
		    n.var = c(24,14,17,16,19), vertex.size=4)
		#Visualize interactive plot of the selected network
		#Different colors for each chromosome
		cl <- c(rep("red", 24), rep("white",14), rep("tan1",17), 
		      rep("gray",16), rep("lightblue2",19))
		plot(epi, vis="interactive", vertex.color= cl)
		#Partial correlations between markers on genome
		image(epi$par.cor, xlab="markers", ylab="markers", sub="")
	
# }

Run the code above in your browser using DataLab