Genetic regression: Linear and Multilinear Genetic Regressions

Description

The regression aims at estimating genetic effects from a population in which the genotypes and phenotypes are known.

Usage

linearRegression(phen, gen=NULL, genZ=NULL, 
	reference="noia", max.level=NULL, max.dom=NULL, fast=FALSE)
multilinearRegression(phen, gen=NULL, genZ=NULL, 
	reference="noia", max.level=NULL, max.dom=NULL, fast=FALSE, 
	e.unique=FALSE, start.algo = "linear", start.values=NULL, 
	robust=FALSE, bilinear.steps=1, ...)
prepareRegression(phen, gen=NULL, genZ=NULL, 
	reference="noia", max.level=NULL, max.dom=NULL, fast=FALSE)

Arguments

phen

The vector of individual phenotypes measured in the population.

gen

The matrix of individual genotypes in the population, one column per locus. See genNames for the genotype encoding. Not necessary if genZ is provided.

genZ

The matrix of individual genotypic probabilities in the population, 3 columns per locus, corresponding of the probability of each of the 3 genotypes (the sum must be 1). Not necessary if gen is provided.

reference

The reference point from which the regression is performed. By default, the "noia" reference point is used, since it provides a fairly good orthogonality. Other possibilities are "G2A", "F2", "F1",

max.level

Maximum level of interactions.

max.dom

Maximum level for dominance effects. Does not have any effect if >= max.level. In the multilinear regression, the maximum level for dominance effects cannot be > 1.

fast

This "fast" algorithm should be used when (i) the number of loci is high (> 8) and (ii) there are uncertainties in the dataset (missing values or Haley-Knott regression). This algorithm computes the regression matrix directly through the

e.unique

Whether the multilinear term is the same for all pairs.

start.algo

Algorithm used to compute the starting values. Can be "linear", "multilinear", "subset" or "bilinear". Ignored if start.values are provided.

start.values

Vector of starting values.

robust

Tries sequentially all starting values algorithms.

bilinear.steps

Number of calls of the bilinearStep function. Ignored if start.algo is not "bilinear". If NULL, the bilinear algorithm is run until (almost) convergence.

...

Extra parameters to the non-linear regression function nls, including nls.control.

Value

linearRegression and multilinearRegression return an object of class "noia.linear" or "noia.multilinear", both having their own print methods: print.noia.linear and print.noia.multilinear.

Details

If a gen data set is provided, it will be turned into a genZ through the gen2genZ function. Missing data (unknown genotypes) are considered as loci for which genotypic probabilities are identical to the genotypic frequencies in the population. The algebraic framework is described extensively in Alvarez-Castro & Carlborg 2007. The default reference point ("noia") provides an orthogonal decomposition of genetic effects in the 1-locus case, whatever the genotypic frequencies. It remains a good approximation of orthogonality in the multi-locus case if linkage disequilibrium is small. Other optional reference points are those of the "G2A" model (Zeng et al. 2005), and the unweighted regression model "UWR" (Cheverud & Routman, 1995). Several key populations can be taken as reference as well: "F2", "F1", "Finf" (F infinity), and the two "parental" homozygous populations "P1" and "P2". The multilinear model for genetic interactions is an alternative way to model epistatic interactions between at least two loci (see Hansen & Wagner 2001). The computation of multilinear estimates requires a non-linear regression step that relies on the nls function. Providing good starting values for the non-linear regression is a key to ensure convergence, and different algorithms are provided, that can be specified by the "start.algo" option. "linear" performs a linear regression and approximates the genetic effects from it, while "multilinear" performs a simpler multilinear regression (without dominance) to initialize the genetic effects. "subset" estimate all genetic effects from a random subset (50%) of the population, and "bilinear" estimate alternatively marginal and epistatic effects. See startingValues for more information. prepareRegression performs all preliminary calculation on the dataset but does not run any regression. It is called internally by both linearRegression and multilinearRegression.

References

Alvarez-Castro JM, Carlborg O. (2007). A unified model for functional and statistical epistasis and its application in quantitative trait loci analysis. Genetics 176(2):1151-1167. Alvarez-Castro JM, Le Rouzic A, Carlborg O. (2008). How to perform meaningful estimates of genetic effects. PLoS Genetics 4(5):e1000062. Cheverud JM, Routman, EJ. (1995). Epistasis and its contribution to genetic variance components. Genetics 139:1455-1461. Hansen TF, Wagner G. (2001) Modeling genetic architecture: A multilinear theory of gene interactions. Theoretical Population Biology 59:61-86. Le Rouzic A, Alvarez-Castro JM. (2008). Estimation of genetic effects and genotype-phenotype maps. Evolutionary Bioinformatics 4. Zeng ZB, Wang T, Zou W. (2005). Modelling quantitative trait loci and interpretation of models. Genetics 169: 1711-1725.

Examples

Run this code

set.seed(123456789)

map <- c(0.25, -0.75, -0.75, -0.75, 2.25, 2.25, -0.75, 2.25, 2.25)
pop <- simulatePop(map, N=500, sigmaE=0.2, type="F2")

# Regressions

linear <- linearRegression(phen=pop$phen, gen=cbind(pop$Loc1, pop$Loc2))

multilinear <- multilinearRegression(phen=pop$phen, 
	gen=cbind(pop$Loc1, pop$Loc2))

# Linear effects, associated variances and stderr
linear

# Multilinear effects
multilinear

Run the code above in your browser using DataLab