marker_h2_means: Compute a marker-based estimate of heritability, given genotypic means.

Description

Given a genetic relatedness matrix and genotypic means, this function computes REML-estimates of the genetic and residual variance and their standard errors, using the AI-algorithm (Gilmour et al. 1995). Based on this, heritability estimates and confidence intervals are given (the estimator \(h_m^2\) in Kruijer et al.).

Usage

marker_h2_means(data.vector, geno.vector, K, Dm=NULL, alpha = 0.05, eps = 1e-06,
       max.iter = 100, fix.h2 = FALSE, h2 = 0.5, grid.size=99)

Value

A list with the following components:

va: REML-estimate of the (additive) genetic variance.
ve: REML-estimate of the residual variance.
h2: Plug-in estimate of heritability: \(va / (va + ve)\).
conf.int1: 1-alpha confidence interval for heritability.
conf.int2: 1-alpha confidence interval for heritability, obtained by application of the delta method on a logarithmic scale.
inv.ai: The inverse of the average information (AI) matrix.
loglik: The log-likelihood.
loglik.vector: Empty numeric vector if the AI-algorthm converged within \(max.iter\) iterations. Otherwise it contains the log-likelihood on a grid.

Arguments

data.vector: A vector of phenotypic observations, typically genotypic means. Needs to be of type numeric. May contain missing values.
geno.vector: A vector of genotype labels, either a factor or character. This vector should correspond to data.vector, and hence needs to be of the same length.
K: A genetic relatedness or kinship matrix, typically marker-based. Must have row- and column-names corresponding to the levels of geno.vector
Dm: Covariance of the genotypic means contained in data.vector; see details. Should be of class matrix, with row- and column-names corresponding to the levels of geno.vector
alpha: Confidence level, for the 1-alpha confidence intervals.
eps: Numerical precision, used as convergence criterion in the AI-algorithm.
max.iter: Maximal number of iterations in the AI-algorithm.
fix.h2: Compute the log-likelihood and inverse AI-matrix for a fixed heritability value. Default is FALSE.
h2: When fix.h2 is TRUE, the value of the heritability. Must be of type numeric, between 0 and 1.
grid.size: If the AI-algorithm has not converged after max.iter iterations, the likelihood is computed on the grid of heritability values 1/(grid.size+1),...,grid.size/(grid.size+1); see details.

Author

Willem Kruijer.

Details

Given phenotypic observations \(Y_{i}\) for genotypes \(i=1,...,n\), the mixed model \(Y_{i} = \mu + G_i + E_{i}\) is assumed. Typically, the \(Y_{i}\) are genotypic means or BLUEs obtained from fitting a linear (mixed) model to the raw data, containing several plants or plots for each genotype. The vector of additive genetic effects \((G_1,...,G_n)'\) follows a multivariate normal distribution with mean zero and covariance \(\sigma_A^2 K\), where \(\sigma_A^2\) is the additive genetic variance, and \(K\) is a genetic relatedness matrix derived from a dense set of markers. The vector of errors \((E_1,...,E_n)'\) follows a multivariate normal distribution with mean zero and covariance \(\sigma_E^2 D_m\), where \(D_m\) is the covariance of the means obtained from the initial analysis. In case of a completely randomized design with \(r_i\) replicates for genotypes \(i=1,...,n\), \(D_m\) is diagonal with elements \(1 / r_i\). Under certain assumptions (see Speed et al. 2012) the marker- or chip-heritability \(h^2 = \sigma_A^2 / (\sigma_A^2 + \sigma_E^2)\) equals the narrow-sense heritability.
As in the marker_h2 function, it is assumed that the genetic relatedness matrix \(K\) is scaled such that \(trace(P K P) = n - 1\), where \(P\) is the projection matrix \(I_n - 1_n 1_n' / n\), for the identity matrix \(I_n\) and \(1_n\) being a column vector of ones. If this is not the case, \(K\) is automatically scaled prior to fitting the mixed model.
No covariates can be included, as it is assumed that these are available at plant- or plot level, and accounted for in the genotypic means.
The resulting heritability estatimes are less accurate than those obtained from individual plant or plot data, and the likelihood can be monotone in \(h^2 = \sigma_A^2 / (\sigma_A^2 + \sigma_E^2)\). If the AI-algorithm has not converged after max.iter iterations, the likelihood is computed on the grid of heritability values 1/(grid.size+1),...,grid.size/(grid.size+1)
As in the marker_h2 function, confidence intervals for heritability are constructed using the delta-method and the inverse AI-matrix. The delta-method can be applied either directly to the function \((\sigma_A^2,\sigma_E^2) -> \sigma_A^2 / (\sigma_A^2 + \sigma_E^2)\) or to the function \((\sigma_A^2,\sigma_E^2) -> log(\sigma_A^2 / \sigma_E^2)\). In the latter case, a confidence interval for \(log(\sigma_A^2 / \sigma_E^2)\) is obtained, which is back-transformed to a confidence interval for heritability. This approach (proposed in Kruijer et al.) has the advantage that intervals are always contained in the unit interval.

References

Gilmour et al. Gilmour, A.R., R. Thompson and B.R. Cullis (1995) Average Information REML: An Efficient Algorithm for Variance Parameter Estimation in Linear Mixed Models. Biometrics, volume 51, number 4, 1440-1450.
Kruijer, W. et al. (2015) Marker-based estimation of heritability in immortal populations. Genetics, Vol. 199(2), p. 1-20.
Speed, D., G. Hemani, M. R. Johnson, and D.J. Balding (2012) Improved heritability estimation from genome-wide snps. the American journal of human genetics 91: 1011-1021.

Examples

Run this code

data(means_LDV)
data(R_matrix_LDV)
data(K_atwell)
out <- marker_h2_means(data.vector=means_LDV$LDV,geno.vector=means_LDV$genotype,
                       K=K_atwell,Dm=R_matrix_LDV)
# Takes about a minute:
#data(means_LD)
#data(R_matrix_LD)
#out <- marker_h2_means(data.vector=means_LD$LD,geno.vector=means_LD$genotype,
#                       K=K_atwell,Dm=R_matrix_LD)
# The likelihood is monotone increasing:
#plot(x=(1:99)/100,y=out$loglik.vector,type="l",ylab="log-likelihood",lwd=2,
#     main='',xlab='h2',cex.lab=2,cex.axis=2.5)

Run the code above in your browser using DataLab