Learn R Programming

heritability (version 1.4)

marker_h2_means: Compute a marker-based estimate of heritability, given genotypic means.

Description

Given a genetic relatedness matrix and genotypic means, this function computes REML-estimates of the genetic and residual variance and their standard errors, using the AI-algorithm (Gilmour et al. 1995). Based on this, heritability estimates and confidence intervals are given (the estimator \(h_m^2\) in Kruijer et al.).

Usage

marker_h2_means(data.vector, geno.vector, K, Dm=NULL, alpha = 0.05, eps = 1e-06,
       max.iter = 100, fix.h2 = FALSE, h2 = 0.5, grid.size=99)

Value

A list with the following components:

  • va: REML-estimate of the (additive) genetic variance.

  • ve: REML-estimate of the residual variance.

  • h2: Plug-in estimate of heritability: \(va / (va + ve)\).

  • conf.int1: 1-alpha confidence interval for heritability.

  • conf.int2: 1-alpha confidence interval for heritability, obtained by application of the delta method on a logarithmic scale.

  • inv.ai: The inverse of the average information (AI) matrix.

  • loglik: The log-likelihood.

  • loglik.vector: Empty numeric vector if the AI-algorthm converged within \(max.iter\) iterations. Otherwise it contains the log-likelihood on a grid.

Arguments

data.vector

A vector of phenotypic observations, typically genotypic means. Needs to be of type numeric. May contain missing values.

geno.vector

A vector of genotype labels, either a factor or character. This vector should correspond to data.vector, and hence needs to be of the same length.

K

A genetic relatedness or kinship matrix, typically marker-based. Must have row- and column-names corresponding to the levels of geno.vector

Dm

Covariance of the genotypic means contained in data.vector; see details. Should be of class matrix, with row- and column-names corresponding to the levels of geno.vector

alpha

Confidence level, for the 1-alpha confidence intervals.

eps

Numerical precision, used as convergence criterion in the AI-algorithm.

max.iter

Maximal number of iterations in the AI-algorithm.

fix.h2

Compute the log-likelihood and inverse AI-matrix for a fixed heritability value. Default is FALSE.

h2

When fix.h2 is TRUE, the value of the heritability. Must be of type numeric, between 0 and 1.

grid.size

If the AI-algorithm has not converged after max.iter iterations, the likelihood is computed on the grid of heritability values 1/(grid.size+1),...,grid.size/(grid.size+1); see details.

Author

Willem Kruijer.

Details

  • Given phenotypic observations \(Y_{i}\) for genotypes \(i=1,...,n\), the mixed model \(Y_{i} = \mu + G_i + E_{i}\) is assumed. Typically, the \(Y_{i}\) are genotypic means or BLUEs obtained from fitting a linear (mixed) model to the raw data, containing several plants or plots for each genotype. The vector of additive genetic effects \((G_1,...,G_n)'\) follows a multivariate normal distribution with mean zero and covariance \(\sigma_A^2 K\), where \(\sigma_A^2\) is the additive genetic variance, and \(K\) is a genetic relatedness matrix derived from a dense set of markers. The vector of errors \((E_1,...,E_n)'\) follows a multivariate normal distribution with mean zero and covariance \(\sigma_E^2 D_m\), where \(D_m\) is the covariance of the means obtained from the initial analysis. In case of a completely randomized design with \(r_i\) replicates for genotypes \(i=1,...,n\), \(D_m\) is diagonal with elements \(1 / r_i\). Under certain assumptions (see Speed et al. 2012) the marker- or chip-heritability \(h^2 = \sigma_A^2 / (\sigma_A^2 + \sigma_E^2)\) equals the narrow-sense heritability.

  • As in the marker_h2 function, it is assumed that the genetic relatedness matrix \(K\) is scaled such that \(trace(P K P) = n - 1\), where \(P\) is the projection matrix \(I_n - 1_n 1_n' / n\), for the identity matrix \(I_n\) and \(1_n\) being a column vector of ones. If this is not the case, \(K\) is automatically scaled prior to fitting the mixed model.

  • No covariates can be included, as it is assumed that these are available at plant- or plot level, and accounted for in the genotypic means.

  • The resulting heritability estatimes are less accurate than those obtained from individual plant or plot data, and the likelihood can be monotone in \(h^2 = \sigma_A^2 / (\sigma_A^2 + \sigma_E^2)\). If the AI-algorithm has not converged after max.iter iterations, the likelihood is computed on the grid of heritability values 1/(grid.size+1),...,grid.size/(grid.size+1)

  • As in the marker_h2 function, confidence intervals for heritability are constructed using the delta-method and the inverse AI-matrix. The delta-method can be applied either directly to the function \((\sigma_A^2,\sigma_E^2) -> \sigma_A^2 / (\sigma_A^2 + \sigma_E^2)\) or to the function \((\sigma_A^2,\sigma_E^2) -> log(\sigma_A^2 / \sigma_E^2)\). In the latter case, a confidence interval for \(log(\sigma_A^2 / \sigma_E^2)\) is obtained, which is back-transformed to a confidence interval for heritability. This approach (proposed in Kruijer et al.) has the advantage that intervals are always contained in the unit interval.

References

  • Gilmour et al. Gilmour, A.R., R. Thompson and B.R. Cullis (1995) Average Information REML: An Efficient Algorithm for Variance Parameter Estimation in Linear Mixed Models. Biometrics, volume 51, number 4, 1440-1450.

  • Kruijer, W. et al. (2015) Marker-based estimation of heritability in immortal populations. Genetics, Vol. 199(2), p. 1-20.

  • Speed, D., G. Hemani, M. R. Johnson, and D.J. Balding (2012) Improved heritability estimation from genome-wide snps. the American journal of human genetics 91: 1011-1021.

See Also

For marker-based estimation of heritability using individual plant or plot data, see marker_h2.

Examples

Run this code
data(means_LDV)
data(R_matrix_LDV)
data(K_atwell)
out <- marker_h2_means(data.vector=means_LDV$LDV,geno.vector=means_LDV$genotype,
                       K=K_atwell,Dm=R_matrix_LDV)
# Takes about a minute:
#data(means_LD)
#data(R_matrix_LD)
#out <- marker_h2_means(data.vector=means_LD$LD,geno.vector=means_LD$genotype,
#                       K=K_atwell,Dm=R_matrix_LD)
# The likelihood is monotone increasing:
#plot(x=(1:99)/100,y=out$loglik.vector,type="l",ylab="log-likelihood",lwd=2,
#     main='',xlab='h2',cex.lab=2,cex.axis=2.5)

Run the code above in your browser using DataLab