hugeRR_update: Updating a hugeRR fit to be a heteroscedastic effects model (HEM) fit

Description

This function updates the hugeRR-obtained bigRR object into a new object with heteroscedasticity assumption.

Usage

hugeRR_update(obj, Z.name, Z.index, family = gaussian(link = identity),  tol.err = 1e-6, tol.conv = 1e-8, save.cache = FALSE)

Arguments

obj

A bigRR object.

Z.name

file name to be combined with Z.index for the design matrix associated with shrinkage parameters (i.e. random effects in the mixed model framework). The files should be in DatABEL format (see Details).

Z.index

file index/indices to be combined with Z.name. See Details.

family

the distribution family of y, see help('family') for more details.

tol.err

internal tolerance level for extremely small values; default value is 1e-6.

tol.conv

tolerance level in convergence; default value is 1e-8.

save.cache

logical; specify whether internal cache files should be saved for fast future repeating analyses. If TRUE, some R data files will be saved in the current working directory, so that in the future, analyses involving the same cache data can substantially speed up.

Details

The function does a similar job as the bigRR_update function, but allows huge size of data (the Z matrix) that cannot be loaded into computer memory as a whole. Instead of specifying the entire design matrix for random effects (Z in bigRR), the Z can be split as Z = cbind(Z1, Z2, ..., Zk), and each piece of Z is stored in DatABEL format with file names specified by the arguments Z.name and Z.index. For example (see also Examples), if the genotype data for each chromosome is stored in DatABEL format with file names chr1.fvd & chr1.fvi, ..., chr22.fvd & chr22.fvi, the input argument should be specified as Z.name = 'chr' and Z.index = 1:22.

References

Shen X, Alam M, Fikse F and Ronnegard L (2013). A novel generalized ridge regression method for quantitative genetics. Genetics, 193, 1255-1268.

Examples

Run this code

# --------------------------------------------- #  
#              Arabidopsis example              #
# --------------------------------------------- #  

require(bigRR)
data(Arabidopsis)
X <- matrix(1, length(y), 1)

## Not run: 
# # splitting the genotype data into two pieces and re-saving in DatABEL format
# #
# dimnames(Z) <- list(NULL, NULL)
# Z <- scale(Z)
# matrix2databel(Z[,1:100000], 'part1')
# matrix2databel(Z[,100001:ncol(Z)], 'part2')
# 
# # fitting SNP-BLUP, i.e. a ridge regression on all the markers across the genome
# #
# SNP.BLUP.result <- hugeRR(y = y, X = X, Z.name = 'part', Z.index = 1:2, 
#                           family = binomial(link = 'logit'), save.cache = TRUE)
#                           
# # re-run SNP-BLUP - a lot faster since cache data are stored
# SNP.BLUP.result <- hugeRR(y = y, X = X, Z.name = 'part', Z.index = 1:2, 
#                           family = binomial(link = 'logit'))
# 
# # fitting HEM, i.e. a generalized ridge regression with marker-specific shrinkage
# #
# HEM.result <- hugeRR_update(SNP.BLUP.result, Z.name = 'part', Z.index = 1:2, 
#                             family = binomial(link = 'logit'))
# 
# # plot and compare the estimated effects from both methods
# #
# split.screen(c(1, 2))
# split.screen(c(2, 1), screen = 1)
# screen(3); plot(abs(SNP.BLUP.result$u), cex = .6, col = 'slateblue')
# screen(4); plot(abs(HEM.result$u), cex = .6, col = 'olivedrab')
# screen(2); plot(abs(SNP.BLUP.result$u), abs(HEM.result$u), cex = .6, pch = 19, 
#                 col = 'darkmagenta')
# 
# # create a random new genotypes for 10 individuals with the same number of markers 
# # and predict the outcome using the fitted HEM
# #
# Z.new <- matrix(sample(c(-1, 1), 10*ncol(Z), TRUE), 10)
# y.predict <- as.numeric(HEM.result$beta + Z.new %*% HEM.result$u)
# #
# # NOTE: The above prediction may not be good due to the scaling in the HEM 
# #       fitting above, and alternatively, one can either remove the scaling 
# #       above or scale Z.new by row-binding it with the original Z matrix.
# ## End(Not run)

Run the code above in your browser using DataLab