haplo.ccs: Estimate Haplotype Relative Risks in Case-Control Data

Description

'haplo.ccs' estimates haplotype and covariate relative risks in case-control data by weighted logistic regression. Diplotype probabilities, which are estimated by EM computation with progressive insertion of loci, are utilized as weights. The model is specified by a symbolic description of the linear predictor, which includes specification of an allele matrix, inheritance mode, and preferences for rare haplotypes using 'haplo'. Note that use of this function requires installation of the 'haplo.stats' and 'survival' packages. See 'haplo.em' for a description of EM computation of diplotype probabilities.

Usage

haplo.ccs(formula, data=NULL, ...)

haplo.ccs.fit(y, x, int, geno, inherit.mode, group.rare, rare.freq, 
              referent, names.x, names.int, ...)

Arguments

formula

a symbolic description of the model to be fit, which requires specification of an allele matrix and inheritance mode using 'haplo'. Note that 'additive' is the default inheritance mode for 'haplo'. Preferences for grouping rare haplotypes a

data

an optional data frame, list, or environment (or object coercible by 'as.data.frame' to a data frame) containing the variables in the model. If not found in 'data', the variables are taken from 'environment(formula)', typically the environm

referent

a character string representing the haplotype to be used as the referent. The haplotype with the highest estimated population frequency is the default referent.

...

optional model-fitting arguments to be passed to 'glm'.

a vector of observations.

the design matrix for environmental covariates.

int

the design matrix for haplotype-environment interaction.

geno

the allele matrix.

inherit.mode

the inheritance mode specified by 'haplo'.

group.rare

a logical value indicating whether rare haplotypes should be grouped, specified by 'haplo'.

rare.freq

the population haplotype frequency used to define the rare haplotypes, specified by 'haplo'.

names.x

the column names of the design matrix for covariates.

names.int

the column names of the design matrix for haplotype-environment interaction.

Value

'haplo.ccs' returns an object of class inheriting from '"haplo.ccs"'. More details appear later in this section. The function 'summary' (i.e., 'summary.haplo.ccs') obtains or prints a summary of the results, which include haplotype and covariate relative risks, robust standard error estimates, and estimated haplotype frequencies. The generic accessory functions 'coefficients', 'fitted.values', and 'residuals' extract corresponding features of the object returned by 'haplo.ccs'. The function 'vcov' (i.e., 'vcov.haplo.ccs') returns sandwich variance-covariance estimates. The function 'haplo.freq' extracts information returned by the EM computation of haplotype frequencies. Note that if rare haplotypes are grouped, then their individual estimated frequencies are summed. An object of class '"haplo.ccs"' is a list containing at least the following components:
formulathe formula supplied.
callthe matched call.
coefficientsa named vector of coefficients.
covariancea named matrix of sandwich variance-covariance estimates, computed using 'sandcov'.
residualsthe working residuals, i.e., the residuals from the final iteration of the IWLS fit.
fitted.valuesthe fitted mean values, obtained by transforming the linear predictors by the expit function.
linear.predictorsthe linear fit on the logit scale.
dfthe model degrees of freedom.
rankthe numeric rank of the fitted model.
familythe family object, in this case, quasibinomial.
iterthe number of iterations of IWLS used.
weightsthe working weights, i.e., the weights from the final iteration of the IWLS fit.
prior.weightsthe weights initially supplied, in this case, the diplotype probabilities estimated by the EM computation.
ya vector indicating case-control status, expanded for each subject by the number of plausible diplotypes for that subject.
idthe numeric vector used to identify subjects, expanded for each subject by the number of plausible diplotypes for that subject.
convergeda logical indicating whether the IWLS fit converged.
boundarya logical indicating whether the fitted values are on the boundary of the attainable values.
modelthe model matrix used.
termsthe terms object used.
offsetthe offset vector used.
contraststhe contrasts used.
xlevelsa record of the levels of the factors used in fitting.
inheritance.modethe method of inheritance.
rare.freqthe value used to define the rare haplotypes.
em.lnlikethe value of the log likelihood at the last EM iteration.
em.lrthe likelihood ratio statistic used to test the assumed model against the model that assumes complete linkage equilibrium among all loci.
em.df.lrthe degrees of freedom for the likelihood ratio statistic.
em.nrepsthe count of haplotype pairs that map to each subject's marker genotypes.
hap1character strings representing the possible first haplotype for each subject.
hap2character strings representing the possible second haplotype for each subject.
hap.namescharacter strings representing the unique haplotypes.
hap.probsthe estimated frequency of each unique haplotype. Note that if rare haplotypes are grouped, then their individual estimated frequencies are summed.
em.convergeda logical indicating whether the EM computation converged.
em.nrepsthe number of haplotype pairs that map to the marker genotypes for each subject.
em.max.pairsthe maximum number of pairs of haplotypes per subject that are consistent with their marker data.
em.controla list of control parameters for the EM computation.

Details

A formula has the form 'y ~ terms' where 'y' is a numeric vector indicating case-control status and 'terms' is a series of terms which specifies a linear predictor for 'y'. A terms specification of the form 'first + second' indicates all the terms in 'first' together with all the terms in 'second' with duplicates removed. The terms in the formula will be re-ordered so that main effects come first, followed by the interactions, all second-order, all third-order and so on. The specification 'first*second' indicates the cross of 'first' and 'second'. Note that 'haplo.ccs.fit' is the workhorse function. The inputs 'y', 'x', 'geno', and 'int' represent case-control status, the matrix of covariates, the matrix of alleles, and the matrix of terms that have interaction with the haplotypes to be estimated from the alleles. The argument 'inherit.mode' corresponds to the inheritance mode specified by 'haplo', and the arguments 'group.rare' and 'rare.freq' correspond to the preferences for grouping rare haplotypes specified by 'haplo'. 'names.x' and 'names.int' correspond to the column names of 'x' and 'int', respectively. The background functions 'one', 'count.haps', and 'return.haps' are used in specifying the model terms and neatly packaging the results.

References

French B, Lumley T, Monks SA, Rice KM, Hindorff LA, Reiner AP, Psaty BM. Simple estimates of haplotype relative risks in case-control data. Genetic Epidemiology 2006; 30(6):485-494. The help files for 'glm', 'haplo.em', and 'haplo.glm' were instrumental in creating this help file.

Examples

Run this code

data(renin)

## Fit a model for haplotype effects.

haplo.ccs(case ~ haplo(geno))

## Fit a model for haplotype and covariate effects.

haplo.ccs(case ~ gender + age + factor(race) + haplo(geno))

## Fit a model for haplotype interaction with gender.

haplo.ccs(case ~ age + factor(race) + gender*haplo(geno))

Run the code above in your browser using DataLab