Learn R Programming

hapassoc (version 0.2)

EM: EM algorithm to fit maximum likelihood estimates of trait associations with SNP haplotypes

Description

This function takes a dataset of haplotypes in which rows for individuals of uncertain phase have been augmented by "pseudo-individuals" who carry the possible multilocus genotypes consistent with the single-locus phenotypes. The EM algorithm is used to find MLE's for trait associations with covariates in generalized linear models.

Usage

EM(form,haplos.list,baseline = "missing" ,family = binomial(),
gamma = FALSE, maxit = 50, tol = 0.001, ...)

Arguments

form
model equation in usual R format
haplos.list
list of haplotype data from PreEM
baseline
optional, haplotype to be used for baseline coding. Default is the most frequent haplotype.
family
binomial, poisson, gaussian or gamma are supported, default=binomial
gamma
initial estimates of haplotype frequencies, default values are calculated in PreEM using standard haplotype-counting (i.e. EM algorithm without adjustment for non-haplotype covariates)
maxit
maximum iterations of the EM loop, default=50
tol
convergence tolerance in terms of the maximum difference in parameter estimates between interations; default=0.001
...
additional arguments to be passed to the glm function such as starting values for parameter estimates in the risk model

Value

  • itnumber of iterations of the EM algorithm
  • betaestimated regression coefficients
  • gammaestimated haplotype frequencies
  • fitsfitted values of the trait
  • wtsfinal weights calculated in last iteration of the EM loop. These are estimates of the conditional probabilities of each multilocus genotype given the observed single-locus genotypes.
  • varjoint variance-covariance matrix of the estimated regression coefficients and the estimated haplotype frequencies
  • dispersionMLmaximum likelihood estimate of dispersion parameter (to get the moment estimate, use summary.EM)
  • familyfamily of the generalized linear model (e.g. binomial, gaussian, etc.)
  • responsetrait value
  • convergedTRUE/FALSE indicator of convergence. If the algorithm fails to converge, only the converged indicator is returned.

See Also

PreEM,summary.EM,glm,family.

Examples

Run this code
data(hypoDat)
example.preEM<-PreEM(hypoDat, 3)

names(example.preEM$haploDM)
# "h000"   "h001"   "h010"   "h011"   "h100"   "pooled"

# Logistic regression, baseline group: '001/001'

example.regr <- EM(affected ~ attr + h000+ h010 + h011 + h100 + pooled,
                     example.preEM, family=binomial())

Run the code above in your browser using DataLab