haplin: Fitting log-linear models to case-parent triad and/or case-control data

Description

Produces an object of class haplin, which is the result of fitting the log-linear models to the data

Usage

haplin(filename, 
markers = "ALL", n.vars = 0, sep = " ", allele.sep = ";", 
na.strings = "NA", design = "triad", use.missing = FALSE, 
xchrom = FALSE, maternal = FALSE, test.maternal = FALSE, 
scoretest = "no", ccvar = NULL, covar = NULL, sex = NULL, 
reference = "reciprocal", response = "free", 
threshold = 0.01, max.haplos = NULL, haplo.file = NULL, 
resampling = FALSE, max.EM.iter = 50, data.out = "no", 
verbose = TRUE, printout = TRUE)

Arguments

Of the following arguments, only filename is required. Use of the remaining arguments will depend on the type of analysis.

filename

A character string giving the name and path of the ASCII data file to be read.

markers

Default is "ALL", which means HAPLIN uses all available markers in the data set in the analysis. For the current version of HAPLIN the number of markers used at a single run should probably not exceed 4 or 5 due to the computational burden. The markers ar

n.vars

Numeric. The number of variables (columns) in the data file before (to the left) of the genetic data.

sep

The character separator used in the data file to separate between "columns", where each column contains the two alleles of a single individual at a single marker.

allele.sep

The character separator used in the data file to separate the two alleles for a single individual in a single marker. The recommended (default) separator is ";", but for SNPs an empty "" is also common.

na.strings

The character string indicating missing data in the data file. Default is to use "NA" in place of, for instance, C;T for a SNP that hasn't been typed in that individual.

design

The value "triad" is used for the standard case triad design, without indepdendent controls. The value "cc.triad" means a combination of case triads and control triads. This requires the argument ccvar to point to the data column containing t

use.missing

A logical value used to determine whether triads with missing data should be included in the analysis. When set to TRUE, Haplin uses the EM algorithm to obtain risk estimates, also taking into account triads with missing data. The standard errors and p-va

xchrom

Logical, defaults to "FALSE". If set to "TRUE", haplin assumes the markers are on the x-chromosome. This option should be combined with specifying the sex argument, and setting (for the time being) response = "mult", reference = "ref.ca

maternal

If TRUE, maternal effects are estimated as well as the standard fetal effects.

test.maternal

Not yet implemented.

scoretest

Special interest only. If "no", no score test is computed. If "yes", an overall score p-value is included in the output, and the individual score values are returned in the haplin object. If "only", haplin is only run under the null hypothesis, and a simp

ccvar

Numeric. Should give the column number for the column containing the case-control indicator in the data file. Needed for the "cc" and "cc.triad" designs. The column should contain two numeric values, of which the largest one is always used to denote cases

covar

Not yet implemented.

sex

A numeric value specifying which of the data columns that contains the sex variable. The variable should be coded 1 for males and 2 for females. To be used with xchrom = TRUE.

reference

Decides how HAPLIN chooses its reference category for the effect estimates. Default value is "reciprocal". With the reciprocal reference the effect of a single or double dose of each haplotype is measured relative to the remaining haplotypes. This means t

response

The default value "free" means that both single- and double dose effects are estimated. Choosing "mult" instead specifies a multiplicative dose-response model.

threshold

Sets the (approximate) lower limit for the haplotype frequencies of those haplotypes that should be retained in the analysis. Hapotypes that are less frequent are removed, and information about this is given in the output.

max.haplos

Not yet implemented.

haplo.file

Not yet implemented.

resampling

Default is FALSE. When FALSE, the individual haplotypes reconstructed by the EM algorithm as assumed known when computing CIs and p-values. If set to "jackknife" a jackknife-based resampling procedure is used when computing confidence intervals and p-valu

max.EM.iter

The maximum number of iterations used by the EM algorithm. This value can be increased if necessary, which sometimes is the case with e.g. case-control data which a substantial amount of missing. However, for triad data with little missing information the

data.out

Character. Accepts values "no", "prelim", "null" or "full", with "no" as default. For values other than default, haplin returns the data file prepared for analysis rather than the usual haplin estimation results. The data file co

verbose

Default is T (=TRUE). During the EM algorithm, HAPLIN prints the estimated parameters and deviance for each step. To avoid the output, set this argument to F (=FALSE).

printout

Logical. If TRUE (default), haplin prints a full summary of the results after finishing the estimation. If FALSE, no such printout is given, but the summary function can later be applied to a saved result to get the same summary.

Value

An object of class haplin is returned. (The only exception is when data.out is set different from "no", where haplin will produce a data file with haplotypes identified.)

Warning

Typically, some of the included haplotypes will be relatively rare, such as a frequency of 1% - 5%. For those haplotypes there may be too little data to estimate the double doses properly, so the estimates may be unreliable. This is seen from the extremely wide confidence intervals. The rare double dose estimates should be disregarded, but the remaining single and double dose estimates are valid. To avoid the problem one can also reduce the model to a purely multiplicative model by setting response = "mult" combined with reference = "ref.cat".

Details

The output can be examined by print, summary and plot.

References

Gjessing HK and Lie RT. Case-parent triads: Estimating single- and double-dose effects of fetal and maternal disease gene haplotypes. Annals of Human Genetics (2006) 70, pp. 382-396. Web Site: http://www.uib.no/smis/gjessing/genetics/software/haplin/

Examples

Run this code

# Standard run:
haplin("data.dat")

# Specify path, estimate maternal effects:
haplin("C:/work/data.dat", maternal = T)

# Specify path, use haplotype no. 2 as reference:
haplin("C:/work/data.dat", reference = 2)

# Remove more haplotypes from estimation by increasing the threshold 
# to 5%:
haplin("C:/work/data.dat", threshold = 0.05)

# Estimate maternal effects, using the most frequent haplotype as reference. 
# Use all data, including triads with missing data. Select 
# markers 3, 4 and 8 from the supplied data.
haplin("C:/work/data.dat", use.missing = T, maternal = T, 
reference = "ref.cat", markers = c(3,4,8))
# Note: in this version of Haplin, the jackknife is 
# no longer necessary since the standard errors are already corrected.

# Some examples showing how to save the Haplin result and later 
# recall plot and summary results:

# Same analysis as above, saving the result in the object "result.1":
result.1 <- haplin("C:/work/data.dat", use.missing = T, maternal = T, 
reference = "ref.cat", markers = c(3,4,8))

# Replot the saved result (fetal effects):
plot(result.1)

# Replot the saved result (maternal effects):
plot(result.1, plot.maternal = T)

# Print a very short summary of saved result:
result.1

# A full summary of saved result, with confidence intervals and 
# p-values (the same as haplin prints when running):
summary(result.1)

# Some examples when the data file contains two covariates, 
# the second is the case-control variable:

# The following standard triad run is INCORRECT since it disregards 
# case status:
haplin("data.dat", use.missing = T, n.vars = 2, design = "triad")

# Combined run on "hybrid" design, correctly using both case-parent 
# triads and control-parent triads:
haplin("data.dat", use.missing = T, n.vars = 2, ccvar = 2, 
design = "cc.triad")

# If parent columns are not in the file, a plain case-control 
# run can be used:
haplin("data.dat", use.missing = T, n.vars = 2, ccvar = 2, 
design = "cc", response = "mult", reference = "ref.cat")

# An example of how to produce a data file with all possible haplotypes
# identified for each triad, together with their probaility weights:
result.data <- haplin("C:/work/data.dat", use.missing = T, 
markers = c(3,4,8), data.out = "prelim")
# result.data will then contain the data file, with a vector of 
# probabilities (freq) computed from the preliminary haplotype
# frequencies.

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

Warning

Details

References

See Also

Examples