Usage
haplin(filename, data, pedIndex,
markers = "ALL", n.vars = 0, sep = " ", allele.sep = ";",
na.strings = "NA", design = "triad", use.missing = FALSE,
xchrom = FALSE, maternal = FALSE, test.maternal = FALSE,
poo = FALSE, scoretest = "no", ccvar = NULL, strata = NULL,
sex = NULL, comb.sex = "double",
reference = "reciprocal", response = "free",
threshold = 0.01, max.haplos = NULL, haplo.file = NULL,
resampling = "no", max.EM.iter = 50, data.out = "no",
verbose = TRUE, printout = TRUE)
Arguments
Of the following arguments, either data
or filename
is required. The data
argument is usually combined with the pedIndex
argument. Use of the remaining arguments will depend on the type of analysis.
filename
A character string giving the name and path of the ASCII data file to be read. The file should be in the haplin
data format.
data
An R-object which is the result of using load.gwaa.data
to load data into R. See the web page for a description of how to convert a ped file into a file that can be loaded. The conversion uses prepPed
and convert.snp.ped
.
pedIndex
A file of family indexes constructed by using prepPed
on the original ped file. This file is used by haplin
to extract and store family information.
markers
Default is "ALL", which means haplin
uses all available markers in the data set in the analysis. For the current version of haplin
the number of markers used at a single run should probably not exceed 4 or 5 due to the computational burden. The markers argument can be used to select appropriate markers from the file without creating a new file for the selected markers. For instance, if markers is set to c(2,4), haplin
will only use the second and fourth markers supplied in the data set. When running haplin
, it may be a good idea to start exploring a few markers at a time, using this argument.
n.vars
Numeric. The number of variables (columns) in the data file before (to the left) of the genetic data.
sep
The character separator used in the data file to separate between "columns", where each column contains the two alleles of a single individual at a single marker.
allele.sep
The character separator used in the data file to separate the two alleles for a single individual in a single marker. The recommended (default) separator is ";", but for SNPs an empty "" is also common.
na.strings
The character string indicating missing data in the data file. Default is to use "NA" in place of, for instance, C;T for a SNP that hasn't been typed in that individual.
design
The value "triad" is used for the standard case triad design, without independent controls. The value "cc.triad" means a combination of case triads and control triads. This requires the argument ccvar
to point to the data column containing the case-control variable. The value "cc" means a simple case-control design, where the parents have not been genotyped (there are no data columns for parental genes).
use.missing
A logical value used to determine whether triads with missing data should be included in the analysis. When set to TRUE, haplin
uses the EM algorithm to obtain risk estimates, also taking into account triads with missing data. The standard errors and p-values are adjusted to correct for this. The default, however, is FALSE. When FALSE, all triads having any sort of missing data are excluded before the analysis is run. Note that haplin
only looks at markers actually used in the analysis, so that if the markers argument (see below) is used to select a collection of markers for analysis, haplin
only excludes triads with missing data on the included markers.
xchrom
Logical, defaults to "FALSE". If set to "TRUE", haplin
assumes the markers are on the x-chromosome. This option should be combined with specifying the sex
argument. In addition, comb.sex
can be useful. xchrom = T
can be combined with poo = T
and/or maternal = T
.
maternal
If TRUE, maternal effects are estimated as well as the standard fetal effects.
test.maternal
Not yet implemented.
poo
If TRUE, haplin
will split single-dose effects into two separate effect estimates, one for the maternally inherited haplotype, and one for the paternally inherited haplotype. Double dose will be estimated as before.
scoretest
Special interest only. If "no", no score test is computed. If "yes", an overall score p-value is included in the output, and the individual score values are returned in the haplin
object. If "only", haplin
is only run under the null hypothesis, and a simple score object is returned instead of the full haplin
object. Useful if only score testing is needed.
ccvar
Numeric. Should give the column number for the column containing the case-control indicator in the data file. Needed for the "cc" and "cc.triad" designs. The column should contain two numeric values, of which the largest one is always used to denote cases.
strata
Not yet implemented.
sex
To be used with xchrom = TRUE
. A numeric value specifying which of the data columns that contains the sex variable. The variable should be coded 1 for males and 2 for females.
comb.sex
To be used with xchrom = TRUE
. A character value that specifies how to handle gender differences on the X-chromosome. If set to "males" or "females", analyses are done either for just males or just females, respectively. If set to "single" or "double", males and females are used in a combined analysis. Specifically, when "single", the effect of a (single) allele in males is assumed to equal the effect of a single allele dose in females, and similarly, when "double", a single allele in males is assumed to have the same effect as a double allele dose in females. Default is "double", which corresponds to X-inactivation. See separate description for more details.
reference
Decides how haplin
chooses its reference category for the effect estimates. Default value is "reciprocal". With the reciprocal reference the effect of a single or double dose of each haplotype is measured relative to the remaining haplotypes. This means that a new reference category is used for each single haplotype. Other possible values are "population" (which is similar to reciprocal, but where the reference category is always the total population), and "ref.cat", where a single haplotype is used as reference for all the rest. For ref.cat, the default is to choose the most frequent haplotype as the reference haplotype. The reference haplotype can be set explicitly by giving a numeric value for the reference argument. Note that the numeric value refers to the haplotype's position among the haplotypes selected for analysis by haplin
. This means that one should run haplin
once first to see what haplotypes are used before giving a numeric value to reference.
response
The default value "free" means that both single- and double dose effects are estimated. Choosing "mult" instead specifies a multiplicative dose-response model.
threshold
Sets the (approximate) lower limit for the haplotype frequencies of those haplotypes that should be retained in the analysis. Hapotypes that are less frequent are removed, and information about this is given in the output. Default is 0.01.
max.haplos
Not yet implemented.
haplo.file
Not yet implemented.
resampling
Mostly for testing. Default is "no". When "no", the individual haplotypes reconstructed by the EM algorithm as assumed known when computing CIs and p-values. If set to "jackknife" a jackknife-based resampling procedure is used when computing confidence intervals and p-values for effect estimates. This takes more time, but corrects the CIs and p-values for the uncertainty contained in unphased data. Note: in all recent versions of haplin
, the resampling is no longer needed since the confidence intervals and p-values are already corrected in the standard computation.
max.EM.iter
The maximum number of iterations used by the EM algorithm. This value can be increased if necessary, which sometimes is the case with e.g. case-control data which a substantial amount of missing. However, for triad data with little missing information there is usually no need for many iterations.
data.out
Character. Accepts values "no", "prelim", "null" or "full", with "no" as default. For values other than default, haplin
returns the data file prepared for analysis rather than the usual haplin
estimation results. The data file contains the haplotypes identified for each triad, and a vector of weights giving the probability distribution of different haplotype configurations within a triad. The probabilities are computed from preliminary haplotype frequency estimates, from the null model or from the full likelihood model. The "prelim" option will be much faster but somewhat less precise than the likelihood models.
verbose
Default is T (=TRUE). During the EM algorithm, haplin
prints the estimated parameters and deviance for each step. To avoid the output, set this argument to F (=FALSE).
printout
Logical. If TRUE (default), haplin
prints a full summary of the results after finishing the estimation. If FALSE, no such printout is given, but the summary
function can later be applied to a saved result to get the same summary.