- ploidy
The ploidy level, 2 or higher: 2 for diploids, 3 for triploids
etc.
- markers
NA or a character or numeric vector specifying the markers to be
fitted. If a character vector, names should match the MarkerName column of
data; if numeric, the numbers index the markers based on the alphabetic order
of the MarkerNames in data.
- data
A data frame with the polyploid samples, with (at least) columns
MarkerName, SampleName and ratio, where ratio is the Y-allele signal
divided by the sum of the X- and Y-allele signals: ratio == Y/(X+Y)
- diplo
NULL or a data frame like data, with the diploid samples and (a subset
of) the same markers as in data. Genotypic scores for diploid samples are
calculated according to the best-fitting model calculated for the polyploid
samples and therefore may range from 0 (nulliplex) to <ploidy>, with the
expected dosages 0 and <ploidy> for the homozygotes and <ploidy/2> for the
heterozygotes.
diplo can also be used for any other samples that need to be
scored, but that should not affect the fitted models.
- select
A logical vector, recycled if shorter than nrow(data):
indicates which rows of data are to be used (default TRUE, i.e. keep all rows)
- diploselect
A logical vector like select, matching diplo instead of data
- pop.parents
NULL or a data.frame specifying the population structure. The
data frame has 3 columns: the first containing population ID's, the 2nd and 3rd
with the population ID's of the parents of these populations (if F1's) or NA
(if not). The population ID's should match those in parameter population. If
pop.parents is NULL all samples are considered to be in one population, and
parameter population should be NULL (default).
- population
NULL or a data.frame specifying to which population each
sample belongs. The data frame has two columns, the first containing
the SampleName (containing all SampleNames occurring in data),
the second column containing population ID's that match pop.parents. In both
columns NA values are not allowed. Parameters pop.parents and population
should both be NULL (default) or both be specified.
- parentalPriors
NULL or a data frame specifying the prior dosages for
the parental populations. The data frame has one column MarkerName
followed by one column for each F1 parental population. Column names (except
first) are population ID's matching the parental populations in pop.parents.
In case there is just one F1 population in pop.parents, it is possible to
have two columns for both parental populations instead of one (allowing two
specify two different prior dosages); in that case both columns for each
parent have the same caption. Each row specifies the priors for
one marker. The contents of the data frame are dosages, as integers from 0
to <ploidy>; NA values are allowed.
Note: when reading the data frame with read.table or read.csv, set
check.names=FALSE so column names (population ID's) are not changed.
- samplePriors
NULL or a data.frame specifying prior dosages for individual
samples. The first column called MarkerName is followed by one column per
sample; not all samples in data need to have a column here, only
those samples for which prior dosages for one or more markers are available.
Each row specifies the priors for one marker. The contents of the data frame
are dosages, as integers from 0 to <ploidy>; NA values are allowed.
Note: when reading the data frame with read.table or read.csv, set
check.names=FALSE so column names (population ID's) are not changed.
- startmeans
NULL or a data.frame specifying the prior means of
the mixture distributions. The data frame has one column MarkerName,
followed by <ploidy+1> columns with the prior ratio means on the original
(untransformed) scale. Each row specifies the
means for one marker in strictly ascending order (all means NA is allowed, but
markers without start means can also be omitted).
- maxiter
A single integer, passed to CodomMarker, see there for explanation
- maxn.bin
A single integer, passed to CodomMarker, see there for explanation
- nbin
A single integer, passed to CodomMarker, see there for explanation
- sd.threshold
The maximum value allowed for the (constant) standard
deviation of each peak on the arcsine - square root transformed scale,
default 0.1. If the optimal model has a larger standard deviation the marker
is rejected. Set to a large value (e.g. 1) to disable this filter.
- p.threshold
The minimum P-value required to assign a genotype (dosage)
to a sample; default 0.9. If the P-value for all possible genotypes is less
than p.threshold the sample is assigned genotype NA. Set to 1 to disable
this filter.
- call.threshold
The minimum fraction of samples to have genotypes
assigned ("called"); default 0.6. If under the optimal model the fraction of
"called" samples is less than call.threshold the marker is rejected. Set to 0
to disable this filter.
- peak.threshold
The maximum allowed fraction of the scored samples that
are in one peak; default 0.85. If any of the possible genotypes (peaks in the
ratio histogram) contains more than peak.threshold of the samples the marker
is rejected (because the remaining samples offers too little information for
reliable model fitting).
- try.HW
Logical: if TRUE (default), try models with and without a
constraint on the mixing proportions according to Hardy-Weinberg equilibrium
ratios. If FALSE, only try models without this constraint. Even when the HW
assumption is not applicable, setting try.HW to TRUE often still leads to
a better model. For more details on how try.HW is used see the Details
section of function fitOneMarker.
- dip.filter
if 1 (default), select best model only from models
that do not have a dip (a lower peak surrounded by higher peaks: these are not
expected under Hardy-Weinberg equilibrium or in cross progenies). If all
fitted models have a dip still the best of these is selected. If 2, similar,
but if all fitted models have a dip the marker is rejected. If 0, select best
model among all fitted models, including those with a dip.
- sd.target
If the fitted standard deviation of the peaks on the
transformed scale is larger than sd.target a penalty is given (see Details
section of function fitOneMarker);
default NA i.e. no penalty is given.
- filePrefix
partial file name, possibly including an absolute or
relative file path. filePrefix must always be specified.
All output files will have filePrefix prefixed to their name so it is clear
they are all derived from the same call to fitMarkers.
If filePrefix includes a file path all output files
will be saved there; if a filePrefix is specified that does not include a
a path the output will be saved in the working directory.
- rdaFiles
logical, default FALSE. The tabular output (scorefile,
diploscorefile, modelfile, allmodelsfile) is saved as tab-separated text files
with extension .dat or as an .RData file if this parameter is FALSE or TRUE
respectively.
- allModelsFile
logical, default FALSE. If TRUE an allmodelsfile is saved
with all models that have been tried for each marker; also the log file will
contain a few lines for each marker. This information is mostly useful
for debugging and locating problems.
- plot
String, "none" (default), "fitted" or "all". If "fitted" a plot
of the best fitting model and the assigned genotypes is saved with filename
<marker number><marker name>.<plot.type>, preceded by "rejected_" if the
marker was rejected. If "all", small plots of all models are saved to files
(8 per file) with filename
<"plots"><marker number><marker name><pagenr>.<plot.type> in addition to the
plot of the best fitting model.
- plot.type
String, "png" (default), "emf", "svg" or "pdf". Indicates
format for saving the plots.
- ncores
The number of processor cores to use for parallel processing,
default 1. Specifying more cores than available may cause problems.
Note that the implementation under Windows involves duplicating the input data
(under Linux that does not happen, nor under Windows if ncores=1), so if
under Windows memory size is a problem it would be better to run several
R instances simultaneously, each with ncores=1, each processing part of the
data.