cross
. The comma-delimited format
(csv
) is recommended. All formats require chromosome
assignments for the genetic markers, and assume that markers are in
their correct order.read.cross(format=c("csv","mm","gary","karl"), dir=".", file,
genfile, mapfile, phefile, chridfile, mnamesfile,
pnamesfile, sep=",", na.strings="-",
genotypes=c("A","H","B","C","D"), estimate.map=FALSE)
"/"
) or double backslashes
("\\"
) to specify directory trees.csv
and mm
.karl
and
gary
only).csv
).karl
and
gary
only).gary
format only).gary
format only).gary
format
only).csv
format only). This is
generally ","
, but could be any other character (such as
"\t"
for tab), as long as that character does not appear in
any of the records.csv
format only). These are interpreted
globally for the entire file, so missing value codes in phenotypes
must not be valid genotypes, and vice versa.csv
format only). Generally this is a vector of
length 5, with the elements corresponding to AA, AB, BB, not AA
(i.e., AB or BB), and not BB (ie, AB or BB). Note<csv
and mm
only: if TRUE
and marker positions are not included in the input files, the
genetic map is estimated using the function
est.map
.cross
, which is a list with two components:names(geno)
contains the names of the
chromsomes. Each chromosome is itself a list, and is given class
A
or X
according to whether it is autosomal
or the X chromosome.
There are two components for each chromosome: data
, a matrix
whose rows are individuals and whose columns are markers, and
map
, either a vector of marker positions (in cM) or a matrix
of dim (2 x n.mar
) where the rows correspond to marker
positions in female and male genetic distance, respectively.
The genotype data for a backcross is coded as follows: NA = missing,
1 = AA, 2 = AB.
For an F2 intercross, the coding is NA = missing, 1 = AA, 2 = AB, 3
= BB, 4 = not BB (ie AA or AB; D in mapmaker/qtl), 5 = not AA (ie AB
or BB; C in mapmaker/qtl).
For a 4-way cross, the mother and father are assumed to have
genotypes AB and CD, respectively. The genotype data for the
progeny is assumed to be phase-known, with the following coding
scheme: NA = missing, 1 = AC, 2 = BC, 3 = AD, 4 = BD, 5 = A = AC or AD,
6 = B = BC or BD, 7 = C = AC or BC, 8 = D = AD or BD, 9 = AC or BD,
10 = AD or BC.n.ind x n.phe
) containing the
phenotypes.sep
) (a comma is recommended).The first line should contain the phenotype names followed by the marker names. At least one phenotype must be included; for example, include a numerical index for each individual.
The second line should contain blanks in the phenotype columns,
followed by chromosome identifiers for each marker in all other
columns. If a chromosome has the identifier X
or x
, it
is assumed to be the X chromosome; otherwise, it is assumed to be an
autosome.
An optional third line should contain blanks in the phenotype columns, followed by marker positions, in cM.
Subsequent lines should give the data, with one line for each individual, and with phenotypes followed by genotypes. If possible, phenotypes are made numeric; otherwise they are converted to factors.
The cross is determined to be a backcross if only the first two elements
of the genotypes
string are found; otherwise, it is assumed to
be an intercross.
csv
), Mapmaker
(mm
), Gary Churchill's format (gary
) and Karl Broman's
format (karl
). The required files and their specification for
each format appears below. The comma-delimited format is recommended.
Note that these formats work only for backcross and intercross data.