cross
. The comma-delimited format
(csv
) is recommended. All formats require chromosome
assignments for the genetic markers, and assume that markers are in
their correct order.read.cross(format=c("csv", "csvr", "csvs", "csvsr", "mm", "qtx",
"qtlcart", "gary", "karl"),
dir="", file, genfile, mapfile, phefile, chridfile,
mnamesfile, pnamesfile, na.strings=c("-","NA"),
genotypes=c("A","H","B","D","C"), estimate.map=TRUE,
convertXdata=TRUE, ...)
"/"
) or double backslashes
("\\"
) to specify directory trees.csv
, csvr
and mm
.karl
and
gary
only).csv
and csvr
).karl
and
gary
only).gary
format only).gary
format only).gary
format
only).csv
, csvr
, and gary
formats only). For the
csv
and csvr
formats, these are interpreted globally
for the entirecsv
and csvr
formats only). Generally this is a vector of
length 5, with the elements corresponding to AA, AB, BB, not AA
(i.e., AB or BB), and not BB (icsv
, csvr
, qtx
, mm
, and
gary
only: if TRUE and marker positions are not included in
the input files, the genetic map is estimated using the function
sex
and
pgm
in the phenotype data if they available or by inference
if they are not. If FALSE, the X chromsome data is readread.table
in the case of
csv
and csvr
formats. In particular, one may use the
argument
sep
to spcross
, which is a list with two components:names(geno)
contains the names of the
chromsomes. Each chromosome is itself a list, and is given class
A
or X
according to whether it is autosomal
or the X chromosome.
There are two components for each chromosome: data
, a matrix
whose rows are individuals and whose columns are markers, and
map
, either a vector of marker positions (in cM) or a matrix
of dim (2 x n.mar
) where the rows correspond to marker
positions in female and male genetic distance, respectively.
The genotype data for a backcross is coded as follows: NA = missing,
1 = AA, 2 = AB.
For an F2 intercross, the coding is NA = missing, 1 = AA, 2 = AB, 3
= BB, 4 = not BB (ie AA or AB; D in mapmaker/qtl), 5 = not AA (ie AB
or BB; C in mapmaker/qtl).
For a 4-way cross, the mother and father are assumed to have
genotypes AB and CD, respectively. The genotype data for the
progeny is assumed to be phase-known, with the following coding
scheme: NA = missing, 1 = AC, 2 = BC, 3 = AD, 4 = BD, 5 = A = AC or AD,
6 = B = BC or BD, 7 = C = AC or BC, 8 = D = AD or BD, 9 = AC or BD,
10 = AD or BC.n.ind x n.phe
) containing the
phenotypes. The phenotype data should contain a column named "sex"
which
indicates the sex of each individual, either coded as 0=female and
1=male, or as a factor with levels female/male or f/m. Case will be
ignored both in the name and in the factor levels. If no such
phenotype column is included, it will be assumed that all individuals
are of the same sex.
In the case of an intercross, the phenotype data may also contain a
column names "pgm"
(for ``paternal grandmother'') indicating the
direction of the cross. It should be coded as 0/1 with 0 indicating
the cross (AxB)x(AxB) or (BxA)x(AxB) and 1 indicating the cross
(AxB)x(BxA) or (BxA)x(BxA). If no such phenotype column is included,
it will be assumed that all individuals come from the same direction
of cross.
The internal storage of X chromosome data is quite different from that
of autosomal data. Males are coded 1=AA and 2=BB; females with pgm==0
are coded 1=AA and 2=AB; and females with pgm==1 are coded 1=BB and
2=AB. If the argument convertXdata
is TRUE, conversion to this
format is made automatically; if FALSE, no conversion is done,
summary.cross
will likely return a warning, and
most analyses will not work properly.
sep
which will be passed
to the function read.table
).The first line should contain the phenotype names followed by the marker names. At least one phenotype must be included; for example, include a numerical index for each individual.
The second line should contain blanks in the phenotype columns,
followed by chromosome identifiers for each marker in all other
columns. If a chromosome has the identifier X
or x
, it
is assumed to be the X chromosome; otherwise, it is assumed to be an
autosome.
An optional third line should contain blanks in the phenotype columns, followed by marker positions, in cM.
Marker order is taken from the cM positions, if provided; otherwise, it is taken from the column order.
Subsequent lines should give the data, with one line for each individual, and with phenotypes followed by genotypes. If possible, phenotypes are made numeric; otherwise they are converted to factors.
The cross is determined to be a backcross if only the first two elements
of the genotypes
string are found; otherwise, it is assumed to
be an intercross.
csv
format, but rotated (or really
transposed), so that rows are columns and columns are rows.csv
format, but with separate files for the
genotype and phenotype data.The first column in the genotype data must be specify individuals' identifiers, and there must be a column in the phenotype data with precisely the same information, and the individuals must be in precisely the same order in the two files.
In the genotype data file, the second row gives the chromosome IDs. The cell in the second row, first column, must be blank. A third row giving cM positions of markers may be included, in which case the cell in the third row, first column, must be blank.
There need be no blank rows in the phenotype data file.
csvs
format, but rotated (or really
transposed), so that rows are columns and columns are rows.csv
), rotated
comma-delimited (csvr
), comma-delimited with separate files for
genotype and phenotype data (csvs
), rotated comma-delimited
with separate files for genotype and phenotype data (csvsr
,
Mapmaker (mm
), Map Manager QTX (qtx
), Gary Churchill's
format (gary
) and Karl Broman's format (karl
). The
required files and their specification for each format appears below.
The comma-delimited format is recommended. Note that these formats
work only for backcross and intercross data. The sampledata
directory in the package distribution contains
sample data files in all formats except Gary's.