Reads in a file of alleles in a particular format.
readFreqs(strPath, FSIGenFormat = TRUE, delim = ",")
a list containing two vectors and a list, loci, counts, and freqs. The vector loci is a vector of the locus names in the frequency file. The vector counts is a vector of the number of individuals (or sometimes alleles) typed at each locus. This will null if the 'Curran' format is used. The list freqs, is a list of vectors with each vector containing the frequencies of the alleles at the locus. The names of the elements of the vectors are the STR allele designations.
The file from which to read the frequencies
Tells the function whether the file is either in FSI Genetics format (see below) or 'Curran' format
This argument is used when FSIGenFormat
is TRUE
,
and is the regular expression used to delimit columns of the table. it is
set to a single comma by default, and multiple delimiters are considered
empty separate fields. There probably should be an additional argument which
specifies the missing or empty cell symbol, but I won't programme this
unless somebody asks for it
James M. Curran
This function reads frequencies in the rectangular allele freqency table format used by FSI Genetics and other journals. This file format assumes a comma separated value file (CSV) (although the column delimeter can be specified). The first column should be labelled 'Allele' and contain the STR allele designations that are used in the data set. The remaining columns will have the locus name as a header, and frequencies that are either blank, zero, or non-zero. Blanks or zeros are used to specify that the allele is not observed (and not used) at the locus. The final row of the file should start with 'N' or 'n' in the first column and give the number of individuals typed (or the number of alleles recorded) in assessing the frequency of the alleles.
The second format is a very particular 'Curran' text format. The first line contains the number of loci in the multiplex. The next line will contain the name of the first locus and the number of alleles, nA, the locus separated by a comma. The next nA lines contain the allele number (from 1 to nA), the STR designation of the allele, and the frequency separated by commas. This pattern is repeated for each locus. In the future this function will read the rectangular allele freqency table used by FSI Genetics and other journals.