Learn R Programming

cape (version 2.0.2)

read.geno: Read in and format data for analysis by cape

Description

This function reads in genotypd data for cape analysis and formats it into a genotype object used by other functions in cape. The file can be in cape format (See read.population), a csv file, or a compressed RData file generated by saveRDS. See Details for further descriptions of the files.

Usage

read.geno(file.format = c("cape", "csv", "rdata"), filename = NULL, geno.col = NULL, delim = ",", na.strings = "-", check.chr.order = TRUE)

Arguments

file.format
A character string indicating which of the accepted formats describes the file to be read in. See Details for specifics.
filename
An optional character string with path name specifying the file to be read in. Omission of this argument will prompt a dialog box for selecting a file.
geno.col
An optional numeric vector specifying which columns the genotypes of interest are in. If omitted, all genotypes are read in.
delim
A character string indicating the delimeter in the data file. The default indicates a comma-separated file (",").
na.strings
The symbol used to denote missing data in the file. Misspecifying this character can lead to errors in processing the file in which cape misstakenly thinks some phenotypes have character values in them.
check.chr.order
A logical value indicating whether the order of the chromosomes should be checked. In general, chromosomes should be entered in increasing numerical value. CAPE does not sort chromosomes, and they will be plotted in the order in which they are entered. If the chromosomes have non-numeric and non-X or Y names, and cannot be checked appropriately, or an alternate order is desired, set check.chr.order to FALSE.

Value

This function is used primarily when genotype data are too large to include in the data.obj, but can be used for small data as well. This function converts genotype data into a list object called the geno.obj. Upon creation the geno.obj contains five elements: "geno", "marker.names", "chromosome", "marker.location", "marker.num"
  • genoA matrix containing the genotype data for the population. Each genotype is stored in a column, and individuals are stored in rows. Regardless of original format, the genotypes are converted to probabilities for in the data object. Genotypes originally coded as A,H,B for example, will be encoded as 0,0.5,1 respectively.
  • marker.names A vector containing the names of all markers.
  • chromosomeA vector containing the chromosome on which each marker is found.
  • marker.locationA vector containing the chromosomal position of each marker.
  • marker.numA vector containing a numerical identifier for each marker.
These elements are used by future functions to identify markers and should not be changed. After running both read.population and read.geno, the function make.data.obj should be run to transfer marker sinformation from the geno.obj to the data.obj.

Details

Genotype data can be contained in one of three file formats: cape, csv, or rdata. For a description of the cape format, see read.population. The csv format must contain the following:
  • header: A header labeling each column is required. The headers typically contain a name for each marker, for example "D15MIT80."
  • chromosomes: The second line of the file must contain the chromosome on which each marker is found.
  • marker location: The third line of the file must contain the chromosomal locations of the markers.
  • genotypes: Genotypes may be coded in one of three different formats: (1) As letters, for example A,H,B, indicating homozygous for allele 1, heterozygous, and homozygous for allele 2 respectively. "H" must be used for heterozygotes, but the other genotypes may be coded with any other letters. (2) As the numbers 0,1,2 indicating homozygous for allele 1, heterozygous, and homozygous for allele 2 respectively. (3) As continuous probabilities of the presence of the reference allele. An individual homozygous for allele 1 would be coded as 0, a heterozygous individual as 0.5, and an individual homozygous for allele 2 as 1. The continuous probabilities allow for uncertainty in genotyping that is not automatically available in the A,H,B or 0,1,2 encodings.

The rdata format follows the same format as the csv file, but is used for large data that cannot be reasonably stored in csv format. The file should be saved using the function saveRDS

See Also

read.population, read.pheno, make.data.obj