Learn R Programming

polysat (version 0.1)

read.Structure: Read Genotypes and Other Data from a Structure File

Description

read.Structure creates a genotype object of the standard polysat format by reading a text file formatted for the software Structure. Optionally, it can also extract PopData and any other extra columns from the file.

Usage

read.Structure(infile, missingin = -9, missingout = -9, sep = "",
markernames = TRUE, labels = TRUE, extrarows = 1, extracols = 0,
ploidy = 4, getexcols = FALSE)
infile{
Character string.  The file path to be read.
}
  missingin{
The symbol used to represent missing data in the Structure file.
}
  missingout{
The symbol to be used to represent missing data in the genotype object
that is produced.
}
  sep{
The character used to delimit the fields of the Structure file (tab by default).
}
  markernames{
Boolean, indicating whether the file has a header containing marker names.
}
  labels{
Boolean, indicating whether the file has a column containing sample names.
}
  extrarows{
Integer.  The number of extra rows that the file has, not counting
marker names.  This could include rows for recessive alleles,
inter-marker distances, or phase information.
}
  extracols{
Integer.  The number of extra columns that the file has, not counting
sample names (labels).  This could include PopData, PopFlag, LocData,
Phenotype, or any other extra columns.
}
  ploidy{
    Integer.  The ploidy of the file, i.e. how many rows there are for each
    individual.
}
  getexcols{
Boolean, indicating whether the function should return the data from any
extra columns.
}
The current version of read.Structure does not support the ONEROWPERIND option in the file format. Each locus must only have one column. If your data is in ONEROWPERIND format, it should be fairly simple to manipulate it in a spreadsheet program so that it can be read by read.GeneMapper instead. read.Structure uses read.table to initially read the file into a data frame, then extracts information from the data frame. Because of this, any header rows (particularly the one containing marker names) should have leading tabs (or spaces if sep="") so that the marker names align correctly with their corresponding genotypes. You should be able to open the file in a spreadsheet program and have everything align correctly. If the file does not contain sample names, set labels=FALSE. The samples will be numbered instead, and if you like you can edit the dimnames of the genotype object after the fact if you have the sample names stored separately. Likewise, if markernames=FALSE, the loci will be numbered automatically by the column names that read.table creates, but these can also be edited after the fact.
If getexcols=FALSE, the function returns only a genotype object in the standard polysat format. This is a two-dimensional list, where samples are represented in the first dimension and loci in the second dimension. Each element of the list is an integer vector containing all unique alleles for a given locus and sample. If getexcols=TRUE, the function returns a list with two elements. The first, named ExtraCol, is a data frame, where the row names are the sample names and each column is one of the extra columns from the file (but with each sample only once instead of being repeated ploidy number of times). The second element is named Genotypes and is the genotype object described above. http://pritch.bsd.uchicago.edu/structure_software/release_versions/v2.3.3/structure_doc.pdf Hubisz, M. J., Falush, D., Stephens, M. and Pritchard, J. K. (2009) Inferring weak population structure with the assistance of sample group information. Molecular Ecology Resources 9, 1322-1332. Falush, D., Stephens, M. and Pritchard, J. K. (2007) Inferences of population structure using multilocus genotype data: dominant markers and null alleles. Molecular Ecology Notes 7, 574-578. [object Object] write.Structure, read.GeneMapper, read.Tetrasat, read.ATetra, read.GenoDive, dominant.to.codominant, read.SPAGeDi # create a file to read (normally done in a text editor or spreadsheet # software) cat("", "-9-9-9-9-9-9-9-9", "WIN1B", "WIN1B", "WIN1B", "WIN1B", "WIN1B-9-9-9-9-9-9-9-9", "WIN1B-9-9-9-9-9-9-9-9", "WIN1B-9-9-9-9-9-9-9-9", "WIN1B-9-9-9-9-9-9-9-9", "MCD1", "MCD1", "MCD1", "MCD1", "MCD1", "MCD1", "MCD1-9-9-9-9-9-9-9-9", "MCD1-9-9-9-9-9-9-9-9", "MCD2", "MCD2", "MCD2", "MCD2", "MCD2", "MCD2", "MCD2", "MCD2", "MCD3", "MCD3", "MCD3-9-9-9-9-9-9-9-9", "MCD3-9-9-9-9-9-9-9-9", "MCD3-9-9-9-9-9-9-9-9", "MCD3-9-9-9-9-9-9-9-9", "MCD3-9-9-9-9-9-9-9-9", "MCD3-9-9-9-9-9-9-9-9", sep="",file="structtest.txt") # view the file cat(readLines("structtest.txt"), sep="") # read the structure file into genotypes and populations testdata <- read.Structure("structtest.txt", extracols=1, ploidy=8, getexcols=TRUE) # examine the results testdata$ExtraCol testdata$Genotypes testdata$Genotypes["WIN1B",] file

Arguments