Learn R Programming

PopGenome (version 2.1.6)

readData: Read alignments and calculate summary data

Description

This function reads alignments/SNP data in several formats and calculates some summary data.

Usage

readData(path,populations=FALSE,outgroup=FALSE,include.unknown=FALSE,
         gffpath=FALSE,format="fasta",parallized=FALSE,
         progress_bar_switch=TRUE, FAST=FALSE,big.data=FALSE,
         SNP.DATA=FALSE
        )

## S3 method for class 'GENOME': get.sum.data(object)

Arguments

object
object of class "GENOME"
path
the basepath (folder) of the alignments
outgroup
vector of outgroup sequences
include.unknown
if positions with unknown nucleotides should be considered.
populations
list of populations. default:FALSE
gffpath
the basepath (folder) of the corresponding GFF-files. default:FALSE
format
data formats. "fasta" is default. See details !
parallized
parallel processing to accelerate the reading process. See details !
progress_bar_switch
progress_bar
FAST
fast computation. See details !
big.data
use the ff-package
SNP.DATA
important for reference positions; should be TRUE if you use SNP-data in alignment format

Value

  • The function creates an object of class "GENOME" --------------------------------------------------------- The following slots will be filled in the "GENOME" object --------------------------------------------------------- rll{ Slot Description 1. n.sites total number of sites 2. n.biallelic.sites number of biallelic sites 3. n.gaps number of sites with gaps 4. n.unknowns number of sites with unknown nucleotides 5. n.valid.sites number of valid sites 6. n.polyallelic.sites number of sites with >2 nucleotides 7. trans.transv.ratio transition/transversion ratio of biallelic sites 8. region.names names of regions 9. region.data some detailed information about the data read }

Details

All data (alignments or SNP-files) have to be stored in one folder. The folder is the input of this function. If no GFF file (which also have to be stored in a folder) is specified, an alignment in the correct reading frame (starting at a first codon position) is expected. Otherwise synonymous and non-synonymous positions are not identified correctly. Note: The GFF-files have to be EXACTLY the same names (without any extensions like .fas or .gff) as the files storing the nucleotide data to ensure correct matching format: "fasta","nexus","phylip", "MAF","MEGA" "HapMap","VCF" "RData" Valid nucleotides are T,t,U,u,G,g,A,a,C,c,N,n,- parallized: - will speed up calculations if you use a very large amount of alignments FAST: - will not classify synonymous/non-synonymous SNPs directly - fast computation (via compiled C code) of biallelic matrix, biallelic sites, transversions/transitions and biallelic substitutions - can be switched to TRUE in case of SNP data without loss of information big.data: - use the ff-package - ff mechanism is used for biallelic.matrix and GFF/GTF information - is automatically activated for readVCF or readSNP - Note! you should set this to TRUE if you use big chunks of data and you want to later concatenate them in the PopGenome framework (for example: sliding windows of the whole dataset). SNP.DATA: - should be switched to TRUE if you use SNP-data in alignment format. - the corresponding SNP positions can be set via set.ref.positions

Examples

Run this code
# GENOME.class <- readData("...\Alignments", FAST=TRUE)
# GENOME.class <- readData("VCF", format="VCF")
# Note, "Alignments" and "VCF" are folders !
# GENOME.class@region.names
# GENOME.class <- readData("...\Alignments", big.data=TRUE)
# object.size(GENOME.class)
# GENOME.class <- readData("...\Alignments",gffpath="...\Alignments_GFF")
# GENOME.class
# show the result:
# get.sum.data(GENOME.class)
# GENOME.class@region.data

Run the code above in your browser using DataLab