Learn R Programming

PopGenome (version 1.2.6)

readData: Reading alignments and calculating summary data

Description

This function reads alignments/SNP-data in several formats and calculates some summary data.

Usage

readData(path,populations=FALSE,outgroup=FALSE,include.unknown=FALSE,
         gffpath=FALSE,format="fasta",parallized=FALSE,
         progress_bar_switch=TRUE, FAST=FALSE,big.data=FALSE,
         SNP.DATA=FALSE
        )

## S3 method for class 'GENOME': get.sum.data(object)

Arguments

object
object of class "GENOME"
path
the basepath (folder) of the alignments
outgroup
vector of outgroup sequences
include.unknown
if unknown positions should be considered.
populations
list of populations.default:FALSE
gffpath
the basepath of the corresponding gff-files. default:FALSE
format
Data formats. "fasta" is default. See detail !
parallized
parallele processing. see detail !
progress_bar_switch
progress_bar
FAST
Fast computation. See detail !
big.data
using the ff-package
SNP.DATA
important for reference positions, should be TRUE, if you use SNP-data in alignment format

Value

  • The function creates an object of class "GENOME" --------------------------------------------------------- Following Slots will be filled in the "GENOME" object --------------------------------------------------------- rll{ Slot Description 1. n.sites total number of sites 2. n.biallelic.sites number of biallelic sites 3. n.gaps number of sites with gaps 4. n.unknowns number of sites with unknown nucleotides 5. n.valid.sites number of valid sites 6. n.polyallelic.sites number of sites with >2 nucleotides 7. trans.transv.ratio transition/transversion ratio of biallelic sites 8. region.names names of each region 9. region.data some detail data informations }

Details

The data (alignments or SNP-files) have to be stored in a folder. The folder is the input of this function. If there is no gff-file specified, an alignment in the right reading frame is expected. Otherwise the examination of synonymous and nonsynonymous positions is useless. format: "fasta","nexus","phylip", "MAF","MEGA","HapMap","VCF", "VCFhap" (haploid), "RData" parallized: - only works on UNIX, because of the multicore package. - will speed up calculation if you use a huge amount of alignments FAST: - fast computation of biallelic matrix, biallelic sites, transversions/transitions and biallelic substitutions - can be switched to TRUE in case of SNP-data without loosing informations

big.data: - using the ff-package - ff mechanism for biallelic.matrix and gff/gtf information - is done automatically for readVCF or readSNP - Note! should switch to TRUE, if you use big chunks and you want to concatenate them in the PopGenome framework (for example: sliding window of the whole data). SNP.DATA: - should be switched to TRUE, if you use SNP-data in alignment format.

Examples

Run this code
# GENOME.class <- readData("...\Alignments", FAST=TRUE)
# GENOME.class@region.names
# GENOME.class <- readData("...\Alignments", big.data=TRUE)
# object.size(GENOME.class)
# GENOME.class <- readData("...\Alignments",gffpath="...\Alignments_GFF")
# GENOME.class
# show the result:
# get.sum.data(GENOME.class)
# GENOME.class@region.data

Run the code above in your browser using DataLab