Learn R Programming

cleanUpdTSeq (version 1.10.2)

buildFeatureVector: buildFeatureVector

Description

This function creates a data frame. Fields include peak name, upstream sequence, downstream sequence, and features to be used in classifying the putative polyadenylation site.

Usage

buildFeatureVector(peaks, BSgenomeName = Drerio, upstream = 50L, downstream = 40L, wordSize = 6L, alphabet = c("ACGT"), sampleType = c("TP", "TN", "unknown"), replaceNAdistance = 40L, method = c("NaiveBayes", "SVM"), ZeroBasedIndex = 1L, fetchSeq = FALSE)

Arguments

peaks
An object of GRanges that may contain the upstream and downstream sequence information. This item is created by the function BED2GRangesSeq.
BSgenomeName
Name of the genome to use to get sequence. To find out a list of available genomes, please type available.genomes() in R.
upstream
This is the length of upstream sequence to use in the analysis.
downstream
This is the length of downstream sequence to use in the analysis.
wordSize
This is the size of the word to use as a feature for the upstream sequence. wordSize = 6 should always be used.
alphabet
These are the bases to use, for example DNA bases ACTG.
sampleType
This is the type of sample. For example TP (true positive) or TN (true negative) for training data or unknown for test data.
replaceNAdistance
If there is no A in the downstream sequence, then use this for the average distance of As to the putative polyadenylation site.
method
This is which machine learning method to specify. For this release, method should always be set to NaiveBayes.
ZeroBasedIndex
If the coordinates are set using Zero Based indexing, set this = 1.
fetchSeq
Boolean, for getting upstream and downstream sequence at this step or not.

Value

An object of "featureVector"

Examples

Run this code
testFile = system.file("extdata", "test.bed", package="cleanUpdTSeq")
testSet = read.table(testFile, sep = "\t", header = TRUE)
	
#convert the test set to GRanges with upstream and downstream sequence information
peaks = BED2GRangesSeq(testSet[1:10, ], upstream.seq.ind = 7, downstream.seq.ind = 8, withSeq=TRUE)
#build the feature vector for the test set with sequence information 
testSet.NaiveBayes = buildFeatureVector(peaks,BSgenomeName = Drerio, upstream = 40,
	   downstream = 30, wordSize = 6, alphabet=c("ACGT"),
	   sampleType = "unknown",replaceNAdistance = 30, 
	   method = "NaiveBayes", ZeroBasedIndex = 1, fetchSeq = FALSE)
    
#convert the test set to GRanges without upstream and downstream sequence information
peaks = BED2GRangesSeq(testSet[1:10, ],withSeq=FALSE)
        
#build the feature vector for the test set without sequence information
testSet.NaiveBayes = buildFeatureVector(peaks,BSgenomeName = Drerio, upstream = 40,
        downstream = 30, wordSize = 6, alphabet=c("ACGT"),
        sampleType = "unknown",replaceNAdistance = 30,
        method = "NaiveBayes", ZeroBasedIndex = 1, fetchSeq = TRUE)

Run the code above in your browser using DataLab