Learn R Programming

SELEX (version 1.4.0)

SELEX: SELEX Package

Description

Functions to assist in discovering transcription factor DNA binding specificities from SELEX-seq experimental data according to the Slattery et al. paper. For a more comprehensive example, please look at the vignette. Sample data used in the Slattery, et. al. is stored in the extdata folder for the package, and can be accessed using either the base R function system.file or the package function selex.exampledata.

Functions available:

selex.affinities
Construct a K-mer affinity table
selex.config
Set SELEX system parameters
selex.counts
Construct or retrieve a K-mer count table
selex.countSummary
Summarize available K-mer count tables
selex.defineSample
Define annotation for an individual sample
selex.exampledata
Extract example data files
selex.fastqPSFM
Construct a diagnostic PSFM for a FASTQ file
selex.getAttributes
Display sample handle attributes
selex.getRound0
Obtain round zero sample handle
selex.getSeqfilter
Display sequence filter attributes
selex.infogain
Compute or retrieve information gain between rounds
selex.infogainSummary
Summarize available information gain values
selex.jvmStatus
Display current JVM memory usage
selex.kmax
Calculate kmax for a dataset
selex.kmerPSFM
Construct a PSFM from a K-mer table
selex.loadAnnotation
Load a sample annotation file
selex.mm
Build or retrieve a Markov model
selex.mmProb
Compute prior probability of sequence using Markov model
selex.mmSummary
Summarize Markov model properties
selex.revcomp
Create forward-reverse complement data pairs
selex.run
Run a standard SELEX analysis
selex.sample
Create a sample handle
selex.sampleSummary
Show samples visible to the current SELEX session
selex.saveAnnotation
Save sample annotations to file
selex.seqfilter
Create a sequence filter
selex.setwd
Set or change the working directory
selex.split
Randomly split a dataset
selex.summary
Display all count table, Markov model, and information gain summaries

Arguments

Details

Package:
SELEX
Type:
Package
Version:
.99
Date:
2014-11-3
License:
GPL

References

Slattery, M., Riley, T.R., Liu, P., Abe, N., Gomez-Alcala, P., Dror, I., Zhou, T., Rohs, R., Honig, B., Bussemaker, H.J.,and Mann, R.S. (2011) Cofactor binding evokes latent differences in DNA binding specificity between Hox proteins. Cell 147:1270--1282. Riley, T.R., Slattery, M., Abe, N., Rastogi, C., Liu, D., Mann, R.S., and Bussemaker, H.J. (2014) SELEX-seq: a method for characterizing the complete repertoire of binding site preferences for transcription factor complexes. Methods Mol. Biol. 1196:255--278.

Examples

Run this code
#Initialize the SELEX package
#options(java.parameters="-Xmx1500M")
#library(SELEX) 

# Configure the current session
workDir = file.path(".", "SELEX_workspace")
selex.config(workingDir=workDir,verbose=FALSE, maxThreadNumber= 4)

# Extract sample data from package, including XML database
sampleFiles = selex.exampledata(workDir)

# Load & display all sample files using XML database
selex.loadAnnotation(sampleFiles[3])
selex.sampleSummary()

# Create sample handles
r0 = selex.sample(seqName="R0.libraries", sampleName="R0.barcodeGC", round=0)
r2 = selex.sample(seqName='R2.libraries', sampleName='ExdHox.R2', round=2)

# Split the r0 sample into testing and training sets
r0.split = selex.split(sample=r0)
r0.split

# Display all currently loaded samples
selex.sampleSummary() 

# Find kmax on the test dataset
k = selex.kmax(sample=r0.split$test)

# Build the Markov model on the training dataset
mm = selex.mm(sample=r0.split$train, order=NA, crossValidationSample=r0.split$test)
# See Markov model R^2 values
selex.mmSummary()

# Kmer counting with an offset
t1 =  selex.counts(sample=r2, k=2, offset=14, markovModel=NULL)
# Kmer counting with a Markov model (produces expected counts)
t2 =  selex.counts(sample=r2, k=4, markovModel=mm)
# Display all available kmer statistics
selex.countSummary()

# Calculate information gain
ig =  selex.infogain(sample=r2, k=8, mm)
# View information gain results
selex.infogainSummary()

# Perform the default analysis
selex.run(trainingSample=r0.split$train, crossValidationSample=r0.split$test, 
  infoGainSample=r2)

# View all stats
selex.summary()

Run the code above in your browser using DataLab