SELEX: SELEX Package

Description

Functions to assist in discovering transcription factor DNA binding specificities from SELEX-seq experimental data according to the Slattery et al. paper. For a more comprehensive example, please look at the vignette. Sample data used in the Slattery, et. al. is stored in the extdata folder for the package, and can be accessed using either the base R function system.file or the package function selex.exampledata.

Functions available:

selex.affinities

Construct a K-mer affinity table

selex.config

Set SELEX system parameters

selex.counts

Construct or retrieve a K-mer count table

selex.countSummary

Summarize available K-mer count tables

selex.defineSample

Define annotation for an individual sample

selex.exampledata

Extract example data files

selex.fastqPSFM

Construct a diagnostic PSFM for a FASTQ file

selex.getAttributes

Display sample handle attributes

selex.getRound0

Obtain round zero sample handle

selex.getSeqfilter

Display sequence filter attributes

selex.infogain

Compute or retrieve information gain between rounds

selex.infogainSummary

Summarize available information gain values

selex.jvmStatus

Display current JVM memory usage

selex.kmax

Calculate kmax for a dataset

selex.kmerPSFM

Construct a PSFM from a K-mer table

selex.loadAnnotation

Load a sample annotation file

selex.mm

Build or retrieve a Markov model

selex.mmProb

Compute prior probability of sequence using Markov model

selex.mmSummary

Summarize Markov model properties

selex.revcomp

Create forward-reverse complement data pairs

selex.run

Run a standard SELEX analysis

selex.sample

Create a sample handle

selex.sampleSummary

Show samples visible to the current SELEX session

selex.saveAnnotation

Save sample annotations to file

selex.seqfilter

Create a sequence filter

selex.setwd

Set or change the working directory

selex.split

Randomly split a dataset

selex.summary

Display all count table, Markov model, and information gain summaries

Arguments

Details

Package:

SELEX

Type:

Package

Version:

.99

Date:

2014-11-3

License:

GPL

References

Slattery, M., Riley, T.R., Liu, P., Abe, N., Gomez-Alcala, P., Dror, I., Zhou, T., Rohs, R., Honig, B., Bussemaker, H.J.,and Mann, R.S. (2011) Cofactor binding evokes latent differences in DNA binding specificity between Hox proteins. Cell 147:1270--1282. Riley, T.R., Slattery, M., Abe, N., Rastogi, C., Liu, D., Mann, R.S., and Bussemaker, H.J. (2014) SELEX-seq: a method for characterizing the complete repertoire of binding site preferences for transcription factor complexes. Methods Mol. Biol. 1196:255--278.

Examples

Run this code

#Initialize the SELEX package
#options(java.parameters="-Xmx1500M")
#library(SELEX) 

# Configure the current session
workDir = file.path(".", "SELEX_workspace")
selex.config(workingDir=workDir,verbose=FALSE, maxThreadNumber= 4)

# Extract sample data from package, including XML database
sampleFiles = selex.exampledata(workDir)

# Load & display all sample files using XML database
selex.loadAnnotation(sampleFiles[3])
selex.sampleSummary()

# Create sample handles
r0 = selex.sample(seqName="R0.libraries", sampleName="R0.barcodeGC", round=0)
r2 = selex.sample(seqName='R2.libraries', sampleName='ExdHox.R2', round=2)

# Split the r0 sample into testing and training sets
r0.split = selex.split(sample=r0)
r0.split

# Display all currently loaded samples
selex.sampleSummary() 

# Find kmax on the test dataset
k = selex.kmax(sample=r0.split$test)

# Build the Markov model on the training dataset
mm = selex.mm(sample=r0.split$train, order=NA, crossValidationSample=r0.split$test)
# See Markov model R^2 values
selex.mmSummary()

# Kmer counting with an offset
t1 =  selex.counts(sample=r2, k=2, offset=14, markovModel=NULL)
# Kmer counting with a Markov model (produces expected counts)
t2 =  selex.counts(sample=r2, k=4, markovModel=mm)
# Display all available kmer statistics
selex.countSummary()

# Calculate information gain
ig =  selex.infogain(sample=r2, k=8, mm)
# View information gain results
selex.infogainSummary()

# Perform the default analysis
selex.run(trainingSample=r0.split$train, crossValidationSample=r0.split$test, 
  infoGainSample=r2)

# View all stats
selex.summary()

Run the code above in your browser using DataLab