selex.run: Run a standard SELEX analysis

Description

A function used to, in one shot,

Determine kmax on the crossValidationSample with the minimum count determined by minCount
Build a Markov model on the trainingSample and test it on the crossValidationSample with kmax length K-mers used to determine model fit, and constructed using mmMethod
Calculate information gain for infoRange K-mer lengths on the infoGainSample, using the Markov model order with the highest R^2 to predict previous round values.

Usage

selex.run(trainingSample, crossValidationSample, minCount=100, infoGainSample,  infoRange=NULL, mmMethod="DIVISION", mmWithLeftFlank=FALSE)

Arguments

trainingSample

A sample handle to the training dataset.

crossValidationSample

A sample handle to the cross-validation dataset.

minCount

The minimum count to be used.

infoGainSample

A sample handle to the dataset on which to perform the information gain analysis.

infoRange

The range of K-mer lengths for which the information gain should be calculated. If NULL, the range is automatically set to start from the optimal Markov model order + 1 to the length of the variable region. This is the same as k in selex.infogain.

mmMethod

A character string indicating the algorithm used to evaluate the Markov model conditional probabilities. Can be either "DIVISION" (default) or "TRANSITION".

mmWithLeftFlank

Predict expected counts by considering the sequences in the left flank of the variable region.

Value

Details

Please see the individual functions or `References' for more details.

References

Slattery, M., Riley, T.R., Liu, P., Abe, N., Gomez-Alcala, P., Dror, I., Zhou, T., Rohs, R., Honig, B., Bussemaker, H.J.,and Mann, R.S. (2011) Cofactor binding evokes latent differences in DNA binding specificity between Hox proteins. Cell 147:1270--1282. Riley, T.R., Slattery, M., Abe, N., Rastogi, C., Liu, D., Mann, R.S., and Bussemaker, H.J. (2014) SELEX-seq: a method for characterizing the complete repertoire of binding site preferences for transcription factor complexes. Methods Mol. Biol. 1196:255--278.

Examples

Run this code

#Initialize the SELEX package
#options(java.parameters="-Xmx1500M")
#library(SELEX) 

# Configure the current session
workDir = file.path(".", "SELEX_workspace")
selex.config(workingDir=workDir,verbose=FALSE, maxThreadNumber= 4)

# Extract sample data from package, including XML database
sampleFiles = selex.exampledata(workDir)

# Load all sample files using XML database
selex.loadAnnotation(sampleFiles[3])

# Create sample handles
r0 = selex.sample(seqName="R0.libraries", sampleName="R0.barcodeGC", round=0)
r2 = selex.sample(seqName='R2.libraries', sampleName='ExdHox.R2', round=2)

# Split the r0 sample into testing and training datasets
r0.split = selex.split(sample=r0)

# Run entire analysis
selex.run(trainingSample=r0.split$train, crossValidationSample=r0.split$test,
  infoGainSample=r2)

# Display results
selex.mmSummary()[,c(1,2,3,4,5,6)]
selex.infogainSummary()[,c(1,2,3,4,5)]

Run the code above in your browser using DataLab