ArrayExpressHTSFastQ: ExpressionSet for RNA-Seq raw data files

Description

ArrayExpressHTSFastQ runs the RNA-Seq pipeline on raw RNA-Seq data files and an .sdrf experiment descriptor and produces an ExpressionSet R object.

Usage

ArrayExpressHTSFastQ( accession, 
            organism = c("automatic", "Homo_sapiens", "Mus_musculus" ), 
            quality = c("automatic", "FastqQuality", "SFastqQuality"), 
            options = list(
                    stranded          = FALSE,
                    insize            = NULL,
                    insizedev         = NULL,
                    reference         = "genome",
                    aligner           = "tophat",
                    aligner_options   = NULL,
                    count_feature     = "transcript",
                    count_options     = "",
                    count_method      = "cufflinks",
                    filter            = TRUE,
                    filtering_options = NULL),
            usercloud = TRUE,
            rcloudoptions = list(
                    nnodes        = "automatic",
                    pool          = c("4G", "8G", "16G", "32G", "64G"),
                    nretries      = 4),
            steplist = c("align", "count", "eset"),
            dir = getwd(),
            refdir = getDefaultReferenceDir(),
            want.reports = TRUE,
            stop.on.warnings = FALSE )

Arguments

accession

name of the folder where experiment data is kept and computaton data is stored. The actual data files and descriptor .sdrf and .idf fles should be stored in the 'data' folder, which should be created in this folder.

organism

defines organism filter, "automatic" means all organism found in the descript files will be used. If a specific organism or an array e.g. c("Homo_sapiens", "Mus_musculus") is defined, those organisms will be used to filter out other unwanted organisms and associated data files from the result object. The value canot be "automatic" if no .sdrf is found and no automatic discovery can be performed.

quality

this parameter is used to explicitly select the quality scale used in the fastq data files. By default "automatic" is used. In case quality scale cannot be automatically detected, the user will be prompted to increase the detection depth or set the quality scale manually. "FastqQuality" corresponds to Phred+33, "SFastqQuality" corresponds to Phred+64.

options

defines pipeline options. See getDefaultProcessingOptions

usercloud

defines if the R Cloud will be used to parallel experiment computation. If set to FALSE, the experiment files will be processed sequentially.

rcloudoptions

defines R Cloud options. See getDefaultRCloudOptions

steplist

defines the steps the pipeline will perform

dir

folder where experiment data will be stored and processed. Default is current directory.

refdir

the directory where reference data is located.

want.reports

defines if quality reports are produced. Reports usually make computation longer and eat up more memory. For faster computation use FALSE.

stop.on.warnings

self explanatory. Warnings are normally producesd when there are inconsistencies, which however would allow the result to be produced.

Value

The output is an object of class ExpressionSet containing expression values in assayData (corresponding to the raw sequencing data files), the information contained in the .sdrf file in phenoData, the information in the adf file in featureData and the idf file content in experimentData.

Examples

Run this code

if (isRCloud()) { # disabled on local configs 
    # so as not to affect package building process
    
    # In ArrayExpressHTS/expdata there is testExperiment, which is
    # a very short version of E-GEOD-16190 experiment, placed there 
    # for testing reasons.
    #
    # Experiment in ArrayExpress:
    # http://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-16190
    #
    # the following piece of code will take ~1.5 hours to compute
    # on local PC and ~10 minutes on R-Cloud
    #
    
    # if executed on a local PC, make sure tools are available
    # to the pipeline.
    # 
    
    # create a temporary folder where experiment will be copied
    # computing experiment in the package folder may cause issues
    # with file permissions and therefore failures.
    # 
    #
    srcfolder <- system.file("expdata", 
                "testExperiment", package="ArrayExpressHTS");
    
    dstfolder <- tempdir();
    
    file.copy(srcfolder, dstfolder, recursive = TRUE);
    
    # run the pipeline
    #
    # set usercloud = FALSE if executing on local PC, 
    # therefore parallel computation will be disabled
    #
    aehts = ArrayExpressHTSFastQ(accession = "testExperiment", 
        organism = "Homo_sapiens", dir = dstfolder);
        
    # load the expression set object
    loadednames = load(paste(dstfolder, 
                        "/testExperiment/eset_notstd_rpkm.RData", sep=""));
    loadednames;
    
    get('library')(Biobase);
    
    # print out the expression values
    #
    head(assayData(eset)$exprs);
        
    # print out the experiment meta data 
    experimentData(eset);
    pData(eset);
}

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples