scalpel: Perform entire SCALPEL pipeline.

Description

Segmentation, Clustering, and Lasso Penalties (SCALPEL) is a method for neuronal calcium imaging data that identifies the locations of neurons, and estimates their calcium concentrations over time. The pipeline involves several steps, each of which is described briefly in its corresponding function. See scalpelStep0, scalpelStep1, scalpelStep2, scalpelStep3 for more details. Full details for the SCALPEL method are provided in Petersen, A., Simon, N., and Witten, D. (Forthcoming). SCALPEL: Extracting Neurons from Calcium Imaging Data.

Usage

scalpel(
  outputFolder,
  rawDataFolder,
  videoHeight,
  minClusterSize = 1,
  lambdaMethod = "trainval",
  lambda = NULL,
  cutoff = 0.18,
  omega = 0.2,
  fileType = "R",
  processSeparately = TRUE,
  minSize = 25,
  maxSize = 500,
  maxWidth = 30,
  maxHeight = 30,
  removeBorder = FALSE,
  alpha = 0.9,
  thresholdVec = NULL,
  maxSizeToCluster = 3000
)

Arguments

outputFolder

Step 0 parameter: The existing directory where the results should be saved.

rawDataFolder

Step 0 parameter: The directory where the raw data version of Y is saved. The data should be a PxT matrix, where P is the total number of pixels per image frame and T the number of frames of the video, for which the (i,j)th element contains the fluorescence of the ith pixel in the jth frame. To create Y, you should vectorize each 2-dimensional image frame by concatenating the columns of the image frame. If the data is saved in a single file, it should be named "Y_1.mat", "Y_1.rds", "Y_1.txt", or "Y_1.txt.gz" (depending on fileType), and if the data is split over multiple files, they should be split into chunks of the columns and named consecutively ("Y_1.mat", "Y_2.mat", etc.; "Y_1.rds", "Y_2.rds", etc.; "Y_1.txt", "Y_2.txt", etc.; or "Y_1.txt.gz", "Y_2.txt.gz", etc.).

videoHeight

Step 0 parameter: The height of the video (in pixels).

minClusterSize

Step 3 parameter: The minimum number of preliminary dictionary elements that a cluster must contain in order to be included in the sparse group lasso.

lambdaMethod

Step 3 parameter: A description of how lambda should be chosen: either "trainval" (default), "distn", or "user". A value of "trainval" means lambda will be chosen using a training/validation set approach. A value of "distn" means lambda will be chosen as the negative of the 0.1% quantile of elements of active pixels (i.e., those contained in at least one dictionary element) of Y. Using "distn" is computationally faster than "trainval". Alternatively with "user", the value of lambda can be directly specified using lambda.

lambda

Step 3 parameter: The value of lambda to use when fitting the sparse group lasso. By default, the value is automatically chosen using the approach specified by lambdaMethod. If a value is provided for lambda, then lambdaMethod will be ignored.

cutoff

Step 2 parameter: A value in [0,1] indicating where to cut the dendrogram that results from hierarchical clustering of the preliminary dictionary elements. The default value is 0.18.

omega

Step 2 parameter: A value in [0,1] indicating how to weight spatial vs. temporal information in the dissimilarity metric used for clustering. If omega=1, only spatial information is used. The default value is 0.2.

fileType

Step 0 parameter: Indicates whether raw data is an .rds (default value; fileType="R"), .mat (fileType="matlab"), .txt (fileType="text"), or .txt.gz (fileType="zippedText") file. Any text files should not have row or column names.

processSeparately

Step 0 parameter: Logical scalar giving whether the multiple raw data files should be processed individually, versus all at once. Processing the files separately may be preferable for larger videos. Default value is TRUE; this argument is ignored if the raw data is saved in a single file.

minSize, maxSize

Step 1 parameter: The minimum and maximum size, respectively, for a preliminary dictionary element with default values of 25 and 500, respectively.

maxWidth, maxHeight

Step 1 parameter: The maximum width and height, respectively, for a preliminary dictionary element with default values of 30.

removeBorder

Step 3 parameter: A logical scalar indicating whether the dictionary elements containing pixels in the 10-pixel border of the video should be removed prior to fitting the sparse group lasso. The default value is FALSE.

alpha

Step 3 parameter: The value of alpha to use when fitting the sparse group lasso. The default value is 0.9.

thresholdVec

Optional advanced user argument: Step 1 parameter: A vector with the desired thresholds to use for image segmentation. If not specified, the default is to use the negative of the minimum of the processed Y data, the negative of the 0.1% quantile of the processed Y data, and the mean of these. If there were multiple raw data files that were processed separately, these values are calculated on only the first part of data, and then these thresholds are used for the remaining parts.

maxSizeToCluster

Optional advanced user argument: Step 2 parameter: The maximum number of preliminary dictionary elements to cluster at once. We attempt to cluster each overlapping set of preliminary dictionary elements, but if one of these sets is very large (e.g., >10,000), memory issues may result. Thus we perform a two-stage clustering in which we first cluster together random sets of size approximately equaling maxSizeToCluster and then cluster together the representatives from the first stage. Finally, we recalculate the representatives using all of the preliminary dictionary elements in the final clusters. The default value is 3000. If maxSizeToCluster is set to NULL, single-stage clustering is done, regardless of the size of the overlapping sets. Memory issues may result when using this option to force single-stage clustering if the size of the largest overlapping set of preliminary dictionary elements is very large (e.g., >10,000).

Value

An object of class scalpel, which can be summarized using summary, used to rerun SCALPEL Steps 1-3 with new parameters using scalpelStep1, scalpelStep2, and scalpelStep3, or can be used with any of the plotting functions: plotFrame, plotThresholdedFrame, plotVideoVariance, plotCandidateFrame, plotCluster, plotResults, plotResultsAllLambda, plotSpatial, plotTemporal, and plotBrightest. The individual elements are described in detail in the documentation for the corresponding step: scalpelStep0, scalpelStep1, scalpelStep2, and scalpelStep3.

Details

Several files containing data from the pipeline, as well as summaries of each step, are saved in various subdirectories of "outputFolder".

Examples

Run this code

# NOT RUN {
### many of the functions in this package are interconnected so the
### easiest way to learn to use the package is by working through the vignette,
### which is available at ajpete.com/software

#existing folder to save results (update this to an existing folder on your computer)
outputFolder = "scalpelResults"
#location on computer of raw data in R package to use
rawDataFolder = gsub("Y_1.rds", "", system.file("extdata", "Y_1.rds", package = "scalpel"))
#video height of raw data in R package
videoHeight = 30
#run SCALPEL pipeline
scalpelOutput = scalpel(outputFolder = outputFolder, rawDataFolder = rawDataFolder,
                       videoHeight = videoHeight)
#summarize each step
summary(scalpelOutput, step = 0)
summary(scalpelOutput, step = 1)
summary(scalpelOutput, step = 2)
summary(scalpelOutput, step = 3)
# }

Run the code above in your browser using DataLab