Segmentation, Clustering, and Lasso Penalties (SCALPEL) is a method for neuronal calcium imaging
data that identifies the locations of neurons, and estimates their calcium concentrations over time.
The pipeline involves several steps, each of which is described briefly in its corresponding
function. See scalpelStep0
, scalpelStep1
, scalpelStep2
,
scalpelStep3
for more details.
Full details for the SCALPEL method are provided in Petersen, A., Simon, N., and Witten, D. (Forthcoming).
SCALPEL: Extracting Neurons from Calcium Imaging Data.
scalpel(
outputFolder,
rawDataFolder,
videoHeight,
minClusterSize = 1,
lambdaMethod = "trainval",
lambda = NULL,
cutoff = 0.18,
omega = 0.2,
fileType = "R",
processSeparately = TRUE,
minSize = 25,
maxSize = 500,
maxWidth = 30,
maxHeight = 30,
removeBorder = FALSE,
alpha = 0.9,
thresholdVec = NULL,
maxSizeToCluster = 3000
)
Step 0 parameter: The existing directory where the results should be saved.
Step 0 parameter: The directory where the raw data version of Y is saved. The data should be a
PxT matrix, where P is the total number of pixels per image frame and T
the number of frames of the video, for which the (i,j)th element contains the
fluorescence of the ith pixel in the jth frame. To create Y, you should
vectorize each 2-dimensional image frame by concatenating the columns of the image frame. If the data is
saved in a single file, it should be named "Y_1.mat", "Y_1.rds", "Y_1.txt", or "Y_1.txt.gz" (depending on fileType
),
and if the data is split over multiple files, they should be split into chunks of the columns
and named consecutively ("Y_1.mat", "Y_2.mat", etc.; "Y_1.rds", "Y_2.rds", etc.; "Y_1.txt", "Y_2.txt", etc.; or "Y_1.txt.gz", "Y_2.txt.gz", etc.).
Step 0 parameter: The height of the video (in pixels).
Step 3 parameter: The minimum number of preliminary dictionary elements that a cluster must contain in order to be included in the sparse group lasso.
Step 3 parameter: A description of how lambda should be chosen: either "trainval"
(default),
"distn"
, or "user"
. A value of "trainval"
means lambda will be chosen using a training/validation
set approach. A value of "distn"
means lambda will be chosen as the negative of the 0.1% quantile
of elements of active pixels (i.e., those contained in at least one dictionary element) of Y.
Using "distn"
is computationally faster than "trainval"
. Alternatively with "user"
,
the value of lambda can be directly specified using lambda
.
Step 3 parameter: The value of lambda to use when fitting the sparse group lasso. By default, the value is automatically
chosen using the approach specified by lambdaMethod
. If a value is provided for lambda
, then lambdaMethod
will be ignored.
Step 2 parameter: A value in [0,1] indicating where to cut the dendrogram that results from hierarchical clustering of the preliminary dictionary elements. The default value is 0.18.
Step 2 parameter: A value in [0,1] indicating how to weight spatial vs. temporal information in the dissimilarity metric
used for clustering. If omega=1
, only spatial information is used. The default value is 0.2.
Step 0 parameter: Indicates whether raw data is an .rds (default value; fileType="R"
), .mat (fileType="matlab"
),
.txt (fileType="text"
), or .txt.gz (fileType="zippedText"
) file. Any text files should not have row or column names.
Step 0 parameter: Logical scalar giving whether the multiple raw data files should be
processed individually, versus all at once. Processing the files separately may be preferable for larger videos.
Default value is TRUE
; this argument is ignored if the raw data is saved in a single file.
Step 1 parameter: The minimum and maximum size, respectively, for a preliminary dictionary element with default values of 25 and 500, respectively.
Step 1 parameter: The maximum width and height, respectively, for a preliminary dictionary element with default values of 30.
Step 3 parameter: A logical scalar indicating whether the dictionary elements containing pixels in the 10-pixel
border of the video should be removed prior to fitting the sparse group lasso. The default value is FALSE
.
Step 3 parameter: The value of alpha to use when fitting the sparse group lasso. The default value is 0.9.
Optional advanced user argument: Step 1 parameter: A vector with the desired thresholds to use for image segmentation. If not specified, the default is to use the negative of the minimum of the processed Y data, the negative of the 0.1% quantile of the processed Y data, and the mean of these. If there were multiple raw data files that were processed separately, these values are calculated on only the first part of data, and then these thresholds are used for the remaining parts.
Optional advanced user argument: Step 2 parameter: The maximum number of preliminary dictionary elements to cluster at once. We attempt to cluster each
overlapping set of preliminary dictionary elements, but if one of these sets is very large (e.g., >10,000), memory issues may
result. Thus we perform a two-stage clustering in which we first cluster together random sets of size
approximately equaling maxSizeToCluster
and then cluster together the representatives from the first stage.
Finally, we recalculate the representatives using all of the preliminary dictionary elements in the final clusters. The default value is 3000.
If maxSizeToCluster
is set to NULL
, single-stage clustering is done, regardless of the size of the overlapping sets.
Memory issues may result when using this option to force single-stage clustering if the size of
the largest overlapping set of preliminary dictionary elements is very large (e.g., >10,000).
An object of class scalpel
, which can be summarized using summary
, used to rerun SCALPEL Steps 1-3 with new parameters using scalpelStep1
, scalpelStep2
, and scalpelStep3
,
or can be used with any of the plotting functions: plotFrame
, plotThresholdedFrame
, plotVideoVariance
, plotCandidateFrame
,
plotCluster
, plotResults
, plotResultsAllLambda
, plotSpatial
,
plotTemporal
, and plotBrightest
.
The individual elements are described in detail in the documentation for the corresponding step: scalpelStep0
, scalpelStep1
, scalpelStep2
, and scalpelStep3
.
Several files containing data from the pipeline, as well as summaries of each step, are saved in various subdirectories of "outputFolder".
The individual steps in the pipeline can be run using the scalpelStep0
,
scalpelStep1
, scalpelStep2
, and scalpelStep3
functions.
Results can be summarized using summary
, loaded at a later time using getScalpel
, and plotted using plotResults
,
plotSpatial
, plotTemporal
, plotCluster
, plotVideoVariance
,
plotFrame
, plotThresholdedFrame
, plotCandidateFrame
, and plotBrightest
.
# NOT RUN {
### many of the functions in this package are interconnected so the
### easiest way to learn to use the package is by working through the vignette,
### which is available at ajpete.com/software
#existing folder to save results (update this to an existing folder on your computer)
outputFolder = "scalpelResults"
#location on computer of raw data in R package to use
rawDataFolder = gsub("Y_1.rds", "", system.file("extdata", "Y_1.rds", package = "scalpel"))
#video height of raw data in R package
videoHeight = 30
#run SCALPEL pipeline
scalpelOutput = scalpel(outputFolder = outputFolder, rawDataFolder = rawDataFolder,
videoHeight = videoHeight)
#summarize each step
summary(scalpelOutput, step = 0)
summary(scalpelOutput, step = 1)
summary(scalpelOutput, step = 2)
summary(scalpelOutput, step = 3)
# }
Run the code above in your browser using DataLab