Learn R Programming

clst (version 1.20.0)

classify: classify

Description

Functions to perform classification by local similarity threshold.

Usage

classify(dmat, groups, dvect, method = "mutinfo", minScore = 0.45, doffset = 0.5, dStart = NA, maxDepth = 10, minGroupSize = 2, objNames = names(dvect), keep.data = TRUE, ..., verbose = FALSE)
classifyIter(dmat, groupTab, dvect, dStart = NA, multiple = FALSE, keep.data = TRUE, ..., verbose = FALSE)
classifier(dmat, groups, dvect, method = 'mutinfo', minScore = 0.45, doffset = 0.5, dStart = NA, minGroupSize = 2, objNames = names(dvect), keep.data = TRUE, ..., verbose = FALSE, depth = 1)
pull(dmat, groups, index)
pullTab(dmat, groupTab, index)

Arguments

dmat
Square matrix of pairwise distances.
groups
Object coercible to a factor identifying group membership of objects corresponding to either edge of dmat.
groupTab
a data.frame representing a taxonomy, with columns in increasing order of specificity from left to right (ie, Kingdom --> Species). Column names are used to name taxonomic ranks. Rows correspond to margins of dmat.
dvect
numeric vector of distance from query sequence to each reference corresponding to margins of dmat.
method
The method for calculating the threshold; only 'mutinfo' is currently implemented.
minScore
Threshold value for the match score to define a match.
doffset
Offset used in the denominator of the expression to calculate match score to penalize very small groups of reference objects.
dStart
start with this value of D.
multiple
if TRUE, stops at the rank that yields at least one match; if FALSE, continues to perform classification until exactly one match is identified.
maxDepth
Maximum number of iterations that will be attempted to perform classification.
minGroupSize
The minimal number of members comprising at least one group required to attempt classification.
objNames
Optional character identifiers for objects corresponding to margin of dmat.
keep.data
Populates thresh$distances (see findThreshold) if TRUE.
verbose
Terminal output is produced if TRUE.
index
an integer specifying an element in dmat
...
see Details
depth
specifies iteration number (not meant to be user-defined)

Value

classify and classifyIter return x, a list of lists, one for each iteration of the classifier. Each sub-list contains the following named elements:
depth
An integer indicating the number of the iteration (where x[[i]]$depth == i)
tally
a data.frame with one row for each group or reference objects. Columns below and above contain counts of reference objects with distance values greater than or less than D, respectively; score, containing match score $S$; match is 1 if $S \ge minScore$, 0 otherwise; and the minimum, median, and maximum values of distances to all members of the indicated group.
details
a list of two matrices, named "below" and "above", itemizing each object with index i in the reference set with distances below or above the distance threshold D, respectively. Columns include index, the index i; dist, the distance between the object and the query; and group, indicating the classification of the object.
matches
Character vector naming groups to which query object belongs.
thresh
object returned by findThreshold
params
a list of input arguments and their values
input
list containing copies of dvect and groups

Details

classify performs iterative classification. See the vignette vignette for package clst for a description of the classification algorithm. classifier performs non-iterative classification, and is typically not called directly by the user.

The functions pull and pullTab are used to remove a single element of dmat for the purpose of performing classification agains the remaining elements. The value of these two functions (a list) can be passed directly to classify or classifyIter directly (see examples).

See Also

findThreshold

Examples

Run this code

## illustrate classification using the Iris data set
data(iris)
dmat <- as.matrix(dist(iris[,1:4], method="euclidean"))
groups <- iris$Species

## remove one element from the data set and perform classification using
## the remaining elements as the reference set
ind <- 1
cat(paste('class of "unknown" sample is Iris',groups[ind]),fill=TRUE)
cc <- classify(dmat[-ind,-ind], groups[-ind], dvect=dmat[ind, -ind])
printClst(cc)

## this operation can be performed conveinetly using the `pull` function
ind <- 51
cat(paste('class of "unknown" sample is Iris',groups[ind]),fill=TRUE)
cc <- do.call(classify, pull(dmat, groups, ind)) 
printClst(cc)
str(cc)

Run the code above in your browser using DataLab