Learn R Programming

analogue (version 0.10-0)

roc: ROC curve analysis

Description

Fits Receiver Operator Characteristic (ROC) curves to training set data. Used to determine the critical value of a dissimilarity coefficient that best descriminate between assemblage-types in palaeoecological data sets, whilst minimising the false positive error rate (FPF).

Usage

roc(object, groups, k = 1, ...)

## S3 method for class 'default': roc(object, groups, k = 1, thin = FALSE, max.len = 10000, ...)

## S3 method for class 'mat': roc(object, groups, k = 1, ...)

## S3 method for class 'analog': roc(object, groups, k = 1, ...)

Arguments

object
an R object.
groups
a vector of group memberships, one entry per sample in the training set data. Can be a factor, and will be coerced to one if supplied vecvtor is not a factor.
k
numeric; the k closest analogues to use to calculate ROC curves.
thin
logical; should the points on the ROC curve be thinned? See Details, below.
max.len
numeric; length of analolgue and non-analogue vectors. Used as limit to thin points on ROC curve to.
...
arguments passed to/from other methods.

Value

  • A list with two components; i, statistics, a summary of ROC statistics for each level of groups and a combined ROC analysis, and ii, roc, a list of ROC objects, one per level of groups. For the latter, each ROC object is a list, with the following components:
  • TPFThe true positive fraction.
  • FPEThe false positive error
  • optimalThe optimal dissimilarity value, asessed where the slope of the ROC curve is maximal.
  • AUCThe area under the ROC curve.
  • se.fitStandard error of the AUC estimate.
  • n.innumeric; the number of samples within the current group.
  • n.outnumeric; the number of samples not in the current group.
  • p.valueThe p-value of a Wilcoxon rank sum test on the two sets of dissimilarities. This is also known as a Mann-Whitney test.
  • roc.pointsThe unique dissimilarities at which the ROC curve was evaluated
  • max.rocnumeric; the position along the ROC curve at which the slope of the ROC curve is maximal. This is the index of this point on the curve.
  • priornumeric, length 2. Vector of observed prior probabilities of true analogue and true non-analogues in the group.
  • analoguea list with components yes and no containing the dissimilarities for true analogue and true non-analogues in the group.

concept

ROC

Details

A ROC curve is generated from the within-group and between-group dissimilarities.

For each level of the grouping vector (groups) the dissimilarity between each group member and it's k closest analogues within that group are compared with the k closest dissimilarities between the non-group member and group member samples.

If one is able to discriminate between members of different group on the basis of assemblage dissimilarity, then the dissimilarities between samples within a group will be small compared to the dissimilarities between group members and non group members.

thin is useful for large problems, where the number of analogue and non-analogue distances can conceivably be large and thus overflow the largest number R can work with. This option is also useful to speed up computations for large problems. If thin == TRUE, then the larger of the analogue or non-analogue distances is thinned to a maximum length of max.len. The smaller set of distances is scaled proportionally. In thinning, we approximate the distribution of distances by taking max.len (or a fraction of max.len for the smaller set of distances) equally-spaced probability quantiles of the distribution as a new set of distances.

References

Brown, C.D., and Davis, H.T. (2006) Receiver operating characteristics curves and related decision measures: A tutorial. Chemometrics and Intelligent Laboratory Systems 80, 24--38. Gavin, D.G., Oswald, W.W., Wahl, E.R. and Williams, J.W. (2003) A statistical approach to evaluating distance metrics and analog assignments for pollen records. Quaternary Research 60, 356--367.

Henderson, A.R. (1993) Assessing test accuracy and its clinical consequences: a primer for receiver operating characteristic curve analysis. Annals of Clinical Biochemistry 30, 834--846.

See Also

mat for fitting of MAT models. bootstrap.mat and mcarlo for alternative methods for selecting critical values of analogue-ness for dissimilarity coefficients.

Examples

Run this code
## load the example data
data(swapdiat, swappH, rlgh)

## merge training and test set on columns
dat <- join(swapdiat, rlgh, verbose = TRUE)

## extract the merged data sets and convert to proportions
swapdiat <- dat[[1]] / 100
rlgh <- dat[[2]] / 100

## fit an analogue matching (AM) model using the squared chord distance
## measure - need to keep the training set dissimilarities
swap.ana <- analog(swapdiat, rlgh, method = "SQchord",
                   keep.train = TRUE)

## fit the ROC curve to the SWAP diatom data using the AM results
## Generate a grouping for the SWAP lakes
clust <- hclust(as.dist(swap.ana$train), method = "ward")
grps <- cutree(clust, 12)

## fit the ROC curve
swap.roc <- roc(swap.ana, groups = grps)
swap.roc

## draw the ROC curve
plot(swap.roc, 1)

## draw the four default diagnostic plots
layout(matrix(1:4, ncol = 2))
plot(swap.roc)
layout(1)

Run the code above in your browser using DataLab