presence.absence.hist: Presence/Absence Histogram

Description

Produces a histogram of predicted probabilities with each bar subdivided by observed values. presence.absence.hist also includes an option to mark several types of optimal thresholds along each plot.

Usage

presence.absence.hist(DATA, which.model = 1, na.rm = FALSE, 
xlab = "predicted probability", ylab = "number of plots", 
main = NULL, model.names = NULL, color = NULL, N.bars = 20, 
truncate.tallest = FALSE, ylim = 1.25 * range(0, apply(counts, 2, sum)),
opt.thresholds = NULL, threshold = 101, opt.methods = NULL, 
req.sens, req.spec, obs.prev = NULL, smoothing = 1, add.legend = TRUE, 
legend.text=c("present","absent"), legend.cex = 0.8, add.opt.legend = TRUE, 
opt.legend.text = NULL, opt.legend.cex = 0.7, pch = NULL, FPC, FNC)

Value

creates a graphical plot

Arguments

DATA

a matrix or dataframe of observed and predicted values where each row represents one plot and where columns are:

`DATA[,1]`	plot ID	text
`DATA[,2]`	observed values	zero-one values
`DATA[,3]`	predicted probabilities from first model	numeric (between 0 and 1)
`DATA[,4]`	predicted probabilities from second model, etc...

which.model

a number indicating which model from DATA should be used

na.rm

a logical indicating whether missing values should be removed

xlab

a title for the x axis

ylab

a title for the y axis

main

an overall title for the plot

model.names

a vector of the names of each model included in DATA

color

colors for presence/absence. Defaults to Presence = dark gray, Absence = light gray.

N.bars

number of bars in histogram

truncate.tallest

a logical indicating if the tallest bar should be truncated to fit on plot

ylim

limit for y axis. To allow room for legend box ylim should be somewhat larger than largest bar.

opt.thresholds

a logical indicating whether the optimal thresholds should be calculated and plotted, or a vector specifying thresholds to plot

threshold

cutoff values between zero and one used for translating predicted probabilities into 0 /1 values, defaults to 0.5. It can be a single value between zero and one, a vector of values between zero and one, or a positive integer representing the number of evenly spaced thresholds to calculate. To get reasonably good optimizations, there should be a large number of thresholds. (Only used if opt.thresholds = TRUE.

opt.methods

what methods should be used to optimize thresholds. Argument can be given either as a vector of method names or method numbers. Possible values are:

1	`Default`	threshold=0.5
2	`Sens=Spec`	sensitivity=specificity
3	`MaxSens+Spec`	maximizes (sensitivity+specificity)/2
4	`MaxKappa`	maximizes Kappa
5	`MaxPCC`	maximizes PCC (percent correctly classified)
6	`PredPrev=Obs`	predicted prevalence=observed prevalence
7	`ObsPrev`	threshold=observed prevalence
8	`MeanProb`	mean predicted probability
9	`MinROCdist`	minimizes distance between ROC plot and (0,1)
10	`ReqSens`	user defined required sensitivity
11	`ReqSpec`	user defined required specificity

req.sens

a value between zero and one giving the user defined required sensitivity. Only used if opt.thresholds = TRUE. Note that req.sens = (1-maximum allowable errors for points with positive observations).

req.spec

a value between zero and one giving the user defined required sspecificity. Only used if opt.thresholds = TRUE. Note that req.sens = (1- maximum allowable errors for points with negative observations).

obs.prev

observed prevalence for opt.method = "PredPrev=Obs" and "ObsPrev". Defaults to observed prevalence from DATA.

smoothing

smoothing factor for maximizing/minimizing. Only used if opt.thresholds = TRUE. Instead of find the threshold that gives the max/min value, function will average the thresholds of the given number of max/min values.

add.legend

a logical indicating if a legend for presence/absence should be added to plot

legend.text

a two item vector of text for presence/absence legend. Defaults to "present" and "absent".

legend.cex

cex for presence/absence legend

add.opt.legend

logical indicating if a legend for optimal threshold criteria should be included on the plot

opt.legend.text

a vector of text for optimimal threshold criteria legend. Defaults to text corresponding to 'opt.methods'.

opt.legend.cex

cex for optimization criteria legend

pch

plotting "character", i.e., symbol to use for the thresholds specified in mark. pch can either be a single character or an integer code for one of a set of graphics symbols. See help(points) for details.

FPC

False Positive Costs, or for C/B ratio C = 'net costs of treating nondiseased individuals'.

FNC

False Negative Costs, or for C/B ratio B = 'net benefits of treating diseased individuals'.

Author

Elizabeth Freeman eafreeman@fs.fed.us

Details

When examining a Presence/Absence histogram to evaluate model quality, a good model will produce a clear separation of 'present' and 'absent' with little overlap in any bars.

The truncate.tallest argument is useful when one bar (often the bar for predicted probability of zero) is much larger than all the other bars. If truncate.tallest = TRUE, the tallest bar is truncated to slightly taller than the next highest bar, and the actual count is plotted above the bar. The truncated bar is also crosshatched to avoid confusion by making it more obviously different from the other bars.

if optimal.thresholds = TRUE the function will find optimal thresholds by several methods and plot them along the X axis. See optimal.thresholds for more details on the optimization methods, and on the arguments ReqSens, ReqSpec, obs.prev, smoothing, FPC, and FNC.

Note: if too many methods are included in opt.methods, the graph will get very crowded.

Examples

Run this code

data(SIM3DATA)

### EXAMPLE 1 - Comparing three models ###
par(mfrow=c(1,3))
for(i in 1:3){
	presence.absence.hist(	SIM3DATA,
					which.model=i,
					na.rm=TRUE,
					model.names=c("Model 1","Model 2","Model 3"),
					N.bars=10,
					truncate.tallest=FALSE,
					opt.thresholds=TRUE,
					opt.methods=c("Default","Sens=Spec","MaxKappa"))}

### EXAMPLE 2 - Effect of 'truncate.tallest' argument ###
par(mfrow=c(1,2))
presence.absence.hist(	SIM3DATA,
				which.model=1,
				model.names=c("Model 1","Model 2","Model 3"),
				N.bars=10,
				truncate.tallest=FALSE,
				main="truncate.tallest=FALSE")
presence.absence.hist(	SIM3DATA,
				which.model=1,
				model.names=c("Model 1","Model 2","Model 3"),
				N.bars=10,
				truncate.tallest=TRUE,
				main="truncate.tallest=TRUE")

Run the code above in your browser using DataLab