crimeClust_bayes: Bayesian model-based partially-supervised clustering for crime series identification

Description

Bayesian model-based partially-supervised clustering for crime series identification

Usage

crimeClust_bayes(crimeID, spatial, t1, t2, Xcat, Xnorm, maxcriminals = 1000, iters = 10000, burn = 5000, plot = TRUE, update = 100, seed = NULL, use_space = TRUE, use_time = TRUE, use_cats = TRUE)

Arguments

crimeID

n-vector of criminal IDs for the n crimes in the dataset. For unsolved crimes, the value should be NA.

spatial

(n x 2) matrix of spatial locations, represent missing locations with NA

earliest possible time for crime

latest possible time for crime. Crime occurred between t1 and t2.

Xcat

(n x q) matrix of categorical crime features. Each column is a variable, such as mode of entry. The different factors (window, door, etc) should be coded as integers 1,2,...,m.

Xnorm

(n x p) matrix of continuous crime features.

maxcriminals

maximum number of clusters in the model.

iters

Number of MCMC samples to generate.

burn

Number of MCMC samples to discard as burn-in.

plot

(logical) Should plots be produced during run.

update

Number of MCMC iterations between graphical displays.

seed

seed for random number generation

use_space

(logical) should the spatial locations be used in clustering?

use_time

(logical) should the event times be used in clustering?

use_cats

(logical) should the categorical crime features be used in clustering?

Value

(list) p.equal is the (n x n) matrix of probabilities that each pair of crimes are committed by the same criminal.if plot=TRUE, then progress plots are produced.

References

Reich, B. J. and Porter, M. D. (2015), Partially supervised spatiotemporal clustering for burglary crime series identification. Journal of the Royal Statistical Society: Series A (Statistics in Society). 178:2, 465--480. http://www4.stat.ncsu.edu/~reich/papers/CrimeClust.pdf

Examples

Run this code

# Toy dataset with 12 crimes and three criminals.

 # Make IDs: Criminal 1 committed crimes 1-4, etc.
 id <- c(1,1,1,1,
         2,2,2,2,
                 3,3,3,3)

 # spatial locations of the crimes:
 s <- c(0.8,0.9,1.1,1.2,
        1.8,1.9,2.1,2.2,
        2.8,2.9,3.1,3.2)
 s <- cbind(0,s)

 # Categorical crime features, say mode of entry (1=door, 2=other) and
 # type of residence (1=apartment, 2=other)
 Mode <- c(1,1,1,1,  #Different distribution by criminal
           1,2,1,2,
           2,2,2,2)
 Type <- c(1,2,1,2,  #Same distribution for all criminals
           1,2,1,2,
           1,2,1,2)
 Xcat <- cbind(Mode,Type)

 # Times of the crimes
 t <- c(1,2,3,4,
        2,3,4,5,
        3,4,5,6)

 # Now let's pretend we don't know the criminal for crimes 1, 4, 6, 8, and 12.
 id <- c(NA,1,1,NA,2,NA,2,NA,3,3,3,NA)

 # Fit the model (nb: use much larger iters and burn on real problem)
 fit <- crimeClust_bayes(crimeID=id, spatial=s, t1=t,t2=t, Xcat=Xcat,
                   maxcriminals=12,iters=500,burn=100,update=100)

 # Plot the posterior probability matrix that each pair of crimes was
 # committed by the same criminal:
 if(require(fields,quietly=TRUE)){
 fields::image.plot(1:12,1:12,fit$p.equal,
            xlab="Crime",ylab="Crime",
            main="Probability crimes are from the same criminal")
 }

 # Extract the crimes with the largest posterior probability
 bayesPairs(fit$p.equal)
 bayesProb(fit$p.equal[1,])

Run the code above in your browser using DataLab