Learn R Programming

prim (version 1.0.0)

prim.box: PRIM for multivariate data

Description

PRIM for multivariate data. Result is an estimate of the highest density region (HDR).

Usage

prim.box(x, y, box.init=NULL, peel.alpha=0.05, paste.alpha=0.01,
     mass.min=0.05, threshold, pasting=TRUE, verbose=FALSE,
     threshold.type=0)

prim.hdr(prim, threshold, threshold.type) prim.combine(prim1, prim2)

Arguments

x
matrix of data values
y
vector of response values
box.init
initial covering box
peel.alpha
peeling quantile tuning parameter
paste.alpha
pasting quantile tuning parameter
mass.min
minimum mass tuning parameter
threshold
threshold tuning parameter(s)
threshold.type
1 = positive HDR, -1 = negative HDR, 0 = both HDR
pasting
flag for pasting
verbose
flag for printing output during execution
prim,prim1,prim2
objects of type prim

Value

  • -- prim.box produces a PRIM estimate of HDRs, an object of type prim, which is a list with 8 fields:
  • xlist of data matrices
  • ylist of response variable vectors
  • y.meanlist of vectors of box mean for y
  • boxlist of matrices of box limits (first row = minima, second row = maxima)
  • massvector of box masses (proportion of points inside a box)
  • num.classtotal number of PRIM boxes
  • num.hdr.classtotal number of PRIM boxes which form the HDR
  • indHDR indicator: 1 = positive HDR, -1 = negative HDR
  • The above lists have num.class fields, one for each box.

    -- prim.hdr takes a prim object and computes HDRs with different threshold values. Returns another prim object. This is much faster for experimenting with different threshold values than calling prim.box each time.

    -- prim.combine combines two prim objects into a single prim object. Useful for combining positive and negative HDRs. Usually used in conjunction with prim.hdr. See examples below.

code

prim.hdr

Details

The data are $(\bold{X}_1, Y_1), \dots, (\bold{X}_n, Y_n)$ where $\bold{X}_i$ is d-dimensional and $Y_i$ is a scalar response. PRIM finds modal (and/or anti-modal) regions in the conditional expectation $m(\bold{x}) = \bold{E} (Y | \bold{x}).$ These regions are also called the highest density regions (HDR).

In general, $Y_i$ can be real-valued. See vignette("prim"). Here, we focus on the special case for binary $Y_i$. Let $Y_i$ = 1 when $\bold{X}_i \sim F^+$; and $Y_i$ = -1 when $\bold{X}_i \sim F^-$ where $F^+$ and $F^-$ are different distribution functions. In this set-up, PRIM finds the regions where $F^+$ and $F^-$ are most different.

The tuning parameters peel.alpha and paste.alpha control the `patience' of PRIM. Smaller values involve more patience. Larger values less patience. The peeling steps remove data from a box till either the box mean is smaller than threshold or the box mass is less than mass.min. Pasting is optional, and is used to correct any possible over-peeling. The default values for peel.alpha, paste.alpha and mass.min are taken from Friedman & Fisher (1999).

Specifying the type of HDR is controlled by threshold and threshold.type:

{For threshold.type=1, then we search for positive HDR {$m(\bold{x}) \geq$ threshold}.} {For threshold.type=-1, then we search for negative HDR {$m(\bold{x}) \leq$ threshold}.} {For threshold.type=0, then we search for both the positive and negative HDR. In this case make sure that threshold is (positive HDR threshold, negative HDR threshold).}

References

Friedman, J.H. & Fisher, N.I. (1999) Bump-hunting for high dimensional data, Statistics and Computing, 9, 123--143.

Examples

Run this code
n <- 1000
set.seed(88192)

mus.p <- rbind(c(0,0), c(2,0), c(1, 2), c(2.5, 2))
Sigmas.p <- 0.125*rbind(diag(2), diag(c(0.5, 0.5)),
   diag(c(0.125, 0.25)), diag(c(0.125, 0.25))) 
props.p <- c(0.5, 0.25, 0.125, 0.125)

mus.m <- rbind(c(0,0), c(2,0), c(2.5, 2))
Sigmas.m <- 0.125*rbind(invvech(c(1,-0.6,1)),
   diag(c(0.5, 0.5)),diag(c(0.125, 0.25))) 
props.m <- c(0.625, 0.25, 0.125)

x.p <- rmvnorm.mixt(n, mus.p, Sigmas.p, props.p)
x.m <- rmvnorm.mixt(n, mus.m, Sigmas.m, props.m)
x <- rbind(x.p, x.m)
y <- c(rep(1, nrow(x.p)), rep(-1, nrow(x.m)))
  ## 1 = positive sample, -1 = negative sample

y.thr <- c(1, -0.35)

## using only one command

x.prim1 <- prim.box(x=x, y=y, threshold=y.thr, threshold.type=0)

## alternative - requires more commands but allows more control
## in intermediate stages

x.prim.hdr.plus <- prim.box(x=x, y=y, threshold.type=1,
   threshold=1)

x.prim.minus <- prim.box(x=x, y=y, threshold.type=-1)
summary(x.prim.minus)
   ## threshold too high, try lower one

x.prim.hdr.minus <- prim.hdr(x.prim.minus, threshold=-0.35,
   threshold.type=-1)
x.prim2 <- prim.combine(x.prim.hdr.plus, x.prim.hdr.minus)
       
plot(x.prim2)

col <- x.prim2$ind
col[col==1] <- "orange"
col[col==-1] <- "blue" 
plot(x.prim2, col=col)

summary(x.prim1)
summary(x.prim2) ## should be exactly the same as command above

Run the code above in your browser using DataLab