Learn R Programming

prim (version 1.0.13)

prim.box: PRIM for multivariate data

Description

PRIM for multivariate data.

Usage

prim.box(x, y, box.init=NULL, peel.alpha=0.05, paste.alpha=0.01,
     mass.min=0.05, threshold, pasting=TRUE, verbose=FALSE,
     threshold.type=0, y.fun=mean)

prim.hdr(prim, threshold, threshold.type, y.fun=mean) prim.combine(prim1, prim2, y.fun=mean)

Arguments

x
matrix of data values
y
vector of response values
y.fun
function applied to response y. Default is mean.
box.init
initial covering box
peel.alpha
peeling quantile tuning parameter
paste.alpha
pasting quantile tuning parameter
mass.min
minimum mass tuning parameter
threshold
threshold tuning parameter(s)
threshold.type
threshold direction indicator: 1 = ">= threshold", -1 = "
pasting
flag for pasting
verbose
flag for printing output during execution
prim,prim1,prim2
objects of type prim

Value

  • -- prim.box produces a PRIM estimate, an object of type prim, which is a list with 8 fields:
  • xlist of data matrices
  • ylist of response variable vectors
  • y.meanlist of vectors of box mean for y
  • boxlist of matrices of box limits (first row = minima, second row = maxima)
  • massvector of box masses (proportion of points inside a box)
  • num.classtotal number of PRIM boxes
  • num.hdr.classtotal number of PRIM boxes which form the HDR
  • indthreshold direction indicator: 1 = ">= threshold", -1 = "<=threshold"< description="">
  • The above lists have num.class fields, one for each box.

    -- prim.hdr takes a prim object and prunes it using different threshold values. Returns another prim object. This is much faster for experimenting with different threshold values than calling prim.box each time.

    -- prim.combine combines two prim objects into a single prim object. Usually used in conjunction with prim.hdr. See examples below.

code

prim.hdr

Details

The data are $(\bold{X}_1, Y_1), \dots, (\bold{X}_n, Y_n)$ where $\bold{X}_i$ is d-dimensional and $Y_i$ is a scalar response. PRIM finds modal (and/or anti-modal) regions in the conditional expectation $m(\bold{x}) = \bold{E} (Y | \bold{x}).$ In general, $Y_i$ can be real-valued. See vignette("prim"). Here, we focus on the special case for binary $Y_i$. Let $Y_i$ = 1 when $\bold{X}_i \sim F^+$; and $Y_i$ = -1 when $\bold{X}_i \sim F^-$ where $F^+$ and $F^-$ are different distribution functions. In this set-up, PRIM finds the regions where $F^+$ and $F^-$ are most different.

The tuning parameters peel.alpha and paste.alpha control the `patience' of PRIM. Smaller values involve more patience. Larger values less patience. The peeling steps remove data from a box till either the box mean is smaller than threshold or the box mass is less than mass.min. Pasting is optional, and is used to correct any possible over-peeling. The default values for peel.alpha, paste.alpha and mass.min are taken from Friedman & Fisher (1999).

The type of PRIM estimate is controlled threshold and threshold.type:

{For threshold.type=1, we search for {$m(\bold{x}) \geq$ threshold}.} {For threshold.type=-1, we search for {$m(\bold{x}) \leq$ threshold}.} {For threshold.type=0, we search for both {$m(\bold{x}) \geq$ threshold[1]} and {$m(\bold{x}) \leq$ threshold[2]}.}

Examples

Run this code
data(quasiflow)
qf <- quasiflow[1:1000,1:2]
qf.label <- quasiflow[1:1000,4]

## using only one command
thr <- c(0.25, -0.3)
qf.prim1 <- prim.box(x=qf, y=qf.label, threshold=thr, threshold.type=0)

## alternative - requires more commands but allows more control
## in intermediate stages
qf.primp <- prim.box(x=qf, y=qf.label, threshold.type=1)
   ## default threshold too low, try higher one

qf.primp.hdr <- prim.hdr(prim=qf.primp, threshold=0.25, threshold.type=1)
qf.primn <- prim.box(x=qf, y=qf.label, threshold=-0.3, threshold.type=-1)
qf.prim2 <- prim.combine(qf.primp.hdr, qf.primn)

plot(qf.prim1)   ## orange=x1>x2, blue x2<x1
points(qf[qf.label==1,], cex=0.5)
points(qf[qf.label==-1,], cex=0.5, col=2)

Run the code above in your browser using DataLab