binsamp: Bin-Samples Strategic Knot Indices

Description

Breaks the predictor domain into a user-specified number of disjoint subregions, and randomly samples a user-specified number of observations from each (nonempty) subregion.

Usage

binsamp(x,xrng=NULL,nmbin=11,nsamp=1,alg=c("new","old"))

Arguments

Matrix of predictors \(\mathbf{X}=\{x_{ij}\}_{n \times p}\) where \(n\) is the number of observations, and \(p\) is the number of predictors.

xrng

Optional matrix of predictor ranges: \(\mathbf{R}=\{r_{kj}\}_{2 \times p}\) where \(r_{1j}=\min_{i}x_{ij}\) and \(r_{2j}=\max_{i}x_{ij}\).

nmbin

Vector \(\mathbf{b}=(b_{1},\ldots,b_{p})'\), where \(b_{j}\geq1\) is the number of marginal bins to use for the \(j\)-th predictor. If length(nmbin)<ncol(x), then nmbin[1] is used for all columns. Default is nmbin=11 marginal bins for each dimension.

nsamp

Scalar \(s\geq1\) giving the number of observations to sample from each bin. Default is sample nsamp=1 observation from each bin.

alg

Bin-sampling algorithm. New algorithm forms equidistant grid, whereas old algorithm forms approximately equidistant grid. New algorithm is default for versions 1.0-1 and later.

Value

Returns an index vector indicating the rows of x that were bin-sampled.

Warnings

If \(x_{ij}\) is nominal with \(g\) levels, the function requires \(b_{j}=g\) and \(x_{ij}\in\{1,\ldots,g\}\) for \(i\in\{1,\ldots,n\}\).

Examples

Run this code

# NOT RUN {
##########   EXAMPLE 1   ##########

# create 2-dimensional predictor (both continuous)
set.seed(123)
xmat <- cbind(runif(10^6),runif(10^6))

# Default use:
#   10 marginal bins for each predictor
#   sample 1 observation from each subregion
xind <- binsamp(xmat)

# get the corresponding knots
bknots <- xmat[xind,]

# compare to randomly-sampled knots
rknots <- xmat[sample(1:(10^6),100),]
par(mfrow=c(1,2))
plot(bknots,main="bin-sampled")
plot(rknots,main="randomly sampled")



##########   EXAMPLE 2   ##########

# create 2-dimensional predictor (continuous and nominal)
set.seed(123)
xmat <- cbind(runif(10^6),sample(1:3,10^6,replace=TRUE))

# use 10 marginal bins for x1 and 3 marginal bins for x2 
# and sample one observation from each subregion
xind <- binsamp(xmat,nmbin=c(10,3))

# get the corresponding knots
bknots <- xmat[xind,]

# compare to randomly-sampled knots
rknots <- xmat[sample(1:(10^6),30),]
par(mfrow=c(1,2))
plot(bknots,main="bin-sampled")
plot(rknots,main="randomly sampled")



##########   EXAMPLE 3   ##########

# create 3-dimensional predictor (continuous, continuous, nominal)
set.seed(123)
xmat <- cbind(runif(10^6),runif(10^6),sample(1:2,10^6,replace=TRUE))

# use 10 marginal bins for x1 and x2, and 2 marginal bins for x3 
# and sample one observation from each subregion
xind <- binsamp(xmat,nmbin=c(10,10,2))

# get the corresponding knots
bknots <- xmat[xind,]

# compare to randomly-sampled knots
rknots <- xmat[sample(1:(10^6),200),]
par(mfrow=c(2,2))
plot(bknots[1:100,1:2],main="bin-sampled, x3=1")
plot(bknots[101:200,1:2],main="bin-sampled, x3=2")
plot(rknots[rknots[,3]==1,1:2],main="randomly sampled, x3=1")
plot(rknots[rknots[,3]==2,1:2],main="randomly sampled, x3=2")

# }

Run the code above in your browser using DataLab