resample-package: Overview of the resample package

Description

Resampling functions, including one- and two-sample bootstrap and permutation tests, with an easy-to-use syntax.

Arguments

Data Sets

A list of datasets is at resample-data,

Main resampling functions

The main resampling functions are: bootstrap, bootstrap2, permutationTest, permutationTest2.

Methods

Methods for generic functions include: print.resample, plot.resample, hist.resample, qqnorm.resample, and quantile.resample.

Confidence Intervals

Functions that calculate confidence intervals for bootstrap and bootstrap2 objects: CI.bca, CI.bootstrapT, CI.percentile, CI.t.

Samplers

Functions that generate indices for random samples: samp.bootstrap, samp.permute.

Low-level Resampling Function

This is called by the main resampling functions, but can also be called directly: resample.

New Versions

I will post the newest versions to http://www.timhesterberg.net/r-packages. See that page to join a list for announcements of new versions.

Details

See library(help = resample) for version number, date, etc.

Examples

Run this code

# NOT RUN {
data(Verizon)
ILEC <- with(Verizon, Time[Group == "ILEC"])
CLEC <- with(Verizon, Time[Group == "CLEC"])

#### Sections in this set of examples
### Different ways to specify the data and statistic
### Example with plots and confidence intervals.


### Different ways to specify the data and statistic
# This code is flexible; there are different ways to call it,
# depending on how the data are stored and on the statistic.

## One-sample Bootstrap

# }
# NOT RUN {
# Ordinary vector, give statistic as a function
bootstrap(CLEC, mean)

# Vector by name, give statistic as an expression
bootstrap(CLEC, mean(CLEC))

# Vector created by an expression, use the name 'data'
bootstrap(with(Verizon, Time[Group == "CLEC"]), mean(data))

# A column in a data frame; use the name of the column
temp <- data.frame(foo = CLEC)
bootstrap(temp, mean(foo))

# Put function arguments into an expression
bootstrap(CLEC, mean(CLEC, trim = .25))

# Put function arguments into a separate list
bootstrap(CLEC, mean, args.stat = list(trim = .25))
# }
# NOT RUN {
# }
# NOT RUN {

## One-sample jackknife

# Like bootstrap. E.g.
jackknife(CLEC, mean)


## One-sample permutation test

# To test H0: two variables are independent, exactly
# one of them just be permuted. For the CLEC data,
# we'll create an artificial variable.
CLEC2 <- data.frame(Time = CLEC, index = 1:length(CLEC))

# }
# NOT RUN {
permutationTest(CLEC2, cor(Time, index),
                resampleColumns = "index")
# Could permute "Time" instead.

# resampleColumns not needed for variables outside 'data'
permutationTest(CLEC, cor(CLEC, 1:length(CLEC)))
# }
# NOT RUN {
# }
# NOT RUN {

### Two-sample problems
## Different ways to specify data and statistic

## Two-sample bootstrap

# Two data objects (one for each group)
# }
# NOT RUN {
bootstrap2(CLEC, data2 = ILEC, mean)
# }
# NOT RUN {
# data frame containing y variable(s) and a treatment variable
# }
# NOT RUN {
bootstrap2(Verizon, mean(Time), treatment = Group)
# }
# NOT RUN {
# treatment variable as a separate object
temp <- Verizon$Group
# }
# NOT RUN {
bootstrap2(Verizon$Time, mean, treatment = temp)
# }
# NOT RUN {

## Two-sample permutation test

# Like bootstrap2. E.g.
# }
# NOT RUN {
permutationTest2(CLEC, data2 = ILEC, mean
# }
# NOT RUN {

### Example with plots and confidence intervals.
# }
# NOT RUN {
boot <- bootstrap2(CLEC, data2 = ILEC, mean)
perm <- permutationTest2(CLEC, data2 = ILEC, mean,
                         alternative = "greater")
# }
# NOT RUN {
par(mfrow = c(2,2))
hist(boot)
qqnorm(boot)
qqline(boot$replicates)
hist(perm)
# }
# NOT RUN {
# P-value
perm

# Standard error, and bias estimate
boot

# Confidence intervals
CI.percentile(boot) # Percentile interval
CI.t(boot)  # t interval using bootstrap SE
# CI.bootstrapT and CI.bca do't currently support two-sample problems.

# Statistic can be multivariate.
# For the bootstrap2, it must have the estimate first, and a standard
# error second (don't need to divide by sqrt(n), that cancels out).
bootC <- bootstrap(CLEC, mean, seed = 0)
bootC2 <- bootstrap(CLEC, c(mean = mean(CLEC), sd = sd(CLEC)), seed = 0)
identical(bootC$replicates[, 1], bootC2$replicates[, 1])

CI.percentile(bootC)
CI.t(bootC)
CI.bca(bootC)
CI.bootstrapT(bootC2)
# The bootstrapT is the most accurate for skewed data, especially
# for small samples.

# By default the percentile interval is "expanded", for better coverage
# in small samples. To turn this off:
CI.percentile(bootC, expand = FALSE)
# }

Run the code above in your browser using DataLab