Learn R Programming

AppliedPredictiveModeling (version 1.1-4)

quadBoundaryFunc: Functions for Simulating Data

Description

These functions simulate data that are used in the text.

Usage

quadBoundaryFunc(n)

easyBoundaryFunc(n, intercept = 0, interaction = 2)

Arguments

n
the sample size
intercept
the coefficient for the logistic regression intercept term
interaction
the coefficient for the logistic regression interaction term

Value

  • Both functions return data frames with columns
  • X1numeric predictor value
  • X2numeric predictor value
  • probnumeric value reflecting the true probability of the first class
  • classa factor variable with levels 'Class1' and 'Class2'

Details

The quadBoundaryFunc function creates a class boundary that is a function of both predictors. The probability values are based on a logistic regression model with model equation: $-1-2X_1 -0.2X_1^2 + 2X_2^2$. The predictors here are multivariate normal with mean (1, 0) and a moderate degree of positive correlation.

Similarly, the easyBoundaryFunc uses a logistic regression model with model equation: $intercept -4X_1 + 4X_2 + interaction \times X_1 \times X_2$. The predictors here are multivariate normal with mean (1, 0) and a strong positive correlation.

Examples

Run this code
## in Chapter 11, 'Measuring Performance in Classification Model'
set.seed(975)
training <- quadBoundaryFunc(500)
testing <- quadBoundaryFunc(1000)
 

## in Chapter 20, 'Factors That Can Affect Model Performance'
set.seed(615)
dat <- easyBoundaryFunc(200, interaction = 3, intercept = 3)
dat$X1 <- scale(dat$X1)
dat$X2 <- scale(dat$X2)
dat$Data <- "Original"
dat$prob <- NULL

## in Chapter X, 'An Introduction to Feature Selection'

set.seed(874)
reliefEx3 <- easyBoundaryFunc(500)
reliefEx3$X1 <- scale(reliefEx3$X1)
reliefEx3$X2 <- scale(reliefEx3$X2)
reliefEx3$prob <- NULL

Run the code above in your browser using DataLab