Learn R Programming

VSURF (version 1.2.0)

toys: A simulated dataset called toys data

Description

toys is a simple simulated dataset of a binary classification problem, introduced by Weston et.al..

Arguments

Format

The format is a list of 2 components:

x

a dataframe containing input variables: with 100 obs. of 200 variables

y

output variable: a factor with 2 levels "-1" and "1"

Details

It is an equiprobable two class problem, Y belongs to {-1,1}, with six true variables, the others being some noise. The simulation model is defined through the conditional distribution of the \(X_i\) for Y=y:

  • with probability 0.7, X^j ~ N(yj,1) for j=1,2,3 and X^j ~ N(0,1) for j=4,5,6 ;

  • with probability 0.3, X^j ~ N(0,1) for j=1,2,3 and X^j ~ N(y(j-3),1) for j=4,5,6 ;

  • the other variables are noise, X^j ~ N(0,1) for j=7,...,p.

After simulation, the obtained variables are finally standardized.

Examples

Run this code
data(toys)
toys.rf <- randomForest::randomForest(toys$x, toys$y)
toys.rf

if (FALSE) {
# VSURF applied for toys data:
# (a few minutes to execute)
data(toys)
toys.vsurf <- VSURF(toys$x, toys$y)
toys.vsurf
}

Run the code above in your browser using DataLab