Learn R Programming

VSURF (version 0.7.6)

toys: A simulated dataset called toys data

Description

toys is a simple simulated dataset of a binary classification problem, introduced by Weston et.al..

Usage

data(toys)

Arguments

format

The format is a list of 2 component:

$x: A data-frame containing input variables: with 100 obs. of 200 variables $y: Outpu variable: a factor with 2 levels "-1" and "1"

source

Weston, J., Elisseff, A., Schoelkopf, B., Tipping, M. (2003), Use of the zero norm with linear models and Kernel methods, J. Machine Learn. Res. 3, 1439-1461

Details

It is an equiprobable two class problem, Y belongs to {-1,1}, with six true variables, the others being some noise. The simulation model is defined through the conditional distribution of the Xi for Y=y: with probability 0.7, X^j ~ N(yj,1) for j=1,2,3 and X^j ~ N(0,1) for j=4,5,6. with probability 0.3, X^j ~ N(0,1) for j=1,2,3 and X^j ~ N(y(j-3),1) for j=4,5,6. the other variables are noise, X^j ~ N(0,1) for j=7,...,p. After simulation, the obtained variables are finally standardized.

Examples

Run this code
data(toys)
system.time(toys.rf <- randomForest(x=toys$x, y=toys$y))
toys.rf

# VSURF applied for toys data:
# (a few minutes to execute)
data(toys)
toys.vsurf <- VSURF(x=toys$x, y=toys$y)
toys.vsurf

Run the code above in your browser using DataLab