This is a two-class classification problem. The difficulty is that the problem is multivariate and highly non-linear. Of the 500 features, 20 are real features, 480 are noise features. Data set from UCI repository, discretized using median cutoffs.
data(madelon)TrainXA matrix with 2000 rows and 500 columns.
TrainYA vector with 2000 rows.
TestXA matrix with 600 rows and 500 columns.
TestYA vector with 600 rows.