This is a two-class classification problem. The difficulty is that the problem is multivariate and highly non-linear. Of the 500 features, 20 are real features, 480 are noise features. Data set from UCI repository, discretized using median cutoffs.
data(madelon)
TrainX
A matrix with 2000 rows and 500 columns.
TrainY
A vector with 2000 rows.
TestX
A matrix with 600 rows and 500 columns.
TestY
A vector with 600 rows.