A subset of the UCI machine learning data set ‘covertype’ describing cloud coverage in seven different states of coverage. This smaller subset contains with 100,000 observations and 55 variables. The first 54 variables are explanatory (i.e. “features”), with the last providing the dependent variable (“labels”. The data is in the ‘wide’ 55 x 100,000 format used by mlpack. The dependent variable has been transformed to the range zero to six by subtracting one from the values found in the data file.
The original source of the data is the US Forest Service, and the complete file is part of the UC Irvince machine learning data repository.