RoughSetData: Data set of the package
Description
Several datasets have been embedded in this package to be
used as decision table of examples. They can be accessed by
typing data(RoughSetData)
. The following is a
description of each datasets.Details
The hiring dataset
It is simple data taken from (Komorowski et al, 1999) where
all the attributes have nominal values. It consists of
eight objects with four conditional attributes and one
decision attribute. The detailed description of each
attribute is as follows: - Diploma: it has
the following values: {"MBA", "MSc", "MCE"}.
- Exprience: it has the following values: {"High", "Low",
"Medium"}.
- French: it has the following values:
{"Yes", "No"}.
- Reference: it has the following
values: {"Excellent", "Good", "Neutral"}.
- Decision:
it is a decision attribute that contains the following
values: {"Accept", "Reject"}.
The housing dataset
This data was taken from the Boston housing dataset located
at the UCI Machine Learning repository, available at
http://www.ics.uci.edu. It was first created by (Harrison
and Rubinfeld, 1978). It contains 506 objects with 13
conditional attributes and one decision attribute.
Furthermore, it should be noted that the housing dataset is
a regression dataset which means that the decision
attribute has continuous values. The conditional attributes
contain both continuous and nominal attributes. The
following is a description of each attribute: - CRIM: it is a continuous attribute that expresses per
capita crime rate by town. It has values in: [0.0062,
88.9762].
- ZN: it is a continuous attribute that
represents the proportion of residential land zoned for
lots over 25,000 sq.ft. It has values in: [0, 100].
- INDUS: it is a continuous attribute that shows the
proportion of non-retail business acres per town. It has
values in: [0.46, 27.74].
- CHAS: it is a nominal
attribute that represents Charles River dummy variable. It
has two values which are 1 if tract bounds river and 0
otherwise.
- NOX: it is a continuous attribute that
shows the nitric oxides concentration (parts per 10
million). It has values in: [0.385, 0.871].
- RM: it
is a continuous attribute that explains the average number
of rooms per dwelling. It has values in: [3.561, 8.78].
- AGE: it is a continuous attribute that expresses
proportion of owner-occupied units built prior to 1940. It
has values in: [2.9, 100].
- DIS: it is a continuous
attribute that shows weighted distances to five Boston
employment centres. It has values in: [1.1296, 12.1265].
- RAD: it is a nominal attribute that shows the index
of accessibility to radial highways. it has the integer
value from 1 to 24.
- TAX: it is a continuous attribute
that shows the full-value property-tax rate per $10,000.
It has values in: [187, 711].
- PTRATIO: it is a
continuous attribute that shows the pupil-teacher ratio by
town. It has values in: [12.6, 22].
- B: it is a
continuous attribute that can be expressed by 1000(Bk -
0.63)^2 where Bk is the proportion of blacks by town. It
has values in: [0.32, 396.9].
- LSTAT: it is a
continuous attribute that illustrates the percentage of
lower status of the population. It has values in: [1.73,
37.97].
- MEDV: it is a continuous attribute that shows
the median value of owner-occupied homes in $1000's. It
has values in: [5, 50].
The wine dataset
This dataset is a classification dataset introduced first
by (Forina, et al) which is commonly used as benchmark for
simulation in the machine learning area. Additionally, it
is available at the KEEL dataset repository (Alcala-Fdez,
2009), available at http://www.keel.es/. It consists of 178
instances with 13 conditional attributes and one decision
attribute where all conditional attributes have continuous
values. The description of each attribute is as follows:
- alcohol: it has a range in: [11, 14.9].
- malid_acid: it has a range in: [0.7, 5.8].
- ash:
it has a range in: [1.3, 3.3].
- alcalinity_of_ash: it
has a range in: [10.6, 30.0].
- magnesium: it has a
range in: [70, 162].
- total_phenols: it has a range
in: [0.9, 3.9].
- flavanoids: it has a range in: [0.3
5.1].
- nonflavanoid_phenols: it has a range in: [0.4
3.6].
- proanthocyanins: it has a range in: [0.4 3.6].
- color_intensity: it has a range in: [1.2 13.0].
- hue: it has a range in: [0.4 1.8].
- od: it has a range
in: [1.2 4.0].
- proline: it has a range in: [278
1680].
- class: it is nominal decision attribute that
has values: {1, 2, 3}.
The pima dataset
It was taken from the pima Indians diabetes dataset which
is available at the KEEL dataset repository (Alcala-Fdez,
2009), available at http://www.keel.es/. It was first
created by National Institute of Diabetes and Digestive and
Kidney Diseases. It contains 768 objects with 8 continuous
conditional attributes. The description of each attribute
is as follows: - preg: it represents number
of times pregnant and has values in: [1, 17].
- plas:
it represents plasma glucose concentration a 2 hours in an
oral glucose tolerance test and has values in: [0.0,
199.0].
- pres: it represents diastolic blood pressure
(mm Hg) and has values in: [0.0, 122.0].
- skin: it
represents triceps skin fold thickness (mm) and has values
in: [0.0, 99.0].
- insu: it represents 2-hour serum
insulin (mu U/ml) and has values in: [0.0, 846.0].
- mass: it represents body mass index (weight in kg/(height
in m)^2) and has values in: [0.0, 67.1].
- pedi: it
represents diabetes pedigree function and has values in:
[0.078, 2.42].
- age: it represents age (years) and has
values in: [21.0, 81.0].
- class: it is a decision
attribute and has values in: [1, 2].
References
M. Forina, E. Leardi, C. Armanino, and S. Lanteri, "PARVUS
- An Extendible Package for Data Exploration,
Classification and Correlation", Journal of Chemonetrics,
vol. 4, no. 2, p. 191 - 193 (1988).
D. Harrison, and D. L. Rubinfeld, "Hedonic Prices and the
Demand for Clean Air", J. Environ. Economics & Management,
vol.5, 81-102 (1978).
J. Alcala-Fdez, L. Sanchez, S. Garcia, M. J. del Jesus, S.
Ventura, J. M. Garrell, J. Otero, C. Romero, J. Bacardit,
V. M. Rivas, J. C. Fernandez, and F. Herrera, "KEEL: A
Software Tool to Assess Evolutionary Algorithms to Data
Mining Problems", Soft Computing vol. 13, no. 3, p. 307 -
318 (2009).
J. Komorowski, Z. Pawlak, L. Polwski, and A. Skowron,
"Rough Sets: A Tutorial", In S. K. Pal and A. Skowron,
editors, Rough Fuzzy Hybridization, A New Trend in Decision
Making, pp. 3 - 98, Singopore, Springer (1999).