```
# discretize continuous data into factors.
discretize(data, method, breaks = 3, ordered = FALSE, ..., debug = FALSE)
# screen continuous data for highly correlated pairs of variables.
dedup(data, threshold, debug = FALSE)
```

data

a data frame containing numeric columns (for

`dedup`

) or a
combination of numeric or factor columns (for

).threshold

a numeric value between zero and one, the absolute
correlation used a threshold in screening highly correlated pairs.

method

a character string, either *interval
discretization*, *quantile discretization*
(the default) or *Hartemink's pairwise mutual
information* method.

`interval`

for `quantile`

for `hartemink`

for breaks

if

`method`

is set to `hartemink`

, an integer number,
the number of levels the variables are to be discretized into. Otherwise,
a vector of integer numbers, one for each column of the data set, specifying
the number of levels for each variable.ordered

a boolean value. If

`TRUE`

the discretized variables are
returned as ordered factors instead of unordered ones.…

additional tuning parameters, see below.

debug

a boolean value. If

`TRUE`

a lot of debugging output is
printed; otherwise the function is completely silent.`discretize`

returns a data frame with the same structure (number of
columns, column names, etc.) as `data`

, containing the discretized
variables. `dedup`

returns a data frame with a subset of the columns of `data`

.`discretize`

takes a data frame of continuous variables as its first
argument and returns a secdond data frame of discrete variables, transformed
using of three methods: `interval`

, `quantile`

or `hartemink`

. `dedup`

screens the data for pairs of highly correlated variables, and
discards one in each pair.```
data(gaussian.test)
d = discretize(gaussian.test, method = 'hartemink', breaks = 4, ibreaks = 20)
plot(hc(d))
d2 = dedup(gaussian.test)
```

Run the code above in your browser using DataLab