MoTBFs (version 1.2)

dataMining: Functions to Manipulate a Dataset

Description

Collection of functions for discretizing, standardizing, converting factors to characters and other usufull methods to manipulate datasets.

Usage

whichDiscrete(dataset, discreteVariables)

discreteVariables_as.character(dataset, discreteVariables)

standardizeDataset(dataset)

discretizeVariablesEWdis(dataset, numIntervals, factor = FALSE, binary = FALSE)

discreteVariablesStates(namevariables, discreteData)

nstates(DiscreteVariablesStates)

quantileIntervals(X, numIntervals)

scaleData(dataset, scale)

Arguments

dataset

A dataset of class "data.frame". Tha variables of the dataset can be discrete and continuous.

discreteVariables

A "character" array with the name of the discrete variables

numIntervals

Numbers of intervals used to split the domain.

factor

By default FALSE, i.e. The variables are taken as "character"; if TRUE, they would be taken as "factor".

binary

By default FALSE, i.e. only binary entries are used for continuous variables; if TRUE, binary entries are used to discretize the full data taking into account the states the discrete variables.

namevariables

an array with the names of the varibles.

discreteData

A discretized dataset of class "data.frame".

DiscreteVariablesStates

The output of the function discreteVariablesStates.

X

A "numeric" vector with the data values of a continuous variable.

scale

A "numeric" vector if is a singles variable,if not a "list" containing the name of the variable and the scale value.

Details

whichDiscrete() selects the position of the discrete variables.

discreteVariables_as.character() transforms the values of the discrete variables to character values.

standardizeDataset() standarizes a data set.

discretizeVariablesEWdis() discretizes a dataset using intervals with equal width.

discreteVariablesStates() extracts the states of the qualitative variables.

nstates() computes the length of the states of the discrete variables.

quantileIntervals() selects the quantiles of a variable taking into account the number of intervals you want to split the domain of the variable.

Examples

Run this code
# NOT RUN {
## dataset: 2 continuous variables, 1 discrete variable.
data <- data.frame(X = rnorm(100),Y = rexp(100,1/2), Z = as.factor(rep(c("s","a"), 50)))
disVar <- "Z" ## Discrete variable
class(data[,disVar]) ## factor

data <- discreteVariables_as.character(dataset = data, discreteVariables = disVar)
class(data[,disVar]) ## character

whichDiscrete(dataset = data, discreteVariables = "Z")

standData <- standardizeDataset(dataset = data)

disData <- discretizeVariablesEWdis(dataset = data, numIntervals = 3)

l <- discreteVariablesStates(namevariables = names(data), discreteData = disData)

nstates(DiscreteVariablesStates = l)

## Continuous variables
quantileIntervals(X = data[,1], numIntervals = 4)
quantileIntervals(X = data[,2], numIntervals = 10)
# }

Run the code above in your browser using DataLab