crazyfy
preprocess data for anomalies detection computational
routines with strange
: missing values
treatement, variables standardisation, eventual recoding in log,
treatment of character/factor variables.
crazyfy(data, do = c("factor", "log", "impute", "range"), id = NULL,
skewness.cutpoint = 2, NA.method = "mean", NA.value = 0,
verbose = FALSE)
Source data (data.frame or data.table).
character vector - List of processing steps to apply -- see details.
(optional) character - name of a preexisting variable to be used as ID.
numeric - value that is used to determine whether log recoding should be applied.
character - method to be used for missing values imputation;
one of "mean" or "value" (then using following parameter NA.value
).
numeric Value to be used to impute missing values when NA.method
if "value".
logical - should function display some details about processing.
Pre-processed data of classes data.table overloaded by crazy.data.table.
See here this list of possible pre-treatment operations.
* factor: Factors/characters are transformed into numeric by using term frequency<U+2013>inverse document frequency approach (td-idf). Note that we use the smooth weighting IDF weight, ie. we take the log of 1+N/nt where N is the number of observations and nt the frequency for the specific term t.
* log: compute log(x-min(x)). Done for all numeric variables having a distribution with skewness greater than skewness.cutpoint
* impute: impute missing values. Possible method, chosen with NA.method
are using variable average or a specific value then provided by NA.value
.
* range: standardize variable: (x-min(x))/max(x).
# NOT RUN {
library(stranger)
data(iris)
crazy <- crazyfy(iris[,1:4])
# }
Run the code above in your browser using DataLab