powered by
Cleans numeric columns by handling extreme values or imputing missing values. The function supports two main focuses: handling skewed distributions or imputing missing data.
impute( x, focus = c("skew", "missing"), method = c("winsorize", "iqr", "mean", "median"), percentile = NULL )
A numeric vector with cleaned or imputed values.
A numeric vector to be cleaned.
A character string indicating the focus. Options are:
"skew": Handle extreme values using percentile or IQR methods (default).
"skew"
"missing": Impute missing values.
"missing"
A character string specifying the method:
For focus = "skew":
focus = "skew"
"winsorize": Replace values outside specified percentiles (default).
"winsorize"
"iqr": Use IQR to limit extreme values.
"iqr"
For focus = "missing":
focus = "missing"
"mean": Replace missing values with the mean.
"mean"
"median": Replace missing values with the median.
"median"
A numeric value (percentile > 0) for winsorization. If not provided, defaults to 0.01 and 0.99.
x <- c(1, 2, 3, 100, 200, NA) # Winsorize to 1% and 99% impute(x, focus = "skew", method = "winsorize") # Replace missing values with the mean impute(x, focus = "missing", method = "mean")
Run the code above in your browser using DataLab