Filters out variables from a dataset that exhibit a coefficient of variation below a specified threshold, ensuring the retention of variables with meaningful variability.
deleteNearZeroCoefficientOfVariation(X, LIMIT = 0.1)
Return a list of two objects:
X
: The new data.frame X filtered.
variablesDeleted
: The variables that have been removed by the filter.
coeff_variation
: The coefficient variables per each variable tested.
Numeric matrix or data.frame. Explanatory variables. Qualitative variables must be transform into binary variables.
Numeric. Cutoff for minimum variation. If coefficient is lesser than the limit, the variables are removed because not vary enough (default: 0.1).
Pedro Salguero Garcia. Maintainer: pedsalga@upv.edu.es
The deleteNearZeroCoefficientOfVariation
function is a pivotal tool in data preprocessing,
especially when dealing with high-dimensional datasets. The coefficient of variation (CoV) is a
normalized measure of data dispersion, calculated as the ratio of the standard deviation to the mean.
In many scientific investigations, variables with a low CoV might be considered as offering
limited discriminative information, potentially leading to noise in subsequent statistical analyses.
By setting a threshold through the LIMIT
parameter, this function provides a systematic approach
to identify and exclude variables that do not meet the desired variability criteria. The underlying
rationale is that variables with a CoV below the set threshold might not contribute significantly
to the variability of the dataset and could be redundant or even detrimental for certain analyses.
The function returns a modified dataset, a list of deleted variables, and the computed coefficients
of variation for each variable. This comprehensive output ensures that researchers are well-informed
about the preprocessing steps and can make subsequent analytical decisions with confidence.
data("X_proteomic")
X <- X_proteomic
filter <- deleteNearZeroCoefficientOfVariation(X, LIMIT = 0.1)
Run the code above in your browser using DataLab