Learn R Programming

rfriend (version 1.0.0)

f_bestNormalize: f_bestNormalize: Automated Data Normalization with bestNormalize

Description

Applies optimal normalization transformations using 'bestNormalize', provides diagnostic checks, and generates comprehensive reports.

Usage

f_bestNormalize(
  data,
  alpha = 0.05,
  plots = FALSE,
  data_name = NULL,
  output_type = "off",
  output_file = NULL,
  output_dir = NULL,
  save_in_wdir = FALSE,
  close_generated_files = FALSE,
  open_generated_files = TRUE,
  ...
)

Value

Returns an object of class `f_bestNormalize` containing:

  • transformed_data Normalized vector.

  • bestNormalize Full bestNormalize object from original package.

  • data_name Name of the analyzed dataset.

  • transformation_name Name of selected transformation.

  • shapiro_original Shapiro-Wilk test results for original data.

  • shapiro_transformed Shapiro-Wilk test results for transformed data.

  • norm_stats Data frame of normality statistics for all methods.

  • rmd Rmd code if outputype = "rmd".

Also generates reports in specified formats, when using output to console and plots = TRUE, the function prints QQ-plots, Histograms and a summary data transformation report.

#' @return An object of class 'f_bestNormalize' containing results from "bestNormalize", the input data, transformed data, Shapiro-Wilk test on original and transformed data. Using the option "output_type", it can also generate output in the form of: R Markdown code, 'Word', or 'pdf' files. Includes print and plot methods for objects of class 'f_bestNormalize'.

Arguments

data

Numeric vector or single-column data frame.

alpha

Numeric. Significance level for normality tests (default = 0.05).

plots

Logical. If TRUE, plots Q-Q plots and Histograms of the original and transformed data. Default is FALSE.

data_name

A character string to manually set the name of the data for plot axis and reporting. Default extracts name from input object. data.

output_type

Character. Output format:"console", "pdf", "word", "rmd", or "off". The option "console" forces output to be printed. Default is "off".

output_file

Character. Custom output filename (optional).

output_dir

Character. Output directory (default = tempdir()).

save_in_wdir

Logical. Save in working directory (default = FALSE).

close_generated_files

Logical. If TRUE, closes open 'Word' files. This to be able to save the newly generated file by the f_bestNormalize() function. 'Pdf' files should also be closed before using the function and cannot be automatically closed. Default is FALSE.

open_generated_files

Logical. If TRUE, Opens the generated output file, this to directly view the results after creation. Files are stored in tempdir(). Default is TRUE.

...

Additional arguments passed to bestNormalize.

Author

Sander H. van Delden plantmind@proton.me

Details

This is a wrapper around the 'bestNormalize' package. Providing a fancy output and the settings of 'bestNormalize' are tuned based on sample size n. If n < 100, loo = TRUE, allow_orderNorm = FALSE and r doesn't matter as loo = TRUE. If 100 <= n < 200, loo = FALSE, allow_orderNorm = TRUE and r = 50. If n >= 200, loo = FALSE, allow_orderNorm = TRUE, r = 10. These setting can be overwritten by user options.

This function requires [Pandoc](https://github.com/jgm/pandoc/releases/tag) (version 1.12.3 or higher), a universal document converter.

  • Windows: Install Pandoc and ensure the installation folder
    (e.g., "C:/Users/your_username/AppData/Local/Pandoc") is added to your system PATH.

  • macOS: If using Homebrew, Pandoc is typically installed in "/usr/local/bin". Alternatively, download the .pkg installer and verify that the binary’s location is in your PATH.

  • Linux: Install Pandoc through your distribution’s package manager (commonly installed in "/usr/bin" or "/usr/local/bin") or manually, and ensure the directory containing Pandoc is in your PATH.

  • If Pandoc is not found, this function may not work as intended.

References

Peterson, C. (2025). bestNormalize: Flexibly calculate the best normalizing transformation for a vector. Available at: https://cran.r-project.org/package=bestNormalize

Examples

Run this code
# \donttest{
# Create some skewed data (e.g., using a log-normal distribution).
skewed_data <- rlnorm(100, meanlog = 0, sdlog = 1)

# Use set.seed to keep the outcome of bestNormalize stable.
set.seed(123)

# Transform the data and store all information in f_bestNormalize_out.
f_bestNormalize_out <- f_bestNormalize(skewed_data)

# Print the output.
print(f_bestNormalize_out)

# Show histograms and QQplots.
plot(f_bestNormalize_out)

# Directly store the transformed_data from f_bestNormalize and force to show
# plots and transformation information.
transformed_data <- f_bestNormalize(skewed_data, output_type = "console")$transformed_data

# Any other transformation can be choosen by using:
boxcox_transformed_data <- f_bestNormalize(skewed_data)$bestNormalize$other_transforms$boxcox$x.t
# and substituting '$boxcox' with the transformation of choice.

#To print rmd output set chunck option to results = 'asis' and use:
f_bestNormalize_rmd_out <- f_bestNormalize(skewed_data, output_type = "rmd")
cat(f_bestNormalize_rmd_out$rmd)
# }

Run the code above in your browser using DataLab