Learn R Programming

LCPA (version 1.0.0)

normalize: Column-wise Z-Score Standardization

Description

Standardizes each column of a numeric matrix or data frame to have mean zero and standard deviation one. This transformation is essential for many multivariate techniques that assume standardized inputs. The function preserves all dimension names and returns a pure numeric matrix with attributes storing original column means and standard deviations.

Usage

normalize(response)

Value

A standardized numeric matrix of dimension \(N \times I\) with attributes:

  • scaled:center: Vector of original column means (\(\mu_i\))

  • scaled:scale: Vector of original column standard deviations (\(\sigma_i\))

  • Row names: Preserved from original input's row names or row indices

  • Column names: Preserved from original input's column names

  • Values: Z-scores calculated as \(z_{ni} = \frac{x_{ni} - \mu_i}{\sigma_i}\)

where:

  • \(x_{ni}\) = original value for observation \(n\) and variable \(i\)

  • \(\mu_i\) = sample mean of variable \(i\): \(\mu_i = \frac{1}{N}\sum_{n=1}^{N}x_{ni}\)

  • \(\sigma_i\) = sample standard deviation of variable \(i\): \(\sigma_i = \sqrt{\frac{1}{N-1}\sum_{n=1}^{N}(x_{ni} - \mu_i)^2}\)

The denominator \(N-1\) provides an unbiased estimator of population variance.

Arguments

response

A numeric matrix or data frame of dimension \(N \times I\), where:

  • \(N\) = number of observations (rows)

  • \(I\) = number of variables (columns)

Non-numeric columns will be coerced to numeric with a warning. Missing values are not allowed and will cause the function to fail. Constant columns (zero variance) will produce NaN values.

Mathematical Details

For each column \(i\) in the input matrix \(X\), the standardization is performed as: $$Z_{\cdot i} = \frac{X_{\cdot i} - \bar{X}_{\cdot i}}{S_{X_{\cdot i}}}$$ where:

  • \(X_{\cdot i}\) is the \(i\)-th column vector of \(X\)

  • \(\bar{X}_{\cdot i}\) is the sample mean of column \(i\)

  • \(S_{X_{\cdot i}}\) is the sample standard deviation of column \(i\)

The resulting matrix \(Z\) has the properties: $$\frac{1}{N}\sum_{n=1}^{N}z_{ni} = 0 \quad \text{and} \quad \sqrt{\frac{1}{N-1}\sum_{n=1}^{N}z_{ni}^2} = 1$$ for all \(i = 1, \ldots, I\).

Examples

Run this code
# Basic usage with matrix
set.seed(123)
mat <- matrix(rnorm(30, mean = 5:7, sd = 1:3), ncol = 3,
              dimnames = list(paste0("Obs", 1:10), paste0("Var", 1:3)))
norm_mat <- normalize(mat)

# Verify attributes
attr(norm_mat, "scaled:center")  # Original column means
attr(norm_mat, "scaled:scale")   # Original column standard deviations

# Verify properties
apply(norm_mat, 2, mean)  # Should be near zero
apply(norm_mat, 2, sd)    # Should be exactly 1

# With data frame input
df <- as.data.frame(mat)
norm_df <- normalize(df)
all.equal(norm_mat, norm_df, check.attributes = FALSE)  # Should be identical

# Handling constant columns (produces NaN)
const_mat <- cbind(mat, Constant = rep(4.2, 10))
normalize(const_mat)

Run the code above in your browser using DataLab