blockNormalize: Normalize Blockwise Two Or Three Datasets

Description

This function provides for normalizing 2 entire data-sets against each other while preserving each set's characteristics. Several methods are possible: normalize blocks to common range, blocks to common median, to common distribution per block (quantile-normalization)

Usage

blockNormalize(
  x,
  y,
  z = NULL,
  method = "quantile.block",
  range = c(0, 1),
  q = c(0, 1),
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)

Value

This function returns a list of the normalized sets for x and y (and z if given)

Arguments

x: (matrix or data.frame) first data-set (must be at least 2 columns and 3 lines)
y: (matrix or data.frame) second data-set (must be at least 2 columns and 3 lines)
z: (NULL or matrix or data.frame) optronal 3rd data-set (if not NULL at least 2 columns and 3 lines)
method: Character string or function specifying the normalization method: - `"quantile.block"` (default): Aligns the distributions of all input datasets to a **common target distribution** (the mean of sorted values at each quantile). After normalization, `sort(x)`, `sort(y)`, and `sort(z)` (if provided) are **identical**. This is useful for making datasets directly comparable. - `"rescale"`: Rescales each dataset **independently** to the range specified by `range`. Preserves the shape of each distribution but standardizes the scale (e.g., to `[0, 1]`). - `"median"`: This precedure works in a proportional matter. It centers each dataset by deviding by its group median and multiplying by the overall median. This removes location differences while preserving spread. - `"logMedian"`: This precedure is adoped for log-data. It centers each dataset by subtracting its group-median and addding the overall median. Removes location differences while preserving spread. - `"none"`: No normalization
range: (numeric vector, length=2) vector specifying the target values to be used in case method="rescale" for rescaling; Default values 0 and 1 point to the min and max values to be adjusted to the value specified in this argument, ie the 'range'. When argument method="q" is other than 0 and 1, the respective quantiles will get adjusted to the values of this arument, other data will treated in a linear fashion.
q: (numeric vector, length=2) vector specifying the quantiles to be used in case method="rescale" for rescaling; Default values 0 and 1 point to the min and max values to be adjusted to the value specified in this argument
silent: (logical) suppress messages
debug: (logical) additional messages for debugging
callFrom: (character) allow easier tracking of messages produced

Details

Main methods for choice :

- quantile.block: This method runs a quantile-normalization on each block (but NOT on each column). Thus, all datasets will get the same (overall) distributions.

- rescale: Apply a linear transformation to each dataset to fit within given `range`.

- median: Normalize each block to get the same overall median (individual columns may deviate, the original order is preserved).

- logMedian: This procedure is an median normalization adopted to log-data. Instead of dividing and multiplying per-group medians will be subtracted and the target median will be added.

- none: Besides, it is also possible to not do any normalization, the output will be identical to the input

NA Handling: For all approaches NAs will get ignored. With method='quantile.block' all precise positions where there is an NA in any of the data x, y or z will be ignored to maintain equal numbers of data. This is due to the fact that regular quantile-normalization requires equal numbers of data per column (here one dataset is treated like a column in regular normalization). This means that the dataset with the highest number of NAs will indirectly has the capacity to mask valid data in other data-sets.

Examples

Run this code

## Basic usage with vectors 
x <- c(1, 5, 3, 7, 2)
y <- c(10, 20, 30, 40, 50)
## Align distributions (default: x and y will have identical distributions)
norm1 <- blockNormalize(x, y)
table(sort(norm1$x) == sort(norm1$y))  # all TRUE with 'quantile.block'

## matrix-example (like with omics-data)
set.seed(2026); mat1 <- matrix(rnorm(70), nrow=10) *5  # 10 lines x 10 samples
set.seed(2025); mat2 <- matrix(rnorm(70, mean=5), nrow=10)^2 -15
mat2[which(mat2 < 1.8)] <- mat2[which(mat2 < 1.8)] + 32
norm2 <- blockNormalize(mat1, mat2, method="rescale", range=c(0.1, 10))
sapply(norm2, range)
norm3 <- blockNormalize(mat1, mat2, method="median")
sapply(norm3, quantile, c(0.25,0.5,0.75), na.rm=TRUE)
norm4 <- blockNormalize(mat1, mat2, method="quantile.block")
sapply(norm4, quantile, c(0.25,0.5,0.75), na.rm=TRUE)

## the resulting distribution 
layout(matrix(1:4, ncol=2))
boxplot(cbind(mat1, NA, mat2), main="initial", las=1)
boxplot(cbind(norm2$x, NA, norm2$y), main="rescale block", las=1)
boxplot(cbind(norm3$x, NA, norm3$y), main="median block", las=1)
boxplot(cbind(norm4$x, NA, norm4$y), main="quantile.block", las=1)

## the overall distribution of blocks
layout(matrix(1:4, ncol=2))
boxplot(cbind(mat1=as.numeric(mat1), mat2=as.numeric(mat2)), main="initial (overall)",las=1)
boxplot(cbind(mat1=as.numeric(norm2$x), mat2=as.numeric(norm2$x)), 
  main="rescale block norm (overall)",las=1)
boxplot(cbind(mat1=as.numeric(norm3$x), mat2=as.numeric(norm3$x)), 
  main="median block norm (overall)",las=1)
boxplot(cbind(mat1=as.numeric(norm4$x), mat2=as.numeric(norm4$x)), 
  main="quantile.block norm (overall)",las=1)