Learn R Programming

wrMisc (version 2.1.0)

blockNormalize: Normalize Blockwise Two Or Three Datasets

Description

This function provides for normalizing 2 entire data-sets against each other while preserving each set's characteristics. Several methods are possible: normalize blocks to common range, blocks to common median, to common distribution per block (quantile-normalization)

Usage

blockNormalize(
  x,
  y,
  z = NULL,
  method = "quantile.block",
  range = c(0, 1),
  q = c(0, 1),
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)

Value

This function returns a list of the normalized sets for x and y (and z if given)

Arguments

x

(matrix or data.frame) first data-set (must be at least 2 columns and 3 lines)

y

(matrix or data.frame) second data-set (must be at least 2 columns and 3 lines)

z

(NULL or matrix or data.frame) optronal 3rd data-set (if not NULL at least 2 columns and 3 lines)

method

Character string or function specifying the normalization method: - `"quantile.block"` (default): Aligns the distributions of all input datasets to a **common target distribution** (the mean of sorted values at each quantile). After normalization, `sort(x)`, `sort(y)`, and `sort(z)` (if provided) are **identical**. This is useful for making datasets directly comparable. - `"rescale"`: Rescales each dataset **independently** to the range specified by `range`. Preserves the shape of each distribution but standardizes the scale (e.g., to `[0, 1]`). - `"median"`: This precedure works in a proportional matter. It centers each dataset by deviding by its group median and multiplying by the overall median. This removes location differences while preserving spread. - `"logMedian"`: This precedure is adoped for log-data. It centers each dataset by subtracting its group-median and addding the overall median. Removes location differences while preserving spread. - `"none"`: No normalization

range

(numeric vector, length=2) vector specifying the target values to be used in case method="rescale" for rescaling; Default values 0 and 1 point to the min and max values to be adjusted to the value specified in this argument, ie the 'range'. When argument method="q" is other than 0 and 1, the respective quantiles will get adjusted to the values of this arument, other data will treated in a linear fashion.

q

(numeric vector, length=2) vector specifying the quantiles to be used in case method="rescale" for rescaling; Default values 0 and 1 point to the min and max values to be adjusted to the value specified in this argument

silent

(logical) suppress messages

debug

(logical) additional messages for debugging

callFrom

(character) allow easier tracking of messages produced

Details

Main methods for choice :

- quantile.block: This method runs a quantile-normalization on each block (but NOT on each column). Thus, all datasets will get the same (overall) distributions.

- rescale: Apply a linear transformation to each dataset to fit within given `range`.

- median: Normalize each block to get the same overall median (individual columns may deviate, the original order is preserved).

- logMedian: This procedure is an median normalization adopted to log-data. Instead of dividing and multiplying per-group medians will be subtracted and the target median will be added.

- none: Besides, it is also possible to not do any normalization, the output will be identical to the input

NA Handling: For all approaches NAs will get ignored. With method='quantile.block' all precise positions where there is an NA in any of the data x, y or z will be ignored to maintain equal numbers of data. This is due to the fact that regular quantile-normalization requires equal numbers of data per column (here one dataset is treated like a column in regular normalization). This means that the dataset with the highest number of NAs will indirectly has the capacity to mask valid data in other data-sets.

See Also

normalizeThis

normalizeThis for normalizing all columns of single data-set, quantile, scale for standard scaling

Examples

Run this code
## Basic usage with vectors 
x <- c(1, 5, 3, 7, 2)
y <- c(10, 20, 30, 40, 50)
## Align distributions (default: x and y will have identical distributions)
norm1 <- blockNormalize(x, y)
table(sort(norm1$x) == sort(norm1$y))  # all TRUE with 'quantile.block'

## matrix-example (like with omics-data)
set.seed(2026); mat1 <- matrix(rnorm(70), nrow=10) *5  # 10 lines x 10 samples
set.seed(2025); mat2 <- matrix(rnorm(70, mean=5), nrow=10)^2 -15
mat2[which(mat2 < 1.8)] <- mat2[which(mat2 < 1.8)] + 32
norm2 <- blockNormalize(mat1, mat2, method="rescale", range=c(0.1, 10))
sapply(norm2, range)
norm3 <- blockNormalize(mat1, mat2, method="median")
sapply(norm3, quantile, c(0.25,0.5,0.75), na.rm=TRUE)
norm4 <- blockNormalize(mat1, mat2, method="quantile.block")
sapply(norm4, quantile, c(0.25,0.5,0.75), na.rm=TRUE)

## the resulting distribution 
layout(matrix(1:4, ncol=2))
boxplot(cbind(mat1, NA, mat2), main="initial", las=1)
boxplot(cbind(norm2$x, NA, norm2$y), main="rescale block", las=1)
boxplot(cbind(norm3$x, NA, norm3$y), main="median block", las=1)
boxplot(cbind(norm4$x, NA, norm4$y), main="quantile.block", las=1)

## the overall distribution of blocks
layout(matrix(1:4, ncol=2))
boxplot(cbind(mat1=as.numeric(mat1), mat2=as.numeric(mat2)), main="initial (overall)",las=1)
boxplot(cbind(mat1=as.numeric(norm2$x), mat2=as.numeric(norm2$x)), 
  main="rescale block norm (overall)",las=1)
boxplot(cbind(mat1=as.numeric(norm3$x), mat2=as.numeric(norm3$x)), 
  main="median block norm (overall)",las=1)
boxplot(cbind(mat1=as.numeric(norm4$x), mat2=as.numeric(norm4$x)), 
  main="quantile.block norm (overall)",las=1)


Run the code above in your browser using DataLab