iprcomp: Improved Function for Obtaining Principal Components

Description

Calculate principal components when data contains missing values.

Usage

iprcomp(dat, center = TRUE, scale. = FALSE)

Arguments

dat

n by p matrix. rows are subjects and columns are variables

center

logical. Indicates if each row of dat needs to be mean-centered

scale.

logical. Indicates if each row of dat needs to be scaled to have variance one

Value

A list of 3 elements

sdev

square root of the eigen values

rotation

a matrix with columns are eigen vectors, i.e., projection direction

a matrix with columns are principal components

Details

We first set missing values as median of the corresponding variable, then call the function prcomp. This is a very simple solution. The user can use their own imputation methods before calling prcomp.

Examples

Run this code

# NOT RUN {
# generate simulated data
set.seed(1234567)
dat.x = matrix(rnorm(500), nrow = 100, ncol = 5)
dat.y = matrix(rnorm(500, mean = 2), nrow = 100, ncol = 5)
dat = rbind(dat.x, dat.y)
grp = c(rep(0, 100), rep(1, 100))
print(dim(dat))

res = iprcomp(dat, center = TRUE, scale.  =  FALSE)

# for each row, set one artificial missing value
dat.na=dat
nr=nrow(dat.na)
nc=ncol(dat.na)
for(i in 1:nr)
{
  posi=sample(x=1:nc, size=1)
  dat.na[i,posi]=NA
}

res.na = iprcomp(dat.na, center = TRUE, scale.  =  FALSE)

##
# pca plot
##
par(mfrow = c(3,1))
# original data without missing values
plot(x = res$x[,1], y = res$x[,2], xlab = "PC1", ylab  =  "PC2")
# perturbed data with one NA per probe 
# the pattern of original data is captured
plot(x = res.na$x[,1], y = res.na$x[,2], xlab = "PC1", ylab  =  "PC2", main = "with missing values")
par(mfrow = c(1,1))

# }

Run the code above in your browser using DataLab