Function to plot principal components analysis (PCA),
with options to show center and potential outliers for each of the groups (columns of data).
One of the specificities of this implementation is the integration of bag-plots to better visualize different groups of points
(if they can be organized so beforehand as distinct groups) :
The main body of data is shown as 'bag-plots' (a bivariate boxplot, see Bagplot)
with different transparent colors to highlight the core part of different groups (if they contain more than 2 values per group).
Furthermore, group centers are shown as average or median (see 'nGrpForMedian') with stars & index-number (if <25 groups).
Layout is automatically set to 2 or 4 subplots (if plotting more than 2 principal components makes sense).
Note : This function uses for calulating PCA prcomp
with default center=TRUE
and scale.=FALSE
, (different to princomp() which standardizes by default).
Note: NA
-values cannot (by definition) be processed by PCA - all lines with any non-finite values/content (eg NA
) will be omitted !
Note : Package RColorBrewer may be used if avaialble.
Finally, note that several other packages dedicated to PCA exist, for example FactoMineR offers
a very wide spectrum of possibiities, in particular for combined numeric and categorical data.
plotPCAw(
dat,
sampleGrp,
tit = NULL,
useSymb = c(21:25, 9:12, 3:4),
center = TRUE,
scale. = TRUE,
colBase = NULL,
useSymb2 = NULL,
displBagPl = TRUE,
getOutL = FALSE,
cexTxt = 1,
showLegend = TRUE,
nGrpForMedian = 6,
pointLabelPar = NULL,
rowTyName = "genes",
rotatePC = NULL,
suplFig = TRUE,
callFrom = NULL,
silent = FALSE
)
(matrix, list or data.frame) data to plot. Note: NA
-values cannot be processed - all lines with non-finite data (eg NA
) will be omitted !
(character or factor) should be factor describing groups of replicates, NAs are not supported
(character) custom title
(integer) symbols to use (see also par
)
(logical or numeric) decide if variables should be shifted to be zero centered, argument passed to prcomp
(logical or numeric) decide if scaling to obtain unit variance, argument passed to prcomp
Alternatively, a vector of length equal the number of columns of x can be supplied. The value is passed to scale.
(character or integer) use custom colors
(integer) symbol to mark group-center (no mark of group-center if default NULL) (equivalent to pch
, see also par
)
(logical) if TRUE
, show bagPlot (group-center) if >3 points per group otherwise the average-confidence-interval
(logical) return outlyer samples/values
(integer) expansion factor for text (see also par
)
(logical) toggle to display legend
(integer) decide if group center should be displayed via its average or median value: If group has less than 'nGrpForMedian' values, the average will be used, otherwise the median; if NULL
no group centers will be displayed
(character) define formatting for optional labels next to points in main figure (ie PC1 vs PC2); may be TRUE
or list containing elments 'textLabel','textCol','textCex',
'textOffSet','textAdj' for fine-tuning
(character) for subtitle : specify nature of rows (genes, proteins, probesets,...)
(integer) optional rotation (by -1) for figure of the principal components specified by index
(logical) to include plots vs 3rd principal component (PC) and Screeplot
(character) allow easier tracking of message(s) produced
(logical) suppress messages
plot and optional matrix of outlyer-data
(used in this function for the PCA underneith:) prcomp
, princomp
, the package FactoMineR
# NOT RUN {
set.seed(2019); dat1 <- matrix(round(c(rnorm(1000), runif(1000,-0.9,0.9)),2),
ncol=20, byrow=TRUE) + matrix(rep(rep(1:5,6:2), each=100), ncol=20)
biplot(prcomp(dat1)) # traditional plot
(grp = factor(rep(LETTERS[5:1],6:2)))
plotPCAw(dat1,grp)
# }
Run the code above in your browser using DataLab