This function allows to plot principal components analysis (PCA), with options to show center and potential outliers for each of the groups (columns of data). The main points of this implementation consist in offering bagplots to highlight groups of columns/samples and support to (object-oriented) output from limma and wrProteo.
plotPCAw(
dat,
sampleGrp,
tit = NULL,
useSymb = c(21:25, 9:12, 3:4),
center = TRUE,
scale. = TRUE,
colBase = NULL,
useSymb2 = NULL,
cexTxt = 1,
cexSub = 0.6,
displBagPl = TRUE,
outCoef = 2,
getOutL = FALSE,
showLegend = TRUE,
nGrpForMedian = 6,
pointLabelPar = NULL,
rowTyName = "genes",
rotatePC = NULL,
suplFig = TRUE,
callFrom = NULL,
silent = FALSE,
debug = FALSE
)
This function make a plot and may retiurn an optional matrix of outlyer-data (depending on argument getOutL
)
(matrix, data.frame, MArrayLM-object or list) data to plot. Note: NA
-values cannot be processed - all lines with non-finite data (eg NA
) will be omitted !
In case of MArrayLM-object or list dat
must conatain list-element named 'datImp','dat' or 'data'.
(character or factor) should be factor describing groups of replicates, NAs are not supported
(character) custom title
(integer) symbols to use (see also par
)
(logical or numeric) decide if variables should be shifted to be zero centered, argument passed to prcomp
(logical or numeric) decide if scaling to obtain unit variance, argument passed to prcomp
Alternatively, a vector of length equal the number of columns of x can be supplied. The value is passed to scale.
(character or integer) use custom colors
(integer) symbol to mark group-center (no mark of group-center if default NULL) (equivalent to pch
, see also par
)
(integer) expansion factor for text (see also par
)
(integer) expansion factor for subtitle line text (see also par
)
(logical) if TRUE
, show bagPlot (group-center) if >3 points per group otherwise the average-confidence-interval
(numeric) parameter for defining outliers, see addBagPlot
(equivalent to range
in boxplot
)
(logical) return outlyer samples/values
(logical or character) toggle to display legend, if character it designes the location within the plot to display the legend ('bottomleft','topright', etc..)
(integer) decide if group center should be displayed via its average or median value: If group has less than 'nGrpForMedian' values, the average will be used, otherwise the median; if NULL
no group centers will be displayed
(character) define formatting for optional labels next to points in main figure (ie PC1 vs PC2); may be TRUE
or list containing elments 'textLabel', 'textCol', 'textCex',
'textOffSet', 'textAdj' for fine-tuning
(character) for subtitle : specify nature of rows (genes, proteins, probesets,...)
(integer) optional rotation (by -1) for fig&ure of the principal components specified by index
(logical) to include plots vs 3rd principal component (PC) and Screeplot
(character) allow easier tracking of messages produced
(logical) suppress messages
(logical) display additional messages for debugging
One motivation for this implementation of plotting PCA was to provide a convenient way for doing so with of MArrayLM-objects or lists as created by limma and wrProteo.
Another motivation for this implementation come from integrating the idea of bag-plots to better visualize different groups of points (if they can be organized so beforehand as distinct groups) : The main body of data is shown as 'bag-plots' (a bivariate boxplot, see Bagplot) with different transparent colors to highlight the core part of different groups (if they contain more than 2 values per group). Furthermore, group centers are shown as average or median (see 'nGrpForMedian') with stars & index-number (if <25 groups).
Layout is automatically set to 2 or 4 subplots (if plotting more than 2 principal components makes sense).
Note : This function uses prcomp
for calculating Eigenvectors and principal components, with default center=TRUE
and scale.=FALSE
(different to princomp()
. which standardizes by default).
This way the user has to option to intervene on arguments center
and scale.
. However, this should be done with care.
Note: NA
-values cannot (by definition) be processed by (any) PCA - all lines with any non-finite values/content (eg NA
) will be omitted !
Note : Package RColorBrewer may be used if available.
For more options with PCA (and related methods) you may also see also the package FactoMineR which provides a very wide spectrum of possibiities, in particular for combined numeric and categorical data.
prcomp
(used here for the PCA underneith) , princomp
, see the package FactoMineR for multiple plotting options or ways of combining categorical and numeric data
set.seed(2019); dat1 <- matrix(round(c(rnorm(1000), runif(1000,-0.9,0.9)),2),
ncol=20, byrow=TRUE) + matrix(rep(rep(1:5,6:2), each=100), ncol=20)
biplot(prcomp(dat1)) # traditional plot
(grp = factor(rep(LETTERS[5:1],6:2)))
plotPCAw(dat1, grp)
Run the code above in your browser using DataLab