Learn R Programming

PCADSC (version 0.8.0)

PCADSC: Compute the elements used for PCADSC

Description

Principal Component Analysis-based Data Structure Comparison tools that prepare a dataset for various diagnostic plots for comparing data structures. More specifically, PCADSC performs PCA on two subsets of a dataset in order to compare the structures of these datasets, e.g. to assess whether they can be analyzed pooled or not. The results of the PCAs are then manipulated in various ways and stored for easy plotting using the three PCADSC plotting tools, the CEPlot, the anglePlot and the chromaPlot.

Usage

PCADSC(data, splitBy, vars = NULL, doCE = TRUE, doAngle = TRUE,
  doChroma = TRUE, B = 10000)

Arguments

data
A dataset, either a data.frame or a matrix with variables in columns and observations in rows. Note that tibbles and data.tables are accepted as input, but they are instantly converted to data.frames. Future releases might include specific implementation for these data representations.
splitBy
The name of a grouping variable with two levels defining the two groups within the dataset whose data structures we wish to compare.
vars
The variable names in data to include in the PCADSC. If NULL (the default), all variables except for splitBy are used.
doCE
Logical. Should the cumulative eigenvalue plot information be computed?
doAngle
Logical. Should the angle plot information be computed?
doChroma
Logical. Should the chroma plot information be computed?
B
A positive integer. The number of resampling steps performed in the cumulative eigenvalue step, if relevant.

Value

An object of class PCADSC, which is a named list with the following entries:
pcaRes
The results of the PCAs performed on the first subset, the second subset and the full subset and also information about the data splitting.
CEInfo
The information needed for making a cumulative eigenvalue plot (see CEPlot).
angleInfo
The information needed for making an angle plot (see anglePlot).
chromaInfo
The information needed for making a chroma plot (see chromaPlot).
data
The original (full) dataset.
splitBy
The name of the variable that splits the dataset in two.
vars
The names of the variables in the dataset that should be used for PCA.
B
The number of resamplings performed for the CEInfo.

Details

PCADSC presents a suite of non-parametric, visual tools for comparing the strucutures of two subsets of a dataset. These tools are all based on PCA (principal component analysis) and thus they can be interpreted as comparisons of the covariance matrices of the two (sub)datasets. PCADSC performs PCA using singular value decomposition for increased numerical precision. Before performing PCA on the full dataset and the two subsets, all variables within each such dataset are standardized.

See Also

doCE, doAngle, doChroma, CEPlot, anglePlot, chromaPlot

Examples

Run this code
#load iris data
data(iris)

#Define grouping variable, grouping the observations by whether their species is
#Setosa or not
iris$group <- "setosa"
iris$group[iris$Species != "setosa"] <- "non-setosa"
iris$Species <- NULL

## Not run: ------------------------------------
# #Make a full PCADSC object, splitting the data by "group"
# irisPCADSC <- PCADSC(iris, "group")
# 
# #The three plotting functions can now be called on irisPCADSC:
# CEPlot(irisPCADSC)
# anglePlot(irisPCADSC)
# chromaPlot(irisPCADSC)
# 
# #Make a partial PCADSC object with no angle plot information and add
# #angle plot information afterwards:
# irisPCADSC2 <- PCADSC(iris, "group", doAngle = FALSE)
# irisPCADSC2 <- doAngle(irisPCADSC)
## ---------------------------------------------

#Make a partial PCADSC obejct with no plotting (angle/CE/chroma)
#information:
irisPCADSC_minimal <- PCADSC(iris, "group", doAngle = FALSE,
  doCE = FALSE, doChroma = FALSE)

Run the code above in your browser using DataLab