PCA_biplot()
creates the PCA (Principal Component
Analysis) biplot with loadings for the new index rYWAASB
for simultaneous selection of genotypes by trait and WAASB index.
It shows rYWAASB
, rWAASB
and rWAASBY
indices (r: ranked) in a
biplot, simultaneously for a better differentiation of genotypes.
In PCA biplots controlling the color of variable using their
contrib i.e. contributions and cos2 takes place.
PCA_biplot(datap)
Returns a a list of dataframes
The data set
Ali Arminian abeyran@gmail.com
PCA is a machine learning method and dimension reduction technique. It is utilized to simplify large data sets by extracting a smaller set that preserves significant patterns and trends(1). According to Johnson and Wichern (2007), a PCA explains the var-covar structure of a set of variables
X_1, X_2, ..., X_p with a less linear
combinations of such variables. Moreover the common
objective of PCA is 1) data reduction and 2) interpretation.
Biplot and PCA: The biplot is a method used to visually represent both the rows and columns of a data table. It involves approximating the table using a two-dimensional matrix product, with the aim of creating a plane that represents the rows and columns. The techniques used in a biplot typically involve an eigen decomposition, similar to the one used in PCA. It is common for the biplot to be conducted using mean-centered and scaled data(2).
Algebra of PCA: As Johnson and Wichern (2007) stated(3), if the random vector X' = {X_1, X_2,...,X_p } have the covariance matrix with eigenvalues
_1 _2 ... _p 0.
Regarding the linear combinations: Y_1 = a'_1X = a_11X_1 + a_12X_2 + ... + a_1PX_p Y_2 = a'_2X = a_21X_1 + a_22X_2 + ... + a_2pX_p ... Y_p = a'_pX = a_p1X_1 + a_p2X_2 + ... + a_ppX_p
where Var(Y_i) = a'_ia_i , i = 1, 2, ..., p Cov(Y_i, Y_k) = a'_ia_k , i, k = 1, 2, ..., p
The principal components refer to the uncorrelated linear combinations Y_1, Y_2, ..., Y_p which aim to have the largest possible variances.
For the random vector X'= [ X_1, X_2, ..., X_p ], if be the associated covariance matrix, then have the eigenvalue-eigenvector pairs (_1, e_1), (_2, e_2), ..., (_p, e_p), and as said _1 _2 ... _p 0.
Then the ith principal component is as follows: Y_i = e'_iX = e_i1X_1 + e_i2X_2 + ... + e_ipX_p, i = 1, 2, ..., p, where Var(Y_i) =(e'_ie_i) = _i, i = 1, 2, ..., p Cov(Y_i, Y_k) = e'_i e_i = 0, i k, and: _11 + _22 + ... + _pp = _i=1^pVar(X_i) = _1 + _2 + ... + _p = _i=1^pVar(Y_i).
Interestingly, Total population variance = _11 + _22 + ... + _pp = _1 + _2 + ... + _p.
Another issues that are significant in PCA analysis are:
The proportion of total variance due to (explained by) the kth principal component: _k(_1 + _2 + ... + _p), k=1, 2, ..., p
The correlation coefficients between the components Y_i and the variables X_k is as follows: _Y_i, X_k = e_ik_i_kk, i,k = 1, 2, ..., p
Please note that PCA can be performed on Covariance
or
correlation matrices
.
And before PCA the data should be centered, generally.
(2) https://pca4ds.github.io/biplot-and-pca.html.
(3) Johnson, R.A. and Wichern, D.W. 2007. Applied Multivariate Statistical Analysis. Pearson Prentice Hall. 773 p.
# \donttest{
data(maize)
PCA_biplot(maize)
# }
Run the code above in your browser using DataLab