PCA_biplot() creates the PCA (Principal Component
Analysis) biplot with loadings for the new index rYWAASB
for simultaneous selection of genotypes by trait and WAASB index.
It shows rYWAASB, rWAASB and rWAASBY indices (r: ranked) in a
biplot, simultaneously for a better differentiation of genotypes.
In PCA biplots controlling the color of variable using their
contrib i.e. contributions and cos2 takes place.
PCA_biplot(datap)Returns a a list of dataframes
The data set
Ali Arminian abeyran@gmail.com
PCA is a machine learning method and dimension reduction technique. It is utilized to simplify large data sets by extracting a smaller set that preserves significant patterns and trends(1). According to Johnson and Wichern (2007), a PCA explains the var-covar structure of a set of variables
X_1, X_2, ..., X_p with a less linear
combinations of such variables. Moreover the common
objective of PCA is 1) data reduction and 2) interpretation.
Biplot and PCA: The biplot is a method used to visually represent both the rows and columns of a data table. It involves approximating the table using a two-dimensional matrix product, with the aim of creating a plane that represents the rows and columns. The techniques used in a biplot typically involve an eigen decomposition, similar to the one used in PCA. It is common for the biplot to be conducted using mean-centered and scaled data(2).
Algebra of PCA: As Johnson and Wichern (2007) stated(3), if the random vector X' = {X_1, X_2,...,X_p } have the covariance matrix with eigenvalues
_1 _2 ... _p 0.
Regarding the linear combinations: Y_1 = a'_1X = a_11X_1 + a_12X_2 + ... + a_1PX_p Y_2 = a'_2X = a_21X_1 + a_22X_2 + ... + a_2pX_p ... Y_p = a'_pX = a_p1X_1 + a_p2X_2 + ... + a_ppX_p
where Var(Y_i) = a'_ia_i , i = 1, 2, ..., p Cov(Y_i, Y_k) = a'_ia_k , i, k = 1, 2, ..., p
The principal components refer to the uncorrelated linear combinations Y_1, Y_2, ..., Y_p which aim to have the largest possible variances.
For the random vector X'= [ X_1, X_2, ..., X_p ], if be the associated covariance matrix, then have the eigenvalue-eigenvector pairs (_1, e_1), (_2, e_2), ..., (_p, e_p), and as said _1 _2 ... _p 0.
Then the ith principal component is as follows: Y_i = e'_iX = e_i1X_1 + e_i2X_2 + ... + e_ipX_p, i = 1, 2, ..., p, where Var(Y_i) =(e'_ie_i) = _i, i = 1, 2, ..., p Cov(Y_i, Y_k) = e'_i e_i = 0, i k, and: _11 + _22 + ... + _pp = _i=1^pVar(X_i) = _1 + _2 + ... + _p = _i=1^pVar(Y_i).
Interestingly, Total population variance = _11 + _22 + ... + _pp = _1 + _2 + ... + _p.
Another issues that are significant in PCA analysis are:
The proportion of total variance due to (explained by) the kth principal component: _k(_1 + _2 + ... + _p), k=1, 2, ..., p
The correlation coefficients between the components Y_i and the variables X_k is as follows: _Y_i, X_k = e_ik_i_kk, i,k = 1, 2, ..., p
Please note that PCA can be performed on Covariance or
correlation matrices.
And before PCA the data should be centered, generally.
(2) https://pca4ds.github.io/biplot-and-pca.html.
(3) Johnson, R.A. and Wichern, D.W. 2007. Applied Multivariate Statistical Analysis. Pearson Prentice Hall. 773 p.
# \donttest{
data(maize)
PCA_biplot(maize)
# }
Run the code above in your browser using DataLab