PCA_biplot: The PCA biplot with loadings

Description

PCA_biplot() creates the PCA (Principal Component Analysis) biplot with loadings for the new index rYWAASB for simultaneous selection of genotypes by trait and WAASB index. It shows rYWAASB, rWAASB and rWAASBY indices (r: ranked) in a biplot, simultaneously for a better differentiation of genotypes. In PCA biplots controlling the color of variable using their contrib i.e. contributions and cos2 takes place.

Usage

PCA_biplot(datap)

Value

Returns a a list of dataframes

Arguments

datap: The data set

Author

Ali Arminian abeyran@gmail.com

Details

PCA is a machine learning method and dimension reduction technique. It is utilized to simplify large data sets by extracting a smaller set that preserves significant patterns and trends(1). According to Johnson and Wichern (2007), a PCA explains the var-covar structure of a set of variables

X_1, X_2, ..., X_p with a less linear combinations of such variables. Moreover the common objective of PCA is 1) data reduction and 2) interpretation.

Biplot and PCA: The biplot is a method used to visually represent both the rows and columns of a data table. It involves approximating the table using a two-dimensional matrix product, with the aim of creating a plane that represents the rows and columns. The techniques used in a biplot typically involve an eigen decomposition, similar to the one used in PCA. It is common for the biplot to be conducted using mean-centered and scaled data(2).

Algebra of PCA: As Johnson and Wichern (2007) stated(3), if the random vector X' = {X_1, X_2,...,X_p } have the covariance matrix with eigenvalues

_1 _2 ... _p 0.

Regarding the linear combinations: Y_1 = a'_1X = a_11X_1 + a_12X_2 + ... + a_1PX_p Y_2 = a'_2X = a_21X_1 + a_22X_2 + ... + a_2pX_p ... Y_p = a'_pX = a_p1X_1 + a_p2X_2 + ... + a_ppX_p

where Var(Y_i) = a'_ia_i , i = 1, 2, ..., p Cov(Y_i, Y_k) = a'_ia_k , i, k = 1, 2, ..., p

The principal components refer to the uncorrelated linear combinations Y_1, Y_2, ..., Y_p which aim to have the largest possible variances.

For the random vector X'= [ X_1, X_2, ..., X_p ], if be the associated covariance matrix, then have the eigenvalue-eigenvector pairs (_1, e_1), (_2, e_2), ..., (_p, e_p), and as said _1 _2 ... _p 0.

Then the ith principal component is as follows: Y_i = e'_iX = e_i1X_1 + e_i2X_2 + ... + e_ipX_p, i = 1, 2, ..., p, where Var(Y_i) =(e'_ie_i) = _i, i = 1, 2, ..., p Cov(Y_i, Y_k) = e'_i e_i = 0, i k, and: _11 + _22 + ... + _pp = _i=1^pVar(X_i) = _1 + _2 + ... + _p = _i=1^pVar(Y_i).

Interestingly, Total population variance = _11 + _22 + ... + _pp = _1 + _2 + ... + _p.

Another issues that are significant in PCA analysis are:

The proportion of total variance due to (explained by) the kth principal component: _k(_1 + _2 + ... + _p), k=1, 2, ..., p
The correlation coefficients between the components Y_i and the variables X_k is as follows: _Y_i, X_k = e_ik_i_kk, i,k = 1, 2, ..., p

Please note that PCA can be performed on Covariance or correlation matrices. And before PCA the data should be centered, generally.

References

(1) https://builtin.com

(2) https://pca4ds.github.io/biplot-and-pca.html.

(3) Johnson, R.A. and Wichern, D.W. 2007. Applied Multivariate Statistical Analysis. Pearson Prentice Hall. 773 p.

Examples

Run this code

# \donttest{
data(maize)
PCA_biplot(maize)
# }

Run the code above in your browser using DataLab