Performs archetypal analysis by using Principal Convex Hull Analysis (PCHA) under a full control of all algorithmic parameters.
archetypal(df, kappas, initialrows = NULL,
method = "projected_convexhull", nprojected = 2, npartition = 10,
nfurthest = 10, maxiter = 2000, conv_crit = 1e-06,
var_crit = 0.9999, verbose = TRUE, rseed = NULL, aupdate1 = 25,
aupdate2 = 10, bupdate = 10, muAup = 1.2, muAdown = 0.5,
muBup = 1.2, muBdown = 0.5, SSE_A_conv = 1e-09,
SSE_B_conv = 1e-09, save_history = FALSE, nworkers = NULL)
The data frame with dimensions n x d
The number of archetypes
The initial set of rows from data frame that will be used for starting algorithm
The method that will be used for computing initial approximation:
projected_convexhull, see find_outmost_projected_convexhull_points
convexhull, see find_outmost_convexhull_points
partitioned_convexhull, see find_outmost_partitioned_convexhull_points
furthestsum, see find_furthestsum_points
outmost, see find_outmost_points
random, a random set of kappas points will be used
The dimension of the projected subspace for find_outmost_projected_convexhull_points
The number of partitions for find_outmost_partitioned_convexhull_points
The number of times that FurthestSum
algorithm will be applied
by find_furthestsum_points
The maximum number of iterations for main algorithm application
The SSE convergence criterion of termination: iterate until |dSSE|/SSE<conv_crit
The Variance Explained (VarExpl) convergence criterion of termination: iterate until VarExpl<var_crit
If it is set to TRUE, then both initialization and iteration details are printed out
The random seed that will be used for setting initial A matrix. Useful for reproducible results.
The number of initial applications of Aupdate for improving the initially randomly selected A matrix
The number of Aupdate applications in main iteration
The number of Bupdate applications in main iteration
The factor (>1) by which muA is multiplied when it holds SSE<=SSE_old(1+SSE_A_conv)
The factor (<1) by which muA is multiplied when it holds SSE>SSE_old(1+SSE_A_conv)
The factor (>1) by which muB is multiplied when it holds SSE<=SSE_old(1+SSE_B_conv)
The factor (<1) by which muB is multiplied when it holds SSE>SSE_old(1+SSE_B_conv)
The convergence value used in SSE<=SSE_old(1+SSE_A_conv). Warning: there exists a Matlab crash sometimes after setting this to 1E-16 or lower
The convergence value used in SSE<=SSE_old(1+SSE_A_conv). Warning: there exists a Matlab crash sometimes after setting this to 1E-16 or lower
If set TRUE, then iteration history is being saved for further use
The number of logical processors that will be used for
parallel computing (usually it is the double of available physical cores).
Parallel computation is applied when asked by functions find_furthestsum_points
,
find_outmost_partitioned_convexhull_points
and find_outmost_projected_convexhull_points
.
A list with members:
BY
, the \(kappas \times d\) matrix of archetypes found
A
, the \(n \times kappas\) matrix such that Y ~ ABY or Frobenius norm ||Y-ABY|| is minimum
B
, the \(kappas \times n\) matrix such that Y ~ ABY or Frobenius norm ||Y-ABY|| is minimum
SSE
, the sum of squared error SSE = ||Y-ABY||^2
varexpl
, the Variance Explained = (SST-SSE)/SST where SST is the total sum of squares for data set matrix
initialsolution
, the initially used set of rows from data frame in order to start the algorithm
freqstable
, the frequency table for all found rows, if it is available.
iterations
, the number of main iterations done by algorithm
time
, the time in seconds that was spent from entire run
converges
, if it is TRUE, then convergence was achieved before the end of maximum allowed iterations
nAup
, the total number of times when it was SSE<=SSE_old(1+SSE_A_conv) in Aupdate processes. Useful for debugging purposes.
nAdown
, the total number of times when it was SSE>SSE_old(1+SSE_A_conv) in Aupdate processes. Useful for debugging purposes.
nBup
, the total number of times when it was SSE<=SSE_old(1+SSE_B_conv) in Bupdate processes. Useful for debugging purposes.
nBdown
, the total number of times when it was SSE>SSE_old(1+SSE_A_conv in Bupdate processes. Useful for debugging purposes.
run_results
, a list of iteration related details: SSE, varexpl, time, B, BY for all iterations done.
[1] M Morup and LK Hansen, "Archetypal analysis for machine learning and data mining", Neurocomputing (Elsevier, 2012). https://doi.org/10.1016/j.neucom.2011.06.033.
[2] Source: http://www.mortenmorup.dk/index_files/Page327.htm , last accessed 2019-06-07
# NOT RUN {
{
# }
# NOT RUN {
# Create a small 2D data set from 3 corner-points:
p1 = c(1,2);p2 = c(3,5);p3 = c(7,3)
dp = rbind(p1,p2,p3);dp
set.seed(916070)
pts = t(sapply(1:20, function(i,dp){
cc = runif(3)
cc = cc/sum(cc)
colSums(dp*cc)
},dp))
df = data.frame(pts)
colnames(df) = c("x","y")
# Run AA:
aa = archetypal(df = df, kappas = 3, verbose = FALSE, save_history = TRUE)
# Archetypes:
archs = data.frame(aa$BY)
archs
# See main results:
names(aa)
aa[c("SSE","varexpl","iterations","time")]
# See history of iterations:
names(aa$run_results)
# }
# NOT RUN {
}
# }
Run the code above in your browser using DataLab