Learn R Programming

Anthropometry (version 1.1)

archetypoids: Finding archetypoids

Description

Archetypoid algorithm. It is based on the PAM clustering algorithm. It is made up of two phases (a BUILD phase and a SWAP phase). In the BUILD phase, an initial set of archetypoids is determined. Unlike PAM, this collection is not derived in a stepwise format. Instead, it is suggested you choose the set made up of the nearest individuals returned by the archetypes function of the archetypes R package (Eugster et al. (2009)). This set can be defined in two different ways, see next section arguments. The goal of the SWAP step is the same as that of the SWAP step of PAM, but changing the objective function. The initial vector of archetypoids is attempted to be improved. This is done by exchanging selected individuals for unselected individuals and by checking whether these replacements reduce the objective function of the archetypoid analysis problem.

More details are given in Vinue et al. (2014) (submitted).

Usage

archetypoids(i,data,huge=200,step,init,ArchObj,nearest,sequ,aux)

Arguments

i
Number of archetypoids.
data
Data matrix. Each row corresponds to an observation and each column corresponds to an anthropometric variable. All variables are numeric.
huge
This is a penalization added to solve the convex least squares problems regarding the minimization problem to estimate archetypoids, see Eugster et al. (2009). Default value is 200.
step
Logical value. If TRUE, the archetypoid algorithm is executed repeatedly within stepArchetypoids. Therefore, this function requires the next argument init (but neither the ArchOb
init
Initial vector of archetypoids for the BUILD phase of the archetypoid algorithm. It is computed within stepArchetypoids. See nearest argument below for an explanation of how this vecto
ArchObj
The list returned by the stepArchetypesMod function. This function is a slight modification of the original stepArchetypes function of
nearest
Initial vector of archetypoids for the BUILD phase of the archetypoid algorithm. Required when step=FALSE. This argument is a logical value: if TRUE (FALSE), the nearest (which) vector is calculated. Both vectors contain the
sequ
Logical value. It indicates whether a sequence of archetypoids (TRUE) or only a single number of them (FALSE) is computed. It is determined by the number of archetypes computed by means of stepArchetypesMod<
aux
If sequ=FALSE, this value is equal to i-1 since for a single number of archetypoids, the list associated with the archetype object only has one element.

Value

  • A list with the following elements:

    archet: Final vector of k archetypoids.

    rss: Residual sum of squares corresponding to the final vector of k archetypoids.

    archet_ini: Vector of initial archetypoids (nearest or which).

    alphas: Alpha coefficients for the optimal vector of archetypoids.

Details

As mentioned, this algorithm is based on PAM. These types of algorithms aims to find good solutions in a short period of time, although not necessarily the best solution. Otherwise, the global minimum solution may always be obtained using as much time as it would be necessary, but this would be very inefficient computationally.

References

Vinue, G., Epifanio, I., and Alemany, S., (2014). Archetypoids: a new approach to define representative archetypal data. Submitted for publication.

Cutler, A., and Breiman, L., (1994). Archetypal Analysis, Technometrics 36, 338--347.

Epifanio, I., Vinue, G., and Alemany, S., (2013). Archetypal analysis: contributions for estimating boundary cases in multivariate accommodation problem, Computers & Industrial Engineering 64, 757--765.

Eugster, M. J., and Leisch, F., (2009). From Spider-Man to Hero - Archetypal Analysis in R, Journal of Statistical Software 30, 1--23, http://www.jstatsoft.org/.

Eugster, M. J. A., (2012). Performance profiles based on archetypal athletes, International Journal of Performance Analysis in Sport 12, 166--187.

See Also

stepArchetypesMod, archetypes, stepArchetypoids

Examples

Run this code
#SPORTIVE EXAMPLE:
#Database:
if(nzchar(system.file(package = "SportsAnalytics"))){
 data("NBAPlayerStatistics0910", package = "SportsAnalytics")
}      
mat <- NBAPlayerStatistics0910[,c("TotalMinutesPlayed","FieldGoalsMade")]
rownames(mat) <- NULL

#Calculating archetypes by using the archetype algorithm:
#Data preprocessing:
preproc <- accommodation(mat,stand=TRUE,percAccomm=1)

#For reproducing results, seed for randomness:
set.seed(4321)
#Run archetype algorithm repeatedly from 1 to 15 archetypes:
lass15 <- stepArchetypesMod(data=preproc,k=1:15,verbose=FALSE,nrep=20)
screeplot(lass15) 

#Calculating real archetypes:
i <- 3 #number of archetypoids.
res <- archetypoids(i,preproc,huge=200,step=FALSE,ArchObj=lass15,nearest=TRUE,sequ=TRUE)
arquets <- NBAPlayerStatistics0910[res[[1]],c("Name","TotalMinutesPlayed","FieldGoalsMade")]
res_which <- archetypoids(i,preproc,huge=200,step=FALSE,ArchObj=lass15,nearest=FALSE,sequ=TRUE)
arquets_eug <- NBAPlayerStatistics0910[res_which[[1]],
                  c("Name","TotalMinutesPlayed","FieldGoalsMade")]

col_pal <- RColorBrewer::brewer.pal(7, "Set1")
col_black <- rgb(0, 0, 0, 0.2)

plot(mat, pch = 1, col = col_black, xlim = c(0,3500), main = "NBA archetypal basketball         
     players \n obtained in Eugster (2012) \n and with our proposal", 
     xlab = "Total minutes played", ylab = "Field goals made")
points(mat[as.numeric(rownames(arquets)),], pch = 4, col = col_pal[1]) 
points(mat[as.numeric(rownames(arquets_eug)),], pch = 4, col = col_pal[1]) 
text(mat[as.numeric(rownames(arquets_eug)),][2,1], 
     mat[as.numeric(rownames(arquets_eug)),][2,2], 
     labels = arquets_eug[2,"Name"], pos = 4, col = "blue")
plotrix::textbox(c(50,800), 50, "Travis Diener") 
plotrix::textbox(c(2800,3500), 780, "Kevin Durant", col = "blue")
plotrix::textbox(c(2800,3500), 270, "Jason Kidd", col = "blue")
legend("topleft",c("archetypes of Eugster","archetypes of our proposal"), 
       lty= c(1,NA), pch = c(NA,22), col = c("blue","black"))


#If a specific number of archetypes is computed only:
i=3
set.seed(4321)
lass3 <- stepArchetypesMod(data=preproc,k=i,verbose=FALSE,nrep=3)
res3 <- archetypoids(i,preproc,huge=200,step=FALSE,ArchObj=lass3,nearest=TRUE,sequ=FALSE,aux=2)
arquets3 <- NBAPlayerStatistics0910[res3[[1]],c("Name","TotalMinutesPlayed","FieldGoalsMade")]


#COCKPIT DESIGN PROBLEM:
m <- dataUSAF
#Variable selection:
sel <- c(48,40,39,33,34,36)
#Changing to inches: 
mpulg <- m[,sel] / (10 * 2.54)

#Data preprocessing:
preproc <- accommodation(mpulg,TRUE,0.95,TRUE)

#For reproducing results, seed for randomness:
set.seed(2010) 
#Run archetype algorithm repeatedly from 1 to numArch archetypes:
numArch <- 10 ; nrep <- 20
lass <- stepArchetypesMod(data=preproc$data,k=1:numArch,verbose=FALSE,nrep=nrep)  
screeplot(lass)

i <- 3 #number of archetypoids.
res <- archetypoids(i,preproc$data,huge=200,step=FALSE,ArchObj=lass,nearest=TRUE,sequ=TRUE)
res_which <- archetypoids(i,preproc$data,huge=200,step=FALSE,ArchObj=lass,nearest=FALSE,sequ=TRUE)

Run the code above in your browser using DataLab