find.clusters
to infer genetic clusters. See 'details' section
for a succint description of the method, and
vignette("adegenet-dapc")
for a tutorial. Graphical methods for
DAPC are documented in scatter.dapc
(see
?scatter.dapc
). dapc
is a generic function performing the DAPC on the following
types of objects:
- data.frame
(only numeric data)
- matrix
(only numeric data)
-
objects (genetic markers)
-
objects (genome-wide SNPs)
These methods all return an object with class dapc
.
Functions that can be applied to these objects are (the ".dapc" can be ommitted):
- print.dapc
: prints the content of a dapc
object.
- summary.dapc
: extracts useful information from a dapc
object.
- predict.dapc
: predicts group memberships based on DAPC results.
DAPC implementation calls upon dudi.pca
from the
ade4
package (except for lda
from the MASS
package. The
predict
procedure uses predict.lda
from the
MASS
package.
as.lda
is a generic with a method for dapc
object which
converts these objects into outputs similar to that of
lda.default
.
## S3 method for class 'data.frame':
dapc(x, grp, n.pca=NULL, n.da=NULL, center=TRUE,
scale=FALSE,var.contrib=TRUE, pca.info=TRUE, pca.select=c("nbEig","percVar"),
perc.pca=NULL, ..., dudi=NULL)## S3 method for class 'matrix':
dapc(x, \ldots)
## S3 method for class 'genind':
dapc(x, pop=NULL, n.pca=NULL, n.da=NULL, scale=FALSE,
scale.method=c("sigma", "binom"), truenames=TRUE, var.contrib=TRUE,
pca.info=TRUE, pca.select=c("nbEig","percVar"), perc.pca=NULL, ...)
## S3 method for class 'genlight':
dapc(x, pop = NULL, n.pca = NULL, n.da = NULL, scale
= FALSE, var.contrib = TRUE, pca.info=TRUE, pca.select = c("nbEig", "percVar"),
perc.pca = NULL, glPca = NULL, ...)
## S3 method for class 'dudi':
dapc(x, grp, \ldots)
## S3 method for class 'dapc':
print(x, \dots)
## S3 method for class 'dapc':
summary(object, \dots)
## S3 method for class 'dapc':
predict(object, newdata, prior = object$prior, dimen,
method = c("plug-in", "predictive", "debiased"), ...)
a data.frame
, matrix
, or genind
object. For the data.frame
and matrix
arguments, only
quantitative variables should be provided.factor
indicating the group membership of
individuals; for scatter
, an optional grouping of individuals.integer
indicating the number of axes retained in the
Principal Component Analysis (PCA) step. If NULL
, interactive selection is triggered.integer
indicating the number of axes retained in the
Discriminant Analysis step. If NULL
, interactive selection is triggered.logical
indicating whether variables should be centred to
mean 0 (TRUE, default) or not (FALSE). Always TRUE for logical
indicating whether variables should be scaled
(TRUE) or not (FALSE, default). Scaling consists in dividing variables by their
(estimated) standard deviation to account for trivial differences in
variances. Further scalinlogical
indicating whether the
contribution of original variables (alleles, for logical
indicating whether information about
the prior PCA should be stored (TRUE, default) or not (FALSE). This
information is required to predict group membership of new
individuals using predict
, but makes the obcharacter
indicating the mode of selection of PCA
axes, matching either "nbEig" or "percVar". For "nbEig", the user
has to specify the number of axes retained (interactively, or via
n.pca
). For "percVar", the user hnumeric
value between 0 and 100 indicating the
minimal percentage of the total variance of the data to be expressed by the
retained axes of PCA.dapc.matrix
, arguments are to match those of
dapc.data.frame
; for dapc.genlight
, arguments passed
to glPca
glPca
object; if provided,
dimension reduction is not performed (saving computational time) but
taken directly from this object.dapc
object.character
specifying the scaling method to be used
for allele frequencies, which must match "sigma" (usual estimate of standard
deviation) or "binom" (based on binomial distribution). See scale
logical
indicating whether true (i.e., user-specified)
labels should be used in object outputs (TRUE, default) or not (FALSE).dudi
(from the ade4 package). If provided, prior PCA will be
ignored, and this object will be used as a prior step for variable
orthogonalisation.?predict.lda
.dapc
is a list with the following
components:summary.dapc
returns a list with 6 components: n.dim
(number
of retained DAPC axes), n.pop
(number of groups/populations),
assign.prop
(proportion of overall correct assignment),
assign.per.pop
(proportion of correct assignment per group),
prior.grp.size
(prior group sizes), and post.grp.size
(posterior
group sizes). DAPC does not infer genetic clusters ex nihilo; for this, see the
find.clusters
function.
scatter.dapc
, assignplot
,
compoplot
: graphics for DAPC. - find.clusters
: to identify clusters without prior.
- dapcIllus
: a set of simulated data illustrating the DAPC
## data(dapcIllus), data(eHGDP), and data(H3N2) illustrate the dapc
## see ?dapcIllus, ?eHGDP, ?H3N2
##
example(dapcIllus)
example(eHGDP)
example(H3N2)
## H3N2 EXAMPLE ##
data(H3N2)
pop(H3N2) <- factor(H3N2$other$epid)
dapc1 <- dapc(H3N2, var.contrib=FALSE, scale=FALSE, n.pca=150, n.da=5)
## remove internal segments and ellipses, different pch, add MStree
scatter(dapc1, cell=0, pch=18:23, cstar=0, mstree=TRUE, lwd=2, lty=2)
## only ellipse, custom labels
scatter(dapc1, cell=2, pch="", cstar=0, posi.da="top",
lab=paste("year
",2001:2006), axesel=FALSE, col=terrain.colors(10))
## SHOW COMPOPLOT ON MICROBOV DATA ##
data(microbov)
dapc1 <- dapc(microbov, n.pca=20, n.da=15)
compoplot(dapc1, lab="")
## EXAMPLE USING GENLIGHT OBJECTS ##
## simulate data
x <- glSim(50,4e3-50, 50, ploidy=2)
x
plot(x)
## perform DAPC
dapc1 <- dapc(x, n.pca=10, n.da=1)
dapc1
## plot results
scatter(dapc1, scree.da=FALSE)
## SNP contributions
loadingplot(dapc1$var.contr)
loadingplot(tail(dapc1$var.contr, 100), main="Loading plot - last 100 SNPs")
## USE "PREDICT" TO PREDICT GROUPS OF NEW INDIVIDUALS ##
## load data
data(sim2pop)
## we make a dataset of:
## 30 individuals from pop A
## 30 individuals from pop B
## 30 hybrids
## separate populations and make F1
temp <- seppop(sim2pop)
temp <- lapply(temp, function(e) hybridize(e,e,n=30)) # force equal popsizes
## make hybrids
hyb <- hybridize(temp[[1]], temp[[2]], n=30)
## repool data - needed to ensure allele matching
newdat <- repool(temp[[1]], temp[[2]], hyb)
pop(newdat) <- rep(c("pop A", "popB", "hyb AB"), c(30,30,30))
## perform the DAPC on the first 2 pop (60 first indiv)
dapc1 <- dapc(newdat[1:60],n.pca=5,n.da=1)
## plot results
scatter(dapc1, scree.da=FALSE)
## make prediction for the 30 hybrids
hyb.pred <- predict(dapc1, newdat[61:90])
hyb.pred
## plot the inferred coordinates (circles are hybrids)
points(hyb.pred$ind.scores, rep(.1, 30))
## look at assignment using assignplot
assignplot(dapc1, new.pred=hyb.pred)
title("30 indiv popA, 30 indiv pop B, 30 hybrids")
## image using compoplot
compoplot(dapc1, new.pred=hyb.pred, ncol=2)
title("30 indiv popA, 30 indiv pop B, 30 hybrids")
Run the code above in your browser using DataLab