This is the main function for fitting latent class models. It performs some checks of the pedigrees (it exits if an individual has only one
parent in the pedigree, if no children is in the pedigree or if there
are not enough individuals for parameters estimation) and of the
initial values (positivity of probabilites and their summation to
one). For models with familial dependence, the child latent class
depends on his parents classes via
triplet-transition probabilities. In the case of models without
familial dependence, it performs the classical Latent
Class Analysis (LCA) where all individuals are supposed independent
and the pedigree structure is meaningless. The EM algorithm stops when
the difference between log-likelihood is smaller then tol
that
is fixed by the user.
lca.model(ped, probs, param, optim.param, fit = TRUE,
optim.probs.indic = c(TRUE, TRUE, TRUE, TRUE), tol = 0.001,
x = NULL, var.list = NULL, famdep = TRUE, modify.init = NULL)
a matrix or data frame representing pedigrees and measurements: ped[,1]
family ID, ped[,2]
subjects ID, ped[,3]
dad ID,
ped[,4]
mom ID, ped[,5]
sex, ped[,6]
symptom
status (2: symptomatic, 1: without symptoms, 0: missing),
ped[,7:ncol(ped)]
measurements, each column corresponds to a phenotypic measurement. If the measurement distribution specified with optim.param
is multinomial, then these columns must either be of type integer
of factor
,
a list of initial probability parameters (see below
for more details). The function init.p.trans
can be
used to compute an initial value
of the component p.trans
of probs
,
a list of initial measurement distribution parameters (see below for more details). The function init.ordi
can be used to compute an initial
value of param
in the case of discrete or ordinal data (product
multinomial distribution) and init.norm
in the case of
continous data (mutivariate normal distribution),
a variable indicating how measurement distribution parameter optimization is performed (see below for more details),
a logical variable, if TRUE
, the EM algorithm is
performed, if FALSE
, only computation of weights and
log-likelihood are performed with the initial parameter values
without log-likelihood maximization,
a vector of logical values indicating which probability parameters to estimate,
a small number governing the stopping rule of the EM algorithm. Default is 0.001,
a matrix of covariates (optional), default is NULL
,
a list of integers indicating the columns of
x
containing the covariates to use for a given phenotypic measurement, default is NULL
,
a logical variable indicating if familial dependence model is used or not. Default is TRUE
. In models without familial dependence, individuals
are treated as independent and pedigree structure is meaningless. In models with familial dependence, a child class depends in his parents classes via a
triplet-transition probability,
a function to modify initial values of the EM algorithm, or NULL
, default is NULL
.
The function returns a list of 4 elements:
the Maximum Likelihood Estimator (MLE) of the
measurement distribution parameters if fit=TRUE
or the input
param
if fit=FALSE
,
the MLE of probability parameters if fit=TRUE
or the input probs
if fit=FALSE
,
an array of dimension n
(the number of individuals) times 2 times K+1
(K
being the number of latent classes in the selected
model and the K+1
th class being the unaffected class) giving
the individual posterior probabilities.
weight[i,s,c]
is the
posterior probability
that individual i
belongs to class c
when his symptom
status is s
, where
s
takes two values: 1 for symptomatic and 2 for without
symptom. In particular, all weight[,2,]
are 0 for symptomatic
individuals and all weight[,1,]
are
0 for individuals without symptoms. For missing individuals (unkown
symptom status), both weight[,1,]
and weight[,2,]
may be
greater than 0.
the maximum log-likelihood value (log-ML) if fit=TRUE
or the log-likelihood computed with the input values of param
and probs
if fit=FALSE
,
The symptom status vector (column 6 of ped
) takes value 1 for
subjects that have been
examined and show no symptoms (i.e. completely unaffected
subjects). When applying the LCA to
measurements available on all subjects, the status vector must take the
value of 2 for every individual with measurements.
probs
is a list of initial probability parameters:
For models with familial dependence:
p
a probability vector, each p[c]
is the probability that an symptomatic founder is in class c
for c>=1
,
p0
the probability that a founder without symptoms is in class 0,
p.trans
an array of dimension K
times K+1
times K+1
, where K
is the number of latent classes of
the model, and is such that p.trans[c_i,c_1,c_2]
is the
conditional probability that a symptomatic individual
i
is in class c_i
given that his parents are in classes
c_1
and c_2
,
p0connect
a vector of length K
, where
p0connect[c]
is the probability that a connector without
symptoms is in class 0
,
given that one of his parents is in class c>=1
and the other in class 0,
p.found
the probability that a founder is symptomatic,
p.child
the probability that a child is symptomatic,
For models without familial dependence, all individuals are independent:
p
a probability vector, each p[c]
is the probability that an symptomatic individual is in class c
for c>=1
,
p0
the probability that an individual without symptoms is in class 0,
p.aff
the probability that an individual is symptomatic,
param
is a list of measurement distribution parameters: the coefficients alpha
(cumulative logistic coefficients see alpha.compute
) in
the case of discrete or ordinal data, and means mu
and variances-covariances matrices sigma
in the case of continuous data,
optim.param
is a variable indicating how the measurement distribution parameter estimation of the M step is performed. Two possibilities,
optim.noconst.ordi
and optim.const.ordi
, are now available in the case of discrete or ordinal measurements, and four possibilities
optim.indep.norm
(measurements are independent, diagonal variance-covariance matrix),
optim.diff.norm
(general variance-covariance matrix but equal for all classes),
optim.equal.norm
(variance-covariance matrices are different for each class but equal variance and equal covariance for a class) and
optim.gene.norm
(general variance-covariance matrices for all classes), are now available in the case of continuous measurements,
One of the allowed values of optim.param
must be entered without quotes.
optim.probs.indic
is a vector of logical values of length 4 for
models with familial dependence and 2 for models without familial
dependence.
For models with familial dependence:
optim.probs.indic[1]
indicates whether p0
will be estimated or not,
optim.probs.indic[2]
indicates whether p0connect
will be estimated or not,
optim.probs.indic[3]
indicates whether p.found
will be estimated or not,
optim.probs.indic[4]
indicates whether p.connect
will
be estimated or not.
For models without familial dependence:
optim.probs.indic[1]
indicates whether p0
will be estimated or not,
optim.probs.indic[2]
indicates whether p.aff
will be
estimated or not.
All defaults are TRUE
. If the dataset contains only nuclear families, there is no information to estimate p0connect and p.connect, and these parameters will not be estimated, irrespective of the indicator value.
TAYEB, A. LABBE, A., BUREAU, A. and MERETTE, C. (2011) Solving Genetic Heterogeneity in Extended Families by Identifying Sub-types of Complex Diseases. Computational Statistics, 26(3): 539-560. DOI: 10.1007/s00180-010-0224-2,
LABBE, A., BUREAU, A. et MERETTE, C. (2009) Integration of Genetic Familial Dependence Structure in Latent Class Models. The International Journal of Biostatistics, 5(1): Article 6.
# NOT RUN {
#data
data(ped.ordi)
fam <- ped.ordi[,1]
#probs and param
data(param.ordi)
data(probs)
#the function applied only to two first families of ped.ordi
lca.model(ped.ordi[fam%in%1:2,],probs,param.ordi,optim.noconst.ordi,
fit=TRUE,optim.probs.indic=c(TRUE,TRUE,TRUE,TRUE),tol=0.001,x=NULL,
var.list=NULL,famdep=TRUE,modify.init=NULL)
# }
Run the code above in your browser using DataLab