multinomial: Analyzing multinomial data

Description

These functions facilite the conversion and analysis of multinomial data as as series of nested binomial data. The main function is multi, to be used in the family argument of the fitting functions. It calls binomialize, which can be called directly to check how the data are converted to nested binomial data. The fitted.HLfitlist method of the fitted generic function returns a matrix of fitted multinomial probabilities. The logLik.HLfitlist method of the logLik generic function returns a log-likelihood for the joint fits.

Usage

multi(binResponse=c("npos","nneg"),binfamily=binomial(),input="types",...)
binomialize(data,responses,sortedTypes=NULL,binResponse=c("npos","nneg"), depth=Inf,input="types")
"fitted"(object,...)
"logLik"(object,which,...)

Arguments

data

The data frame to be analyzed.

object

A list of binomial fits returned by a multinomial analysis

responses

column names of the data, such that [,] contain the multinomial response data, as levels of factor variables.

sortedTypes

Names of multinomial types, i.e. levels of the multinomial response factors. Their order determines which types are taken first to define the nested binomial samples. By default, the most common types are considered first.

binResponse

The names to be given to the number of “success” and “failures” in the binomial response.

depth

The maximum number of nested binomial responses to be generated from the multinomial data.

binfamily

The family applied to each binomial response.

input

If input="types", then the responses columns must contain factor levels of the binomial response. If input="counts", then the responses columns must contain counts of different factor levels, and the column names are the types.

which

Which element of the APHLs list to return. The default depends on the fitting method.In particular, if it was REML or one of its variants, the function returns the log restricted likelihood (exact or approximated).

...

Other arguments passed from or to other functions.

Value

binomialize returns a list of data frames appropriate for analysis as binomial response. Each data frame contains the original one plus Two columns named according to binResponse. multi returns a list.

Details

A multinomial response, say counts 17, 13, 25, 8, 3, 1 for types type1 to type6 can be represented as a series of nested binomials e.g. type1 against others (17 vs 50) then among these 50 others, type2 versus others (13 vs 37), etc. The binomialize function generates such a representation. By default the representation considers types in decreasing order of the number of positives, i.e. first type3 against others (25 vs 42), then type1 against others within these 42, etc. It stops if it has reached depth nested binomial responses. This can be modified by the sortedTypes argument, e.g. sortedTypes=c("type6","type4","type2"). binomialize returns a list of data frames which can be directly provided as a data argument for the fitting functions, with binomial response.

Alternatively, one can provide the multinomial response data frame, which will be internally converted to nested binomial data if the family argument is a call to multinomial (see examples).

For mixed models, the multinomial data can be fitted to a model with the same correlation parameters, and either the same or different variances of random effects, for all binomial responses. Which analysis is performed depends on the init.corrHLfit argument (see corrHLfit and the Examples).

Examples

Run this code

## An example considering pseudo-data at one diploid locus for 50 individuals 
set.seed(123)
genecopy1 <- sample(4,size=50,prob=c(1/2,1/4,1/8,1/8),replace=TRUE)
genecopy2 <- sample(4,size=50,prob=c(1/2,1/4,1/8,1/8),replace=TRUE)
alleles <- c("122","124","126","128")
genotypes <- data.frame(type1=alleles[genecopy1],type2=alleles[genecopy2])
## Columns "type1","type2" each contains an allele type => input is "types" (the default)
datalist <- binomialize(genotypes,responses=c("type1","type2"))

## two equivalent fits:
f1 <- HLfit(cbind(npos,nneg)~1,data=datalist, family=binomial())
f2 <- HLfit(cbind(npos,nneg)~1,data=genotypes, family=multi(responses=c("type1","type2")))
fitted(f2)

## distinct fits for spatial data
## Not run: 
# genoInSpace <- data.frame(type1=alleles[genecopy1],type2=alleles[genecopy2],x=runif(50),y=runif(50))
# ## Fitting distinct variances of random effects for each binomial response
# corrHLfit(cbind(npos,nneg)~1+Matern(1|x+y),data=genoInSpace, 
#           family=multi(responses=c("type1","type2")),
#           ranFix=list(rho=1,nu=0.5))
# ## Fitting the same variance for all binomial responses           
# corrHLfit(cbind(npos,nneg)~1+Matern(1|x+y),data=genoInSpace, 
#           family=multi(responses=c("type1","type2")),
#           ranFix=list(rho=1,nu=0.5),init.corrHLfit=list(lambda=1))
# ## End(Not run)

Run the code above in your browser using DataLab