multinomial.test(observed, prob, useChisq = FALSE, MonteCarlo = FALSE, ntrial = 100000, atOnce = 1000000)
TRUE, Pearson's chisquare is used as a distance measure between observed and expected frequencies.
TRUE, the Monte Carlo approach is used.
observed, indicates how many items have been observed in each category. The model, described by the vector
prob, assigns to each category the hypothetical probability that an item falls into it. Now, if the observation is unlikely to have occurred under the assumption of the model, it is advisible to regard the model as not valid. The p-value estimates how likely the observation is, given the model. In particular, low p-values suggest that the model is not valid. The default approach used by
multinomial.testobtains the p-values by calculating the exact probabilities of all possible outcomes given
k, using the multinomial probability distribution function
dmultinomprovided by R. Then, by default, the p-value is obtained by summing the probabilities of all outcomes which are less likely than the observed outcome (or equally likely as the observed outcome), i.e. by summing all $p(i) <= p(observed)$="" (distance="" measure="" based="" on="" probabilities).="" alternatively,="" the="" p-value="" can="" be="" obtained="" by="" summing="" probabilities="" of="" all="" outcomes="" connected="" with="" a="" chisquare="" no="" smaller="" than="" actual="" observation="" chisquare).="" latter="" is="" triggered="" setting=""
useChisq = TRUE. Having a sample of size n in an experiment with k categories, the number of distinct possible outcomes is the binomial coefficient
choose(n+k-1,k-1). This number grows rapidly with increasing parameters n and k. If the parameters grow too big, numerical calculation might fail because of time or memory limitations. In this case, usage of the Monte Carlo approach provided by
multinomial.testis suggested. The Monte Carlo approach, activated by setting
MonteCarlo = TRUE, simulates withdrawal of ntrial samples of size n from the hypothetical distribution specified by the vector
prob. The default value for ntrial is
100000but might be incremented for big n and k. The advantage of the Monte Carlo approach is that memory requirements and running time are essentially determined by ntrial but not by n or k. By default, the p-value is then obtained by summing the relative frequencies of occurrence of unusual outcomes, i.e. of outcomes occurring less frequently than the observed one (or equally frequent as the observed one). Alternatively, as above, Pearson's chisquare can be used as a distance measure by setting
useChisq = TRUE. The parameter atOnce is of more technical nature, with a default value of $1000000$. This value should be decremented for computers with low memory to avoid overflow, and can be incremented for large-CPU computers to speed up calculations. The parameter is only effective for Monte Carlo calculations. =>
Read, T. R. C. and Cressie, N. A. C. (1988). Goodness-of-fit statistics for discrete multivariate data. Springer, New York.
## Load the EMT package: library(EMT) ## Input data for a three-dimensional case: observed <- c(5,2,1) # observed data: 5 items in category one, 2 items in category two, 1 item in category three prob <- c(0.25, 0.5, 0.25) # model: hypothetical probability that an item falls into category one, two, or three ## Calculate p-value using default options: out <- multinomial.test(observed, prob) # p.value = 0.0767 ## Plot the probabilities for each event: plotMultinom(out) ## Calculate p-value for the same input using Pearson's chisquare as a distance measure: out <- multinomial.test(observed, prob, useChisq = TRUE) # p.value = 0.0596 ; not the same! ## Test the hypothesis that all sides of a dice pop up with the same probability (a 6-dimesional problem): pdice = 1/6 prob <- c(pdice, pdice, pdice, pdice, pdice, pdice) # the model, determined by the hypothetical probabilities observed <- c(4, 5, 2, 7, 0, 1) # the observation consisting of 19 throws ( = sample size) out <- multinomial.test(observed, prob) # p.value = 0.0357 ; better get another dice, this one seems to be biased plotMultinom(out, showmax = 10000) # the same problem using a Monte Carlo approach: # we have about 40.000 outcomes and choose at least 400000 trials (probably to be increased): out <- multinomial.test(observed, prob, MonteCarlo = TRUE, ntrial = 400000) # p.value = 0.0343 ; takes a few minutes on a laptop with 2 GB memory, 1.5 GHz speed plotMultinom(out, showmax = 5000)