mph.fit: Fitting MPH and HLP Models

Description

Computes maximum likelihood estimates and fit statistics for multinomial-Poisson homogeneous (MPH) and homogeneous linear predictor (HLP) models for contingency tables.

More detailed DOCUMENTATION and EXAMPLES of mph.fit are online.

Usage

mph.fit(y, h.fct = constraint, constraint = NULL, h.mean = FALSE,
        L.fct = link, link = NULL, L.mean = FALSE, X = NULL,
        strata = rep(1, length(y)), fixed.strata = "all",
        check.homog.tol = 1e-9, check.HLP.tol = 1e-9, maxiter = 100,
        step = 1, change.step.after = 0.25 * maxiter, y.eps = 0,
        iter.orig = 5, m.initial = y, norm.diff.conv = 1e-6,
        norm.score.conv = 1e-6, max.score.diff.iter = 10,
        derht.fct = NULL, derLt.fct = NULL, pdlambda = 2/3,
        verbose = FALSE)

Arguments

Vector (not matrix) of table counts.

h.fct

Function object that defines the constraint function $h(\cdot)$. It must return a column vector. h.fct can also be set to 0, in which case $h(\cdot)$ is viewed as the $0$ function, so no constraints are imposed.

By default, $h(\cdot)$ is viewed as a function of the table probabilities $p$. If h.mean is set to h.mean = TRUE, then $h(\cdot)$ is viewed as a function of the expected counts $m$.

Default: h.fct = NULL. If both h.fct = NULL and L.fct = NULL, then h.fct is set to 0 and no constraints are imposed. If h.fct = NULL and L.fct is not NULL, then h.fct will be computed as t(U) %*% L.fct.

constraint

Alias for the argument h.fct. Argument constraint is secondary to the primary argument h.fct in the following senses: If constraint and h.fct are not equal, h.fct is used.

h.mean

Logical argument. If h.mean = FALSE (the default), h.fct is treated as a function of $p$. If h.mean = TRUE, then h.fct is treated as a function of $m$.

L.fct

Function object that defines the link $L(\cdot)$ in the HLP model; it must return a column vector. Or ... L.fct = keyword, where candidate keywords include "logp" and "logm".

Entering L.fct = "logp" tells the program to create the function object as L.fct <- function(p) {log(p)}. L.fct = "logm" tells the program to (i) create the function object as L.fct <- function(m) {log(m)} and (ii) set L.mean = TRUE.

By default, L.fct is treated as a function of the table probabilities $p$ (even if the argument in the L.fct function object is m ). If L.mean is set to L.mean = TRUE, then L.fct is treated as a function of the expected counts $m$. Default: L.fct = NULL means no constraints of the form $L(p) = X\beta$ are imposed.

link

Alias for the argument L.fct. Argument link is secondary to the primary argument L.fct in the following senses: If link and L.fct are not equal, L.fct is used.

L.mean

Logical argument. If L.mean = FALSE (the default), L.fct is treated as a function of $p$. If L.mean = TRUE, L.fct is treated as a function of $m$.

HLP model design matrix, assumed to be full rank. Default: X = NULL; i.e., it is left unspecified and unused.

strata

Vector of the same length as y that gives the stratum membership identifier. Default: strata = rep(1, length(y)); i.e. the default is the single stratum (non-stratified) setting. Examples: strata = A, or strata = c(1,1,1,2,2,2,3,3,3), or strata = paste(sep = "", "A=", A, ", B=", B).

fixed.strata

The object that gives information on which stratum have fixed sample sizes. It can equal one of the keywords, fixed.strata = "all" or fixed.strata = "none", or it can be a vector of stratum membership identifiers, e.g. fixed.strata = c(1,3) or fixed.strata = c("pop1", "pop5"). Default: fixed.strata = "all".

check.homog.tol

Round-off tolerance for $Z$ homogeneity check. Default: check.homog.tol = 1e-9.

check.HLP.tol

Round-off tolerance for HLP link status check. Default: check.HLP.tol = 1e-9.

maxiter

Maximum number of iterations. Default: maxiter = 100.

step

Step-size value. Default: step = 1.

change.step.after

If the score value increases for more than change.step.after iterations in a row, then the initial step size is halved. Default: change.step.after = 0.25 * maxiter.

y.eps

Non-negative constant to be temporarily added to the original counts in y. Default: y.eps = 0.

iter.orig

Iteration at which the original counts will be used. Default: iter.orig = 5.

m.initial

Initial estimate of $m$. Default: m.initial = y. See Input Note 6 below.

norm.diff.conv

Convergence criteria value; see norm.diff in the Value section. Default: norm.diff.conv = 1e-6.

norm.score.conv

Convergence criteria value; see norm.score in the Value section. Default: norm.score.conv = 1e-6.

max.score.diff.iter

The variable score.diff.iter keeps track of how long norm.score is smaller than norm.score.conv, but norm.diff is greater than norm.diff.conv. If this is the case too long (i.e. score.diff.iter >= max.score.diff.iter), then stop the iterations because the solution likely includes at least one ML fitted value of $0$. Default: max.score.diff.iter = 10.

derht.fct

Function object that computes analytic derivative of the transpose of the constraint function $h(\cdot)$ with respect to $m$. If $h(\cdot)$ maps from $R^p$ to $R^q$ (i.e. there are $q$ constraints), then derht.fct returns a $p$-by-$q$ matrix of partial derivatives. Default: derht.fct = NULL. This means that the derivative is calculated numerically.

derLt.fct

Function object that computes analytic derivative of the transpose of the link function $L(\cdot)$ with respect to $m$. If $L(\cdot)$ maps from $R^p$ to $R^q$ (i.e. there are $q$ link components), then derLt.fct returns a $p$-by-$q$ matrix of partial derivatives. Default: derLt.fct = NULL, i.e. by default this derivative is calculated numerically.

pdlambda

The index parameter $\lambda$ in the power-divergence statistic.

verbose

Logical argument. If verbose = FALSE, do not print out iteration information. If verbose = TRUE, then iteration information is printed out. Default: verbose = FALSE.

Value

mph.fit returns a list, which includes the following objects:

Vector of counts used in the algorithm for ML estimation. Usually, this vector is identical to the observed table counts.

Vector of ML fitted values.

covm

Approximate covariance matrix of fitted values.

Vector of cell probability ML estimates.

covp

Approximate covariance matrix of cell probability estimators.

lambda

Vector of Lagrange multiplier ML estimates.

covlambda

Approximate covariance matrix of multiplier estimators.

resid

Vector of raw residuals (i.e. observed minus fitted counts).

presid

Vector of Pearson residuals.

adjresid

Vector of adjusted residuals.

covresid

Approximate covariance matrix of raw residuals.

Gsq

Likelihood ratio statistic for testing $H_{0}: h(m) = 0$ vs. $H_{1}:$ not $H_{0}$.

Xsq

Pearson's score statistic (same as Lagrange multiplier statistic) for testing $H_{0}: h(m) = 0$ vs. $H_{1}:$ not $H_{0}$.

Wsq

Generalized Wald statistic for testing $H_{0}: h(m) = 0$ vs. $H_{1}:$ not $H_{0}$.

PD.stat

Power-divergence statistic (with index parameter pdlambda) for testing $H_{0}: h(m) = 0$ vs. $H_{1}:$ not $H_{0}$.

Degrees of freedom associated with Gsq, Xsq, and PD.stat. $df = \dim(h)$.

beta

Vector of HLP model parameter ML estimates.

covbeta

Approximate covariance matrix of HLP model parameter estimators.

Vector of HLP model link ML estimates.

Lobs

Vector of HLP model observed link values, $L(y)$.

covL

Approximate covariance matrix of HLP model link estimators.

Lresid

Vector of adjusted link residuals of the form $$(L(\texttt{obs}) - L(\texttt{fitted})) / ase(L(\texttt{obs}) - L(\texttt{fitted})).$$

iter

Number of update iterations performed.

norm.diff

Distance between last and second last theta iterates (theta is the vector of log fitted values and Lagrange multipliers).

norm.score

Distance between the score vector and zero.

h.fct

Constraint function used in algorithm.

h.input.fct

Constraint function as originally input.

h.mean

Logical variable. If h.mean = FALSE, h.fct is treated as a function of $p$. If h.mean = TRUE, h.fct is treated as a function of $m$.

derht.fct

Analytic function used in algorithm that computes derivative of transpose of constraint function $h$.

L.fct

Link function used in algorithm.

L.input.fct

Link function as originally input.

L.mean

Logical variable. If L.mean = FALSE, L.fct is treated as a function of $p$. If L.mean = TRUE, L.fct is treated as a function of $m$.

derLt.fct

Analytic function used in algorithm that computes derivative of transpose of link function $L$.

HLP model design matrix used in algorithm.

Orthogonal complement of design matrix $X$.

triple

A list object containing the sampling plan triple $(Z, Z_{F}, n)$, where $Z$ is the population (or strata) matrix, $Z_{F}$ is the sampling constraint matrix, and $n$ is the collection of fixed sample sizes.

strata

strata variable used as input.

fixed.strata

The strata corresponding to fixed sample sizes.

warn.message

Message stating whether or not the original counts were used.

warn.message.score

Message stating whether or not the norm score convergence criterion was met.

warn.message.diff

Message stating whether or not the norm diff convergence criterion was met.

Details

In the following, let $y$ be the vector of contingency table counts, $p$ be the unknown vector of contingency table probabilities, $s$ be a vector of strata identifiers, and $F$ be the set of strata with a priori fixed sample sizes.

Although mph.fit can fit more general models (see below), two important special cases include:

MPH (Special-Case): $y$ is a realization of random vector $Y$, where $Y \sim MP(\gamma, p | \texttt{strata} = s, \texttt{fixed} = F)$, $h(p) = 0$.
HLP (Special-Case): $y$ is a realization of random vector $Y$, where $Y \sim MP(\gamma, p | \texttt{strata} = s, \texttt{fixed} = F)$, $L(p) = X \beta$.

Here, $h(\cdot)$ is a smooth constraint function and $L(\cdot)$ is a smooth link function. It is assumed that the constraints in $h(p) = 0$ are non-redundant so that the Jacobian, $\partial h'(p) / \partial p$, is full column rank.

The link $L(\cdot)$ is allowed to be many-to-one and row-rank deficient, so this HLP model is quite general. It is only required that the implied constraints, $U'L(p) = 0$, where $null(U') = span(X)$, are non-redundant.

Here, MP stands for the multinomial-Poisson distribution. The parameters are $\gamma$, the vector of expected sample sizes, and $p$, the vector of table probabilities.

The notation $$Y \sim MP(\gamma, p | \texttt{strata} = s, \texttt{fixed} = F)$$ means that the random vector $Y$ is composed of independent blocks of multinomial and/or Poisson random variables. The strata vector $s$ determines the blocks and $F$ determines which blocks are multinomial and which blocks comprise independent Poisson random variables. More specifically, suppose there are $K$ strata, so $s$ contains $K$ distinct strata identifiers. The components in $Y$ corresponding to $s = \texttt{identifier[k]}$ make up a block. If identifier[k] is included in $F$, then this block has a multinomial distribution and $\gamma_{k}$ is the a priori known, i.e. fixed, sample size. If identifier[k] is not in $F$, then this block comprises independent Poisson random variables and $\gamma_{k}$ is an unknown expected sample size.

Note: Given the observed counts $y$, the pair $\texttt{(strata, fixed)} = (s, F)$ contains the same information as the sampling plan triple $(Z, Z_{F}, n_{F})$ described in Lang (2004, 2005). Specifically, $Z = Z(s)$, the strata/population matrix, is determined by $s$. $Z_{F} = Z_{F}(s, F)$, the sampling constraint matrix, is determined by $s$ and $F$. $n_{F} = Z_{F}'y$ is the vector of a priori fixed sample sizes.

Special case MP distributions include...

Full Multinomial: $MP(\gamma, p | \texttt{strata = 1, fixed = "all"})$. A simple random sample of fixed size $\gamma$ is taken from a single strata (population).
Product Multinomial: $MP(\gamma, p | \texttt{strata = s, fixed = "all"})$. A stratified random sample of fixed sample sizes $\gamma = (\gamma_{1}, \ldots, \gamma_{K})'$ is taken from the $K$ strata determined by $s$.
Full Poisson: $MP(\gamma, p | \texttt{strata = 1, fixed = "none"})$. A simple random sample is taken from a single strata (population). The sample size is random and follows a Poisson distribution with unknown mean $\gamma$.
Product Poisson: $MP(\gamma, p | \texttt{strata = s, fixed = "none"})$. A stratified random sample is taken from $K$ strata. The sample sizes are all random and distributed as Poissons with unknown means in $\gamma = (\gamma_{1}, \ldots, \gamma_{K})'$.

Specifying the MP distribution in mph.fit...

The user need only enter (strata, fixed.strata), the input variables corresponding to $(s, F)$. Keywords, fixed.strata = "all" ["none"] means that all [none] of the strata have a priori fixed sample sizes.

To fit MPH (Special Case), the user must enter the counts y, the constraint function h.fct (alias constraint), and the sampling plan variables, strata and fixed.strata. Note: The user can omit the sampling plan variables if the default, multinomial sampling (strata = 1, fixed = "all"), can be assumed.

To fit HLP (Special Case), the user must enter the counts y, the link function L.fct (alias link), the model matrix X, and the sampling plan variables, strata and fixed.strata. Note: The user can omit the sampling plan variables if the default, multinomial sampling, can be assumed.

IMPORTANT: When specifying the model and creating the input objects for mph.fit, keep in mind that the interpretation of the table probabilities $p$ depends on the sampling plan!

Specifically, if the $i^{th}$ count $y_{i}$ is in block $k$ (i.e. corresponds with strata identifier[k]), then the $i^{th}$ table probability $p_{i}$ is the conditional probability defined as $p_{i}$ = probability of a Type $i$ outcome GIVEN that the outcome is one of the types in stratum $k$.

For example, in an $I$-by-$J$ table with row variable $A$ and column variable $B$, if row-stratified sampling is used, the table probabilities have the interpretation, $p_{ij} =$ prob of a Type $(i, j)$ outcome GIVEN that the outcome is one of the types in stratum $i$ (i.e. one of $(i, 1), \ldots, (i, J)$) $ = P(A = i, B = j | A = i)$ $ = P(B = j | A = i)$. For column-stratified sampling, $p_{ij} = P(A = i | B = j)$. And for non-stratified sampling, $p_{ij} = P(A = i, B = j)$.

Log-Linear Models: Log-linear models specified as $\log(p) = X\beta$, are HLP models.

As with any HLP model, $\log(p) = X\beta$ can be restated as a collection of constraints; specifically, $\log(p) = X\beta$ is equivalent to $h(p) = U'\log(p) = 0$, where $null(U') = span(X)$. Noting that $Z'p = 1$, we see that to avoid redundant constraints, $span(X)$ should contain $span(Z)$. Loosely, fixed-by-sampling-design parameters should be included.

Log-linear models of the form $\log(p) = X\beta$ are simple to fit using mph.fit. For example, > mph.fit(y, link = "logp", X = model.matrix(~ A + B)), or, equivalently, > mph.fit(y, link = function(p) {log(p)}, X = model.matrix(~ A + B)).

MORE GENERAL MPH and HLP MODELS...

Instead of $(\gamma, p)$, the MP distribution can alternatively be parameterized in terms of the vector of expected table counts, $m = E(Y)$. Formally, $(\gamma, p)$ and $m$ are in one-to-one correspondence and satisfy: $$m = Diag(Z\gamma)p,$$ and $$\gamma = Z'm, p = Diag^{-1}(ZZ'm)m.$$ Here, $Z = Z(s)$ is the $c$-by-$K$ strata/population matrix determined by strata vector $s$. Specifically, $Z_{ik} = I\{s_{i} = \texttt{identifier[k]}\}$.

The MPH (Special-Case) Model given above is a special case because it constrains the expected counts $m$ only through the table probabilities $p$. Similarly, the HLP (Special-Case) Model given above is a special case because it uses a link function that depends on $m$ only through the table probabilities $p$.

More generally, mph.fit computes maximum likelihood estimates and fit statistics for MPH and HLP models of the form...

MPH: $y$ is a realization of random vector $Y$, where $Y \sim MP(\gamma, p | \texttt{strata} = s, \texttt{fixed} = F), h(m) = 0$.
HLP: $y$ is a realization of random vector $Y$, where $Y \sim MP(\gamma, p | \texttt{strata} = s, \texttt{fixed} = F), L(m) = X\beta$.

Here, $h(\cdot)$ is a smooth constraint function that must also be $Z$ (i.e. strata $s$) homogeneous. $L(\cdot)$ is a smooth link function that must also satisfy the HLP conditions with respect to $Z$ (i.e. strata $s$) and $X$. That is,

(1) $L(\cdot)$ has HLP link status with respect to $Z$, and
(2) The implied constraint function $h(m) = U'L(m)$ is $Z$ homogeneous. Here, $null(U') = span(X)$.

Here, (1) $L(\cdot)$ has HLP link status with respect to $Z$ if, for $m = Diag(Z\gamma) p$,

(1)(a) $L(m) = a(\gamma) + L(p)$, where $a(\gamma_{1}/\gamma_{2}) - a(1) = a(\gamma_{1}) - a(\gamma_{2})$, i.e. $a(\gamma)$ has the form $C \log \gamma + \texttt{constant}$; or
(1)(b) $L(m) = G(\gamma) L(p)$, where $G(\gamma)$ is a diagonal matrix with diagonal elements that are powers of the $\gamma$ elements, i.e. $L(\cdot)$ is $Z$ homogeneous (see Lang (2004)); or
(1)(c) The components of $L(\cdot)$ are a mixture of types (a) and (b): $L_{j}(m) = a_{j}(\gamma) + L_{j}(p)$ or $L_{j}(m) = G_{j}(\gamma) L_{j}(p)$, $j = 1, \ldots, l$.

Lang (2005) described HLP models that satisfied (1)(a) and (2), but the definition of HLP models can be broadened to include those models satisfying (1) and (2). That is, HLP models can be defined so they also include models that satisfy (1)(b) and (2) or (1)(c) and (2). mph.fit uses this broader definition of HLP Model.

Note: The input variable h.mean must be set to TRUE to fit this more general MPH model. Similarly, the input variable L.mean must be set to TRUE to fit this more general HLP model. (An exception: If the link function is specified using the keyword "logm" then L.mean is automatically set to TRUE.)

Note: mph.fit carries out "necessary-condition" checks of $Z$ homogeneity of $h(\cdot)$ and HLP link status of $L(\cdot)$ for these general models.

Log-Linear Models: Log-linear models of the form $\log(m) = X\beta$ are HLP models provided the $span(X)$ contains the $span(Z)$. Loosely, provided fixed-by-design parameters are included, the log-linear model is a special case HLP model.

Log-linear models of the form $\log(m) = X\beta$ are simple to fit using mph.fit. For example, > mph.fit(y, link = "logm", X = model.matrix(~ A + B)), or, equivalently, > mph.fit(y, link = function(m) {log(m)}, L.mean = TRUE, X = model.matrix(~ A + B)).

Note: Most reasonable generalized log-linear models, which have the form $L(m) = C \log Mm = X\beta$, are also HLP models. See Lang (2005).

References

Lang, J. B. (2004) Multinomial-Poisson homogeneous models for contingency tables, Annals of Statistics, 32, 340--383.

Lang, J. B. (2005) Homogeneous linear predictor models for contingency tables, Journal of the American Statistical Association, 100, 121--134.

Examples

Run this code

# NOT RUN {
# Listed below is a collection of Basic Examples:
# https://homepage.divms.uiowa.edu/~jblang/mph.fitting/mph.basic.numerical.examples.htm

# Another collection of Less Basic Examples is online:
# https://homepage.divms.uiowa.edu/~jblang/mph.fitting/mph.numerical.examples.htm


# EXAMPLE 1. Test whether a binomial probability equals 0.5.
#
# y = (15, 22) <- Y ~ MP(gamma, p | strata = 1, fixed = "all");
# i.e. Y ~ multinomial.
#
# In other symbols,
#
# y = (15, 22) <- Y = (Y[1], Y[2]) ~ multinomial(37, p = (p[1], p[2])).
#
# GOAL: Test H0: p[1] = 0.5 vs. H1: not H0.

a1 <- mph.fit(y = c(15, 22), constraint = function(p) {p[1] - 0.5})

# Alternative specifications...
a2 <- mph.fit(y = c(15, 22), constraint = function(p) {p[1] - p[2]})
a3 <- mph.fit(y = c(15, 22), constraint = function(p) {log(p[1] / p[2])})
a4 <- mph.fit(y = c(15, 22), constraint = function(m) {m[1] - m[2]},
              h.mean = TRUE)
a5 <- mph.fit(y = c(15, 22), link = function(p) {p}, X = matrix(1, 2, 1))
a6 <- mph.fit(y = c(15, 22), link = "logm", X = matrix(1, 2, 1))

# Alternatively, assume that
#
# y = (15, 22) <- Y ~ MP(gamma, p | strata = 1, fixed = "none");
# i.e. Y ~ indep Poisson.
#
# In other symbols,
#
# y = (15, 22) <- Y = (Y[1], Y[2]), where
# Y[i] indep ~ Poisson(gamma * p[i]), i = 1, 2.
#
# GOAL: Test H0: p[1] = 0.5 vs. H1: not H0.

b1 <- mph.fit(y = c(15, 22), constraint = function(p) {p[1] - 0.5},
              fixed.strata = "none")

mph.summary(a1, TRUE)
mph.summary(b1, TRUE)


# EXAMPLE 2. Test whether a multinomial probability vector is uniform.
#            Test whether a multinomial probability vector equals a
#            specific value.
#
# y <- Y = (Y[1], ..., Y[6]) ~ MP(gamma, p | strata = 1, fixed = "all");
# i.e. Y ~ multinomial.
#
# In other symbols,
#
# y <- Y ~ multinomial(15, p = (p[1], ..., p[6]))
#
# GOAL: Test H0: p[1] = p[2] = ... = p[6] vs. H1: not H0.

y <- rmultinom(1, 15, rep(1, 6))
a1 <- mph.fit(y, L.fct = function(p) {p}, X = matrix(1, 6, 1),
              y.eps = 0.1)

# Alternative specification...
a2 <- mph.fit(y, h.fct = function(p) {as.matrix(p[-6] - 1/6)},
              y.eps = 0.1)

mph.summary(a1, TRUE)
mph.summary(a2, TRUE)

# Test whether p = (1, 2, 3, 1, 2, 3) / 12 .

p0 <- c(1, 2, 3, 1, 2, 3) / 12
b <- mph.fit(y, h.fct = function(p) {as.matrix(p[-6] - p0[-6])},
             y.eps = 0.1)
mph.summary(b, TRUE)


# EXAMPLE 3. Test whether a multinomial probability vector satisfies a
#            particular constraint.
#
# Data Source: Agresti 25:2002.
#
# y = (30, 63, 63) <- Y ~ MP(gamma, p | strata = 1, fixed = "all");
# i.e. Y ~ multinomial.
#
# In other symbols,
#
# y = (30, 63, 63) <- Y ~ multinomial(156, p = (p[1], p[2], p[3]))
#
# GOAL: Test H0: p[1] + p[2] = p[1] / (p[1] + p[2]) vs. H1: not H0.

y <- c(30, 63, 63)
h.fct <- function(p) {
    (p[1] + p[2]) - p[1] / (p[1] + p[2])
}
a <- mph.fit(y, h.fct = h.fct)
mph.summary(a, TRUE)


# EXAMPLE 4. Test of Independence in a 2-by-2 Table.
#
# y = (y[1, 1], y[1, 2], y[2, 1], y[2, 2]) = (25, 18, 13, 21)
#   <- Y ~ MP(gamma, p | strata = 1, fixed = "all");
# i.e. Y ~ multinomial.
#
# In other symbols,
# y = (y[1, 1], y[1, 2], y[2, 1], y[2, 2])
#   <- Y ~ multinomial(77, p = (p[1, 1], p[1, 2], p[2, 1], p[2, 2]))
#
# GOAL: Test H0: p[1, 1] * p[2, 2] / p[1, 2] / p[2, 1] = 1
#        vs. H1: not H0.

d <- data.frame(A = c(1, 1, 2, 2), B = c(1, 2, 1, 2),
                count = c(25, 18, 13, 21))

a1 <- mph.fit(y = d$count, h.fct = function(p)
              {log(p[1] * p[4] / p[2] / p[3])})

# Alternative specifications of independence....
a2 <- mph.fit(y = d$count, h.fct = function(p)
              {p <- matrix(p, 2, 2, byrow = TRUE);
               log(p[1, 1] * p[2, 2] / p[1, 2] / p[2, 1])})
a3 <- mph.fit(y = d$count, h.fct = function(p)
              {p[1] * p[4] / p[2] / p[3] - 1})
a4 <- mph.fit(y = d$count, h.fct = function(p)
              {p[1] / (p[1] + p[2]) - p[3] / (p[3] + p[4])})
a5 <- mph.fit(y = d$count, L.fct = "logm",
              X = model.matrix(~ A + B, data = d))

# Suppose we wished to output observed and fitted values of
# log OR, OR, and P(B = 1 | A = 1) - P(B = 1 | A = 2)...

L.fct <- function(p) {
  L <- as.matrix(c(
    log(p[1] * p[4] / p[2] / p[3]),
    p[1] * p[4] / p[2] / p[3],
    p[1] / (p[1] + p[2]) - p[3] / (p[3] + p[4])
  ))
  rownames(L) <- c("log OR", "OR",
                   "P(B = 1 | A = 1) - P(B = 1 | A = 2)")
  L
}

a6 <- mph.fit(y = d$count, h.fct = function(p)
              {log(p[1] * p[4] / p[2] / p[3])},
              L.fct = L.fct, X = diag(3))

# Unrestricted Model...
b <- mph.fit(y = d$count, L.fct = L.fct, X = diag(3))

mph.summary(a6, TRUE)
mph.summary(b, TRUE)


# EXAMPLE 5. Test of Independence in a 4-by-4 Table.
#            (Using Log-Linear Model.)
#
# Data Source: Table 2.8, Agresti, 57:2002.
#
# y <- Y ~ MP(gamma, p | strata = 1, fixed = "all");
# i.e. Y ~ multinomial.
#
# In other symbols,
# y <- Y ~ multinomial(96, p = (p[1, 1], p[1, 2], p[2, 1], p[2, 2]))
#
# GOAL: Test H0: p[1, 1] * p[2, 2] / p[1, 2] / p[2, 1] = 1 vs. H1: not H0.

d <- data.frame(Income = c("<15", "<15", "<15", "<15", "15-25", "15-25",
                           "15-25", "15-25", "25-40", "25-40", "25-40",
                           "25-40", ">40", ">40", ">40", ">40"),
                JobSatisf = c("VD", "LD", "MS", "VS", "VD", "LD", "MS", "VS",
                              "VD", "LD", "MS", "VS", "VD", "LD", "MS", "VS"),
                count = c(1, 3, 10, 6, 2, 3, 10, 7, 1, 6, 14, 12, 0, 1, 9, 11))

a <- mph.fit(y = d$count, link = "logp",
             X = model.matrix(~ Income + JobSatisf, data = d))
mph.summary(a)

# Alternatively,
b <- mph.fit(y = d$count, link = "logm",
             X = model.matrix(~ Income + JobSatisf, data = d))
mph.summary(b)


# EXAMPLE 6. Test Marginal Homogeneity in a 3-by-3 Table.
#
# Data Source: Table 10.16, Agresti, 445:2002.
#
# y <- Y ~ MP(gamma, p | strata = 1, fixed = "all");
# i.e. Y ~ multinomial.
#
# Specifically,
# y <- Y ~ multinomial(160, p = (p[1, 1], ..., p[3, 3]))
#
# GOAL: Test H0: p[1, +] = p[+, 1], p[2, +] = p[+, 2], p[3, +] = p[+, 3]
#        vs. H1: not H0.

d <- data.frame(Siskel = c("Pro", "Pro", "Pro", "Mixed", "Mixed",
                           "Mixed", "Con", "Con", "Con"),
                Ebert = c("Pro", "Mixed", "Con", "Pro", "Mixed",
                          "Con", "Pro", "Mixed", "Con"),
                count = c(64, 9, 10, 11, 13, 8, 13, 8, 24))

h.fct <- function(p){
    p.Siskel <- M.fct(d$Siskel) %*% p
    p.Ebert  <- M.fct(d$Ebert) %*% p
    as.matrix(c(p.Siskel[-3] - p.Ebert[-3]))
}
a1 <- mph.fit(y = d$count, h.fct = h.fct)
mph.summary(a1, TRUE)

# Suppose that we wish to report on the observed and fitted
# marginal probabilities.

L.fct <- function(p) {
    p.Siskel <- M.fct(d$Siskel) %*% p
    p.Ebert <- M.fct(d$Ebert) %*% p
    L <- as.matrix(c(p.Siskel, p.Ebert))
    rownames(L) <- c(paste(sep = "", "P(Siskel=", levels(as.factor(d$Siskel)), ")"),
                     paste(sep = "", "P(Ebert=", levels(as.factor(d$Ebert)), ")"))
    L
}
a2 <- mph.fit(y = d$count, h.fct = h.fct, L.fct = L.fct, X = diag(6))
mph.summary(a2, TRUE)

# M.fct(factor) %*% p gives the marginal probabilities corresponding to
# the levels of 'factor'. The marginal probabilities are ordered by the
# levels of 'factor'.
#
# Alternatively, in this rectangular table setting, we can find the
# marginal probabilities using the apply(...) function. In this case,
# the marginal probabilities are ordered as they are entered in the
# data set.

h.fct <- function(p) {
    p <- matrix(p, 3, 3, byrow = TRUE)
    p.Siskel <- apply(p, 1, sum)
    p.Ebert <- apply(p, 2, sum)
    as.matrix(c(p.Siskel[-3] - p.Ebert[-3]))
}

L.fct <- function(p) {
    p <- matrix(p, 3, 3, byrow = TRUE)
    p.Siskel <- apply(p, 1, sum)
    p.Ebert <- apply(p, 2, sum)
    L <- as.matrix(c(p.Siskel, p.Ebert))
    rownames(L) <- c("P(Siskel=Pro)", "P(Siskel=Mixed)",
                     "P(Siskel=Con)", "P(Ebert=Pro)",
                     "P(Ebert=Mixed)", "P(Ebert=Con)")
    L
}
b <- mph.fit(y = d$count, h.fct = h.fct, L.fct = L.fct, X = diag(6))


# EXAMPLE 7. Log-Linear Model for 2-by-2-by-2 Table.
#
# Data Source: Table 8.16, Agresti 347:2002
#
# y <- Y ~ MP(gamma, p | strata = 1, fixed = "all");
# i.e. Y ~ multinomial.
#
# Specifically,
#
# y <- Y ~ multinomial(621, p).
#
# The counts in y are cross-classification counts for variables
# G = Gender, I = Information Opinion, H = Health Opinion.
#
# GOAL: Fit the loglinear models [GI, GH, IH] and [G, IH].

d <- data.frame(G = c("Male", "Male", "Male", "Male",
                      "Female", "Female", "Female", "Female"),
                I = c("Support", "Support", "Oppose", "Oppose",
                      "Support", "Support", "Oppose", "Oppose"),
                H = c("Support", "Oppose", "Support", "Oppose",
                      "Support", "Oppose", "Support", "Oppose"),
                count = c(76, 160, 6, 25, 114, 181, 11, 48))

# Fit loglinear model [GI, GH, IH]...

a1 <- mph.fit(y = d$count, link = "logm",
              X = model.matrix(~ G + I + H + G:I + G:H + I:H, data = d))

# Fit loglinear model [G, IH]...

a2 <- mph.fit(y = d$count, link = "logm",
              X = model.matrix(~ G + I + H + I:H, data = d))

# Different Sampling Distribution Assumptions:
#
# Alternatively, assume
# y <- Y ~ MP(gamma, p | strata = 1, fixed = "none");
# that is, Y ~ indep Poisson.
#
# In other symbols,
# y <- Y, where Y[i] indep ~ Poisson(m[i] = gamma * p[i]).
# Here, gamma is the unknown expected sample size.

b2 <- mph.fit(y = d$count, link = "logm",
              X = model.matrix(~ G + I + H + I:H, data = d),
              fixed = "none")

# Alternatively, assume
# y <- Y ~ MP(gamma, p | strata = Gender, fixed = "all");
# that is, Y ~ prod multinomial.
#
# In other symbols,
# y <- Y = (Y[1, 1, 1], Y[1, 1, 2], ..., Y[2, 2, 2]),
# where (Y[i, 1, 1], ..., Y[i, 2, 2]) indep ~ multinomial(n[i], p[i, , ]).
# Here, p[i, j, k] = P(I = j, H = k | G = i) and n[1] = 267 and
# n[2] = 354 are the a priori fixed sample sizes for males and females.

c2 <- mph.fit(y = d$count, link = "logm",
              X = model.matrix(~ G + I + H + I:H, data = d),
              strata = d$G)

# Alternatively, assume
# y <- Y ~ MP(gamma, p | strata = Gender, fixed = "none");
# that is, Y ~ prod Poisson.
#
# In other symbols,
# y <- Y = (Y[1, 1, 1], Y[1, 1, 2], ..., Y[2, 2, 2]),
# where Y[i, j, k] indep ~ Poisson(m[i, j, k] = gamma[i] * p[i, j, k]).
# Here, p[i, j, k] = P(I = j, H = k | G = i) and gamma[1] and gamma[2] are the
# unknown expected sample sizes for males and for females.

d2 <- mph.fit(y = d$count, link = "logm",
              X = model.matrix(~ G + I + H + I:H, data = d),
              strata = d$G, fixed = "none")

cbind(a2$m, b2$m, c2$m, d2$m, sqrt(diag(a2$covm)), sqrt(diag(b2$covm)),
      sqrt(diag(c2$covm)), sqrt(diag(d2$covm)))
cbind(a2$p, b2$p, c2$p, d2$p, sqrt(diag(a2$covp)), sqrt(diag(b2$covp)),
      sqrt(diag(c2$covp)), sqrt(diag(d2$covp)))


# EXAMPLE 8. Fit Linear-by-Linear Log-Linear Model
#
# Data Source: Table 8.15, Agresti, 345:2002
#
# y <- Y ~ MP(gamma, p | strata = 1, fixed = "all");
# i.e. Y ~ multinomial.
#
# Specifically,
# y <- Y ~ multinomial(1425, p)
#
# GOAL: Assess the fit of the linear-by-linear log-linear model.

d <- list(Schooling = c("<HS", "<HS", "<HS", "HS", "HS", "HS", ">HS", ">HS", ">HS"),
          Abortion = c("Disapprove", "Middle", "Approve", "Disapprove", "Middle",
                       "Approve", "Disapprove", "Middle", "Approve"),
          count = c(209, 101, 237, 151, 126, 426, 16, 21, 138))

Schooling.score <- -1 * (d$Schooling == "<HS") +
                    0 * (d$Schooling == "HS") +
                    1 * (d$Schooling == ">HS")
Abortion.score  <- -1 * (d$Abortion == "Disapprove") +
                    0 * (d$Abortion == "Middle") +
                    1 * (d$Abortion == "Approve")

d <- data.frame(d, Schooling.score, Abortion.score)

a <- mph.fit(y = d$count, link = "logm",
             X = model.matrix(~ Schooling + Abortion +
             Schooling.score : Abortion.score, data = d))
mph.summary(a, TRUE)


# EXAMPLE 9. Marginal Standardization of a Contingency Table.
#
# Data Source: Table 8.15, Agresti 345:2002.
#
# GOAL: For a two-way table, find the standardized values of y, say y*,
# that satisfy (i) y* has the same odds ratios as y, and
#             (ii) y* has row and column totals equal to 100.
#
# Note: This is equivalent to the problem of finding the fitted values
# for the following model...
# x <- Y ~ multinomial(n, p = (p[1, 1], ..., p[3, 3]))
#      p[1, +] = p[2, +] = p[3, +] = p[+, 1] = p[+, 2] = p[+, 3] = 1/3
#      p[1, 1] * p[2, 2] / p[2, 1] / p[1, 2] = or[1, 1]
#      p[1, 2] * p[2, 3] / p[2, 2] / p[1, 3] = or[1, 2]
#      p[2, 1] * p[3, 2] / p[3, 1] / p[2, 2] = or[2, 1]
#      p[2, 2] * p[3, 3] / p[3, 2] / p[2, 3] = or[2, 2],
# where or[i, j] = y[i, j] * y[i + 1, j + 1] / y[i + 1, j] / y[i, j + 1]
# are the observed (y) odds ratios.
# If m is the vector of fitted values, then y* = m * 300 / sum(m)
# are the standardized values of y.
# Here x can be any vector of 9 counts.
# Choosing x so that the sum is 300 leads to sum(m) = 300, so that
# y* = m in this case.

d <- data.frame(Schooling = c("<HS", "<HS", "<HS", "HS", "HS", "HS", ">HS", ">HS", ">HS"),
                Abortion = c("Disapprove", "Middle", "Approve", "Disapprove", "Middle",
                             "Approve", "Disapprove", "Middle", "Approve"),
                count = c(209, 101, 237, 151, 126, 426, 16, 21, 138))

h.fct <- function(p) {
   p.Schooling <- M.fct(d$Schooling) %*% p
   p.Abortion  <- M.fct(d$Abortion) %*% p
   p <- matrix(p, 3, 3, byrow = TRUE)
   as.matrix(c(
     p.Schooling[-3] - 1/3, p.Abortion[-3] - 1/3,
     p[1, 1] * p[2, 2] / p[2, 1] / p[1, 2] - 209 * 126 / 151 / 101,
     p[1, 2] * p[2, 3] / p[2, 2] / p[1, 3] - 101 * 426 / 126 / 237,
     p[2, 1] * p[3, 2] / p[3, 1] / p[2, 2] - 151 * 21 / 16 / 126,
     p[2, 2] * p[3, 3] / p[3, 2] / p[2, 3] - 126 * 138 / 21 / 426
   ))
}

b <- mph.fit(y = d$count, h.fct = h.fct)
ystar <- b$m * 300 / sum(b$m)
matrix(round(ystar, 1), 3, 3, byrow = TRUE)

x <- c(rep(33, 8), 36)
b <- mph.fit(y = x, h.fct = h.fct)
ystar <- b$m
matrix(round(ystar, 1), 3, 3, byrow = TRUE)


# EXAMPLE 10. Cumulative Logit Model.
#
# Data Source: Table 7.19, Agresti, 306:2002.
#
# y <- Y ~ MP(gamma, p | strata = Therapy * Gender, fixed = "all");
# i.e. Y ~ prod multinomial.
#
# Here, y[i, j, k] is the cross-classification count corresponding to
# Therapy = i, Gender = j, Response = k.
#
# The table probabilities are defined as
# p[i, j, k] = P(Response = k | Therapy = i, Gender = j).
#
# Goal: Fit the cumulative logit proportional odds model that includes
# the main effect of Therapy and Gender.

d <- data.frame(Therapy = c("Sequential", "Sequential", "Sequential", "Sequential",
                            "Sequential", "Sequential", "Sequential", "Sequential",
                            "Alternating", "Alternating", "Alternating", "Alternating",
                            "Alternating", "Alternating", "Alternating", "Alternating"),
                Gender = c("Male", "Male", "Male", "Male", "Female", "Female",
                           "Female", "Female", "Male", "Male", "Male", "Male",
                           "Female", "Female", "Female", "Female"),
                Response = c("Progressive", "NoChange", "Partial", "Complete",
                             "Progressive", "NoChange", "Partial", "Complete",
                             "Progressive", "NoChange", "Partial", "Complete",
                             "Progressive", "NoChange", "Partial", "Complete"),
                count = c(28, 45, 29, 26, 4, 12, 5, 2, 41, 44, 20, 20, 12, 7, 3, 1))

strata <- paste(sep = "", d$Therapy, ".", d$Gender)
d <- data.frame(d, strata)

d3 <- subset(d, Response != "Complete")
levels(d3$Response) <- c(NA, "NoChange", "Partial", "Progressive")

L.fct <- function(p) {
   p <- matrix(p, 4, 4, byrow = TRUE)
   clogit <- c()
   for (s in 1:4) {
     clogit <- c(clogit,
                 log(sum(p[s, 1])   / sum(p[s, 2:4])),
                 log(sum(p[s, 1:2]) / sum(p[s, 3:4])),
                 log(sum(p[s, 1:3]) / sum(p[s, 4]))
     )
   }
   L <- as.matrix(clogit)
   rownames(L) <- c(paste(sep = "", "log odds(R < ", 2:4, "|",
                          d3$strata, ")"))
   L
}

a <- mph.fit(d$count, link = L.fct,
             X = model.matrix(~ -1 + Response + Therapy + Gender,
                              data = d3),
             strata = strata)

# Fit the related non-proportional odds cumulative logit model
b <- mph.fit(d$count, link = L.fct,
             X = model.matrix(~ Response + Response * Therapy +
                                Response * Gender - 1 - Therapy - Gender,
                              data = d3),
             strata = strata)

mph.summary(a, TRUE)
mph.summary(b, TRUE)

# }

Run the code above in your browser using DataLab