Learn R Programming

SSLfmm (version 0.1.0)

neg_loglik: Negative Log-Likelihood for Semi-Supervised FMM with a Mixed-Missingness Mechanism

Description

Computes the negative log-likelihood for a semi-supervised Gaussian mixture model under a mixed missingness mechanism (MCAR + entropy-based MAR). Assumes a covariance matrix \(\Sigma\) across all mixture components.

Usage

neg_loglik(theta, Y, m_j, Z, d2_yj, xi, alpha_k, unpack_fn)

Value

A single numeric value: the negative log-likelihood.

Arguments

theta

Numeric vector of packed model parameters to be unpacked by unpack_fn.

Y

Numeric matrix of observations (n x p).

m_j

Integer or logical vector of length n indicating missingness: 0 for observed (labeled block), 1 for unlabeled/missingness block.

Z

Integer vector of length n with class labels for labeled samples (1..g); use NA for unlabeled rows.

d2_yj

Numeric vector of length n with the entropy-like score used in the MAR mechanism (e.g., posterior entropy or any scalar proxy).

xi

Numeric length-2 vector c(xi0, xi1) for the logistic MAR model \(q_j = \text{logit}^{-1}(\xi_0 + \xi_1 d2_{yj})\).

alpha_k

Numeric scalar in (0,1), the MCAR mixing proportion in the missingness mechanism.

unpack_fn

Function that takes theta and returns a list with elements:

pi

Numeric vector of length g with mixture weights.

mu

List of length g; each element is a numeric mean vector (length p).

sigma

Shared covariance matrix (p x p).

Details

The total log-likelihood is composed of three parts:

  1. Labeled samples (\(m_j=0\)) with observed class labels \(Z_j\).

  2. Unlabeled samples attributed to MCAR with probability mass \(m_{1j}\).

  3. Unlabeled samples attributed to MAR with probability mass \(m_{2j}\).

The MAR probability for each sample is \(q_j = \text{logit}^{-1}(\xi_0 + \xi_1 d2_{yj})\). Internally, the function uses a numerically stable logSumExp.

Examples

Run this code
# Minimal example (illustrative only):
library(mvtnorm)
set.seed(1)
n <- 20; p <- 2; g <- 2
Y <- matrix(rnorm(n*p), n, p)
Z <- sample(c(1:g, rep(NA, n - g)), n, replace = TRUE)
m_j <- ifelse(is.na(Z), 1L, 0L)
d2_yj <- runif(n)
xi <- c(-1, 2)
alpha_k <- 0.4
unpack_fn <- function(theta) {
  list(pi = c(0.6, 0.4),
       mu = list(c(0,0), c(1,1)),
       sigma = diag(p))
}
theta <- numeric(1) # not used in this toy unpack_fn
neg_loglik(theta, Y, m_j, Z, d2_yj, xi, alpha_k, unpack_fn)

Run the code above in your browser using DataLab