Fit a generalized matrix factorization (GMF) model for non-Gaussian data using either deterministic or stochastic optimization methods. It is an alternative to PCA when the observed data are binary, counts, and positive scores or, more generally, when the conditional distribution of the observations can be appropriately described using a dispersion exponential family or a quasi-likelihood model. Some examples are Gaussian, Gamma, Binomial and Poisson probability laws.
The dependence among the observations and the variables in the sample can be taken into account through appropriate row- and column-specific regression effects. The residual variability is then modeled through a low-rank matrix factorization.
For the estimation, the package implements two deterministic optimization methods, (AIRWLS and Newton) and two stochastic optimization algorithms (adaptive SGD with coordinate-wise and block-wise sub-sampling).
sgdgmf.fit(
Y,
X = NULL,
Z = NULL,
family = gaussian(),
ncomp = 2,
weights = NULL,
offset = NULL,
method = c("airwls", "newton", "sgd"),
sampling = c("block", "coord", "rnd-block"),
penalty = list(),
control.init = list(),
control.alg = list()
)
An sgdgmf
object, namely a list, containing the estimated parameters of the GMF model.
In particular, the returned object collects the following information:
method
: the selected estimation method
family
: the model family
ncomp
: rank of the latent matrix factorization
npar
: number of unknown parameters to be estimated
control.init
: list of control parameters used for the initialization
control.alg
: list of control parameters used for the optimization
control.cv
: list of control parameters used for the cross.validation
Y
: response matrix
X
: row-specific covariate matrix
Z
: column-specific covariate matrix
B
: the estimated col-specific coefficient matrix
A
: the estimated row-specific coefficient matrix
U
: the estimated factor matrix
V
: the estimated loading matrix
weights
: weighting matrix
offset
: offset matrix
eta
: the estimated linear predictor
mu
: the estimated mean matrix
var
: the estimated variance matrix
phi
: the estimated dispersion parameter
penalty
: the penalty value at the end of the optimization
deviance
: the deviance value at the end of the optimization
objective
: the penalized objective function at the end of the optimization
aic
: Akaike information criterion
bic
: Bayesian information criterion
names
: list of row and column names for all the output matrices
exe.time
: the total execution time in seconds
trace
: a trace matrix recording the optimization history
summary.cv
:
matrix of responses (\(n \times m\))
matrix of row fixed effects (\(n \times p\))
matrix of column fixed effects (\(q \times m\))
a glm
family (see family
for more details)
rank of the latent matrix factorization (default 2)
an optional matrix of weights (\(n \times m\))
an optional matrix of offset values (\(n \times m\)), that specify a known component to be included in the linear predictor
estimation method to minimize the negative penalized log-likelihood
sub-sampling strategy to use if method = "sgd"
list of penalty parameters (see set.penalty
for more details)
list of control parameters for the initialization (see set.control.init
for more details)
list of control parameters for the optimization (see set.control.alg
for more details)
Model specification
The model we consider is defined as follows. Let \(Y = (y_{ij})\) be a matrix of observed data of dimension \(n \times m\). We assume for the \((i,j)\)th observation in the matrix a dispersion exponential family law \((y_{ij} \mid \theta_{ij}) \sim EF(\theta_{ij}, \phi)\), where \(\theta_{ij}\) is the natural parameter and \(\phi\) is the dispersion parameter. Recall that the conditional probability density function of \(y_{ij}\) is given by $$f (y_{ij}; \psi) = \exp \big[ w_{ij} \{(y_{ij} \theta_{ij} - b(\theta_{ij})\} / \phi - c(y_{ij}, \phi / w_{ij}) \big],$$ where \(\psi\) is the vector of unknown parameters to be estimated, \(b(\cdot)\) is a convex twice differentiable log-partition function, and \(c(\cdot,\cdot)\) is the cumulant function of the family.
The conditional mean of \(y_{ij}\), say \(\mu_{ij}\), is then modeled as $$g(\mu_{ij}) = \eta_{ij} = x_i^\top \beta_j + \alpha_i^\top z_j + u_i^\top v_j,$$ where \(g(\cdot)\) is a bijective twice differentiable link function, \(\eta_{ij}\) is a linear predictor, \(x_i \in \mathbb{R}^p\) and \(z_j \in \mathbb{R}^q\) are observed covariate vectors, \(\beta_j \in \mathbb{R}^p\) and \(\alpha_j \in \mathbb{R}^q\) are unknown regression parameters and, finally, \(u_i \in \mathbb{R}^d\) and \(v_j \in \mathbb{R}^d\) are latent vector explaining the residual varibility not captured by the regression effects. Equivalently, in matrix form, we have \(g(\mu) = \eta = X B^\top + A Z^\top + U V^\top.\)
The natural parameter \(\theta_{ij}\) is linked to the conditional mean of \(y_{ij}\) through the equation \(E(y_{ij}) = \mu_{ij} = b'(\theta_{ij})\). Similarly, the variance of \(y_{ij}\) is given by \(\text{Var}(y_{ij}) = (\phi / w_{ij}) \,\nu(\mu_{ij}) = (\phi / w_{ij}) \,b''(\mu_{ij})\), where \(\nu(\cdot)\) is the so-called variance function of the family. Finally, we denote by \(D_\phi(y,\mu)\) the deviance function of the family, which is defined as \(D_\phi(y,\mu) = - 2 \log\{ f(y, \psi) / f_0 (y) \}\), where \(f_0(y)\) is the likelihood of the saturated model.
The estimation of the model parameters is performed by minimizing the penalized deviance function $$\displaystyle \ell_\lambda (\psi; y) = - \sum_{i = 1}^{n} \sum_{j = 1}^{m} D_\phi(y_{ij}, \mu_{ij}) + \frac{\lambda_{\scriptscriptstyle U}}{2} \| U \|_F^2 + \frac{\lambda_{\scriptscriptstyle V}}{2} \| V \|_F^2,$$ where \(\lambda_{\scriptscriptstyle U} > 0\) and \(\lambda_{\scriptscriptstyle V} > 0\) are regularization parameters and \(\|\cdot\|_F\) is the Frobenious norm. Additional \(\ell_2\) penalization terms can be introduced to regularize \(B\) and \(A\). Quasi-likelihood models can be considered as well, where \(D_\phi(y, \mu)\) is substituted by \(Q_\phi(y, \mu) = - \log (\phi/w) - (w / \phi) \int_y^\mu \{(y - t) / \nu(t) \} \,dt,\) under an appropriate specification of mean, variance and link functions.
Identifiability constraints
The GMF model is not identifiable being invariant with respect to rotation, scaling and sign-flip transformations of \(U\) and \(V\). To enforce the uniqueness of the solution, we impose the following identifiability constraints:
\(\text{Cov}(U) = U^\top (I_n - 1_n 1_n^\top / n) U / n = I_d\),
\(V\) is lower triangular, with positive diagonal entries,
where \(I_n\) and \(1_n\) are, respectively, the \(n\)-dimensional identity matrix and unitary vector.
Alternative identifiability constraints on \(U\) and \(V\) can be easily obtained by post processing. For instance, a PCA-like solution, say \(U^\top U\) is diagonal and \(V^\top V = I_d\), can by obtained by applying the truncated SVD decomposition \(U V^\top = \tilde{U} \tilde{D} \tilde{V}^\top\), and setting \(U = \tilde{U} \tilde{D}\) and \(V = \tilde{V}\).
Estimation algorithms
To obtain the penalized maximum likelihood estimate, we here employs four different algorithms
AIRWLS: alternated iterative re-weighted least squares (method="airwls"
);
Newton: quasi-Newton algorithm with diagonal Hessian (method="newton"
);
C-SGD: adaptive stochastic gradient descent with coordinate-wise sub-sampling (method="sgd", sampling="coord"
);
B-SGD: adaptive stochastic gradient descent with block-wise sub-sampling (method="sgd", sampling="block"
);
RB-SGD: as B-SGD but with an alternative rule to scan randomly the minibatch blocks (method="sgd", sampling="rnd-block"
).
Likelihood families
Currently, all standard glm
families are supported, including neg.bin
and negative.binomial
families from the MASS
package.
In such a case, the deviance function we consider takes the form
\(D_\phi(y, \mu) = 2 w \big[ y \log(y / \mu) - (y + \phi) \log\{ (y + \phi) / (\mu + \phi) \} \big]\).
This corresponds to a Negative Binomial model with variance function \(\nu(\mu) = \mu + \mu^2 / \phi\).
Then, for \(\phi \rightarrow \infty\), the Negative Binomial likelihood converges
to a Poisson likelihood, having linear variance function, say \(\nu(\mu) = \mu\).
Notice that the over-dispersion parameter, that here is denoted as \(\phi\),
in the MASS
package is referred to as \(\theta\).
If the Negative Binomial family is selected, a global over-dispersion parameter
\(\phi\) is estimated from the data using the method of moments.
Parallelization
Parallel execution is implemented in C++
using OpenMP
. When installing
and compiling the sgdGMF
package, the compiler check whether OpenMP
is installed in the system. If it is not, the package is compiled excluding all
the OpenMP
functionalities and no parallel execution is allowed at C++
level.
Notice that OpenMP
is not compatible with R
parallel computing packages,
such as parallel
and foreach
. Therefore, when parallel=TRUE
,
it is not possible to run the sgdgmf.fit
function within R
level
parallel functions, e.g., foreach
loop.
Kidzinnski, L., Hui, F.K.C., Warton, D.I. and Hastie, J.H. (2022). Generalized Matrix Factorization: efficient algorithms for fitting generalized linear latent variable models to large data arrays. Journal of Machine Learning Research, 23: 1-29.
Wang, L. and Carvalho, L. (2023). Deviance matrix factorization. Electronic Journal of Statistics, 17(2): 3762-3810.
Castiglione, C., Segers, A., Clement, L, Risso, D. (2024). Stochastic gradient descent estimation of generalized matrix factorization models with application to single-cell RNA sequencing data. arXiv preprint: arXiv:2412.20509.
set.control.init
, set.control.alg
,
sgdgmf.init
, sgdgmf.rank
,
refit.sgdgmf
, coef.sgdgmf
, resid.sgdgmf
,
fitted.sgdgmf
, predict.sgdgmf
, plot.sgdgmf
,
screeplot.sgdgmf
, biplot.sgdgmf
, image.sgdgmf
# Load the sgdGMF package
library(sgdGMF)
# Set the data dimensions
n = 100; m = 20; d = 5
# Generate data using Poisson, Binomial and Gamma models
data = sim.gmf.data(n = n, m = m, ncomp = d, family = poisson())
# Estimate the GMF parameters assuming 3 latent factors
gmf = sgdgmf.fit(data$Y, ncomp = 3, family = poisson(), method = "airwls")
# Get the fitted values in the link and response scales
mu_hat = fitted(gmf, type = "response")
# Compare the results
oldpar = par(no.readonly = TRUE)
par(mfrow = c(1,3), mar = c(1,1,3,1))
image(data$Y, axes = FALSE, main = expression(Y))
image(data$mu, axes = FALSE, main = expression(mu))
image(mu_hat, axes = FALSE, main = expression(hat(mu)))
par(oldpar)
Run the code above in your browser using DataLab