missoNet
Multi-task regression and network estimation with missing responses — no imputation required!
missoNet jointly estimates regression coefficients and the response network (precision matrix) from multi-response data where some responses are missing (MCAR/MAR/MNAR). Estimation is based on unbiased estimating equations with separate L1 regularization for coefficients and the precision matrix, enabling robust multi-trait analysis under incomplete outcomes.
Why missoNet?
- Native handling of missing responses without ad‑hoc imputation.
- Joint learning of effects (
Beta) and conditional dependency structure (Theta). - Two regularization paths with glmnet-like ergonomics.
- Reliable model selection via cross‑validation (with the 1‑SE rule) or information criteria (e.g., BIC).
- Built for scale: warm starts, parallel, and adaptive lambda grids.
If you only have a single response, classical lasso/elastic net (e.g.,
glmnet) is simpler and likely faster.
Installation
CRAN (stable)
install.packages("missoNet")GitHub (development)
# install.packages("devtools")
devtools::install_github("yixiao-zeng/missoNet", build_vignettes = TRUE)Quick start
library(missoNet)
# Example data with ~15% missing responses (MCAR)
sim <- generateData(n = 300, p = 50, q = 10, rho = 0.15, missing.type = "MCAR")
# Fit along two lambda paths; choose via BIC (no CV)
fit <- missoNet(X = sim$X, Y = sim$Z, GoF = "BIC")
# Extract estimates at the selected solution
Beta <- fit$est.min$Beta # p x q regression coefficients
Theta <- fit$est.min$Theta # q x q precision (conditional network)
# Visualize selection path
plot(fit, type = "scatter")Cross‑validation & prediction
# 5-fold CV over (lambda.beta, lambda.theta)
cvfit <- cv.missoNet(X = sim$X, Y = sim$Z, kfold = 5)
# Inspect CV heatmap and selected models (min and 1-SE variants)
plot(cvfit, type = "heatmap")
# Predict responses on new data
Y_hat <- predict(cvfit, newx = sim$X, s = "lambda.min")Tip: Try s = "lambda.1se.beta" or "lambda.1se.theta" for more conservative sparsity when available.
Parallel processing
library(parallel)
cl <- makeCluster(max(1, detectCores() - 1))
cvfit <- cv.missoNet(X = sim$X, Y = sim$Z, kfold = 5,
parallel = TRUE, cl = cl)
stopCluster(cl)Advanced usage
Custom penalty factors
# Lessen the penalty for prior-important predictors
p <- ncol(sim$X); q <- ncol(sim$Z)
beta.pen.factor <- matrix(1, p, q)
beta.pen.factor[c(1, 2), ] <- 0.1
fit <- missoNet(X = sim$X, Y = sim$Z,
beta.pen.factor = beta.pen.factor)Adaptive search (faster large runs)
fit <- missoNet(X = sim$X, Y = sim$Z,
adaptive.search = TRUE,
n.lambda.beta = 50,
n.lambda.theta = 50)Documentation
vignette("missoNet-introduction")
vignette("missoNet-cross-validation")
vignette("missoNet-case-study")If vignettes are not available from CRAN binaries on your platform, install from source using the GitHub command above with build_vignettes = TRUE.
Performance notes
- Handles substantial missingness in responses, without imputation.
- Warm starts and adaptive grids often yield 5–10× speedups in large problems.
- Scales to p > 1,000 predictors and q > 100 responses with reasonable settings.
Actual performance will depend on sparsity, signal-to-noise, and missingness mechanisms.
When to use (and not)
Great for
- Multi-trait genomic studies (eQTL, meQTL, pQTL)
- High-dimensional omics with partially observed outcomes
- Longitudinal studies with dropout
- Network inference under incomplete responses
Not ideal for
- Single-response regression (use
glmnetor similar) - Extremely sparse information (e.g., >50% missing responses across most traits)
Citation
If you use missoNet in your research, please cite:
@article{zeng2025missonet,
title = {Multivariate regression with missing response data for modelling regional DNA methylation QTLs},
author = {Zeng, Yixiao and Alam, Shomoita and Bernatsky, Sasha and Hudson, Marie and Colmegna, In{\'e}s and Stephens, David A and Greenwood, Celia MT and Yang, Archer Y},
journal = {arXiv preprint arXiv:2507.05990},
year = {2025},
url = {https://arxiv.org/abs/2507.05990}
}Contributing
Contributions and issues are welcome! Please open a discussion or pull request on the GitHub repository.
License
GPL-2. See the LICENSE file.