discom: DISCOM: Optimal Sparse Linear Prediction for Block-missing Multi-modality Data Without Imputation

Description

DISCOM: Optimal Sparse Linear Prediction for Block-missing Multi-modality Data Without Imputation

Usage

discom(
  beta,
  x,
  y,
  x.tuning,
  y.tuning,
  x.test,
  y.test,
  nlambda,
  nalpha,
  pp,
  robust = 0,
  standardize = TRUE,
  itcp = TRUE,
  lambda.min.ratio = NULL,
  k.value = 1.5
)

Value

The function returns a list containing the following components:

err: A multi-dimensional array storing the mean squared error (MSE) for all combinations of tuning parameters alpha and lambda.
est.error: The estimation error, calculated as the Euclidean distance between the estimated beta coefficients and the true beta (if provided).
lambda: The optimal lambda value chosen via cross-validation on the tuning set.
alpha: A vector of the optimal alpha values, also selected on the tuning set.
train.error: The mean squared error on the tuning set for the optimal parameter combination.
test.error: The mean squared error on the test set for the final, optimal model.
y.pred: The predicted values for the observations in the test set.
R2: The R-squared value, which measures the proportion of variance explained by the model on the test set.
a0: The intercept of the final model.
a1: The vector of estimated beta coefficients for the final model.
select: The number of non-zero coefficients, representing the number of selected variables.
xtx: The final regularized covariance matrix used to fit the optimal model.
fpr: The False Positive Rate (FPR) if the true beta is provided. It measures the proportion of irrelevant variables incorrectly selected.
fnr: The False Negative Rate (FNR) if the true beta is provided. It measures the proportion of relevant variables incorrectly excluded.
lambda.all: The complete vector of all lambda values tested during cross-validation.
beta.cov.lambda.max: The estimated beta coefficients using the maximum lambda value.
time: The total execution time of the function in seconds.

Arguments

beta: Vector, true beta coefficients (optional)
x: Matrix, training data
y: Vector, training response
x.tuning: Matrix, tuning data
y.tuning: Vector, tuning response
x.test: Matrix, test data
y.test: Vector, test response
nlambda: Integer, number of lambda values
nalpha: Integer, number of alpha values
pp: Vector, block sizes. Discom supports 2, 3, or 4 blocks.
robust: Integer, 0 for classical, 1 for robust estimation
standardize: Logical, whether to standardize covariates. When TRUE, uses training data mean and standard deviation to standardize tuning and test sets. When robust=1, uses Huber-robust standard deviation estimates
itcp: Logical, whether to include intercept
lambda.min.ratio: Numeric, `lambda.min.ratio` sets the smallest lambda value in the grid, expressed as a fraction of `lambda.max`—the smallest lambda for which all coefficients are zero. By default, it is `0.0001` when the number of observations (`nobs`) exceeds the number of variables (`nvars`), and `0.01` when `nobs < nvars`. Using a very small value in the latter case can lead to overfitting.
k.value: Numeric, tuning parameter for robust estimation

Examples

Run this code

# \donttest{
# Simple example with synthetic multimodal data
n <- 100
p <- 24

# Generate synthetic data with 3 blocks
set.seed(456)
x_train <- matrix(rnorm(n * p), n, p)
x_tuning <- matrix(rnorm(50 * p), 50, p)
x_test <- matrix(rnorm(30 * p), 30, p)

# True coefficients with sparse structure
beta_true <- c(rep(1.5, 4), rep(0, 4), rep(-1, 4), rep(0, 12))

# Response variables
y_train <- x_train %*% beta_true + rnorm(n, sd = 0.5)
y_tuning <- x_tuning %*% beta_true + rnorm(50, sd = 0.5)
y_test <- x_test %*% beta_true + rnorm(30, sd = 0.5)

# Block sizes (3 blocks of 8 variables each)
pp <- c(8, 8, 8)

# Run DISCOM
result <- discom(beta = beta_true,
                 x = x_train, y = y_train,
                 x.tuning = x_tuning, y.tuning = y_tuning,
                 x.test = x_test, y.test = y_test,
                 nlambda = 25, nalpha = 15, pp = pp)

# View results
print(paste("Test error:", round(result$test.error, 4)))
print(paste("R-squared:", round(result$R2, 3)))
print(paste("Variables selected:", result$select))
# }

Run the code above in your browser using DataLab