Learn R Programming

AdapDiscom (version 1.0.0)

discom: DISCOM: Optimal Sparse Linear Prediction for Block-missing Multi-modality Data Without Imputation

Description

DISCOM: Optimal Sparse Linear Prediction for Block-missing Multi-modality Data Without Imputation

Usage

discom(
  beta,
  x,
  y,
  x.tuning,
  y.tuning,
  x.test,
  y.test,
  nlambda,
  nalpha,
  pp,
  robust = 0,
  standardize = TRUE,
  itcp = TRUE,
  lambda.min.ratio = NULL,
  k.value = 1.5
)

Value

The function returns a list containing the following components:

err

A multi-dimensional array storing the mean squared error (MSE) for all combinations of tuning parameters alpha and lambda.

est.error

The estimation error, calculated as the Euclidean distance between the estimated beta coefficients and the true beta (if provided).

lambda

The optimal lambda value chosen via cross-validation on the tuning set.

alpha

A vector of the optimal alpha values, also selected on the tuning set.

train.error

The mean squared error on the tuning set for the optimal parameter combination.

test.error

The mean squared error on the test set for the final, optimal model.

y.pred

The predicted values for the observations in the test set.

R2

The R-squared value, which measures the proportion of variance explained by the model on the test set.

a0

The intercept of the final model.

a1

The vector of estimated beta coefficients for the final model.

select

The number of non-zero coefficients, representing the number of selected variables.

xtx

The final regularized covariance matrix used to fit the optimal model.

fpr

The False Positive Rate (FPR) if the true beta is provided. It measures the proportion of irrelevant variables incorrectly selected.

fnr

The False Negative Rate (FNR) if the true beta is provided. It measures the proportion of relevant variables incorrectly excluded.

lambda.all

The complete vector of all lambda values tested during cross-validation.

beta.cov.lambda.max

The estimated beta coefficients using the maximum lambda value.

time

The total execution time of the function in seconds.

Arguments

beta

Vector, true beta coefficients (optional)

x

Matrix, training data

y

Vector, training response

x.tuning

Matrix, tuning data

y.tuning

Vector, tuning response

x.test

Matrix, test data

y.test

Vector, test response

nlambda

Integer, number of lambda values

nalpha

Integer, number of alpha values

pp

Vector, block sizes. Discom supports 2, 3, or 4 blocks.

robust

Integer, 0 for classical, 1 for robust estimation

standardize

Logical, whether to standardize covariates. When TRUE, uses training data mean and standard deviation to standardize tuning and test sets. When robust=1, uses Huber-robust standard deviation estimates

itcp

Logical, whether to include intercept

lambda.min.ratio

Numeric, `lambda.min.ratio` sets the smallest lambda value in the grid, expressed as a fraction of `lambda.max`—the smallest lambda for which all coefficients are zero. By default, it is `0.0001` when the number of observations (`nobs`) exceeds the number of variables (`nvars`), and `0.01` when `nobs < nvars`. Using a very small value in the latter case can lead to overfitting.

k.value

Numeric, tuning parameter for robust estimation

Examples

Run this code
# \donttest{
# Simple example with synthetic multimodal data
n <- 100
p <- 24

# Generate synthetic data with 3 blocks
set.seed(456)
x_train <- matrix(rnorm(n * p), n, p)
x_tuning <- matrix(rnorm(50 * p), 50, p)
x_test <- matrix(rnorm(30 * p), 30, p)

# True coefficients with sparse structure
beta_true <- c(rep(1.5, 4), rep(0, 4), rep(-1, 4), rep(0, 12))

# Response variables
y_train <- x_train %*% beta_true + rnorm(n, sd = 0.5)
y_tuning <- x_tuning %*% beta_true + rnorm(50, sd = 0.5)
y_test <- x_test %*% beta_true + rnorm(30, sd = 0.5)

# Block sizes (3 blocks of 8 variables each)
pp <- c(8, 8, 8)

# Run DISCOM
result <- discom(beta = beta_true,
                 x = x_train, y = y_train,
                 x.tuning = x_tuning, y.tuning = y_tuning,
                 x.test = x_test, y.test = y_test,
                 nlambda = 25, nalpha = 15, pp = pp)

# View results
print(paste("Test error:", round(result$test.error, 4)))
print(paste("R-squared:", round(result$R2, 3)))
print(paste("Variables selected:", result$select))
# }

Run the code above in your browser using DataLab