metafuse.l: fit a GLM with fusion penalty for data integraion, with a fixed lambda

Description

Fit a GLM with fusion penalty on coefficients within each covariate at given lambda.

Usage

metafuse.l(X = X, y = y, sid = sid, fuse.which = c(0:ncol(X)), family = "gaussian", intercept = TRUE, alpha = 0, lambda = lambda)

Arguments

a matrix (or vector) of predictor(s), with dimensions of N*(p-1), where N is the total sample size of the integrated dataset

a vector of response, with length N; when family="cox", y is a data frame with cloumns time and status

sid

data source ID of length N, must contain integers numbered from 1 to K

fuse.which

a vector of integers from 0 to p-1, indicating which covariates are considered for fusion; 0 corresponds to the intercept; coefficients of covariates not in this vector are homogeneously estimated across all datasets

family

response vector type, "gaussian" if y is a continuous vector, "binomial" if y is binary vector, "poisson" if y is a count vector, "cox" if y is a data frame with cloumns time and status

intercept

if TRUE, intercept will be included, default is TRUE

alpha

the ratio of sparsity penalty to fusion penalty, default is 0 (i.e., no variable selection, only fusion)

lambda

tuning parameter for fusion penalty

Value

A list containing the following items will be returned: A list containing the following items will be returned:

Details

Adaptive lasso penalty is used. See Zou (2006) for detail.

References

Lu Tang, and Peter X.K. Song. Fused Lasso Approach in Regression Coefficients Clustering - Learning Parameter Heterogeneity in Data Integration. Journal of Machine Learning Research, 17(113):1-23, 2016.

Fei Wang, Lu Wang, and Peter X.K. Song. Fused lasso with the adaptation of parameter ordering in combining multiple studies with repeated measurements. Biometrics, DOI:10.1111/biom.12496, 2016.

Examples

Run this code

########### generate data ###########
n <- 200    # sample size in each dataset (can also be a K-element vector)
K <- 10     # number of datasets for data integration
p <- 3      # number of covariates in X (including the intercept)

# the coefficient matrix of dimension K * p, used to specify the heterogeneous pattern
beta0 <- matrix(c(0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,   # beta_0 of intercept
                  0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,   # beta_1 of X_1
                  0.0,0.0,0.0,0.0,0.5,0.5,0.5,1.0,1.0,1.0),  # beta_2 of X_2
                K, p)

# generate a data set, family=c("gaussian", "binomial", "poisson", "cox")
data <- datagenerator(n=n, beta0=beta0, family="gaussian", seed=123)

# prepare the input for metafuse
y       <- data$y
sid     <- data$group
X       <- data[,-c(1,ncol(data))]

########### run metafuse ###########
# fit metafuse at a given lambda
metafuse.l(X=X, y=y, sid=sid, fuse.which=c(0,1,2), family="gaussian", intercept=TRUE,
          alpha=1, lambda=0.5)

Run the code above in your browser using DataLab