pgee.fit: Penalized Generalized Estimating Equations

Description

Estimate regression coefficients using Penalized Generalized Estimating Equations (PGEEs). Linear and binary logistic models are currently supported. In particular, can handle the case of bivariate correlated mixed outcomes, in which each cluster consists of one continuous outcome and one binary outcome.

Usage

pgee.fit(N, m, X, Z = NULL, y = NULL, yc = NULL, yb = NULL, wctype = "Ind", family = "Gaussian", lambda = 0, eps = 1e-06, maxiter = 1000, tol.coef = 0.001, tol.score = 0.001, init = NULL, standardize = TRUE, penalty = "SCAD", weights = rep(1, N), FDR = FALSE, fdr.corr = NULL, fdr.type = "all")

Arguments

Number of clusters.

Cluster size. Assumed equal across all clusters. Should be set to 2 for family=="Mixed".

Design matrix. If family=="Mixed", then design matrix for continuous responses. For family!="Mixed", should have N*m rows. For family=="Mixed", should have N rows. For standardize=TRUE, the first column should be a column vector of ones, corresponding to the intercept.

Design matrix for binary responses for family=="Mixed". Should not be provided for other family types. If not provided for family=="Mixed", is set equal to X. For family!="Mixed", should have N*m rows. For family=="Mixed", should have N rows. For standardize=TRUE, the first column should be a column vector of ones, corresponding to the intercept.

Response vector. Don't use this argument for family == "Mixed". Instead, use arguments yc and yb. Since the cluster size is assumed equal across clusters, the vector is assumed to have the form c(y_1, y_2,...,y_N), with y_i = c(y_i1,...,y_im).

Continuous response vector. Use only for family=="Mixed".

Binary (0/1) response vector. Use only for family=="Mixed".

wctype

Working correlation type; one of "Ind", "CS", or "AR1". For family=="Mixed", "CS" and "AR1" are equivalent.

family

"Gaussian", "Binomial", or "Mixed" (use the last for bivariate mixed outcomes). Note that for "Binomial", currently only binary outcomes are supported.

lambda

Tuning parameter(s). A vector of two tuning parameters should be provided for family=="Mixed" (one for the continuous outcome coefficients, and one of the binary outcome coefficients). Otherwise, a single tuning parameter should be provided.

eps

Disturbance in the Linear Quadratic Approximation algorithm.

maxiter

The maximum number of iterations the Newton algorithm tries before declaring failure to converge.

tol.coef

Converge of the Newton algorithm is declared if two conditions are met: The L1-norm of the difference of successive iterates should be less than tol.coef AND the L1-norm of the penalized score should be less than tol.score.

tol.score

See tol.coef.

init

Vector of initial values for regression coefficients. For family=="Mixed", should be c(init_c, init_b). Defaults to glm values.

standardize

Standardize the design matrices prior to estimation?

penalty

"SCAD", "MCP", or "LASSO".

weights

Vector of cluster weights. All observations in a cluster are assumed to have the same weight.

FDR

Should the false discovery rate be estimated for family=="Mixed"? Currently, FDR cannot be estimated for other family types.

fdr.corr

Association parameter to use in FDR estimation. The default is to use the association parameter estimated from the PGEEs.

fdr.type

Estimate the FDR for only the coefficients corresponding to the continuous outcomes ("continuous"), for only the coefficients corresponding to the binary outcomes ("binary"), or for all coefficients ("all", the default).

Value

A list

Details

pgee.fit estimates the regression coefficients for a single value of the tuning paramter (or a single pair of tuning parameters in the mixed outcomes case). To select optimal tuning parameter(s) via k-fold cross validation, see cv.pgee.

For bivariate mixed outcomes, the false discovery rate can be estimated.

Examples

Run this code

set.seed(100)
# Gaussian
N <- 100
m <- 10
p <- 10
y <- rnorm(N * m)
# If you want standardize = TRUE, you must provide an intercept.
X <- cbind(1, matrix(rnorm(N * m * (p - 1)), N * m, p - 1))
fit <- pgee.fit(X = X, y = y, N = N, m = m, lambda = 0.5, wctype = "CS",
            family = "Gaussian")
str(fit)
fit$coefficients
fit$vcov

# Binary
y <- sample(0:1, N*m, replace = TRUE)
fit <- pgee.fit(X = X, y = y, N = N, m = m, lambda = 0.1, wctype = "CS",
            family = "Binomial")
str(fit)
fit$coefficients
fit$vcov

# Bivariate mixed outcomes
# Generate some data
Bc <- c(2.0, 3.0, 1.5, 2.0, rep(0, times = p - 4))
Bb <- c(0.7, -0.7, -0.4, rep(0, times = p - 3))
dat <- gen_mixed_data(Bc, Bc, N, 0.5)
# Estimate regression coefficients and false discovery rate
fit <- pgee.fit(X = dat$X, yc = dat$yc, yb = dat$yb, N = N, m = 2,
            wctype = "CS", family = "Mixed", lambda = c(0.1, 0.05),
            FDR = TRUE)
str(fit)
fit$coefficients
fit$vcov

Run the code above in your browser using DataLab