p_logreg_xerrors: Poolwise Logistic Regression with Normal Exposure Subject to Errors

Description

Assumes normal linear model for exposure given covariates, and additive normal processing errors and measurement errors acting on the poolwise mean exposure. Manuscript fully describing the approach is under review.

Usage

p_logreg_xerrors(g, y, xtilde, c = NULL, errors = "processing",
  nondiff_pe = TRUE, nondiff_me = TRUE, constant_pe = TRUE,
  prev = NULL, samp_y1y0 = NULL, approx_integral = TRUE,
  estimate_var = TRUE, start_nonvar_var = c(0.01, 1),
  lower_nonvar_var = c(-Inf, 1e-04), upper_nonvar_var = c(Inf, Inf),
  jitter_start = 0.01, hcubature_list = list(tol = 1e-08),
  nlminb_list = list(control = list(trace = 1, eval.max = 500, iter.max =
  500)), hessian_list = list(method.args = list(r = 4)),
  nlminb_object = NULL)

Arguments

Numeric vector with pool sizes, i.e. number of members in each pool.

Numeric vector with poolwise Y values, coded 0 if all members are controls and 1 if all members are cases.

xtilde

Numeric vector (or list of numeric vectors, if some pools have replicates) with Xtilde values.

Numeric matrix with poolwise C values (if any), with one row for each pool. Can be a vector if there is only 1 covariate.

errors

Character string specifying the errors that X is subject to. Choices are "neither", "processing" for processing error only, "measurement" for measurement error only, and "both".

nondiff_pe

Logical value for whether to assume the processing error variance is non-differential, i.e. the same in case pools and control pools.

nondiff_me

Logical value for whether to assume the measurement error variance is non-differential, i.e. the same in case pools and control pools.

constant_pe

Logical value for whether to assume the processing error variance is constant with pool size. If FALSE, assumption is that processing error variance increase with pool size such that, for example, the processing error affecting a pool 2x as large as another has 2x the variance.

Numeric value specifying disease prevalence, allowing for valid estimation of the intercept with case-control sampling. Can specify samp_y1y0 instead if sampling rates are known.

samp_y1y0

Numeric vector of length 2 specifying sampling probabilities for cases and controls, allowing for valid estimation of the intercept with case-control sampling. Can specify prev instead if it's easier.

approx_integral

Logical value for whether to use the probit approximation for the logistic-normal integral, to avoid numerically integrating X's out of the likelihood function.

estimate_var

Logical value for whether to return variance-covariance matrix for parameter estimates.

start_nonvar_var

Numeric vector of length 2 specifying starting value for non-variance terms and variance terms, respectively.

lower_nonvar_var

Numeric vector of length 2 specifying lower bound for non-variance terms and variance terms, respectively.

upper_nonvar_var

Numeric vector of length 2 specifying upper bound for non-variance terms and variance terms, respectively.

jitter_start

Numeric value specifying standard deviation for mean-0 normal jitters to add to starting values for a second try at maximizing the log-likelihood, should the initial call to nlminb result in non-convergence. Set to NULL for no second try.

hcubature_list

List of arguments to pass to hcubature for numerical integration. Only used if approx_integral = FALSE.

nlminb_list

List of arguments to pass to nlminb for log-likelihood maximization.

hessian_list

List of arguments to pass to hessian for approximating the Hessian matrix. Only used if estimate_var = TRUE.

nlminb_object

Object returned from nlminb in a prior call. Useful for bypassing log-likelihood maximization if you just want to re-estimate the Hessian matrix with different options.

Value

List containing:

Numeric vector of parameter estimates.
Variance-covariance matrix (if estimate_var = TRUE).
Returned nlminb object from maximizing the log-likelihood function.
Akaike information criterion (AIC).

References

Schisterman, E.F., Vexler, A., Mumford, S.L. and Perkins, N.J. (2010) "Hybrid pooled-unpooled design for cost-efficient measurement of biomarkers." Stat. Med. 29(5): 597--613.

Weinberg, C.R. and Umbach, D.M. (1999) "Using pooled exposure assessment to improve efficiency in case-control studies." Biometrics 55: 718--726.

Weinberg, C.R. and Umbach, D.M. (2014) "Correction to 'Using pooled exposure assessment to improve efficiency in case-control studies' by Clarice R. Weinberg and David M. Umbach; 55, 718--726, September 1999." Biometrics 70: 1061.

Examples

Run this code

# NOT RUN {
# Load dataset containing (Y, Xtilde, C) values for pools of size 1, 2, and
# 3. Xtilde values are affected by processing error.
data(pdat1)

# Estimate log-OR for X and Y adjusted for C, ignoring processing error
fit1 <- p_logreg_xerrors(
  g = pdat1$g,
  y = pdat1$allcases,
  xtilde = pdat1$xtilde,
  c = pdat1$c,
  errors = "neither"
)
fit1$theta.hat

# Repeat, but accounting for processing error. Closer to true log-OR of 0.5.
fit2 <- p_logreg_xerrors(
  g = pdat1$g,
  y = pdat1$allcases,
  xtilde = pdat1$xtilde,
  c = pdat1$c,
  errors = "processing"
)
fit2$theta.hat


# }

Run the code above in your browser using DataLab