RRlog: Logistic randomized response regression

Description

A dichotomous variable, measured once or more per person by a randomized response method, serves as dependent variable using one or more continuous and/or categorical predictors.

Usage

RRlog(
  formula,
  data,
  model,
  p,
  group,
  n.response = 1,
  LR.test = TRUE,
  fit.n = 3,
  EM.max = 1000,
  optim.max = 500,
  ...
)

Value

Returns an object RRlog which can be analysed by the generic method summary. In the table of coefficients, the column

Wald refers to the Chi^2 test statistic which is computed as Chi^2 = z^2 = Estimate^2/StdErr^2. If LR.test = TRUE, the test statistic

deltaG2 is the likelihood-ratio-test statistic, which is computed by fitting a nested logistic model without the corresponding predictor.

Arguments

formula: specifying the regression model, see formula
data: data.frame, in which variables can be found (optional)
model: Available RR models: "Warner", "UQTknown", "UQTunknown", "Mangat", "Kuk", "FR", "Crosswise", "Triangular", "CDM", "CDMsym", "SLD", "custom". See vignette("RRreg") for details.
p: randomization probability/probabilities (depending on model, see RRuni for details)
group: vector specifying group membership. Can be omitted for single-group RR designs (e.g., Warner). For two-group RR designs (e.g., CDM or SLD), use 1 and 2 to indicate the group membership, matching the respective randomization probabilities p[1], p[2]. If an RR design and a direct question (DQ) were both used in the study, the group indices are set to 0 (DQ) and 1 (RR; 1 or 2 for two-group RR designs). This can be used to test, whether the RR design leads to a different prevalence estimate by including a dummy variable for the question format (RR vs. DQ) as predictor. If the corresponding regression coefficient is significant, the prevalence estimates differ between RR and DQ. Similarly, interaction hypotheses can be tested (e.g., the correlation between a sensitive attribute and a predictor is only found using the RR but not the DQ design). Hypotheses like this can be tested by including the interaction of the DQ-RR-dummy variable and the predictor in formula (e.g., RR ~ dummy*predictor).
n.response: number of responses per participant, e.g., if a participant responds to 5 RR questions with the same randomization probability p (either a single number if all participants give the same number of responses or a vector)
LR.test: test regression coefficients by a likelihood ratio test, i.e., fitting the model repeatedly while excluding one parameter at a time (each nested model is fitted only once, which can result in local maxima). The likelihood-ratio test statistic \(G^2(df=1)\) is reported in the table of coefficiencts as deltaG2.
fit.n: Number of fitting replications using random starting values to avoid local maxima
EM.max: maximum number of iterations of the EM algorithm. If EM.max=0, the EM algorithm is skipped.
optim.max: Maximum number of iterations within each run of optim
...: ignored

Author

Daniel W. Heck

Details

The logistic regression model is fitted first by an EM algorithm, in which the dependend RR variable is treated as a misclassified binary variable (Magder & Hughes, 1997). The results are used as starting values for a Newton-Raphson based optimization by optim.

References

van den Hout, A., van der Heijden, P. G., & Gilchrist, R. (2007). The logistic regression model with response variables subject to randomized response. Computational Statistics & Data Analysis, 51, 6060-6069.

Examples

Run this code

# generate data set without biases
dat <- RRgen(1000, pi = .3, "Warner", p = .9)
dat$covariate <- rnorm(1000)
dat$covariate[dat$true == 1] <- rnorm(sum(dat$true == 1), .4, 1)
# analyse
ana <- RRlog(response ~ covariate, dat, "Warner", p = .9, fit.n = 1)
summary(ana)
# check with true, latent states:
glm(true ~ covariate, dat, family = binomial(link = "logit"))

Run the code above in your browser using DataLab