A dichotomous variable, measured once or more per person by a randomized response method, serves as dependent variable using one or more continuous and/or categorical predictors.
RRlog(
formula,
data,
model,
p,
group,
n.response = 1,
LR.test = TRUE,
fit.n = 3,
EM.max = 1000,
optim.max = 500,
...
)
Returns an object RRlog
which can be analysed by the generic
method summary
. In the table of coefficients, the column
Wald
refers to the Chi^2 test statistic which is computed as Chi^2 =
z^2 = Estimate^2/StdErr^2. If LR.test = TRUE
, the test statistic
deltaG2
is the likelihood-ratio-test statistic, which is computed by
fitting a nested logistic model without the corresponding predictor.
specifying the regression model, see formula
data.frame
, in which variables can be found (optional)
Available RR models: "Warner"
, "UQTknown"
,
"UQTunknown"
, "Mangat"
, "Kuk"
, "FR"
,
"Crosswise"
, "Triangular"
, "CDM"
, "CDMsym"
,
"SLD"
, "custom"
. See vignette("RRreg")
for details.
randomization probability/probabilities (depending on model, see
RRuni
for details)
vector specifying group membership. Can be omitted for
single-group RR designs (e.g., Warner). For two-group RR designs (e.g.,
CDM
or SLD
), use 1 and 2 to indicate the group membership,
matching the respective randomization probabilities p[1], p[2]
. If
an RR design and a direct question (DQ) were both used in the study, the
group indices are set to 0 (DQ) and 1 (RR; 1 or 2 for two-group RR
designs). This can be used to test, whether the RR design leads to a
different prevalence estimate by including a dummy variable for the
question format (RR vs. DQ) as predictor. If the corresponding regression
coefficient is significant, the prevalence estimates differ between RR and
DQ. Similarly, interaction hypotheses can be tested (e.g., the correlation
between a sensitive attribute and a predictor is only found using the RR
but not the DQ design). Hypotheses like this can be tested by including the
interaction of the DQ-RR-dummy variable and the predictor in formula
(e.g., RR ~ dummy*predictor
).
number of responses per participant, e.g., if a participant
responds to 5 RR questions with the same randomization probability p
(either a single number if all participants give the same number of
responses or a vector)
test regression coefficients by a likelihood ratio test, i.e.,
fitting the model repeatedly while excluding one parameter at a time (each
nested model is fitted only once, which can result in local maxima). The
likelihood-ratio test statistic \(G^2(df=1)\) is reported in the table of
coefficiencts as deltaG2
.
Number of fitting replications using random starting values to avoid local maxima
maximum number of iterations of the EM algorithm. If
EM.max=0
, the EM algorithm is skipped.
Maximum number of iterations within each run of optim
ignored
Daniel W. Heck
The logistic regression model is fitted first by an EM algorithm, in
which the dependend RR variable is treated as a misclassified binary
variable (Magder & Hughes, 1997). The results are used as starting values
for a Newton-Raphson based optimization by optim
.
van den Hout, A., van der Heijden, P. G., & Gilchrist, R. (2007). The logistic regression model with response variables subject to randomized response. Computational Statistics & Data Analysis, 51, 6060-6069.
anova.RRlog
for model comparisons, plot.RRlog
for plotting predicted regression curves, and vignette('RRreg')
or
https://www.dwheck.de/vignettes/RRreg.html for a
detailed description of the RR models and the appropriate definition of
p
# generate data set without biases
dat <- RRgen(1000, pi = .3, "Warner", p = .9)
dat$covariate <- rnorm(1000)
dat$covariate[dat$true == 1] <- rnorm(sum(dat$true == 1), .4, 1)
# analyse
ana <- RRlog(response ~ covariate, dat, "Warner", p = .9, fit.n = 1)
summary(ana)
# check with true, latent states:
glm(true ~ covariate, dat, family = binomial(link = "logit"))
Run the code above in your browser using DataLab