empirical_bayes: Estimate proportions of liars in multiple samples using empirical Bayes

Description

This function creates a prior by fitting a Beta distribution to the heads/N vector, using MASS::fitdistr(). The prior is then updated using data from each individual sample to give the posterior distributions.

Usage

empirical_bayes(heads, ...)
# S3 method for default
empirical_bayes(heads, N, P, ...)
# S3 method for formula
empirical_bayes(formula, data, P, subset, ...)

Arguments

heads

A vector of numbers of the good outcome reported

...

Ignored

A vector of sample sizes

Probability of bad outcome

formula

A two-sided formula of the form heads ~ group. heads is a logical vector specifying whether the "good" outcome was reported. group specifies the sample.

data

A data frame or matrix. Each row represents one individual.

subset

A logical or numeric vector specifying the subset of data to use

Value

A list with two components:

prior, the calculated empirical prior (of class densityFunction).
posterior, a list of posterior distributions (objects of class densityFunction). If heads was named, the list will have the same names.

Details

The formula interface allows calling the function directly on experimental data.

Examples

Run this code

# NOT RUN {
heads <- c(Baseline = 30, Treatment1 = 38, Treatment2 = 45)
N <- c(50, 52, 57)
res <- empirical_bayes(heads, N, P = 0.5)

compare_dists(res$posteriors$Baseline, res$posteriors$Treatment1)
plot(res$prior, ylim = c(0, 4), col = "grey", lty = 2)
plot(res$posteriors$Baseline, add = TRUE, col = "blue")
plot(res$posteriors$Treatment1, add = TRUE, col = "orange")
plot(res$posteriors$Treatment2, add = TRUE, col = "red")


# starting from raw data:
raw_data <- data.frame(
        report = sample(c("heads", "tails"),
          size = 300,
          replace = TRUE,
          prob = c(.8, .2)
        ),
        group = rep(LETTERS[1:10], each = 30)
    )
empirical_bayes(I(report == "heads") ~ group, data = raw_data, P = 0.5)
# }

Run the code above in your browser using DataLab