rstanarm (version 2.15.3)

rstanarm-datasets: Datasets for rstanarm examples

Description

Small datasets for use in rstanarm examples and vignettes.

Arguments

Format

bball1970

Data on hits and at-bats from the 1970 Major League Baseball season for 18 players.

Source: Efron and Morris (1975).

18 obs. of 5 variables

  • Player Player's last name

  • Hits Number of hits in the first 45 at-bats of the season

  • AB Number of at-bats (45 for all players)

  • RemainingAB Number of remaining at-bats (different for most players)

  • RemainingHits Number of remaining hits

bball2006

Hits and at-bats for the entire 2006 American League season of Major League Baseball.

Source: Carpenter (2009)

302 obs. of 2 variables

  • y Number of hits

  • K Number of at-bats

kidiq

Data from a survey of adult American women and their children (a subsample from the National Longitudinal Survey of Youth).

Source: Gelman and Hill (2007)

434 obs. of 4 variables

  • kid_score Child's IQ score

  • mom_hs Indicator for whether the mother has a high school degree

  • mom_iq Mother's IQ score

  • mom_age Mother's age

mortality

Surgical mortality rates in 12 hospitals performing cardiac surgery in babies.

Source: Spiegelhalter et al. (1996).

12 obs. of 2 variables

  • y Number of deaths

  • K Number of surgeries

radon

Data on radon levels in houses in the state of Minnesota.

Source: Gelman and Hill (2007)

919 obs. of 4 variables

  • log_radon Radon measurement from the house (log scale)

  • log_uranium Uranium level in the county (log scale)

  • floor Indicator for radon measurement made on the first floor of the house (0 = basement, 1 = first floor)

  • county County name (factor)

roaches

Data on the efficacy of a pest management system at reducing the number of roaches in urban apartments.

Source: Gelman and Hill (2007)

262 obs. of 6 variables

  • y Number of roaches caught

  • roach1 Pretreatment number of roaches

  • treatment Treatment indicator

  • senior Indicator for only eldery residents in building

  • exposure2 Number of days for which the roach traps were used

tumors

Tarone (1982) provides a data set of tumor incidence in historical control groups of rats; specifically endometrial stromal polyps in female lab rats of type F344.

Source: Gelman and Hill (2007)

71 obs. of 2 variables

  • y Number of rats with tumors

  • K Number of rats

wells

A survey of 3200 residents in a small area of Bangladesh suffering from arsenic contamination of groundwater. Respondents with elevated arsenic levels in their wells had been encouraged to switch their water source to a safe public or private well in the nearby area and the survey was conducted several years later to learn which of the affected residents had switched wells.

Souce: Gelman and Hill (2007)

3020 obs. of 5 variables

  • switch Indicator for well-switching

  • arsenic Arsenic level in respondent's well

  • dist Distance (meters) from the respondent's house to the nearest well with safe drinking water.

  • association Indicator for member(s) of household participate in community organizations

  • educ Years of education (head of household)

References

Carpenter, B. (2009) Bayesian estimators for the beta-binomial model of batting ability. http://lingpipe-blog.com/2009/09/23/

Efron, B. and Morris, C. (1975) Data analysis using Stein's estimator and its generalizations. Journal of the American Statistical Association 70(350), 311--319.

Gelman, A. and Hill, J. (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press, Cambridge, UK. http://stat.columbia.edu/~gelman/arm/

Spiegelhalter, D., Thomas, A., Best, N., & Gilks, W. (1996) BUGS 0.5 Examples. MRC Biostatistics Unit, Institute of Public health, Cambridge, UK.

Tarone, R. E. (1982) The use of historical control information in testing for a trend in proportions. Biometrics 38(1):215--220.

Examples

Run this code
# NOT RUN {
# Using 'kidiq' dataset 
fit <- stan_lm(kid_score ~ mom_hs * mom_iq, data = kidiq, 
               prior = R2(location = 0.30, what = "mean"),
               # the next line is only to make the example go fast enough
               chains = 1, iter = 500, seed = 12345)
pp_check(fit, nreps = 20)
# }
# NOT RUN {
bayesplot::color_scheme_set("brightblue")
pp_check(fit, plotfun = "stat_grouped", stat = "median", 
         group = factor(kidiq$mom_hs, labels = c("No HS", "HS")))
# }
# NOT RUN {
# }

Run the code above in your browser using DataCamp Workspace