Data Sets and Supplemental Functions from 'OpenIntro' Textbooks and Labs

Supplemental functions and data for 'OpenIntro' resources, which includes open-source textbooks and resources for introductory statistics (<>). The package contains data sets used in our open-source textbooks along with custom plotting functions for reproducing book figures. Note that many functions and examples include color transparency; some plotting elements may not show up properly (or at all) when run in some versions of Windows operating system.



status R build

Supplemental functions and data for ‘OpenIntro’ resources, which includes open-source textbooks and resources for introductory statistics at The package contains data sets used in our open-source textbooks along with custom plotting functions for reproducing book figures. The package also contains the datasets used in OpenIntro labs. Note that many functions and examples include color transparency; some plotting elements may not show up properly (or at all) when run in some versions of Windows operating system.


You can install the released version of openintro from CRAN with:


You can install the development version of openintro from GitHub with:

# install.packages("devtools")

This package was produced as part of the OpenIntro project. For the accompanying textbook, visit A PDF of the textbook is free and paperbacks can be purchased online (royalty-free).

Questions, bugs, feature requests

You can file an issue to get help, report a bug, or make a feature request.

When filing an issue to get help or report a bug, please make a minimal reproducible example using the reprex package. If you haven’t heard of or used reprex before, you’re in for a treat! Seriously, reprex will make all of your R-question-asking endeavors easier (which is a pretty insane ROI for the five to ten minutes it’ll take you to learn what it’s all about). For additional reprex pointers, check out the Get help! section of the tidyverse site.

Before opening a new issue, be sure to search issues and pull requests to make sure the bug hasn’t been reported and/or already fixed in the development version. By default, the search will be pre-populated with is:issue is:open. You can edit the qualifiers (e.g. is:pr, is:closed) as needed. For example, you’d simply remove is:open to search all issues in the repo, open or closed.

Code of Conduct

Please note that the openintro project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Functions in openintro

Name Description
BG Add background color to a plot
AxisInPercent Build Better Looking Axis Labels for Percentages
MosaicPlot Custom Mosaic Plot
ChiSquareTail Plot upper tail in chi-square distribution
absenteeism Absenteeism from school in New South Wales
CT2DF Contingency Table to Data Frame
PlotWLine Plot data and add a regression line
COL OpenIntro Statistics colors
ami_occurrences Acute Myocardial Infarction (Heart Attack) Events
ames Housing prices in Ames, Iowa
bdims Body measurements of 507 physically active individuals.
birds Aircraft-Wildlife Collisions
babies_crawl Crawling age
age_at_mar Age at first marriage of 5,534 US women.
cherry Summary information for 31 cherry trees
babies The Child Health and Development Studies
census Random sample of 2000 U.S. Census Data
Braces Plot a Braces Symbol
ball_bearing Lifespan of ball bearings
cancer_in_dogs Cancer in dogs
bac Beer and blood alcohol content
acs12 American Community Survey, 2012
boxPlot Box plot
CCP Plot a Cartesian Coordinate Plane
densityPlot Density plot
buildAxis Axis function substitute
china Child care hours
children_gender_stereo Gender Stereotypes in 5-7 year old Children
diabetes2 Type 2 Diabetes Clinical Trial for Patients 10-17 Years Old
contTable Generate Contingency Tables for LaTeX
coast_starlight Coast Starlight Amtrak train
cards Deck of cards
dotPlot Dot plot
dlsegments Create a Double Line Segment Plot
cle_sac Cleveland and Sacramento
drone_blades Quadcopter Drone Blades
elmhurst Elmhurst College gift aid
climate70 Temperature Summary Data, Geography Limited
drug_use Drug use of students and parents
corr_match Sample data sets for correlation problems
country_iso Country ISO information
arbuthnot Male and female births in London
antibiotics Pre-existing conditions in 92 children
email Data frame representing information about a collection of emails
env_regulation American Adults on Regulation and Renewable Energy
email50 Sample of 50 emails
cpr CPR data set
credits College credits.
fadeColor Fade colors
ebola_survey Survey on Ebola quarantine
ask How important is it to ask pointed questions?
assortive_mating Eye color of couples
association Simulated data for association plots
evals Professor evaluations and beauty
ethanol Ethanol Treatment for Tumors Experiment
family_college Simulated sample of parent / teen college attendance
cchousing Community college housing (simulated data)
friday Friday the 13th
edaPlot Exploratory data analysis plot
helium Helium football
global_warming_pew Pew survey on global warming
fish_oil_18 Findings on n-3 Fatty Acid Supplement Health Benefits
fheights Female college student heights, in inches
avandia Cardiovascular problems for two types of Diabetes medicines
cars93 cars93
goog Google stock data
house United States House of Representatives historical make-up
infmortrate Infant Mortality Rates, 2012
gov_poll Pew Research poll on government approval ratings
books Sample of books on a shelf
births North Carolina births
mammals Sleep in Mammals
housing Simulated data set on student housing
full_body_scan Poll about use of full-body airport scanners
immigration Poll on illegal workers in the US
gpa Survey of Duke students on GPA, studying, and more
helmet Socioeconomic status and reduced-fee school lunches
mammogram Experiment with Mammogram Randomized
midterms_house President's party performance and unemployment rate
lab_report lab_report
law_resume Gender, Socioeconomic Class, and Interview Invites
male_heights Sample of 100 male heights
gear_company Fake data for a gear company example
exams Exam scores
gender_discrimination Bank manager recommendations based on gender
hfi Absenteeism
normTail Normal distribution tails
migraine Migraines and acupuncture
nuclear_survey Nuclear Arms Reduction Survey
photo_classify Photo classifications: fashion or not
piracy Piracy and PIPA/SOPA
histPlot Histogram or hollow histogram
military US Military Demographics
playing_cards Table of Playing Cards in 52-Card Deck
mlb Salary data for Major League Baseball (2010)
male_heights_fcid Random sample of adult male heights
calc_streak Calculate hit streaks
exclusive_relationship Number of Exclusive Relationships
gpa_iq Sample of students and their GPA and IQ
burger Burger preferences
ppp_201503 US Poll on who it is better to raise taxes on
ncbirths North Carolina births
present Birth counts
nba_players_19 NBA Players for the 2018-2019 season
gradestv Simulated data for analyzing the relationship between watching TV and grades
cia_factbook CIA Factbook Details on Countries
classdata Simulated class data
pm25_2011_durham Air quality for Durham, NC
simulated_dist Simulated data sets, not necessarily drawn from a normal distribution.
simulated_normal Simulated data sets, drawn from a normal distribution.
president United States Presidental History
gsearch Simulated Google search experiment
husbands_wives Great Britain: husband and wife pairs
sp500_1950_2018 Daily observations for the S\&P 500
sp500_seq S&P 500 stock data
hsb2 High School and Beyond survey
lsegments Create a Line Segment Plot
epa2012 Vehicle info from the EPA
gss2010 2010 General Social Survey
gpa_study_hours gpa_study_hours
loop Output a message while inside a loop
prison Prison isolation experiment
dotPlotStack Add a Stacked Dot Plot to an Existing Plot
dream Survey on views of the DREAM Act
london_boroughs London Borough Boundaries
esi Environmental Sustainability Index 2005
health_coverage Health Coverage and Health Status
marathon New York City Marathon Times
lmPlot Linear regression plot with residual plot
ipo Facebook, Google, and LinkedIn IPO filings
fastfood Nutrition in fast food
mail_me Influence of a Good Mood on Helpfulness
mariokart Wii Mario Kart auctions from Ebay
ipod Length of songs on an iPod
loans_full_schema Loan data from Lending Club
openintro-package openintro: Data Sets and Supplemental Functions from 'OpenIntro' Textbooks and Labs
mtl Medial temporal lobe (MTL) and other data for 26 participants
murders Data for 20 metropolitan areas.
fcid Summary of male heights from USDA Food Commodity Intake Database
major_survey Survey of Duke students and the area of their major
smoking UK Smoking Data
london_murders London Murders, 2006-2011
poker Poker winnings during 50 sessions
get_it_dunn_run Get it Dunn Run, Race Times
orings 1986 Challenger disaster and O-rings
oscars Oscar winners, 1929 to 2018
stent30 Stents for the treatment of stroke
socialexp Social experiment
mlb_players_18 Batter Statistics for 2018 Major League Baseball (MLB) Season
outliers Simulated data sets for different types of outliers
res_demo_2 Simulated data for regression
possum possum
offshore_drilling California poll on drilling off the California coast
stocks_18 Monthly Returns for a few stocks
mlbbat10 Major League Baseball Player Hitting Statistics for 2010
gifted Analytical skills of young gifted children
prius_mpg User reported fuel efficiency for 2017 Toyota Prius Prime
nycflights Flights data
penny_ages Penny Ages
pew_energy_2018 Pew Survey on Energy Sources in 2018
qqnormsim Generate simulated QQ plots
healthcare_law_survey Pew Research Center poll on health care, including question variants
resume Which resume attributes drive job callbacks? (Race and gender under study.)
thanksgiving_spend Thanksgiving spending, simulated based on Gallup poll.
satgpa SAT and GPA data
tips Tip data
res_demo_1 Simulated data for regression
sat_improve Simulated data for SAT score improvement
heart_transplant Heart Transplant Data
jury Simulated juror data set
prof_evals Professor evaluations and beauty
russian_influence_on_us_election_2016 Russians' Opinions on US Election Influence in 2016
rosling_responses Sample Responses to Two Public Health Questions
speed_gender_height Speed, gender, and height of 1325 students
scotus_healthcare Public Opinion with SCOTUS ruling on American Healthcare Act
student_sleep Sleep for 110 students (simulated)
student_housing Community college housing (simulated data, 2015)
starbucks Starbucks nutrition
stats_scores Final exam scores for twenty students
seattlepets Names of pets in Seattle
kobe_basket Kobe Bryant basketball performance
treeDiag Construct tree diagrams
leg_mari Legalization of Marijuana Support in 2010 California Survey
smallpox Smallpox vaccine results
stem_cell Embryonic stem cells to treat heart attack (in sheep)
sleep_deprivation Survey on sleep deprivation and transportation workers
textbooks Textbook data for UCLA Bookstore and Amazon
teacher Teacher Salaries in St. Louis, Michigan
ukdemo United Kingdom Demographic Data
unempl Annual unemployment since 1890
winery_cars Time Between Gondola Cars at Sterling Winery
ucla_f18 UCLA courses in Fall 2018
ucla_textbooks_f18 Sample of UCLA course textbooks for Fall 2018
linResPlot Create simple regression plot with residual plot
xom Exxon Mobile stock data
toohey Simulated polling data set
tourism Turkey tourism
myPDF Custom PDF function
yrbss_samp Sample of Youth Risk Behavior Surveillance System (YRBSS)
unemploy_pres President's party performance and unemployment rate
nba_heights NBA Player heights from 2008-9
penelope Guesses at the weight of Penelope (a cow)
makeTube Regression tube
malaria Malaria Vaccine Trial
penetrating_oil What's the best way to loosen a rusty bolt?
simulated_scatter Simulated data for sample scatterplots
sinusitis Sinusitis and antibiotic experiment
supreme_court Supreme Court approval rating
sulphinpyrazone Treating heart attacks
solar Energy Output From Two Solar Arrays in San Francisco
toy_anova Simulated data set for ANOVA
sp500 Financial information for 50 S&P 500 companies
yawn Contagiousness of yawning
transplant Transplant consultant success rate (fake data)
yrbss Youth Risk Behavior Surveillance System (YRBSS)
ArrowLines Create a Line That may have Arrows on the Ends
AxisInDollars Build Better Looking Axis Labels for US Dollars
Vignettes of openintro

