Learn R Programming

⚠️There's a newer version (2.4.0) of this package.Take me there.

openintro

Supplemental functions and data for OpenIntro resources, which includes open-source textbooks and resources for introductory statistics at openintro.org. The package contains data sets used in our open-source textbooks along with custom plotting functions for reproducing book figures. The package also contains the datasets used in OpenIntro labs. Note that many functions and examples include color transparency; some plotting elements may not show up properly (or at all) when run in some versions of Windows operating system.

Installation

You can install the released version of openintro from CRAN with:

install.packages("openintro")

You can install the development version of openintro from GitHub with:

# install.packages("devtools")
library(devtools)
install_github("OpenIntroStat/openintro")

This package was produced as part of the OpenIntro project. For the accompanying textbook, visit openintro.org. A PDF of the textbook is free and paperbacks can be purchased online (royalty-free).

Questions, bugs, feature requests

You can file an issue to get help, report a bug, or make a feature request.

When filing an issue to get help or report a bug, please make a minimal reproducible example using the reprex package. If you haven’t heard of or used reprex before, you’re in for a treat! Seriously, reprex will make all of your R-question-asking endeavors easier (which is a pretty insane ROI for the five to ten minutes it’ll take you to learn what it’s all about). For additional reprex pointers, check out the Get help! section of the tidyverse site.

Before opening a new issue, be sure to search issues and pull requests to make sure the bug hasn’t been reported and/or already fixed in the development version. By default, the search will be pre-populated with is:issue is:open. You can edit the qualifiers (e.g. is:pr, is:closed) as needed. For example, you’d simply remove is:open to search all issues in the repo, open or closed.

Contributing

Process for adding new data to the package

The following steps use the devtools and usethis packages for various steps. We recommend using this process when suggesting new datasets to be added to the package. If the dataset is large (>500MB) or you’d like to add a function, please open an issue first for discussion before making the pull request.

  1. Fork and clone the repo with usethis::create_from_github("OpenIntroStat/openintro")
    • Note: If you have write access to the repo, you can skip this step.
  2. Start a new pull request with usethis::pr_init("BRANCH-NAME"), where BRANCH-NAME is an informative branch name.
  3. If adding a file that is not an .rda file to begin with (Excel, csv, etc.), create a folder in the data-raw folder with the name of the dataset (how you’d like it to show up in the package). Please use snake_case for naming, e.g. name_of_dataset.
  4. Place your dataset in its raw form in the folder.
  5. Also in the data-raw folder, create a new R script called name_of_dataset-dataprep.R and write the code needed to read in the file, make any modifications to the data that are needed (if any), and end with usethis::use_data() to save the data in the package as an .rda file with the ideal compression. See examples from other folders in data-raw for sample code. The contents of this folder do not end up in the package (the entire folder is ignored in the .Rbuildignore) so you don’t need to worry about adding package dependencies etc.
  6. In the R folder, create an R script called data-name_of_dataset and add documentation using Roxygen style. See other documentation files for help with style. In the examples, use tidyverse syntax but do not use library(tidyverse) and only use the relevant packages, e.g. library(dplyr), library(ggplot2).
  7. Restart R and run devtools::load_all() to make sure the data loads and run your examples to confirm they all work.
  8. Run devtools::document(), restart R, and then devtools::load_all(). Then, check out ?name_of_dataset to make sure the documentation looks as expected.
  9. Run devtools::check(). The only NOTE you should see as a result of the check should be about the package size. If any other ERRORs, NOTEs, or WARNINGs are generated, resolve them or open an issue for help.
  10. In the pkgdown.yml file, add the name of the dataset under reference, in the correct alphabetical order.
  11. Add a note in the NEWS.md with the new dataset you’ve added with a link to your GitHub username so we can acknowledge your contribution, e.g. “added by @mine-cetinkaya-rundel”.

Code of Conduct

Please note that the openintro project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Copy Link

Version

Install

install.packages('openintro')

Monthly Downloads

6,156

Version

2.3.0

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Mine <c3><87>etinkaya-Rundel

Last Published

February 23rd, 2022

Functions in openintro (2.3.0)

ChiSquareTail

Plot upper tail in chi-square distribution
COL

OpenIntro Statistics colors
Braces

Plot a Braces Symbol
CT2DF

Contingency Table to Data Frame
ArrowLines

Create a Line That may have Arrows on the Ends
IMSCOL

Introduction to Modern Statistics (IMS) Colors
acs12

American Community Survey, 2012
absenteeism

Absenteeism from school in New South Wales
MosaicPlot

Custom Mosaic Plot
CCP

Plot a Cartesian Coordinate Plane
AxisInDollars

Build Better Looking Axis Labels for US Dollars
ask

How important is it to ask pointed questions?
arbuthnot

Male and female births in London
babies_crawl

Crawling age
cancer_in_dogs

Cancer in dogs
biontech_adolescents

Efficacy of Pfizer-BioNTech COVID-19 vaccine on adolescents
age_at_mar

Age at first marriage of 5,534 US women.
AxisInPercent

Build Better Looking Axis Labels for Percentages
BG

Add background color to a plot
birds

Aircraft-Wildlife Collisions
PlotWLine

Plot data and add a regression line
births14

US births
bac

Beer and blood alcohol content
births

North Carolina births, 100 cases
ami_occurrences

Acute Myocardial Infarction (Heart Attack) Events
ames

Housing prices in Ames, Iowa
association

Simulated data for association plots
cle_sac

Cleveland and Sacramento
cards

Deck of cards
climate70

Temperature Summary Data, Geography Limited
avandia

Cardiovascular problems for two types of Diabetes medicines
drug_use

Drug use of students and parents
cia_factbook

CIA Factbook Details on Countries
classdata

Simulated class data
antibiotics

Pre-existing conditions in 92 children
duke_forest

Sale prices of houses in Duke Forest, Durham, NC
babies

The Child Health and Development Studies
burger

Burger preferences
calc_streak

Calculate hit streaks
ball_bearing

Lifespan of ball bearings
assortive_mating

Eye color of couples
census

Random sample of 2000 U.S. Census Data
cherry

Summary information for 31 cherry trees
email

Data frame representing information about a collection of emails
climber_drugs

Climber Drugs Data.
coast_starlight

Coast Starlight Amtrak train
ebola_survey

Survey on Ebola quarantine
earthquakes

Earthquakes
cchousing

Community college housing (simulated data)
bdims

Body measurements of 507 physically active individuals.
contTable

Generate Contingency Tables for LaTeX
corr_match

Sample data sets for correlation problems
cars93

cars93
country_iso

Country ISO information
blizzard_salary

Blizzard Employee Voluntary Salary Info.
email50

Sample of 50 emails
ethanol

Ethanol Treatment for Tumors Experiment
boxPlot

Box plot
evals

Professor evaluations and beauty
fastfood

Nutrition in fast food
buildAxis

Axis function substitute
gdp_countries

GDP Countries Data.
full_body_scan

Poll about use of full-body airport scanners
diabetes2

Type 2 Diabetes Clinical Trial for Patients 10-17 Years Old
exclusive_relationship

Number of Exclusive Relationships
fcid

Summary of male heights from USDA Food Commodity Intake Database
books

Sample of books on a shelf
immigration

Poll on illegal workers in the US
infmortrate

Infant Mortality Rates, 2012
daycare_fines

Daycare fines
densityPlot

Density plot
dlsegments

Create a Double Line Segment Plot
dotPlot

Dot plot
kobe_basket

Kobe Bryant basketball performance
male_heights

Sample of 100 male heights
jury

Simulated juror data set
malaria

Malaria Vaccine Trial
fadeColor

Fade colors
dotPlotStack

Add a Stacked Dot Plot to an Existing Plot
cpr

CPR data set
myPDF

Custom PDF function
openintro_pal

Return function to interpolate an OpenIntro IMS color palette
openintro_cols

Function to extract OpenIntro IMS colors as hex codes
openintro_colors

OpenIntro colors
epa2021

Vehicle info from the EPA for 2021
fact_opinion

Can Americans categorize facts and opinions?
flow_rates

River flow data
openintro_palettes

OpenIntro palettes
nba_heights

NBA Player heights from 2008-9
family_college

Simulated sample of parent / teen college attendance
edaPlot

Exploratory data analysis plot
children_gender_stereo

Gender Stereotypes in 5-7 year old Children
gov_poll

Pew Research poll on government approval ratings
gear_company

Fake data for a gear company example
gender_discrimination

Bank manager recommendations based on gender
elmhurst

Elmhurst College gift aid
friday

Friday the 13th
health_coverage

Health Coverage and Health Status
cpu

CPU's Released between 2010 and 2020.
gss2010

2010 General Social Survey
china

Child care hours
esi

Environmental Sustainability Index 2005
credits

College credits.
dream

Survey on views of the DREAM Act
exam_grades

Exam and course grades for statistics students
president

United States Presidental History
gpa

Survey of Duke students on GPA, studying, and more
drone_blades

Quadcopter Drone Blades
hsb2

High School and Beyond survey
husbands_wives

Great Britain: husband and wife pairs
prison

Prison isolation experiment
scale_fill_openintro

Fill scale constructor for OpenIntro IMS colors
exams

Exam scores
histPlot

Histogram or hollow histogram
get_it_dunn_run

Get it Dunn Run, Race Times
gpa_iq

Sample of students and their GPA and IQ
hfi

Human Freedom Index
lsegments

Create a Line Segment Plot
linResPlot

Create simple regression plot with residual plot
epa2012

Vehicle info from the EPA for 2012
env_regulation

American Adults on Regulation and Renewable Energy
fheights

Female college student heights, in inches
gpa_study_hours

gpa_study_hours
lab_report

lab_report
helium

Helium football
scotus_healthcare

Public Opinion with SCOTUS ruling on American Healthcare Act
helmet

Socioeconomic status and reduced-fee school lunches
labor_market_discriminiation

Are Emily and Greg More Employable Than Lakisha and Jamal?
london_murders

London Murders, 2006-2011
migraine

Migraines and acupuncture
midterms_house

President's party performance and unemployment rate
mail_me

Influence of a Good Mood on Helpfulness
loop

Output a message while inside a loop
lizard_habitat

Field data on lizards observed in their natural habitat
gradestv

Simulated data for analyzing the relationship between watching TV and grades
gsearch

Simulated Google search experiment
mariokart

Wii Mario Kart auctions from Ebay
mcu_films

Marvel Cinematic Universe films
gifted

Analytical skills of young gifted children
global_warming_pew

Pew survey on global warming
house

United States House of Representatives historical make-up
goog

Google stock data
fish_oil_18

Findings on n-3 Fatty Acid Supplement Health Benefits
normTail

Normal distribution tails
ppp_201503

US Poll on who it is better to raise taxes on
housing

Simulated data set on student housing
nuclear_survey

Nuclear Arms Reduction Survey
mlb_players_18

Batter Statistics for 2018 Major League Baseball (MLB) Season
lizard_run

Lizard speeds
ipo

Facebook, Google, and LinkedIn IPO filings
present

Birth counts
russian_influence_on_us_election_2016

Russians' Opinions on US Election Influence in 2016
sa_gdp_elec

Sustainability and Economic Indicators for South Africa.
major_survey

Survey of Duke students and the area of their major
mlbbat10

Major League Baseball Player Hitting Statistics for 2010
ipod

Length of songs on an iPod
makeTube

Regression tube
smoking

UK Smoking Data
smallpox

Smallpox vaccine results
mlb_teams

Major League Baseball Teams Data.
tips

Tip data
thanksgiving_spend

Thanksgiving spending, simulated based on Gallup poll.
socialexp

Social experiment
snowfall

Snowfall at Paradise, Mt. Rainier National Park
mn_police_use_of_force

Minneapolis police use of force data.
heart_transplant

Heart Transplant Data
healthcare_law_survey

Pew Research Center poll on health care, including question variants
law_resume

Gender, Socioeconomic Class, and Interview Invites
toy_anova

Simulated data set for ANOVA
outliers

Simulated data sets for different types of outliers
oscars

Oscar winners, 1929 to 2018
prius_mpg

User reported fuel efficiency for 2017 Toyota Prius Prime
qqnormsim

Generate simulated QQ plots
lmPlot

Linear regression plot with residual plot
transplant

Transplant consultant success rate (fake data)
stats_scores

Final exam scores for twenty students
mlb

Salary data for Major League Baseball (2010)
mammals

Sleep in Mammals
male_heights_fcid

Random sample of adult male heights
offshore_drilling

California poll on drilling off the California coast
military

US Military Demographics
stem_cell

Embryonic stem cells to treat heart attack (in sheep)
student_housing

Community college housing (simulated data, 2015)
student_sleep

Sleep for 110 students (simulated)
penny_ages

Penny Ages
photo_classify

Photo classifications: fashion or not
pew_energy_2018

Pew Survey on Energy Sources in 2018
openintro-package

openintro: Data Sets and Supplemental Functions from 'OpenIntro' Textbooks and Labs
penelope

Guesses at the weight of Penelope (a cow)
treeDiag

Construct tree diagrams
piracy

Piracy and PIPA/SOPA
nba_players_19

NBA Players for the 2018-2019 season
ucla_f18

UCLA courses in Fall 2018
penetrating_oil

What's the best way to loosen a rusty bolt?
ncbirths

North Carolina births, 1000 cases
scale_color_openintro

Color scale constructor for OpenIntro IMS colors
satgpa

SAT and GPA data
playing_cards

Table of Playing Cards in 52-Card Deck
ssd_speed

SSD read and write speeds
starbucks

Starbucks nutrition
resume

Which resume attributes drive job callbacks?
unempl

Annual unemployment since 1890
sulphinpyrazone

Treating heart attacks
yrbss_samp

Sample of Youth Risk Behavior Surveillance System (YRBSS)
supreme_court

Supreme Court approval rating
unemploy_pres

President's party performance and unemployment rate
leg_mari

Legalization of Marijuana Support in 2010 California Survey
loans_full_schema

Loan data from Lending Club
london_boroughs

London Borough Boundaries
marathon

New York City Marathon Times (outdated)
mtl

Medial temporal lobe (MTL) and other data for 26 participants
mammogram

Experiment with Mammogram Randomized
pm25_2011_durham

Air quality for Durham, NC
nyc_marathon

New York City Marathon Times
murders

Data for 20 metropolitan areas
nycflights

Flights data
opportunity_cost

Opportunity cost of purchases
rosling_responses

Sample Responses to Two Public Health Questions
res_demo_1

Simulated data for regression
poker

Poker winnings during 50 sessions
simpsons_paradox_covid

Simpson's Paradox: Covid
res_demo_2

Simulated data for regression
orings

1986 Challenger disaster and O-rings
simulated_dist

Simulated data sets, not necessarily drawn from a normal distribution.
solar

Energy Output From Two Solar Arrays in San Francisco
salinity

Salinity in Bimini Lagoon, Bahamas
sowc_child_mortality

SOWC Child Mortality Data.
sp500

Financial information for 50 S&P 500 companies
sp500_1950_2018

Daily observations for the S\&P 500
stent30

Stents for the treatment of stroke
possum

Possums in Australia and New Guinea
sat_improve

Simulated data for SAT score improvement
stocks_18

Monthly Returns for a few stocks
race_justice

Yahoo! News Race and Justice poll results
simulated_normal

Simulated data sets, drawn from a normal distribution.
simulated_scatter

Simulated data for sample scatterplots
yawn

Contagiousness of yawning
yrbss

Youth Risk Behavior Surveillance System (YRBSS)
sex_discrimination

Bank manager recommendations based on sex
seattlepets

Names of pets in Seattle
reddit_finance

Reddit Survey on Financial Independence.
sowc_maternal_newborn

SOWC Maternal and Newborn Health Data.
sowc_demographics

SOWC Demographics Data.
sleep_deprivation

Survey on sleep deprivation and transportation workers
sinusitis

Sinusitis and antibiotic experiment
sp500_seq

S&P 500 stock data
ucla_textbooks_f18

Sample of UCLA course textbooks for Fall 2018
tourism

Turkey tourism
speed_gender_height

Speed, gender, and height of 1325 students
textbooks

Textbook data for UCLA Bookstore and Amazon
write_pkg_data

Create a CSV variant of .rda files
toohey

Simulated polling data set
teacher

Teacher Salaries in St. Louis, Michigan
xom

Exxon Mobile stock data
ukdemo

United Kingdom Demographic Data
winery_cars

Time Between Gondola Cars at Sterling Winery
world_pop

World Population Data.