Learn R Programming

⚠️There's a newer version (1.6.0) of this package.Take me there.

baseballr

baseballr is a package written for R focused on baseball analysis. It includes functions for scraping various data from websites, such as FanGraphs.com, Baseball-Reference.com, and baseballsavant.mlb.com. It also includes functions for calculating metrics, such as wOBA, FIP, and team-level consistency over custom time frames.

You can read more about some of the functions and how to use them at its official site as well as this Hardball Times article.

Installation

You can install the CRAN version of baseballr with:

install.packages("baseballr")

You can install the released version of baseballr from GitHub with:

# You can install using the pacman package using the following code:
if (!requireNamespace('pacman', quietly = TRUE)){
  install.packages('pacman')
}
pacman::p_load_current_gh("BillPetti/baseballr")
# Alternatively, using the devtools package:
if (!requireNamespace('devtools', quietly = TRUE)){
  install.packages('devtools')
}
devtools::install_github(repo = "BillPetti/baseballr")

For experimental functions in development, you can install the development branch:

# install.packages("devtools")
devtools::install_github("BillPetti/baseballr", ref = "development_branch")

Functionality

The package consists of two main sets of functions: data acquisition and metric calculation.

For example, if you want to see the standings for a specific MLB division on a given date, you can use the bref_standings_on_date() function. Just pass the year, month, day, and division you want:

library(baseballr)
library(dplyr)
bref_standings_on_date("2015-08-01", "NL East", from = FALSE)
## ── MLB Standings on Date data from baseball-reference.com ─── baseballr 1.3.0 ──

## ℹ Data updated: 2022-09-08 20:13:31 EDT

## # A tibble: 5 × 8
##   Tm        W     L `W-L%` GB       RS    RA `pythW-L%`
##   <chr> <int> <int>  <dbl> <chr> <int> <int>      <dbl>
## 1 WSN      54    48  0.529 --      422   391      0.535
## 2 NYM      54    50  0.519 1.0     368   373      0.494
## 3 ATL      46    58  0.442 9.0     379   449      0.423
## 4 MIA      42    62  0.404 13.0    370   408      0.455
## 5 PHI      41    64  0.39  14.5    386   511      0.374

Right now the function works as far as back as 1994, which is when both leagues split into three divisions.

You can also pull data for all hitters over a specific date range. Here are the results for all hitters from August 1st through October 3rd during the 2015 season:

data <- bref_daily_batter("2015-08-01", "2015-10-03") 
data %>%
  dplyr::glimpse()
## Rows: 764
## Columns: 30
## $ bbref_id <chr> "547989", "554429", "542436", "571431", "501303", "346793", "…
## $ season   <int> 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2…
## $ Name     <chr> "Manny Machado", "Matt Duffy", "Jose Altuve", "Adam Eaton", "…
## $ Age      <dbl> 22, 24, 25, 26, 32, 21, 27, 28, 36, 28, 29, 29, 27, 29, 27, 2…
## $ Level    <chr> "Maj-AL", "Maj-NL", "Maj-AL", "Maj-AL", "Maj-AL", "Maj-AL", "…
## $ Team     <chr> "Baltimore", "San Francisco", "Houston", "Chicago", "Texas", …
## $ G        <dbl> 59, 59, 57, 58, 58, 58, 59, 58, 59, 57, 55, 57, 57, 58, 56, 5…
## $ PA       <dbl> 266, 264, 262, 262, 260, 259, 259, 258, 257, 257, 255, 255, 2…
## $ AB       <dbl> 237, 248, 244, 230, 211, 224, 239, 235, 231, 233, 213, 218, 2…
## $ R        <dbl> 36, 33, 30, 37, 48, 35, 32, 29, 37, 27, 50, 37, 36, 25, 38, 4…
## $ H        <dbl> 66, 71, 81, 74, 71, 79, 54, 66, 75, 48, 65, 56, 61, 51, 78, 5…
## $ X1B      <dbl> 43, 54, 53, 56, 47, 51, 34, 37, 48, 30, 34, 32, 35, 33, 66, 2…
## $ X2B      <dbl> 10, 12, 19, 12, 14, 17, 6, 17, 16, 11, 13, 13, 15, 10, 7, 13,…
## $ X3B      <dbl> 0, 2, 3, 1, 1, 4, 1, 0, 2, 1, 2, 4, 0, 1, 3, 0, 4, 0, 1, 1, 0…
## $ HR       <dbl> 13, 3, 6, 5, 9, 7, 13, 12, 9, 6, 16, 7, 11, 7, 2, 20, 9, 8, 8…
## $ RBI      <dbl> 32, 30, 18, 31, 34, 32, 27, 40, 53, 21, 50, 19, 31, 39, 23, 4…
## $ BB       <dbl> 26, 15, 10, 23, 39, 18, 16, 17, 21, 21, 34, 33, 21, 39, 12, 3…
## $ IBB      <dbl> 1, 0, 1, 1, 1, 0, 0, 6, 1, 1, 0, 1, 1, 5, 0, 4, 3, 3, 7, 2, 2…
## $ uBB      <dbl> 25, 15, 9, 22, 38, 18, 16, 11, 20, 20, 34, 32, 20, 34, 12, 35…
## $ SO       <dbl> 42, 35, 28, 55, 51, 38, 68, 56, 29, 53, 46, 62, 41, 48, 27, 7…
## $ HBP      <dbl> 2, 0, 4, 5, 8, 1, 3, 5, 1, 1, 2, 3, 3, 1, 1, 6, 1, 3, 4, 1, 0…
## $ SH       <dbl> 0, 0, 1, 2, 1, 11, 0, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, …
## $ SF       <dbl> 1, 1, 3, 2, 1, 5, 1, 1, 4, 2, 5, 1, 2, 2, 3, 0, 3, 2, 3, 4, 3…
## $ GDP      <dbl> 5, 9, 6, 1, 1, 4, 2, 2, 9, 7, 5, 1, 4, 8, 1, 2, 3, 10, 5, 4, …
## $ SB       <dbl> 6, 8, 11, 9, 2, 10, 0, 0, 0, 3, 3, 4, 5, 4, 24, 2, 1, 0, 6, 0…
## $ CS       <dbl> 4, 0, 4, 4, 0, 2, 0, 0, 0, 1, 0, 1, 3, 2, 7, 2, 3, 0, 2, 0, 0…
## $ BA       <dbl> 0.279, 0.286, 0.332, 0.322, 0.337, 0.353, 0.226, 0.281, 0.325…
## $ OBP      <dbl> 0.353, 0.326, 0.364, 0.392, 0.456, 0.395, 0.282, 0.341, 0.377…
## $ SLG      <dbl> 0.485, 0.387, 0.508, 0.448, 0.540, 0.558, 0.423, 0.506, 0.528…
## $ OPS      <dbl> 0.839, 0.713, 0.872, 0.840, 0.996, 0.953, 0.705, 0.848, 0.906…

In terms of metric calculation, the package allows the user to calculate the consistency of team scoring and run prevention for any year using team_consistency():

team_consistency(2015)
## # A tibble: 30 × 5
##    Team  Con_R Con_RA Con_R_Ptile Con_RA_Ptile
##    <chr> <dbl>  <dbl>       <dbl>        <dbl>
##  1 ARI    0.37   0.36          17           15
##  2 ATL    0.41   0.4           88           63
##  3 BAL    0.4    0.38          70           42
##  4 BOS    0.39   0.4           52           63
##  5 CHC    0.38   0.41          30           85
##  6 CHW    0.39   0.4           52           63
##  7 CIN    0.41   0.36          88           15
##  8 CLE    0.41   0.4           88           63
##  9 COL    0.35   0.34           7            3
## 10 DET    0.39   0.38          52           42
## # … with 20 more rows

You can also calculate wOBA per plate appearance and wOBA on contact for any set of data over any date range, provided you have the data available.

Simply pass the proper data frame to woba_plus:

data %>%
  dplyr::filter(PA > 200) %>%
  woba_plus %>%
  dplyr::arrange(desc(wOBA)) %>%
  dplyr::select(Name, Team, season, PA, wOBA, wOBA_CON) %>%
  dplyr::glimpse()
## Rows: 117
## Columns: 6
## $ Name     <chr> "Edwin Encarnacion", "Bryce Harper", "David Ortiz", "Joey Vot…
## $ Team     <chr> "Toronto", "Washington", "Boston", "Cincinnati", "Baltimore",…
## $ season   <int> 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2…
## $ PA       <dbl> 216, 248, 213, 251, 253, 260, 245, 255, 223, 241, 223, 259, 2…
## $ wOBA     <dbl> 0.490, 0.450, 0.449, 0.445, 0.434, 0.430, 0.430, 0.422, 0.410…
## $ wOBA_CON <dbl> 0.555, 0.529, 0.541, 0.543, 0.617, 0.495, 0.481, 0.494, 0.459…

You can also generate these wOBA-based stats, as well as FIP, for pitchers using the fip_plus() function:

bref_daily_pitcher("2015-04-05", "2015-04-30") %>% 
  fip_plus() %>% 
  dplyr::select(season, Name, IP, ERA, SO, uBB, HBP, HR, FIP, wOBA_against, wOBA_CON_against) %>%
  dplyr::arrange(dplyr::desc(IP)) %>% 
  head(10)
## ── MLB Daily Pitcher data from baseball-reference.com ─────── baseballr 1.3.0 ──

## ℹ Data updated: 2022-09-08 20:13:44 EDT

## # A tibble: 10 × 11
##    season Name            IP   ERA    SO   uBB   HBP    HR   FIP wOBA_…¹ wOBA_…²
##     <int> <chr>        <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>   <dbl>
##  1   2015 Johnny Cueto  37    1.95    38     4     2     3  2.62   0.21    0.276
##  2   2015 Dallas Keuc…  37    0.73    22    11     0     0  2.84   0.169   0.151
##  3   2015 Sonny Gray    36.1  1.98    25     6     1     1  2.69   0.218   0.239
##  4   2015 Mike Leake    35.2  3.03    25     7     0     5  4.16   0.24    0.281
##  5   2015 Felix Herna…  34.2  1.82    36     6     3     1  2.2    0.225   0.272
##  6   2015 Corey Kluber  34    4.24    36     5     2     2  2.4    0.295   0.391
##  7   2015 Jake Odoriz…  33.2  2.41    26     8     1     0  2.38   0.213   0.228
##  8   2015 Josh Collme…  32.2  2.76    16     3     0     1  2.82   0.29    0.33 
##  9   2015 Bartolo Col…  32.2  3.31    25     1     0     4  3.29   0.28    0.357
## 10   2015 Zack Greinke  32.2  1.93    27     7     1     2  3.01   0.24    0.274
## # … with abbreviated variable names ¹​wOBA_against, ²​wOBA_CON_against

Issues

Please leave any suggestions or bugs in the Issues section.

Pull Requests

Pull request are welcome, but I cannot guarantee that they will be accepted or accepted quickly. Please make all pull requests to the development branch for review.

Breaking Changes

Full News on Releases

Follow the SportsDataverse on Twitter and star this repo

Our Authors

Our Contributors (they’re awesome)

Citations

To cite the baseballr R package in publications, use:

BibTex Citation

@misc{petti_gilani_2021,
  author = {Bill Petti and Saiem Gilani},
  title = {baseballr: The SportsDataverse's R Package for Baseball Data.},
  url = {https://billpetti.github.io/baseballr/},
  year = {2021}
}

Copy Link

Version

Install

install.packages('baseballr')

Monthly Downloads

1,502

Version

1.3.0

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Last Published

September 9th, 2022

Functions in baseballr (1.3.0)

fangraphs

FanGraphs Functions Overview
csv_from_url

Load .csv / .csv.gz file from a remote connection
edge_code

Edge Code
code_barrel

Helper for determining whether a batted ball is a "barrel"
daily_batter_bref

(legacy) Scrape Batter Performance Data Over a Custom Time Frame
playername_lookup

Look up Baseball Player Name by ID
edge_frequency

Edge Percentage Frequency
fg_bat_leaders

(legacy) Scrape Batter Leaderboards from FanGraphs
daily_pitcher_bref

(legacy) Scrape Pitcher Performance Data Over a Custom Time Frame
column_structure_draft_mlb

Column structure of the MLB Draft data
fg_guts

Scrape FanGraphs.com Guts!
fg_pitcher_leaders

Scrape Pitcher Leaderboards from FanGraphs
fg_pitch_leaders

(legacy) Scrape Pitcher Leaderboards from FanGraphs
fg_batter_leaders

Scrape Batter Leaderboards from FanGraphs
fg_team_batter

Scrape Team Batter Leaderboards from FanGraphs
fg_milb_pitcher_game_logs

Scrape MiLB game logs for pitchers from FanGraphs
fg_pitcher_game_logs

Scrape Pitcher Game Logs from FanGraphs
fg_milb_batter_game_logs

Scrape MiLB game logs for batters from FanGraphs
fg_batter_game_logs

Scrape Batter Game Logs from FanGraphs
fg_park

Scrape Park Factors from FanGraphs
get_game_pks_mlb

(legacy) Get MLB Game Info by Date and Level
fg_team_pitcher

Scrape Team Pitcher Leaderboards from FanGraphs
fip_plus

Calculate FIP and related metrics for any set of data
get_ncaa_baseball_roster

(legacy) Get NCAA Baseball Rosters
get_game_info_sup_petti

(legacy) Download a data frame of supplemental data about MLB games since 2008.
get_game_info_mlb

(legacy) Retrieve additional game information for major and minor league games
get_ncaa_game_logs

(legacy) Get NCAA Baseball Game Logs
get_batting_orders

(legacy) Retrieve batting orders for a given MLB game
get_ncaa_baseball_pbp

(legacy) Get Play-By-Play Data for NCAA Baseball Games
get_draft_mlb

(legacy) Retrieve draft pick information by year
linear_weights_savant

Generate linear weight values for events using Baseball Savant data
get_retrosheet_data

(legacy) Get, Parse, and Format Retrosheet Event and Roster Files
label_statcast_imputed_data

Label Statcast data as imputed
ggspraychart

Generate spray charts with ggplot2
get_ncaa_park_factor

(legacy) Get Park Effects for NCAA Baseball Teams
get_umpire_ids_petti

(legacy) Download a data frame of all umpires and their MLBAM IDs for games since 2008
get_ncaa_lineups

(legacy) Retrieve lineups for a given NCAA game via its game_info_url
get_probables_mlb

(legacy) Retrieve probable starters for a given MLB game
get_pbp_mlb

(legacy) Acquire pitch-by-pitch data for Major and Minor League games
get_ncaa_schedule_info

(legacy) Get Schedule and Results for NCAA Baseball Teams
mlb_all_star_write_ins

Find MLB All-Star Write-ins
mlb_all_star_ballots

Find MLB All-Star Ballots
mlb

MLB Functions Overview
mlb_attendance

MLB Attendance
load_game_info_sup

Download a data frame of supplemental data about MLB games since 2008.
milb_batter_game_logs_fg

(legacy) Scrape MiLB game logs for batters from FanGraphs
metrics

Metrics Functions Overview
load_umpire_ids

Download a data frame of all umpires and their mlbamids for games since 2008
mlb_all_star_final_vote

Find MLB All-Star Final Vote
milb_pitcher_game_logs_fg

(legacy) Scrape MiLB game logs for pitchers from FanGraphs
mlb_batting_orders

Retrieve batting orders for a given MLB game
mlb_divisions

MLB Divisions
mlb_draft

Retrieve draft pick information by year
mlb_awards_recipient

MLB Award Recipients
mlb_conferences

View all PCL conferences
mlb_draft_prospects

Retrieve draft prospect information by year
mlb_draft_latest

Retrieve latest draft information by year
mlb_award

MLB All-Star, Awards, Home Run Derby Functions
mlb_baseball_stats

MLB Baseball Stats
mlb_awards

MLB Awards
mlb_game_changes

Acquire time codes for Major and Minor League games
mlb_fielder_detail_types

MLB Fielder Detail Types
mlb_game_content

Retrieve additional game content for major and minor league games
mlb_game_linescore

Retrieve game linescores for major and minor league games
mlb_game_status_codes

MLB Game Status Codes
mlb_game_pace

Retrieve game pace metrics for major and minor league
mlb_game_pks

Get MLB Game Info by Date and Level
mlb_game_info

Retrieve additional game information for major and minor league games
mlb_event_types

MLB Event Types
mlb_game_context_metrics

Acquire game context metrics for Major and Minor League games
mlb_homerun_derby_bracket

Retrieve Homerun Derby Bracket
mlb_high_low_types

MLB Stat High/Low Types
mlb_homerun_derby

Retrieve Homerun Derby data
mlb_hit_trajectories

MLB Hit Trajectories
mlb_homerun_derby_players

Retrieve Homerun Derby Players
mlb_game_types

MLB Game Types
mlb_job_types

MLB Job Types
mlb_game_timecodes

Acquire time codes for Major and Minor League games
mlb_high_low_stats

Acquire high/low stats for Major and Minor Leagues
mlb_game_wp

Acquire win probability for Major and Minor League games
mlb_logical_events

MLB Logical Events
mlb_league_leader_types

MLB League Leader Types
mlb_languages

MLB API Language Options
mlb_jobs_datacasters

MLB Jobs Datacasters
mlb_jobs

MLB Jobs
mlb_jobs_umpires

MLB Jobs Umpires
mlb_metrics

MLB Metrics
mlb_jobs_official_scorers

MLB Jobs Official Scorers
mlb_league

MLB Leagues
mlb_pbp

Acquire pitch-by-pitch data for Major and Minor League games
mlb_positions

MLB Positions
mlb_people_free_agents

Find Information About MLB Free Agents
mlb_people

Find Biographical Information for MLB Players
mlb_player_game_stats

Find MLB Player Game Stats
mlb_probables

Retrieve probable starters for a given MLB game
mlb_player_game_stats_current

Find MLB Player Game Stats - Current Game
mlb_pbp_diff

Acquire pitch-by-pitch data between two timecodes for Major and Minor League games
mlb_player_status_codes

MLB Player Status Codes
mlb_pitch_codes

MLB Pitch Codes
mlb_pitch_types

MLB Pitch Types
mlb_rosters

Find MLB Rosters by Roster Type
mlb_roster_types

MLB Roster Types
mlb_schedule

Find game_pk values for professional baseball games (major and minor leagues)
mlb_runner_detail_types

MLB Runner Detail Types
mlb_seasons

Find MLB Seasons
mlb_schedule_event_types

MLB Schedule Event Types
mlb_review_reasons

MLB Review Reasons
mlb_schedule_postseason

Find game_pk values for professional baseball postseason games (major and minor leagues)
mlb_schedule_games_tied

Find game_pk values for professional baseball games (major and minor leagues) that are tied
mlb_schedule_postseason_series

Find game_pk values for professional baseball postseason series games (major and minor leagues)
mlb_sports

MLB Sport IDs
mlb_sky

MLB Sky (Weather) Codes
mlb_standings_types

MLB Standings Types
mlb_stat_types

MLB Stat Types
mlb_sports_players

MLB Sport Players
mlb_stat_groups

MLB Stat Groups
mlb_situation_codes

MLB Situation Codes
mlb_standings

MLB Standings
mlb_seasons_all

Find MLB Seasons all
mlb_sports_info

MLB Sport IDs Information
mlb_team_history

MLB Teams History
mlb_stats

MLB Stats
mlb_team_coaches

MLB Team Coaches
mlb_team_affiliates

MLB Team Affiliates
mlb_team_stats

MLB Team Individual Stats
mlb_stats_leaders

MLB Stats Leaders
mlb_team_info

MLB Team Info
mlb_team_leaders

MLB Team Leaders
mlb_team_personnel

MLB Team Personnel
mlb_team_alumni

MLB Team Alumni
ncaa_baseball_pbp

Get Play-By-Play Data for NCAA Baseball Games
mlb_venues

Find MLB Venues
mlb_teams_stats_leaders

MLB Teams Stats Leaders
ncaa_baseball_roster

Get NCAA Baseball Rosters
most_recent_mlb_season

Most Recent MLB Season
mlb_teams_stats

MLB Teams Stats
most_recent_ncaa_baseball_season

Most Recent NCAA Baseball Season
ncaa

NCAA Functions Overview
mlb_wind_direction_codes

MLB Wind Direction Codes
mlb_teams

MLB Teams
ncaa_team_lu

A data set colleges and their athletic conferences and divisions
pitcher_game_logs_fg

(legacy) Scrape Pitcher Game Logs from FanGraphs
ncaa_lineups

Retrieve lineups for a given NCAA game via its game_info_url
ncaa_scrape

Scrape NCAA baseball data (Division I, II, and III)
ncaa_school_id_lu

Lookup NCAA baseball school IDs (Division I, II, and III)
process_statcast_payload

Process Baseball Savant CSV payload
ncaa_schedule_info

Get Schedule and Results for NCAA Baseball Teams
ncaa_park_factor

Get Park Effects for NCAA Baseball Teams
ncaa_game_logs

Get NCAA Baseball Game Logs
ncaa_season_id_lu

A data set of college baseball seasons
standings_on_date_bref

(legacy) Scrape MLB Standings on a Given Date
sptrc_league_payrolls

Scrape League Payroll Breakdowns from Spotrac
run_expectancy_code

Generate run expectancy and related measures from Baseball Savant data
retrosheet_data

Get, Parse, and Format Retrosheet Event and Roster Files
sptrc_team_active_payroll

Scrape Team Active Payroll Breakdown from Spotrac
scrape_statcast_savant

(legacy) Query Statcast by Date Range and Players
school_id_lu

(legacy) Lookup NCAA baseball school IDs (Division I, II, and III)
scrape_savant_leaderboards

(legacy) Query Baseball Savant Leaderboards
rds_from_url

Load .rds file from a remote connection
progressively

Progressively
statcast_leaderboards

Query Baseball Savant Leaderboards
statcast

Statcast Functions Overview
stats_api_live_empty_df

Column structure of MLB Stats Live Game API data frame
team_consistency

Calculate Team-level Consistency
woba_plus

Calculate wOBA and related metrics for any set of data
statline_from_statcast

Create stat lines from Statcast data
statcast_search

Query Statcast by Date Range and Players
teams_lu_table

A Team Lookup Table
statcast_impute

Statcast Label Imputation
team_results_bref

(legacy) Scrape Team Results
bref_team_results

Scrape Team Results
chadwick

Chadwick Bureau Register Player Lookup
playerid_lookup

Look up Baseball Player IDs by Player Name
bref_standings_on_date

Scrape MLB Standings on a Given Date
bref_daily_batter

Scrape Batter Performance Data Over a Custom Time Frame
batter_game_logs_fg

(legacy) Scrape Batter Game Logs from FanGraphs
bref_daily_pitcher

Scrape Pitcher Performance Data Over a Custom Time Frame
bref

Baseball Reference Functions Overview
chadwick_player_lu

Download the Chadwick Bureau's public register of baseball players
baseballr-package

baseballr: Acquiring and Analyzing Baseball Data