nflscrapR v1.4.0

Compiling the NFL Play-by-Play API for easy use in R

This package allows data driven sports enthusiasts to use the NFL JSON API data to perform detailed analysis at game, season, and player levels. The functions within this package help parse and clean the data from for R users. This package was built to enhance advanced sports analytics research specifically for American football with the hopes of developing metrics and insights that could potentially be useful for professional NFL teams as well as the public.


Introducing the nflscrapR Package

This package was built to allow R users to utilize and analyze data from the National Football League (NFL) API. The functions in this package allow users to perform analysis at the play and game levels on single games and entire seasons. By parsing the play-by-play data recorded by the NFL, this package allows NFL data enthusiasts to examine each facet of the game at a more insightful level. The creation of this package puts granular data into the hands of the any R user with an interest in performing analysis and digging up insights about the game of American Football. With open-source data, the development of reproducible advanced NFL metrics can occur at a more rapid pace and lead to growing the football analytics community.

Note: Data is only available after 2009

Downloading and Loading the Package

# Must install the devtools package using the below commented out code
# install.packages('devtools')

devtools::install_github(repo = "maksimhorowitz/nflscrapR")
#> Skipping install for github remote, the SHA1 (05815ef8) has not changed since last install.
#>   Use `force = TRUE` to force installation

# Load the package


Simple Example of Package Usage

Here is an example of comparing the difference in the distributions of EPA per attempt for passers with at least 50 attempts between NFL seasons from 2009-2016. The code for this example is below:

# Loading the data with season_play_by_play function: (Note the
# season_play_by_play function takes a few minutes to run)

pbp_2009 <- season_play_by_play(2009)
pbp_2010 <- season_play_by_play(2010)
pbp_2011 <- season_play_by_play(2011)
pbp_2012 <- season_play_by_play(2012)
pbp_2013 <- season_play_by_play(2013)
pbp_2014 <- season_play_by_play(2014)
pbp_2015 <- season_play_by_play(2015)
pbp_2016 <- season_play_by_play(2016)

# Stack the datasets together: (Load the tidyverse first - as if you didn't
# already...)

#> Loading tidyverse: tibble
#> Loading tidyverse: tidyr
#> Loading tidyverse: readr
#> Loading tidyverse: purrr
#> Loading tidyverse: dplyr
#> Conflicts with tidy packages ----------------------------------------------
#> complete(): tidyr, RCurl
#> filter():   dplyr, stats
#> lag():      dplyr, stats

pbp_data <- bind_rows(pbp_2009, pbp_2010, pbp_2011, pbp_2012, pbp_2013, pbp_2014, 
    pbp_2015, pbp_2015)

# Now filter down to only passing attempts, group by the season and passer,
# then calculate the number of passing attempts, total expected points added
# (EPA), EPA per attempt, then finally filter to only those with at least 50
# pass attempts:

passing_stats <- pbp_data %>% filter(PassAttempt == 1 & PlayType != "No Play" & 
    ! %>% group_by(Season, Passer) %>% summarise(Attempts = n(), 
    Total_EPA = sum(EPA, na.rm = TRUE), EPA_per_Att = Total_EPA/Attempts) %>% 
    filter(Attempts >= 50)

# Using the ggjoy package (install with the commented out code below) can
# compare the EPA per Pass Attempt for each NFL season:
# install.packages('ggjoy')

ggplot(passing_stats, aes(x = EPA_per_Att, y = as.factor(Season))) + geom_joy(scale = 3, 
    rel_min_height = 0.01) + theme_joy() + ylab("Season") + xlab("EPA per Pass Attempt") + 
    scale_y_discrete(expand = c(0.01, 0)) + scale_x_continuous(expand = c(0.01, 
    0)) + ggtitle("The Shifting Distribution of EPA per Pass Attempt") + theme(plot.title = element_text(hjust = 0.5, 
    size = 16), axis.title = element_text(size = 16), axis.text = element_text(size = 16))
#> Picking joint bandwidth of 0.0603

Functions in nflscrapR

Name Description
drive_summary Drive Summary and Results
nflteams Dataset of NFL team names, abbreviations, and colors
game_play_by_play Parsed Descriptive Play-by-Play Dataset for a Single Game
playerstats11 NFL Team Names and Abbreviations
playerstats15 NFL Team Names and Abbreviations
proper_jsonurl_formatting Formatting URL for location of NFL Game JSON Data
playerstats13 NFL Team Names and Abbreviations
playerstats14 NFL Team Names and Abbreviations
season_rosters Season Rosters for Teams
season_player_game Boxscore for Each Game in the Season - One line per player per game
season_play_by_play Parsed Descriptive Play-by-Play Function for a Full Season
season_games Game Information for All Games in a Season
simple_boxscore Simple Game Boxscore
playerstats09 NFL Team Names and Abbreviations
agg_player_season Detailed Player Aggregate Season Statistics
playerstats10 NFL Team Names and Abbreviations
player_game Detailed Boxscore for Single NFL Game
extracting_gameids Extract GameIDs for each game in a given NFL season
playerstats12 NFL Team Names and Abbreviations
expected_points Expected point function to calculate expected points for each play in the play by play, and the expected points added in three ways, basic EPA, air yards EPA, and yards after catch EPA
win_probability Win probability function to add win probability columns for the home and away teams for each play in the game
buildURL Building URL to scrape player season stat pages
getPageNumbers Get Number of Player Position Pages
getPlayers Scrape Player Names and Positions
buildNameAbbr Build formatted player name from full player name
getGSISID For a player's href, get their GSIS ID from their personal url.
findPagePlayerID Find the GSIS ID for each player on the provided page.
No Results!

Vignettes of nflscrapR

No Results!


Include our badge in your README