Clean Play by Play Data
clean_pbp(pbp, ...)
is a Data frame of play-by-play data scraped using fast_scraper
.
Additional arguments passed to a message function (for internal use).
The input Data Frame of the paramter 'pbp' with the following columns added:
Binary indicator wheter epa > 0 in the given play.
Name of the dropback player (scrambles included) including plays with penalties.
Jersey number of the passer.
Name of the rusher (no scrambles) including plays with penalties.
Jersey number of the rusher.
Name of the receiver including plays with penalties.
Jersey number of the receiver.
Binary indicator if the play was a pass play (sacks and scrambles included).
Binary indicator if the play was a rushing play.
Binary indicator if the play was a special teams play.
Binary indicator if the play ended in a first down.
Binary indicator if the play description indicates "Aborted".
Binary indicator: 1 if the play was a 'normal' play (including penalties), 0 otherwise.
ID of the player in the 'passer' column (NOTE: ids vary pre and post 2011 but are consistent for each player. Please see details for further information)
ID of the player in the 'rusher' column (NOTE: ids vary pre and post 2011 but are consistent for each player. Please see details for further information)
ID of the player in the 'receiver' column (NOTE: ids vary pre and post 2011 but are consistent for each player. Please see details for further information)
Name of the 'passer' if it is not 'NA', or name of the 'rusher' otherwise.
Jersey number of the player listed in the 'name' column.
ID of the player in the 'name' column (NOTE: ids vary pre and post 2011 but are consistent for each player. Please see details for further information)
Gives QB credit for EPA for up to the point where a receiver lost a fumble after a completed catch and makes EPA work more like passing yards on plays with fumbles.
Build columns that capture what happens on all plays, including penalties, using string extraction from play description. Loosely based on Ben's nflfastR guide (https://www.nflfastr.com/articles/beginners_guide.html) but updated to work with the RS data, which has a different player format in the play description; e.g. 24-M.Lynch instead of M.Lynch. The function also standardizes team abbreviations so that, for example, the Chargers are always represented by 'LAC' regardless of which year it was. The function also standardizes player IDs for players appearing in both the older era (1999-2010) and the new era (2011+).