These vignettes serve a dual purpose:
to introduce users of the
Lahmanpackage to the breadth and depth of the data, and a range of analysis and statistical methods that can be undertaken using the data in the package,
to introduce users to the statistical software R, but particularly to the modern use in statistics and data science encapsulated in the
tidyverseof R packages designed to facilitate data input and manipulation and graphics.
Vignettes completed to-date:
Relationship Between Strikeouts and Home Runs -- This vignette looks at the relationship between rate of strikeouts and home runs from the year 1950+. This question was inspired by Marchi and Albert (2014), Analyzing Baseball Data in R.
- R packages demonstrated:
car(Companion to Applied Regression)
- R packages demonstrated:
- Run Scoring Trends -- Major League Baseball average per-game run scoring for each season since 1901.
- Team Payroll and the World Series -- This vignette examines whether there is a relationship between total team salaries (payroll) and World Series success.
A number of books and on-line resources use the
Lahman package as material for the examples. These include:
Michael Friendly and David Meyer (2016) Discrete Data Analysis with R: Visualization and Modeling Techniques for Categorical and Count Data (CRC Press). DDAR Web Site
Max Marchi and Jim Albert (2014) Analyzing Baseball Data with R (CRC Press)
- the book makes frequent reference to the raw Lahman database files in CSV form; the
Lahmanpackage was relatively recent when the book was published, and authors make a brief mention of the package.
David Robinson (2017) Introduction to Empirical Bayes (published at [gumroad.com])
the book makes extensive use of the package to explain "the empirical Bayesian approach to estimation, credible intervals, A/B testing, mixture models, and other methods, all through the example of baseball batting averages."
Hadley Wickham and Garrett Grolemund (2017) R for Data Science: Import, Tidy, Transform, Visualize, and Model Data (O'Reilly)
Articles, blog entries, and course materials
Steven Buechler (2014-2015) Analysis of career performance in top home run hitters
- This is lecture 16 from Computing with Data Seminar
Kris Eberwein (2015-09-30) "Hacking The New Lahman Package 4.0-1 with R-Studio" (via [r-bloggers.com])
Michael Lopez (2016) Lab materials for Skidmore College MA 276, “Sports and Statistics”
Bill Petti (2015-09-21) A Short(-ish) Introduction to Using R Packages for Baseball Research
Exploring Baseball Data with R blog
Jim Albert (2018-12-24) The Vanishing 300 Batting Average
Jim Albert (2015-01-05) A Graph of a Batting Average
Brian Mills (2014-09-30) Using ggmap and Lahman to Find the Hometown College Rosters