Vignette introduction


These vignettes serve a dual purpose:

  • to introduce users of the Lahman package to the breadth and depth of the data, and a range of analysis and statistical methods that can be undertaken using the data in the package,

  • to introduce users to the statistical software R, but particularly to the modern use in statistics and data science encapsulated in the tidyverse of R packages designed to facilitate data input and manipulation and graphics.


Vignettes completed to-date:

  1. Relationship Between Strikeouts and Home Runs -- This vignette looks at the relationship between rate of strikeouts and home runs from the year 1950+. This question was inspired by Marchi and Albert (2014), Analyzing Baseball Data in R.

  1. Run Scoring Trends -- Major League Baseball average per-game run scoring for each season since 1901.
  1. Team Payroll and the World Series -- This vignette examines whether there is a relationship between total team salaries (payroll) and World Series success.

Further reading

A number of books and on-line resources use the Lahman package as material for the examples. These include:


Michael Friendly and David Meyer (2016) Discrete Data Analysis with R: Visualization and Modeling Techniques for Categorical and Count Data (CRC Press). DDAR Web Site

Max Marchi and Jim Albert (2014) Analyzing Baseball Data with R (CRC Press)

  • the book makes frequent reference to the raw Lahman database files in CSV form; the Lahman package was relatively recent when the book was published, and authors make a brief mention of the package.

David Robinson (2017) Introduction to Empirical Bayes (published at [])

  • the book makes extensive use of the package to explain "the empirical Bayesian approach to estimation, credible intervals, A/B testing, mixture models, and other methods, all through the example of baseball batting averages."

  • the blog introduction to the book

Hadley Wickham and Garrett Grolemund (2017) R for Data Science: Import, Tidy, Transform, Visualize, and Model Data (O'Reilly)

Articles, blog entries, and course materials

Steven Buechler (2014-2015) Analysis of career performance in top home run hitters

Kris Eberwein (2015-09-30) "Hacking The New Lahman Package 4.0-1 with R-Studio" (via [])

Michael Lopez (2016) Lab materials for Skidmore College MA 276, “Sports and Statistics”

Bill Petti (2015-09-21) A Short(-ish) Introduction to Using R Packages for Baseball Research

Exploring Baseball Data with R blog

Jim Albert (2018-12-24) The Vanishing 300 Batting Average

Jim Albert (2015-01-05) A Graph of a Batting Average

Brian Mills (2014-09-30) Using ggmap and Lahman to Find the Hometown College Rosters