# Vignette introduction

## Introduction

These vignettes serve a dual purpose:

to introduce users of the

`Lahman`

package to the breadth and depth of the data, and a range of analysis and statistical methods that can be undertaken using the data in the package,to introduce users to the statistical software

**R**, but particularly to the modern use in statistics and data science encapsulated in the`tidyverse`

of R packages designed to facilitate data input and manipulation and graphics.

## Contents

Vignettes completed to-date:

Relationship Between Strikeouts and Home Runs -- This vignette looks at the relationship between rate of strikeouts and home runs from the year 1950+. This question was inspired by Marchi and Albert (2014),

*Analyzing Baseball Data in R*.- R packages demonstrated:
`car`

(Companion to Applied Regression)

- R packages demonstrated:

- Run Scoring Trends -- Major League Baseball average per-game run scoring for each season since 1901.

- Team Payroll and the World Series -- This vignette examines whether there is a relationship between total team salaries (payroll) and World Series success.

## Further reading

A number of books and on-line resources use the `Lahman`

package as material for the examples. These include:

### Books

Michael Friendly and David Meyer (2016) *Discrete Data Analysis with R: Visualization and Modeling Techniques for Categorical and Count Data* (CRC Press). DDAR Web Site

Max Marchi and Jim Albert (2014) *Analyzing Baseball Data with R* (CRC Press)

- the book makes frequent reference to the raw Lahman database files in CSV form; the
`Lahman`

package was relatively recent when the book was published, and authors make a brief mention of the package.

David Robinson (2017) *Introduction to Empirical Bayes* (published at [gumroad.com])

the book makes extensive use of the package to explain "the empirical Bayesian approach to estimation, credible intervals, A/B testing, mixture models, and other methods, all through the example of baseball batting averages."

Hadley Wickham and Garrett Grolemund (2017) *R for Data Science: Import, Tidy, Transform, Visualize, and Model Data* (O'Reilly)

### Articles, blog entries, and course materials

Steven Buechler (2014-2015) Analysis of career performance in top home run hitters

- This is lecture 16 from Computing with Data Seminar

Kris Eberwein (2015-09-30) "Hacking The New Lahman Package 4.0-1 with R-Studio" (via [r-bloggers.com])

Michael Lopez (2016) Lab materials for Skidmore College MA 276, “Sports and Statistics”

Bill Petti (2015-09-21) A Short(-ish) Introduction to Using R Packages for Baseball Research

*Exploring Baseball Data with R* blog

Jim Albert (2018-12-24) The Vanishing 300 Batting Average

Jim Albert (2015-01-05) A Graph of a Batting Average

Brian Mills (2014-09-30) Using ggmap and Lahman to Find the Hometown College Rosters

-30-