This database was created by Sean Lahman, who pioneered the effort to make baseball statistics freely available to the general public. What started as a one man effort in 1994 has grown tremendously, and now a team of researchers have collected their efforts to make this the largest and most accurate source for baseball statistics available anywhere.
This database, in the form of an R package offers a variety of interesting challenges and opportunities for data processing and visualization in R.
fieldingLabelsTeamsHalf split season data for teams
TeamsFranchises franchise informationPitchingPost post-season pitching statistics
FieldingPost post-season fielding data
SeriesPost post-season series informationAwardsPlayers awards won by players
AwardsShareManagers award voting for manager awards
AwardsSharePlayers award voting for player awardsLahman, S. (2012) Lahman's Baseball Database, 1871-2012, v. 2012, Comma-delimited version,
Lahman, S. (2012) Lahman's Baseball Database, 1871-2012, MS Access version,
The main form of this database is a relational database in Microsoft Access format.
The design follows these general principles. Each player is assigned a
unique code (playerID). All of the information in different tables relating to that player
is tagged with his playerID. The playerIDs are linked to names and
birthdates in the Master table. Similar links exist among other tables
via analogous *ID variables.
The database is comprised of the following main tables:
[object Object],[object Object],[object Object],[object Object]