Building a Retrosheet Database for the 2016 Season, Part 1

Baseball season is almost upon us. Soon, people will flood to ballparks in cities all over our great nation in search of entertainment and meaning, while baseball bloggers will continue their search for relevance and the mysterious Full Time Gig. If you fall into the latter camp (or if you just like having this kind of data handy), then it’s time to get your Retrosheet database installed/updated.

For those not in the know, Retrosheet is a magnificent project that essentially looks to turn box scores into computer records. And they’ve done a great job of it. They have all box scores from games since 1914, and play-by-play data since around 1940. What we’ll want to do is convert their records into an easily-searchable database that we can query for fun and profit.

Below is a video walking you through how to get your machine set up. We won’t actually be loading the data yet — that will come in Part 2 — but we’ll make sure your computer is prepped and has all the files and utilities is needs.

If you already installed a Retrosheet database using our instructions from last year, most of this won’t apply to you, but feel free to follow along. You’ll certainly need the links to the new packages that are now up on our GitHub page, but most of what you’ll need is in Part 2.

(Mac people: as I mentioned in the video, your instructions are coming)

Links mentioned in the video:

TechGraphs GitHub: https://github.com/techgraphs/2016Ret…

MySQL Server: https://dev.mysql.com/downloads/mysql/

Wget: http://gnuwin32.sourceforge.net/packa…

7-Zip: http://www.7-zip.org/

SQLyog: https://github.com/webyog/sqlyog-comm…





David G. Temple is the Managing Editor of TechGraphs and a contributor to FanGraphs, NotGraphs and The Hardball Times. He hosts the award-eligible podcast Stealing Home. Dayn Perry once called him a "Bible Made of Lasers." Follow him on Twitter @davidgtemple.

4 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Shane Tourtellotte
8 years ago

Retrosheet’s box scores actually go back to 1913 now. They are that much more amazing, David, and only getting amazinger.

Gotta run now. Have an instructional video to watch.

Matt
8 years ago

I followed your series last year and set this up. After that I found it a bit difficult to understand the schema, and how to extract data. Any chance there’d be a followup video this year giving some quick examples on querying and merging tables to find useful information?

Cory
8 years ago

@Matt – I think that’s a great idea, perhaps David & Co. could setup some sort of forum where users could post code / ideas / findings so that others could replicate and/or add color to it as well?