Building a Retrosheet Database – Part 1

That magical time of year has come again. Yes, the very fine folks over at Retrosheet have updated their game files once again, and the new batch includes (among other updates) the play-by-play files for every baseball game in 2014. There are a lot of things you can do with the Retrosheet files, but one of the most powerful options is to create a Retrosheet MySQL database.

Certainly, there are a lot of things you can find out at FanGraphs and Baseball Reference without the need for your own database. But every now and then, a question pops up that Googling, FanGraphs leaderboards, or even the Almighty Play Index at Baseball Reference cannot answer. It’s a luxury to have your own database, to be sure, but any baseball nerd worth their salt should at least have one to tinker with.

This tutorial is going to be starting from scratch. This means that this first installment of the series will simply be about acquiring the right tools to build out the database. We’ll get to actually inputting the baseball data in the next chapter. This may seem like the boring part, because it is. But it’s important to have the proper foundation before we get to the fun stuff.

DISCLAIMER: For right now, I’m going to be working strictly in a Windows environment. The tools that we need — which will be revealed later — were meant to work with the Windows command line. There are options for you Mac/Linux users out there, and I will discuss them down the road. But for right now, these initial steps will be for the Windows folks.

MORE DISCLAIMER: I’m trying my hardest to keep this tutorial accessible for everyone. You’ll need a basic understanding of Windows, but you shouldn’t have to be a computer wiz to follow these steps. Conversely, those that are very familiar with Windows will see this as pretty basic stuff. I’m not trying to be insulting. I’m just trying to make sure everyone can follow along.

Step 1: Installing MySQL

The term “database” might sound like you need some sort of special hardware or server rack in your house. While this is true in the corporate world, the truth is you can have your own database right on your regular PC. No need for special hardware or hosting plans or anything of the sort. This is not to say that having a hosted database doesn’t have it’s advantages, but for the time being, we’ll be dealing with what are called local databases — those that live right on your regular machine.

However, a complete Retrosheet database will require a good deal of hard drive space. From my experience, a Retrosheet database that spans all the available data requires a little over 8 GB of hard drive space. However, as Retrosheet releases more and more updates, that need will expand. To be safe, I wouldn’t work with any less than 15 GB of available space. You’ll be surprised how fast that fills up.

Personally, I have a separate machine that hosts my personal database. It’s a headless (just  a tower stowed away in my attic, no monitor or keyboard attached) unit that performs a lot of functions like storing my ripped DVDs and music. But my Retrosheet info also lives there, and I connect to it to do my querying. If you are totally baffled by this notion, don’t worry. I’m still teaching you how to install it locally. But for those with a little knowledge of how to handle some basic home networking, know that this is an option.

OK. Let’s get to the action. The type of database that we’ll be using is called MySQL. It’s a very common database language. There are lots of resources in both book and online form to learn more about it, and pretty much any problem is easily Googlable. It’s one of the industry standards, so that’s what we’ll deal with.

To keep things nice and free, and since we’ll be only using this for personal reasons, we’ll deal with the Community Edition of MySQL. You can find the download link here. You’ll want to choose the second (larger) download link toward the middle of the page. The next page will give you some buttons to login or signup, but you can bypass that with the “No thanks…” link a little further down the page. Once the .msi file has downloaded, go ahead and run it.

pt1screengrab1If you want to go the quick and dirty way, go ahead and choose the Developer Default option. It will install everything you need, and probably some stuff you don’t. You might want that stuff later on, but for the sake of this article, I’m going to choose the Custom option and pick what I want installed. You should install the Server and Workbench at minimum, though it’s a good idea to include the notifier and the documentation as well. You’ll have to click through some category trees to get what you want. Use the green arrow to add your options to the list of things to be installed.

pt1screengrab3The installer will check to make sure you have all the required utilities to do what you want. In my case, I didn’t have the C++ 2013 distribution on my machine — you may have more things that need installing. Just click Execute and the necessary files will be downloaded and the respective installers will run. Once everything is installed, click Next and your MySQL install will begin.

pt1screengrab5For configuration purposes, go ahead and keep the defaults, unless you have specific reasons not to (and you know what you’re doing).

pt1screengrab6You can add users to the database if you want, but for simplicity, we’re just going to define a password for the “root” user. REMEMBER THIS PASSWORD. It’s your key to doing anything with the database — adding data, querying, etc. If you lose it, you’re pretty much horked. Use a familiar password or write it down. If you’re just having this on your own machine, there’s really no need for a super-complicated password. It’s holding freely-available data, after all.

pt1screengrab7For the Windows Service configuration, you can keep the defaults as well. You can choose to not have the MySQL service to run when you start the machine, but if you do you’ll have to manually start the service each time you want to use the database. If you don’t know what that means, have it run on default.

pt1screengrab8Go ahead and click Execute, and the options will be configured. After that, a few clicks of Next should do it. You’ve just installed MySQL on your machine. Congratulations!

Step 2: Install Wget

Wget is a great utility for downloading mass amounts of data from a server without having to click a thousand links. It’s a script-based tool, and it’s what you’ll need to download all the Retrosheet data with in one big swoop. Don’t worry, we’ll provide the scripts you’ll need.

Download the installer here. Run the installer, keeping all the defaults.

Step 3: Install 7-Zip

7-Zip is a .zip file utility. It’s one of the best free tools for handling .zip files, and it’s what we’ll be using. You can download the installers here. Choose one of the first two options, taking care to choose the right version in regards to 32- or 64-bit. If you don’t know what version you have, go to the Control Panel of your machine, click System and Security, then System. You’ll get a similar screen, which should show you your version. Remember which one you have, as this will be important later on.

pt1screengrab9
Yes, I name all of my devices after baseball players. Don’t judge.

After you’ve downloaded the right version, go ahead and install, using all the defaults.

You’ve done it! You’ve done all the pre-steps for installing your Retrosheet database. If you have any questions, sound off in the comments. Stay tuned, as next week, we’ll get our hands dirty with installing the actual Retrosheet data. I’ve made it as painless as possible, I promise. Don’t get worried. If you can handle what we did today, you can handle the rest. Good luck, and don’t feel bad about asking for help.





David G. Temple is the Managing Editor of TechGraphs and a contributor to FanGraphs, NotGraphs and The Hardball Times. He hosts the award-eligible podcast Stealing Home. Dayn Perry once called him a "Bible Made of Lasers." Follow him on Twitter @davidgtemple.

6 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Jeff in T.O.
9 years ago

I am going to nerd out on this so hard over the Holidays…

Daniel
9 years ago

Can’t wait for you to do a tutorial like this for Mac!

Mike
9 years ago

Fantastic! I’ll probably use a Mac too, but this is great. Looking forward to the next installment.

Leon
9 years ago

I followed these directions, but I only see data up to 2009 when I query it.

Leon
9 years ago
Reply to  Leon

It looks like it’s a problem with the loader file. It only has commands up to 2009.