The difficulties with baseball data
While baseball is certainly not an insanely complicated system like meteorology or the financial markets, it's not as simple a it may first appear. I've been parsing play by play data provided by the good folks over at retrosheet and have run into quite a few weird plays which have only occurred once or twice.
One example is the event noted by the following bit of retrosheet code.
S6/L6D.2-H;BXH(TH)(E2/TH)(8E2)(NR)(UR)
Basically, it says that the batter hit a single to short, the runner on second scored, the BATTER is safe at HOME because he advanced to second on the throw, then advanced to third on a throwing error by the catcher, and then scored when another error was recorded on the catcher on a throw from centerfield. Basically, this is what my code had to be able to parse. And it's stronger for it. It handles weird runner's interference outs, fielders interference, players batting out of turn, pitchers throwing with their alternate hands and a myriad of double and triple play combinations.
To highlight this, the good folks at the Hardball Times have compiled a list of rule quirks in our national past time.
Fun stats
How far along are we with benchcoach? Well, to give you an idea, we've got around 300 megs of compressed data. Game data going back to 1871. Play by play data for every game back to 1953. 63 tables. If you were to print out our schema.rb it would take 36 slices of dead trees. We have over 140 model classes -- and they have over 8K lines of code by themselves.
And that's not to mention any of the business functionality.
We're getting there.
Announcement List
If you're subscribed to the blog this is probably redundant, but we're created a mailing list:
We’ve hired!
It's been a busy week or so here at the new Benchcoach HQ over looking the wooden water towers of Are-We-In-Chelsea-or-Grammercy. (I guess we are in "Silicon Alley" which is a name so lame I hesitate to type it in.) We've got offices, machines, deployments-to-staging, and most importantly, two additional developers!
Say hi to Dave and Matt everyone! Dave is ssh'ing in from Buffalo, and Matt hails from Madison, and I for one am excited that they are on board. This also firmly establishes me as the least knowledgeable person about fantasy baseball here. Sweet. A title which I'll happily take because it means that we've made some good hiring choices.
New Office
So we finally got some office space. It was was a surprisingly painless effort, although I think Will would disagree.
In any case, since this desk was privately funded and not at the taxapayer's expense, the naming rights for my desk are now up for bids. I was thinking of the following names:
- Citi-desk
- Cellular One Desk
- The Desk that Ikea Built
- The workspace in Chelsea
We’re hiring!
Sang & I have been cranking away for a few months now, and it's time to start hiring! Lots of stuff going on. Most of our work has been on the database, analytics and league simulations, and we're looking for people to help out on the front end.
(This is also posted on craigslist.)
We're looking for Rails front-end developer to take over ownership of the rails front end to a new analytical engine. The goal is to raise the bar of sport statistics and analytics, and therefor familiarity with Sabermetics and a strong statistical background are a plus. Ideally you are in a fantasy league right now. Looking for a motivated individual who can work independently to help us launch the site by the beginning of March. Must have strong HTML/CSS and Javascript abilities. (And greasemonkey experience would be a huge plus.)
Please send the following things:
- Resume in text or PDF
- an explanation or test plan on how to check whether a trade between two players is valid. There are no wrong answer only poorly thought out ones. We’re looking for people who can code defensively and understand the domain. Pseudocode or english is fine.
- an explanation of what the mean, median and confidence intervals are.
- code which hides and shows a row in a table without reloading the page.
Requirements
- Able to work in USA.
- Rails. HTML/CSS/Javascript.
- Baseball (fantasy or otherwise)
- Willingness to write integration tests.
Wouldn’t it be nice
- Capistrano
- Amazon EC2
- Hadoop
- Greasemonkey
- Strong opinions about Lisp or Smalltalk












