Stat Czar Vince Gennaro Harnesses Baseball’s Mountain Of Data
The amount of data being captured, stored, and analyzed during a Major League Baseball (MLB) game has, like players' salaries and the net worth of big league franchises, exploded. From the late 1860s to 2003, the total amount of statistical data on the game amounted to about 2 gigabytes. With the installation of sensor and tracking technologies in big league ballparks beginning in 2003, the amount of "game-level data" is expected to reach about 1 terabyte per game in the next year or so.
Making sense of all those pitches (an estimated 720,000 per season), hits, stolen bases, and double plays has grown from stat sheets on clip boards to number-crunching algorithms running on clusters. The result is what has come to be known as "baseball analytics," and it is being used by team owners and general managers to make multimillion-dollar decisions about whether or not to sign a high-priced free agency or re-sign a rising star to contracts that reflect skyrocketing market values.
Working to make sense of all this data, the underpinnings of the business of baseball, is Vince Gennaro, a successful executive, entrepreneur, and author of Diamond Dollars: The Economics of Winning in Baseball. Gennaro has carved out a niche in the world of big data for application in sports management. It all starts with a thorough understanding of baseball analytics.
Gennaro, who currently serves as president of the respected Society of American Baseball Research, also directs the sports management program at Columbia University. Gennaro will deliver the keynote address at the EnterpriseHPC 2014 conference, which is being held September 7 through 9 at the Park Hyatt Aviara Resort in Carlsbad, California.
EnterpriseTech sat down with Gennaro to discuss how big data, analytics, high-performance number crunching, and ubiquitous cloud computing and storage are transforming the business of baseball.
ET: Baseball has always been about statistics. When did Major League Baseball start to embrace what you call "baseball analytics"?
VG: The inflection point was in 2007 when MLB Advanced Media, MLB.com, and Sportvision collaborated to put high-speed cameras into all 30 ballparks to capture 20 different metrics for each of the 720,000 pitches in the course of a season. We also have five or six data points for every batted ball, including the vertical launch angle of the ball, the speed of the ball coming off the bat. We know the pitcher's release point, the angle of [pitched ball's] break, we know the velocity at release. We know the velocity where the ball crossed home plate.
ET: So we have found ways to deploy tracking technologies to capture all this data?
VG: Yeah, that's right. The technology has in fact done two things. One is in terms of data capture, and on the back end of that, the data analytics. Those are the technologies that have really affected the game. Now we've added motion capture [without the need for markers used in sports video games] that can measure…at the point in the game when the pitcher is releasing the ball, the technology can measure the load on his elbow. So we get the biomechanics of the pitcher in live game action.
We did this for a number of years and sent the data to a laboratory…. Now we are getting to the point where we have the ability to capture everything from the torque on the knee to when the pitcher plants his foot, the load on his elbow and shoulder.
ET: And do this in real time, right?
VG: Yes, in real time. Where supercomputers come in is that they are able to process this information on a much more timely basis. Running stuff overnight to the laboratory, getting it the next morning is one thing. But supercomputers come into play to provide an opportunity to try to have something much more timely.
For example, in a game situation, a manager may go out and ask the pitcher how he feels; that's going to start to come down to psychological questions rather than physical. It's like an EKG for the shoulder. That's the kind of data that exists.
ET: What's on the horizon for baseball analytics?
MLB Advanced Media is rolling out in the next year, two years, expansion of video and radar motion capture where they are going to track every movement on the field. So, you can now measure a center fielder's reaction time, his first step, synchronized on the crack of the bat. You can literally score the route he takes to the ball.
You can now coach and develop players more effectively using this data and when you are paying players $150 million or whatever, these are critical decisions. This has implications for how you set up lineups or how you acquire players.
ET: Much of what we are talking about goes back to the pre-big data days when teams like the Oakland A's used statistical analysis to find players who put the ball in play and didn't make outs. Do you agree with the Money Ball approach to using statistics to gauge a player's worth, to win baseball games?
VG: Yes, in the sense that I buy into using all available information that's relevant. With the data explosion along with processing capabilities and inexpensive storage, when you couple those things, why would you leave the information on the table?
I don't think the game is going to change to where it will ever allow intelligent decision making to rule the day. That would defeat the whole essence of what the sport is trying to accomplish.
It's one thing to be smarter, but it's another thing to have the financial resources to compete. It's all about big markets. The Money Ball approach is trying to narrow the gap, the financial disadvantage, by using all this information.