I've recently uncovered a source of data that will allow me to write a series of articles I've been thinking about for a long time. What I plan to do is look at sabermetrics through an economists eyes. Sabermetrics is a term everyone here is familiar with. In short, it seeks to better understand baseball performance through advanced statistical analysis. FIP replaces ERA. OPS take the place of batting average. Economics you may be less familiar with. Modern economics has developed a comprehensive set of tools for analyzing large sets of data. Economists who write for large audiences, such as Paul Krugman or Steven Landsburg, tend to speak in broader terms so that they're more interesting to more people. However, detailed statistical analysis underlies all (well, most) of their claims.
In the simplest terms, then, I plan to use economic tools to answer Sabermetric questions. This marriage has been tried before, in the now defunct Sabernormics blog. I plan to expand upon this and build the work in slightly new directions hence the new name "Econoball."
The first question I'd like to explore is something that's come up in the comments section of late: old school pitching stats like ERA and WHIP vs. sabermetric stats like strikeouts. The argument goes that ERA and WHIP give a better indicator than Sabermetric stats because, at the end of the day, the goal of the game is to keep runners off base and limit the runs the opponent scores. Sabermetricians respond that strikeouts, walks, and home runs are the only things a pitcher has direct control over and, moreover, give a better picture of what the pitcher has done independent of poor luck such as fielding.
The key to the debate is really whether or not there is direct relationship between strikeouts and ERA and WHIP. If there is an absolute distinction between pitchers and "brain-dead heavers," as Maddux put it, we should see no connection between strikeouts and ERA or WHIP. That is, some pitchers with high strikeouts will have a low ERA and WHIP and others will have a high ERA and WHIP, and the same holds true for low strikeout pitchers. However, if power pitchers are systematically more likely to have success, they should systematically give up fewer hits and runs and, as such, have lower ERAs and WHIPs.
The tool I'll use to investigate this is a linear regression. (The workhorse in economics.) A linear regression tries to fit data points into the following equation:
Y=α+βX+ γZ+ ε
Here, Y is our variable of interest (the dependent variable). We're trying to understand what impact one or more X variables (the independent variables) have on Y. (That impact is β.) Often, control variables will also be employed (Z in the equation above), these are factors that might impact the dependent variable that we want to control for when we look at the impact of X on Y. For example, if we were interested in the impact of father's height on child's height, we'd want to control for mother's height and the child's diet. ε is an error term which represents statistical noise or omitted variables. In the example of the child's height, for example, we know there are differences in height between siblings, even though they have the same parents and diets. The error term catches this. Mechanically, the error term is the difference between the prediction for Y and the actual value of Y. The goal of regression is to minimize errors.
To test the hypothesis that strikeouts, walks, and home runs impact ERA and WHIP, I started by pulling data on every pitcher who threw more than 10 innings last season. I collected data from the 2013 season on a pitcher's K/9, BB/9, HR/9, ERA, WHIP, starting status (started more than 90% of the games he appeared in), and his team. I then regressed both the ERA and the WHIP on K/9, BB/9, and HR/9, controlling for starter and team. The team variable should absorb the impact of two different things, team defense and home stadium effects. For simplicity, I dropped the 42 pitchers who pitched for more than 1 team. (Including them as a 31st team does not impact the results.) This leaves me 524 players.
The results for ERA are as follows:
- ERA decreases by .18 points per 1 point increase in K/9.
- ERA increases by .34 points per 1 point increase in BB/9.
- ERA increases by 1.56 points per 1 point increase in HR/9.
All three of these effects are highly significant. I can say with 99% certainty that they have an impact different than zero on ERA. Interpreting this, a one point increase in HR/9 having an impact of more than 1 on ERA (ER/9 ip) makes sense as the home run automatically leads to one run, usually earned. Walks emerge as killers. Going from 4.86 BB/9 ip (the average of the bottom quartile) to 1.95 BB/9 ip (the average of the top quartile) would be associated with .99 point decrease in ERA. The same exercise with K/9 IP (5.46 to 10.18) results in a .85 point decrease in ERA. Simply: high strikeout pitchers tend to give up fewer runs, all else equal.
The results for WHIP are
- WHIP decreases by .046 points per 1 point increase in K/9.
- WHIP increases by .12 points per 1 point increase in BB/9.
- WHIP increases by .14 points per 1 point increase in HR/9.
Again, all three are different than zero with 99% certainty. Just looking at calculations, a 1 point increase in BB/9 should increase WHIP by .11. Interestingly, the .12 I find in the regression is not statistically different from .11. A HR is just one hit, so it should also increase WHIP by .11, a one point increase in HR increases WHIP by more than .11. It's possible that this reflects the fact that pitchers who give up more home runs are more likely to make mistakes that are hit for singles than pitchers who give up fewer home runs. Alternatively, these pitchers may have more control issues and, hence, walk more batters. Most likely, it's a combination of the two.
However, strikeouts is interesting here. What explains the connection between strikeouts and WHIP? Pitchers who pitch to contact open themselves up to bad luck. A "ground ball with eyes," a "texas league single," these can't happen to a hitter who strikes out. The number above suggests that about 4.6% of the time, when a ball is put in play, it finds a way through the defense. So, a strikeout removes that chance. (The 4.6% is very rough -- a much better experiment would be required to find the exact percentage.)
So, we see here that the Sabermetricians do have solid ground in using strikeouts, walks, and home runs in their analysis. The next question, which I hope to address next week, is whether power pitchers are more likely to repeat strong performances year over year.
Filed under: Uncategorized