It seems that I have upset a few people with one of my interview answers at Chop-n-Change, involving a paper by Jahn Hakes and Skip Sauer. Here’s a brief response that covers the criticism the paper has received (cross-posted in the comments).
1) The goal of Hakes and Sauer was to test the Moneyball hypothesis that OBP was undervalued relative to SLG; hence, the title of the paper.
2) This test must include OBP and SLG in the model. The concept can be broken down and testing further, which they did, but what is interesting is if this central tenet of Moneyball is true. The exercise is not about designing the perfect model for predicting salaries. I vividly recall discussing this fact with the authors at the time the paper was written when I asked them about alternate specifications of the model. They responded that they had done this and this analysis would be a part of another paper, which it was, but were focused on Moneyball itself for this exercise. This then creates the problem of adjusting for playing time. This could be controlled for in ways other than plate appearances (e.g., interaction terms), but the authors ultimately decided the parsimony of their specification made it the right choice. Adding in the impact of all sectors of the labor market is another tough issue. Ideally, you would like to separate the labor classifications, but they are trying to estimate the market price for the entire labor market—reserved and arbitration-eligible players are a part of that market. So, they include dummies to act as a control. Again, interaction terms or some other correction could have been used, but they felt that their final specification was best. And they were able to convince many other economists (colleagues, editors, and referees) at different levels of review that what they produced was the best choice.
3) The goal of the study was to identify if the market was out of whack at the time the book was written. The findings indicate the pre-Moneyball models don’t predict as well as the post-Moneyball sample based on what we would expect them to be. That is a point in favor of the paper, not an objection. Furthermore, in 2001 the labor market was especially out of whack, and I find it odd that it was the specification chosen for close examination. The regression equation was designed to pick up information from real-world data, the values are not something presupposed by the authors. The coefficient on OBP is negative—higher OBP lowers your salary. You don’t need to plug in any values to see that this is counter-intuitive. Part of the reason why the salaries remain so stable when Tangotiger adjusts the inputs is that the higher value for OBP cuts into the impact of SLG. As Hakes and Sauer acknowledge in the text, the coefficients on OBP are not even statistically significant—the market appeared to be ignoring the relevance of OBP at the time. That’s their argument.
4) So, the Hakes and Sauer papers may be imperfect, joining the ranks of every other empirical study ever written. If you think you can do better, here is a solution. Take the freely available data and run alternate specifications. As it stands, the critique is that the perfect is the enemy of the good. If further testing reveals the labor market was not out of whack, then we have an argument.