Sabermetric Hubris

I’m pissed, and I cannot contain my anger any longer. Because remaining silent about this ingrained attitude of what constitutes a good argument within the sabermetric community sets a broad precedent for how we as an online community are going to determine what is truth in the game of baseball, this issue must be addressed before we go any further. While I noticed this bizarre clubish backlash against reasonable yet unpopular arguments in the response to Steven Levitt’s critique of sabermetric dogma, I really didn’t realize the full extent of the problem until I became the target. I am being asked to defend an undefendable position, which is not undefendable because my argument is wrong. I’m being asked to produce deductive truth where the the only logical tools we have are induction or abduction. And this standard is wholly unacceptable and must not be tolerated if the quest for knowledge about baseball is to continue.

I must say that the most recent Baseball Primer thread discussing my THT article on DIPS has been the low-light of my association with the online sabermetric community. The general approach of the posters involving sheer ignorance of statistics or the role of the scientific method in determining truth is embarrassing. I cannot prove that there are no flaws with my study anymore that I can prove that God did not create the world in 6 days a little over 5,000 year ago. (And just so I don’t offend anyone, I am a Christian and serious about my faith. I don’t believe the several conflicting accounts of God’s creation of the world should be read literally.) All I can do is offer up the best argument from the data that exists as I wrote in the Primer thread.

Here are some limitations any study of DIPS must face. Take your best shot at working around them. I will gladly applaud your efforts.

Predictions involve a BEFORE and an AFTER. You have to gage the efficiency of predictions from some unit of measurement in the past and in the future. It could be a season, half-season, a moving 3-year average, batter-to-batter, etc. We also have to have a cut-off to allow entry into the study to minimize noise. A pitcher who faces one batter is certainly too small. Facing 1000 innings is too high. There is a tradeoff that must be made when choosing the cutoff. If you don’t like my choice, which is the lowest cutoff of any study on DIPS that I am aware of, pick a new one and go to it. The data is available at www.baseball1.com. Have a field day.

It’s time for my critics to put up or shut up. I have made my case, put some alternative empirical evidence on the table if you wish to claim that there is something biased in my findings and the findings that many others have made. I am trained as an empirical researcher, and I am of an open mind. I only wish to find truth. In fact, one of the reasons I set out to do this study was to verify or reject some previous work that I thought was not statistically rigorous. For example, I have no knowledge that any previous person studying this issue had identified serial correlation in the data. Well, it exists and must be addressed. This is something I did in the study, and it turns out that the past conclusions of those studying DIPS still stand up to this. It would have been wrong for me to simply state “there could be autocorrelation” and end my critique there. Had the results of my reevaluation of DIPS gone differently, I would have been the first person to say, DIPS has to go.

If you find fault with what I have done, demonstrate the errors of my study with empirical data. I will be the first person to link to and praise any study that does so. As Bill James once said about new offensive metrics, “What we really need… is for the amateurs to clear the floor.” I’m throwing down the gauntlet. Get serious or get a life. No more nit picking, no more pretending to understand statistics by merely tossing around relevant statistical terms. I want real hard-core statistical evidence if you would care for me to change my mind. And maybe changing my mind isn’t anything you’re interested in. If not, then don’t criticize my work.

And if you are offended and think I’m being rude? Good! Now I’ve got your attention. I asked politely several times only to be ignored.

22 Responses “Sabermetric Hubris”

  1. Chuck Oliveros says:

    JC, I read parts of the thread on Baseball Primer and scanned the rest of it. I assume that you are referring to those posters who maintained that any pitcher who threw 100 innings was, by the very fact that he was allowed to throw that many innings, going to have a lower H/BIP. After reading it, I found myself thinking about that. A study occurred to me that I thought I might suggest. Why not choose a past season and track AA pitchers who threw 100 innings or more in a season? It might be illuminating to see how much variation there is in H/BIP between pitchers at that level as well as how the pitchers who eventually pitched in the majors compared to those who didn’t. There could be some sources of distortion, such as pitchers who failed to reach the majors because of injury, but they could be removed from consideration. I would attempt such a study myself but I frankly don’t have the inclination, but I thought I would bring it up should my suggestion be of some value to you.

  2. Jeff says:

    I think you need to stop reading online forums. I wouldn’t worry about the unwashed masses carrying pitchforks and torches and calling for your head. Unless they’re banging on your door right now.

    When/if someone writes a reasonable, logical, statistically-accurate response, you can worry about the criticisms. But if they’re just throwing mud at you, resist the urge to throw mud back, as frustrating as it is.

  3. J. Cross says:

    I think Levitt got a fair amount of mud, some reasonable critiques and perhaps some nitpicking. I don’t think that should have caused him to refer to everyone as Beane worshippers. His study was (intentionally, I’m sure) quite simplistic and didn’t require statistical expertise to follow.

    JCB, your study is obviously more complicated but not beyond Emeigh and Ruane (Tangotiger) who have both studied DiPS and done good sabermetric work of their own (especially in Tangotiger’s case). They are your peers as far as sabermetrics are concerned and the ones who should be critiquing/reviewing your work. I think Emeigh HAS done work that demonstrates (or at least that he thinks demonstrates) that your sample is biased so I don’t think the “put up or shut up” really applies to him.

    You’ve done excellent work on this site and people have recognized it. I don’t think you’re an outsider being criticized by the club members.

  4. Frank says:

    It’s easy to be frustrated with garbage like “…but all these things are less important than actually observing the pitcher.” Observation by whom? Suppose two people both observe a pitcher but come away with different observations–what then?

    Re the complaint about 100 IP as a threshold for the sample:

    1. You (or even better, your critics) could redo the study using a lower cutoff.
    2. I think there are some other econometric techiques–2 stage models in which the first stage is used to estimate who will be still be around in the second year–that your critics could put themselves to work using since they think your study is so deficient.
    3. To your critics–why concern yourselves only with pitchers who are in the bigs one year and are not there the next? What about the bias in even reaching the big leagues at all–lots of high school players get weeded out before playing pro. Then there are the thousands of little leaguers who get weeded out before high school …

    Bottom line–don’t let the bums get you down. Consider the constructive comments and ignore the rest.

  5. JC says:

    Chuck,

    I think that would be an excellent study. I, too, don’t have much time to do it and minor league data is a bit hard to come by. But, my beef is that we should start with the assumption that things we can observe do provide information about things we can’t. If you see a shark fin, get out of the ocean. It might be some kid playing a prank, but that’s a pretty dangerous reason to stay in the water.

    Jeff,

    You are right. I would normally just ignore such arguments. But, the problem is that this is where the debates take place. And when logic takes a back seat, people start arguing off false premises until the whole argument is lost and in the wrong direction. I guarantee you that I’m going to see 10 references to that thread over the year as “proof” that DIPS is flawed. Guarantee it. However, as a remedy I am considering starting up a refereed sabermetric journal.

    J.,

    Thanks for the lecture. I’ll go to my shame corner now.

    Frank,

    You are right, of course. In fact, I have done some studies with lower cutoffs (all the way down to excluding no one. But, that think the best method would be to use a heckit-type model to get at the problem. On the lower cut-off, the results still hold, as I argued they would. I’ve done the studies, but it’s not my job to report them. In fact, maybe that’s one reason no one is reporting other studies. I like the 100 cutoff because it cuts out much of the noise. And since I know there is no bias until someone proves me wrong, I see no reason to grace this argument with a response. The problem isn’t that a study couldn’t be done. In fact you are right to suggest a remedy rather than saying “all right, nothing to see here, move along!” I think that’s why I’m so mad. Many are willing to criticize, but no one has yet been willing to pick up the data to see if the criticism matters.

  6. Aaron says:

    I love your study. The inning cutoff seems entirely reasonable; I was just wondering why you cut it off before 1980. Did something happen in baseball then, was there some data not collected or corrected inaccurately before then, is it a computing power issue, or is it just a noise issue? Did the decrease in major league knuckleballers make the correlations stronger for recent years?

    Did you mention this somewhere else, and I just missed it?

  7. Steven says:

    Chuck,

    Clay Davenport at Baseball Prospectus has done a study to examine if the DIPS theory applies in the minor league levels.

    Minor League Batting Averages on Balls in Play

  8. Chuck Oliveros says:

    JC, I agree with your analogy of the shark fin. I was just looking for a way to get a handle on additional evidence.

  9. JC says:

    Aaron, 1980 is somewhat arbitrary. I wanted to have a lot of observations but still stay quite recent, and 1980 just seemed to be a good place to start. I did fool around with the starting point some (going more recent), and it didn’t have much impact.

    Steven, I’m aware of the BPro stuff, but when I requested a copy of one recent study my request was ignored. I would like to comment on it, but I’m not a subscriber nor do I plan to be in the near future.

    Chuck, I understand what you’re saying and didn’t mean to imply otherwise. Sorry if it came across that way. I thought your suggestion was very good., so I used it as a spring board for further ranting.

  10. Frank says:

    Here’s a thought that no one seems to have come up with. You’re looking at the season-to-season correlation of various statistics in an effort to determine which ones are good predictors of year-to-year performance. Some traditional ones (ERA) seem to perform worse than some DIPS. Your critics claim that this might be an artifact of the 100 IP cutoff (a reasonable threshold in my opinion). They argue that pitchers with lousy performance in one year get culled from the sample without getting a second year. Fair enough, but nothing in that argument necessarily changes your correlation coefficients or the slope of your regression lines. It’s entirely possible that lousy pitchers in one year would be lousy in the next and that the correlations you find would still be present if bad pitchers were allowed to pitch 100 innings in consecutive years. Your cutoff then has the effect of removing some observations that would appear in the upper righthand portion of plots such as BB rate or HR rate but doesn’t necessarily alter your results.

  11. JC says:

    Frank,

    Right, once again! Not only is is possible, it’s more likely than a biased outcome. To assume that a part of the excluded sample is likely different from the sample we observe by default is a pretty poor place to start.

  12. A.West says:

    Interesting study. You’re a better statistician than I am, because I’m just a dabbler. I applaud you for looking for autocorrelation and correlated factors, stuff that can illegetimately boost/distort results. I do some analysis on companies within industries with a data structure similar to baseball players – a longitudinal analysis. I do wonder sometimes about how dropouts/acquisitions/bankruptcies (similar to injuries/demotions to AAA ball) affect the data. It’s amazing how some less sophisticated people will resist adjustments for autocorrelation in analysis – as if ignoring a fact of reality could help them understand reality.

    My main tool for analyzing panel data is the NLME package to do mixed effects modeling in the open source R statistical language. Are you doing a similar type of modeling for pitchers, do you think that would be valid? The software you’re using looks pretty powerful – what is it?

  13. JC says:

    Thanks Adam,

    I don’t know much about R, but I’m sure you can do what I did with it. And your methods sound just fine. I use Stata, which is great for panel data.

    To future commentors,

    I’m about to hit the road for a few days. I’ll check back in with material from time to time, but I might not get right back to you.

  14. anon says:

    J., I think Tom Ruane and Tangotiger are two different people.

  15. Aaron says:

    I haven’t had the time, but I’m pretty sure you could recreate his results with tools less powerful than R, like gretl (http://gretl.sourceforge.net/). Gretl is also open source, and it’s a good bit easier to use.

  16. J. Cross says:

    Anon, you’re absolutely right, of course. JCB pointed that out to me as well. Don’t know where I got that from.

    Also, I can’t find any work Emeigh has done that demonstrates (or claims) that there’s bias based on an IP cut-off (he did some work to show GB/RB differences). The closest thing I can find is this study done by tangotiger which he doesn’t claim that it demonstrates bias. It could easily just show that pitchers who start out with bad BIPr’s get less of a chance. Although if you buy that claim you’d also have to accept that managers aren’t making perfect decisions and that DIPS does have some “utility” and some people are reluctant to acknowledge that. Tippett’s study also shows that pitchers with long careers outperform their DIPS (slightly) but I don’t see how that’s directly relevant.

    I should point out that I don’t disagree with JCB’s conclusions (without any evidence that the sample IS biased I don’t see any reason why we would expect it to be), I just thought he was overreacting to the criticisms. Besides, I owed him a lecture after the Levitt thread where my manners were drawn into question.

    As far as what “sample bias” even means since I don’t want to throw around terms… sure, if you widen the pool of pitchers you’ll see a somewhat larger spread of “true” BIP rate just as you’ll see a larger spread in K rate, BB rate and HR rate b/c there is SOME skill there. The question, as I understand it, is whether the range in true BIPr is inordinately narrowed by selection of pitchers JCB used. Am I understanding that correctly?

    Also, Backlasher’s criticism, as you may well know, goes way back to disputes with Voros that are before my time. Oh, and I read Freakonomics and enjoyed it. I’m assuming that any criticisms I have of that book are simply desplays of hubris so I’ll keep them to myself.

  17. F. James Mohl says:

    Other studies have shown that homerun rate can be viewed as simply a proxy for flyball rate (or, to be more precise, Outfield Flyball Rate. Some pitchers do seem to have the ability to induce Infield Flies, which are nearly as good as strikeouts.) So, if you redo the model, try substituting Outfield Fly Rate (% of BIP) for HR Rate. That should eliminate some of the noise. Also try substituting GB Rate and LD Rate for Hit Rate.

  18. Voros McCracken says:

    JC,

    I wouldn’t get too wound up about it, imagine how I feel.

    G.W.O.: “But you can’t teach Voros to do actual statistics, he likes his own (wrong) methods too much.”

    BWC: “I understand Voros made grandiose claims that were not supported by his data; I understand that he (apparently) used to irritate people with arrogance and condescension.”

    G.W.O.: “Whatever his research was, it wasn’t good statistics.” (Apparently not knowing what it was I did, isn’t necessary to know it had nothing to do with statistics).

    Backlasher: “I certainly missed that, especially with all the bluster from Voros and the Disciples.”

    Backlasher: “If Voros was really this big a muse for you guys, then I’ve underestimated the little fellow.”


    The funny thing is…

    …nobody hated me when I was a paralegal, and I made more money too.

    At this point all I can do is not take too much of anything someone says to me too seriously, unless they are actually communicating directly with me face to face or over the phone. When people who have never exchanged word one in conversation with you can dislike you so much, you have to develop a bit thicker skin and move on.

    None of these people know me, have ever spoken to me or asked me to clarify any of my points. I tend to give them level of consideration they’ve shown me.

    Oh and yes, this really is me.

  19. Tom G says:

    First off, I haven’t read either the study you did JC, nor the comments at Primer (but I can guess what they were like). Having said that, I think when people make such comments, they forget that the author put a ton of work into something because he/she loves loves loves baseball. These folks need to keep that in mind before they spout off (which isn’t to say that criticism is in bad form, but personal criticism is).

    It’s the efforts of people that do these studies that have made my enjoyment of the game much much richer. Thanks JC (and Voros, and everyone else)

    Now, I’ll go read the article….

  20. tangotiger says:

    Interesting how JC took my quote about “nothing to see here, let’s move on”. My context for that quote was the neverending DIPS discussion, and its purpose was to put an exclamation mark to that discussion. JC seems to have interpreted my comment as directed towards his article, which certainly was not the case (though I can see how one would think it was).

    (For those who never read Primer, most threads end up not from where they start, and the lead-in article serves only as a starting point for the discussion, and not the center of the discussion.)

    It’s nice to see Voros posting something, finally! I thought backlasher was complimenting calling Voros a muse (something he rarely does towards Voros), but again, Voros’ comment above seems to contradict that. Voros is of course an excellent sabermetrician, who in addition to that, is also a muse.

    In any case, DIPS discussions are more productive, though less entertaining, when only facts are presented.

  21. josh says:

    You took this as a compliment?

    “If Voros was really this big a muse for you guys, then I’ve underestimated the little fellow.”

    I read this in the thread; it was clearly condescending.

  22. tangotiger says:

    Josh, yes, that’s how I took it. He said that if Voros was such a source of inspiration, then he underestimated him (the “little fellow” was bl’s need for a jab I guess). Anyway, I took it as praise. If someone wants to claim me as a muse, I’ll promise to thank him for it!