Rob, when we were at Fenway you posed the question, Who was the most likely pitcher to have thrown a no-hitter not to have thrown one. On the plane home I thought of what I assumed had to be the correct answer, which is Roger Clemens. Clemens has never thrown a no-hitter at any level: majors, minors, college, high school, amateur, little league.
But actually, Clemens is not the answer to the question, amazingly enough.
To find the answer to his question, James looks at the likelihood that a pitcher will throw a no-hitter based on his career out percetage [ (3 * IP) / ((3 * IP) + hits)] and the number of starts. The simple logic behind this is that the higher a pitcher’s out percentage and the more starts, the more no-hitters a pitcher should throw. Using this method James finds that Don Sutton, not Roger Clemens, is the answer to the question. Clemens is number three, behind Pedro Martinez.
But, I thought of another method to predict no-hitters. No-hit games are simply the product of the out percentage; however, the likelihood of a pitcher throwing a no-hitter may depend on how pitchers tend to get those outs. Players who strike out lots of batters (like Nolan Ryan, Randy Johnson, and Roger Clemens) are less dependent on their fielders to get outs. Might not these guys have a better chance of throwing no- hitters than pitchers with identical out percentages but lower K-rates? Well, I decided to check it out using a Poisson regression procedure. Using DIPS/FIP as my motivator for pitchers’ abilities to prevent runs I estimated the number of no-hitters as a function of strikeouts, walks, home runs, and games started. Ks and HRs are obviously indicators of pitcher success in generating outs, with the former generating outs and the latter preventing them. Walks are a bit iffy. I included it largely because it is one of the “big 3″ stats in FIP, but pitchers who walk more batters pitch more out of the stretch and open up infield holes. But, in the end walks did not seem to have much of relationship with no-hitters.
Here is the list of the Top-25 predicted no-hitters:
And here are the Top-27 pitchers without no-hitters (Why 27? To get Tom Glavine on the list, of course.):
|No Nohitters Rank||Overall Rank||First||Last||Debut||Predicted|
This method has Clemens not just the most likely no-hit pitcher never to throw a no-hitter, but he is third on the all-time list of predicted no-hitters. This is a little more supportive of James’s intuition. And I suspect this intuition is nurtured by a belief that Rocket’s pitching style is conducive to no-hit games.
One thing I like about the model is how well it predicts Ryan. Even when I throw Ryan out of the sample when estimating the regression, it still predicts Ryan should throw about 7 no-hitters. However, my model misses Koufax badly, but so does James’s. What that tells me is not that either model is bad, but that he was really lucky to throw 4 no-hitters.
On a final note, I estimated the regressions based on all pitchers with at least 100 games started, but that ended up kicking out a few guys with no-hitters. That may have affected the results. Also, I have not double-checked myself as much as I would have liked due to my busy schedule. I really would like to have put a little more time into double-checking my numbers, but I just don’t have the time right now. I would be happy to share my data with anyone who wants to proof what I did. Finally, I did run some of the standard tests used for Count regressions and, generally, a straight-up Poisson model seemed the right way to go. The results with a negative binomial regression were not much different.
As always I welcome thoughts and suggestions.