## A Little Clutch Hitting Study

Last week’s post on clutch ability got me thinking about another way to identify clutch hitting. Instead of comparing performance in aggregate data, I wanted to look at the probability that a hitter would perform in an individual plate appearance using past performance metrics as predictors. The degree to which past clutch performance predicted actual performance would tell us something about clutch ability, while controlling for other factors.

So, as I watched Nick Punto surpass Lonnie Smith for the most-memorable baserunning error in Metrodome history, I pulled up an old data file (via Retrosheet) that Doug Drinen and I had used to study protection. I had a four-year sample of individual plate appearances from 1989–1992. I estimated each player’s performance with runners in scoring position (RISP) from 1989–1991 to see how it predicted 1992 performance in RSIP plate appearances. The idea is that if players have any ability to perform with higher stakes, then past performance in this area should affect the probability of success during individual plate appearances. The nice thing about such granular data is that it is possible to control for factors such as pitcher quality and the platoon advantage—effects that are difficult to tease out of aggregate data.

I used probit models to estimate the likelihood that a player would get a hit (1 = hit; 0 = otherwise), or get on base (1= hit, walk, or hbp; 0 = otherwise) controlling for the player’s seasonal performance in that area (AVG or OBP), RISP 1989–91 performance in that area, whether the the platoon advantage was in effect (1 = platton; 0 = otherwise), and the pitcher’s ability in that area. To test hitting power, I used the count regression negative binomial method to estimate the expected number of total bases during the plate appearance and used his RSIP SLG 1989–1991 as a proxy for clutch skill in this area.

The table below lists the marginal effect (X) of a change in the explanatory variable on the dependent variable. For example, a one-unit change in the explanatory variable is associated with an X-unit change in the dependent variable. For the probit estimates, this represents a change in probability. For the negative binomial estimates, this represents the expected change in total bases.

```Variable	Hit		On Base		Total Bases
Overall		1.04		0.98		0.93
[9.58]		[11.84]		[10.8]

RISP		-0.06162	0.00018		0.00012
[1.02]		[3.65]		[1.32]

Pitcher		1.152		1.031		0.983
[12.94]		[12.51]		[12.83]

Platoon		0.014		0.040		0.039
[2.41]		[6.74]		[3.82]

Observations	23,197		26,820		23,197
Method		Probit		Probit		Neg. Binomial

Absolute robust z-statistics in brackets.
```

The brackets below the variable list the z-statistics, where a statistic of 2 or above generally indicates a statistically meaningful relationship. In samples of this size, statistical significance isn’t difficult to achieve; therefore, it isn’t surprising that in all but two instances the variables are significant. The two that are insignificant are the past RISP performance in batting average and slugging average. Thus, clutch ability doesn’t appear to be strong here.

However, the estimate of a clutch effect is statistically significant for getting on base. Is this evidence for clutch ability? Well, let’s interpret the coefficient. Every one-unit increase in RISP OBP is associated with a 0.00018 increase in the likelihood of getting on base; thus, a player increasing his RISP OBP by 0.010 (10 OBP points) increases his on-base probability by 0.0000018. For practical purposes, there is no effect.

This study is by no means perfect, but the striking magnitude of the impacts between overall and clutch ability (just look at the differences in the Overall and RISP coefficients) in such a large sample shows why it’s best to remain skeptical regarding clutch ability. If players did have clutch skill, I believe it would show up in this test.

## Overestimating the Fog

This morning, I ran across an article by Alan Barra in the WSJ that reminded me of a blog post that I have been meaning to write for several years. Barra discusses the ability of players to perform in “clutch” moments. In closing, Barra cites Bill James as an agnostic regarding clutch ability, citing the last line of his article Underestimating the Fog, “Let’s not be too sure that we haven’t been missing something important.”

James’s article caused a bit of a stir when it was first published. Here was James arguing that several common notions among sabermetricians—including that clutch ability is a myth—were not necessarily so. As James metaphorically stated the problem, clutch hitting lay in a fog beyond a sentry, on the lookout for approaching forces. In a thick fog, the enemy may be invisible despite existing in strong numbers. The fog that obscures the view of the guard is similar to the fog that shrouds the randomness in baseball that makes it difficult for us to identify ability from chance. While we have methods for disentangling luck from ability, there exists the possibility that clutch ability is real and we just haven’t found a way to see through the fog properly. Therefore, we shouldn’t be so quick to believe an idea to be true, even when the bulk of the evidence we have indicate that it is true. Maybe the truth is just lost in the fog.

No one can deny this. Of course it is possible that clutch ability exists and we just haven’t found a way to measure it properly. But we dismiss lots of other possible events as likely outcomes everyday with good reason. And the tradeoff of acting with too little evidence must be balanced against not acting with sufficient evidence. It’s a dilemma familiar to all scientists. This is explained with the distinction between type I and type II errors.

Let’s begin with the null hypothesis that player performance in clutch situations is identical to performance in non-clutch situations. A type I error occurs when we reject a correct null hypothesis. Studies of clutch hitting find that performance differences in these situations are small and often not statistically meaningful. The null stands and clutch-hitting skill is seen as a myth. A type II error occurs from not rejecting an incorrect null hypothesis. When James advocates agnosticism towards clutch-hitting as a skill, it is because that despite the studies showing little evidence of clutch-hitting he wants to avoid committing type II error. The problem is, this choice between type I and II errors isn’t free. By raising the decision criterion to avoid type II error, you necessarily increase the chance of committing type I error.

Identifying clutch hitting is practical problem that requires a decision involving real costs. Should a team factor in clutch ability when choosing between free agents. Should it matter for the manager choosing among pinch hitters? Should a historically big-game pitcher start the playoff series over your regular season ace? Based on the available evidence, if I had to decide between Jeter or A-Rod it’s not even close: Alex Rodriguez is a far superior player to Derek Jeter, and that’s what is relevant. And in cases were the players’ performances are more similar, I wouldn’t consider clutch performance for even a moment. If clutch ability exists, it would show up in bunches using the empirical methods already employed by researchers seeking to study the question.

In my view, the fog is a distraction: something to bring up to keep the argument going. But arguing takes time, which is valuable. Let’s stop it with the fog, already. Of course it’s possible that something exists that just hasn’t been discovered yet (e.g.the Loch Ness Monster, Sasquatch, ergogenic effects of HGH); but the evidence we have says these things don’t exist, and hanging hopes on the possible isn’t a very persuasive argument.

## More on Chicago’s Olympic Chances

Economist Michael Davis has some interesting thoughts on the recent Olympic vote (my thoughts on the vote here). He demonstrates that with some fairly general assumptions about voters’ preferences that Chicago may have been barely defeated. In fact, just making it to the second round might have been able to propel it to victory in a head-to-head contest with Rio.

The key to all of the above possibilities is that they are consistent with the revealed preferences of the voters (with the exception of the three voters whose votes are changed from Tokyo to Chicago). It is also consistent with the geographical solidarity that the voters seem to exhibit.

The scenarios where Chicago ultimately succeeds are probably not the most likely, as there seems to have been a lot of sentimental support for Rio. Scenario 4 seems the most likely to me, but a variation of scenario where Chicago barely beats Rio in the final round certainly seem plausible to me.

The problems with voting and cycling of outcomes in democracy have been studied for some time by economists. If you are interested in some of the analysis, I suggest starting with the work of Kenneth Arrow and Duncan Black.

## Nate McLouth Is a 4th Outfielder?

“Nate McLouth is still a fourth OF masquerading as a starting CF.”

That’s Joe Sheehan of Baseball Prospectus. It’s not the first time I’ve heard this comment, and I don’t get it. Take a look at McLouth’s numbers for the past three seasons.

```season	OPS+	+/-	SB/CS
2007	110	-9	22/1
2008	126	-37	23/3
2009	109	+3	19/6
```

There is no arguing that McLouth is an above-average hitter, and when you add in his contributions on the basepaths it’s clear that he is a valuable offensive player. His lone deficiency is on defense, where he drew the ire of many saber-minded commentators for winning a Gold Glove while having the worst Plus/Minus in the league. He also was the Pirates lone All-Star representative in 2008, because someone had to go. But the justifiable backlash against his mainstream overrating doesn’t justify relegating him to part-time status.

So, let’s tackle the defense. In 2007, he had a poor defensive season with a -9 Plus/Minus that when translated to a full season of work would have been a -16. Not good, but not in -37 territory. In 2009, he seems to have corrected the problem, becoming league average. Maybe it’s a blip, and he hasn’t improved. After watching him for half a season, I don’t really understand how anyone could have awarded him a Gold Glove. Yet, I thought he was adequate and a defensive upgrade over the supposedly solid Jordan Schafer, who posted a -5 Plus/Minus for one-third of the season (Yes, I get it: small sample and he’s young. Just pointing out that the metric that damns McLouth says he was better defensively than Schafer in 2009).

But that -37 may not adequately capture his defensive ability, and the Plus/Minus creator John Dewan seems to agree, “All in all, I no longer think of McLouth as the worst center fielder in baseball. It means something that at least some of the managers and coaches think highly of him.” In addition, Dewan examined McLouth’s performance at a more granular level in The Fielding Bible: Volume II and found McLouth’s biggest weakness: deep balls, especially those near the wall. Does this have something to do with defensive positioning, the park in Pittsburgh, or McLouth’s ability? This is difficult to answer, but the Plus/Minus of McLouth’s replacement in Pittsburgh Andrew McCutchen reveals something interesting. In two-thirds of a season he posted a Plus/Minus of -17. I think BIS and Pittsburgh need to get together and see it there is a measurement or coaching problem that needs to be addressed. McLouth and McCutchen might both have been poor fielders in Pittsburgh, but I think there was also be something else going on.

Even if you take the Plus/Minus values at face value and compare it with his Adjusted Batting Runs per 162 for the past three season is 15 runs above average. His average Plus/Minus for the past 3 years (stretching 2007 out to his 2008 playing time) is -17, which you can multiply times .56 to get about -10 runs. So, he’s still a player who is five runs above average.

In conclusion, I think there is very little evidence to support the claim that McLouth is a fourth outfielder. He may not be an All-Star or a Gold-Glover, but he’s a starting center fielder for most major-league teams.

## Did Prediction Markets Get Chicago Wrong?

Alex Tabarrok points to a critique of prediction markets, regarding the fact that betting markets had Chicago being the favorite to win the 2016 Summer Olympics over Madrid, Rio, and Tokyo.

Why were the odds so awry on the 2016 hosting city? I had assumed Rio would win in a walk, and yet, as shown in the following figure, Chicago was the favorite among oddsmakers.

I normally wouldn’t have much to say here, but I just happened to be following the prediction market on Friday morning quite closely. I was scheduled to go on CNBC to talk about the economic impact of the Olympics if Chicago was elected. It takes me about an hour to get to the studio, so I wanted to know the likelihood of making the trip, and I became fixed on the market for the morning.

Around 9 am, a friend sent me this picture of the InTrade betting market on who would get the games.

The odds show Chicago to be the favorite with a 53% chance of winning, closely followed by Rio at 46%, Tokyo at 3%, and Madrid at 2%. Like all the pundits following the selection were saying, it was a race between Chicago and Rio, but was very close to call. These odds also show something else, Chicago was trending down and Rio was trending up. The trend would continue for the next few hours. And I happened to record these changes on my Twitter/Facebook pages.

10:50am: “Intrade Olympics futures market has Rio rising and Chicago falling to a dead heat.”

11:17 am: “Intrade futures market: Chicago shares selling for 30, Rio 53.”

About 10 minutes later, Chicago was out. Looks like useful information was leaking out from knowledgeable parties just before the vote. This is evidence for, not against, the strong-form of efficient markets hypothesis.

But, then we have the question: if Chicago had such high odds, how did it get knocked out of the running first?

[Update 8/3/2009] A few people have asked why I think the odds were all that awry, when they implied that Chicago was only a slight favorite over Rio. True enough, but Chicago went out in the first round, Tokyo in the second, and Madrid in the third, implying that Chicago was the fourth choice of the IOC among the finalist cities, not a slight favorite for first.

This can be answered by the voting mechanism used and the coalitions favoring each region. Here are the vote tallies by round.

```City	Rd. 1	Rd. 2	Rd. 3
Rio	26	46	66