Archive for Sabermetrics

I’m Not a Sabermetrician

Rob Neyer has a nice post in which he uses my valuation of Raul Ibanez as an example where the sabermetric community is not in agreement. Rob has a great point: there is a lot of disagreement among baseball analysts.

The sabermetric community is quite broad, and I’m not sure how to properly define it. But, I will say that I consider myself to operate outside of what most people consider to be the sabermetric community, but that I often study similar questions. It wouldn’t be incorrect to say some of what I do is sabermetrics, but my approach and methods are different than the community standard. This doesn’t bother me.

I’m an economist. I didn’t grow up reading Bill James, playing Strat-O-Matic, or participating in old Usenet groups. I wish I had been aware of these things, but I think that my outside perspective has been an advantage when analyzing baseball.

In the comments on Rob’s site, there are two comments that I want to post here. The first is from Mitchel Lichtman (MGL).

JC, with all due respect, is an economist with some knowledge of sabermetrics and not a sabermetrician. As well, I don’t think he is an expert on projections by any means. There are plenty of experts with respect to projections, and I don’t think – in fact I know – that any of them will project his value at anything close to 14mm per year.

And this is my response.

I agree with MGL, I shouldn’t be considered a sabermetrician. I have never claimed to be a member of this community. What I do is apply my knowledge from my economics training and experience with analyzing data to issues in baseball. While MGL believes I do not understand certain sabermetric principles, I believe many sabermetricians (MGL included) underestimate the difficulty in analyzing baseball phenomena, especially when it comes to valuing players. Sabermetricians have made important discoveries (Voros McCracken’s work on DIPS was an important and correct finding); however, much what passes for research within this community is not sufficiently rigorous to reach the conclusions often claimed. There are many academic researchers from a variety of fields who have significantly advanced the understanding of baseball that receive scant mention in the sabermetric community. For example, Michael Schell’s Baseball’s All-Time Best Sluggers is the most thorough treatise on hitting ever written; yet, few individuals mention his work or attempt to replicate his methods. You rarely see economists Gerald Scully or Tony Krautmann mentioned when attempting to value players, despite the fact that their methods were published in reputable peer-reviewed economics journals, where established experts vetted their work. Academics are not always right, but I believe the checks ensure they are more likely to reach correct conclusions than informal online discussions.

Just to clear things up.

Wasting Money on K-Rod

I continue to be amazed by the over-valuing of closers in the baseball labor market. Yesterday, the New York Mets and Francisco Rodriguez agreed to a 3-year, $37 million contract. The deal also includes an option for a fourth year for $13-$14 million based on easily-attainable criteria. What an absolute waste of money. I have K-Rod valued at $6 million per season over the next three years.

I’ve been saying for a while that closers are overpaid. Rodriguez has been a very good closer, but the problem is that closers don’t pitch much. Over the past three seasons, K-Rod has faced 4.7% of the team’s opposing batters; a decent starter will face three times as many batters. While we see K-Rod pitch at the end of games, often when games are on the line, he’s not pitching much. The Met’s would have been better off spending that kind of money on a good starter who would prevent run scoring over many more batters. A few more million a year could have brought in A.J. Burnett or Derek Lowe whose superior pitching would prevent situations that closers can rectify.

Addendum: I received a question about the role of leverage—the difference in the importance of when a pitcher typically appears within a game—in determining values. I’ve been asked it before, and my answers have been scattered over several different locations. So, here is my e-mail reply explaining why I value all innings pitched the same.

I have considered the impact of leverage, but I don’t think leverage can explain the vast differences in my estimates and what is happening in the market. Leverage is a product of outside factors when a pitcher faces the same rules during all times of the game. The quality of his pitching is the same in the 5th inning as it is in the 9th. (There is the argument about pressure, but I don’t buy this explanation at this level of competition.) Now, the fact that he is good enough to pitch in a high-leverage situation is worth something; however, I don’t believe the value is twice the average. And the fact that a pitcher has pitched in high or low leverage situations doesn’t mean he ought to get all the credit for it.

For example, take Scott Linebrink and Francisco Cordero. Last year, both pitchers signed four-year deals for $19 million and $46 million. I estimated that Cordero was worth about $2 million more than Linebrink, yet he was paid more than twice what Linebrink got. The only difference in their pitching histories is that one is considered to be a middle reliever and the other considered a closer. It’s the performance that matters and ought to determine their salaries, not when they pitch. If Cordero is worth $46 million because he pitches in high-leverage situations, then Linbrink should have received a similar salary to reflect his opportunity cost—he could have pitched in high-leverage situations, but he didn’t. I think the market is putting too much value on the “Closer” label.

Another factor is that better pitchers in earlier innings affect the leverage in later innings. So, a good starter preventing runs as an impact on reducing leverage later in the game by creating bigger leads. I’m not sure exactly how to value that. So, I believe that the proper method is to treat all pitcher innings the same, while acknowledging that some elite relievers have some extra value in that they could be used in more valuable spots. But this value doesn’t necessarily come from when they pitched in the past.

I’m also a believer in patchwork bullpens. Take a bunch of bad castoff starters, platoon them, and tell them to pitch as hard as they can.

The Best Statement I Read Today

Tom Boswell, as quoted in The Hidden Game of Baseball:

the more ambitious the stat, the more complex and arbitrary it almost always becomes. What it gains in sophistication and the intuitive wisdom of its creator, it loses in simplicity and objectivity. How can you love a stat, or use it in arguments, if you can’t really explain it?

More Evidence that Protection Doesn’t Exist

From Will Carroll and Eric Seidman.

Andre Ethier recently said that he felt he was seeing better pitches with Manny Ramirez batting behind him….To test this, I looked at the Pitch F/X data for Ethier from 3/31 to 8/27, when he was not hitting ahead of Manny, and compared it to the data from 8/28 until the end of the season, when Manny was protecting him….He saw virtually the same amount of fastballs and same percentage of pitches in a pretty generous strike zone before hitting in front of Manny and after. It might seem like he is seeing better pitches but it could be some type of placebo effect.

This fits with what I’ve found. Thanks to studes for the pointer.

Extra Base Hits on Balls in Play and Pitcher Skill

A few weeks ago, John Dewan focused on a stat that I hadn’t thought about in a while as his Stat of the Week: Pitcher OPS Allowed (OPSA).

For hitters, for years and years, it was batting average that was thought to be the best single statistic to look at to evaluate a hitter. In the last couple of decades, the weaknesses of batting average have been exposed and the value of getting on base and hitting for power have become better recognized. The stat that is becoming the new standard for hitters is OPS—On-base percentage Plus Slugging percentage.

For pitchers, the standard is ERA. Compared to batting average, it provides a much better representation of effectiveness. It measures the most important quality of a pitcher’s job, preventing runs. However, it too has its flaws. The biggest flaw is that a pitcher’s ERA can be greatly affected by the pitchers that immediately follow him in a game, both positively and negatively.

Enter Opponent OPS. This is a stat that you hardly ever see. It makes just as much sense to look at Opponent OPS for pitchers as it does to look at a hitter’s own OPS. We just recently added this as a leaderboard titled “Opponent OPS” to Bill James Online and I wanted to share it with you.

ERA is going to continue to be the standard, and I will personally look at ERA for every pitcher, but I think Opponent OPS may be a better indicator of a pitcher’s overall effectiveness.

I used OPS Allowed to proxy pitcher quality in a study of hit batters with Doug Drinen. I’m not sure why we settled on the metric, but I haven’t used it in some time. There are two reasons for this. OPSA is heavily influenced by balls in play, and it is difficult to compute with available data. You basically have to reconstruct it from play-by-play data. But, OPSA holds some potentially useful information not contained in traditional DIPS metrics. So, when I read Dewan’s post I decided to look into the stat a little deeper.

This summer I’ve made a commitment to become more familiar with Perl so that I can better dig through the play-by-play data that is becoming increasingly available. So, I viewed this as an opportunity. Armed with Learning Perl and Doug’s old Perl scripts, I was able to gather some very specific pitcher-allowed stats from Retrosheet event files from 2000–2007.

First, I want to look at the correlation of pitcher performance from year to year. Here are the correlations for pitchers who pitched more than 100 innings in back-to-back-seasons.

Metric R2
AVGA 0.208
OBPA 0.206
SLGA 0.194
OPSA 0.179
ERA 0.120

Pitchers performance in OPSA is more consistent from year to year than ERA. This is an indicator that OPS captures more skill than ERA, as skill ought to be repeatable from season-to-season. Performances in other metrics are also better, but not by much. Even compared to ERA, OPSA’s correlation isn’t that much stronger. About 18% of a pitchers OPS is explainable by his previous season’s OPS, while 12% of a pitcher’s ERA is explainable by his previous season’s ERA.

That OPSA is more highly correlated from season to season than ERA is not surprising. ERA suffers from two deficiencies. First, scoring rules that assign a pitchers culpability for runs that were jointly allowed. Second, ERA includes performance on balls in play, which is heavily polluted by luck and defense. OPSA, though it also includes balls in play, includes walks and weights home runs more than other hits, both of which are defense-independent outcomes.

Next, I break down pitcher performance into components: strikeout rate (per batter-faced), walk rate, home run rate, batting average on balls in play, and doubles-and-triples-allowed average on balls in play (XBABIP).

Metric R2
K 0.603
BB 0.492
HR 0.153
XBABIP 0.042
BABIP 0.038

Pitchers do appear to have more control over the type of hits allowed on balls in play than they do over hits in general; however, the difference is small. Furthermore, pitchers have far more control over defense-independent metrics than balls in play.

But, even if pitcher control in this area is small, is there any additional information to be gained by knowing a pitcher’s XBABIP? The next table reports the marginal impact of the previous variables on predicting future ERA.

Variable (one-year lag) OPSA DIPS Only DIPS & OPSA DIPS & XBABIP
K – 7.02521 – 6.68551 – 7.03101
[11.27]** [8.81]** [11.31]**
BB 3.43275 3.05492 3.27137
[3.27]** [2.65]** [3.12]**
HR 12.05949 9.88836 11.4197
[4.27]** [2.50]* [4.04]**
OPSA 3.09165 0.41422
[9.67]** [0.78]
XBABIP 3.99536
Constant 2.05883 4.93117 4.65531 4.66884
[8.50]** [29.48]** [11.95]** [23.63]**
Observations 932 932 932 932
Adjusted R-squared 0.09 0.16 0.16 0.16
Absolute value of t statistics in brackets
* significant at 5%; ** significant at 1%

First, look at the R2–actually, it’s the “adjusted R2“, which makes some corrections to raw R2for bias induced by adding additional variables–and how they change in each model’s estimate of ERA. Neither the addition of OPSA nor XBABIP adds much explanatory power over the DIPS-only model. BABIP is excluded because it is never statistically significant.

Next, I use the same independent variables from the previous season to estimate OPSA in the present season.

Variable (one-year lag) OPSA DIPS Only DIPS & OPSA DIPS & XBABIP
K – 0.74367 – 0.59477 – 0.73719
[14.87]** [9.79]** [14.94]**
BB 0.22356 0.05451 0.18811
[2.65]** [0.59] [2.25]*
HR 0.54351 -0.28027 0.46318
[2.39]* [0.94] [2.06]*
OPS 0.3062 0.16671
[12.12]** [4.24]**
XBABIP 0.65966
Constant 0.52537 0.84578 0.73255 0.80116
[27.40]** [62.61]** [24.51]** [50.67]**
Observations 934 934 934 934
Adjusted R-squared 0.14 0.21 0.23 0.23
Absolute value of t statistics in brackets
* significant at 5%; ** significant at 1%

In terms of predicting the future OPSA of a pitcher, knowing his OPSA or his past propensity for allowing extra-base hits on balls in play–recall that this excludes home runs–improves the explanatory power of the model. This is evidence that pitchers do have some ability at preventing extra-base hits on balls in play.

Finally, let’s look at the magnitude of the impacts. The following table lists the absolute changes in ERA and OSPA from a one-standard deviation change in the variables, based on the coefficients in the DIPS & XBABIP models.

Variable ERA OPSA
K – 0.32087 – 0.03364
BB 0.07658 0.00440
HR 0.09103 0.00369
XBABIP 0.05431 0.00897

In terms of predicting ERA, all of the DIPS metrics have a larger impact than XBABIP; however, for OPSA, a one standard deviation change in XBABIP has a larger impact than a pitcher’s walk and homer rates.

What does this tell us? Again, if you want to know something about a pitcher’s skill at prevention runs, you can learn a lot from his defense-independent performance. The metrics will tell you more than his ERA or OPSA alone. However, knowing how a pitcher prevents different types of hits does add some useful information about a pitcher’s skill, unlike BABIP. If you happen to have XBABIP handy, feel free to use it to evaluate a pitcher’s talent. But if you don’t have it, you don’t lose much by ignoring it.

PS — Sorry about the spacing issues. I used HTML tables, which causes WordPress to insert extra spaces. I usually use pre tags, but I have not been able to get them to work since upgrading. I don’t think it’s necessarily a WordPress problem; instead, it’s probably something that I am doing that WordPress is interpreting differently than I intend. If you have any suggestions, please pass them along to me.

Problem fixed. I had a plug-in turned on that didn’t play well with others.

VORP Shmorp

David Sheinin wrote the following in yesterday’s Washington Post.

VORP can open your mind. It can bring your world into crystal-clear sharpness. Go ahead — try some. Did you know, for example, that if you exclude the resurgent Cristian Guzmán (VORP: 21.5), the Washington Nationals’ offense has a negative VORP — which, in essence, means if you released every last one of their position players (except Guzmán) and replaced them with cheap, waiver-wire scrubs, the team would be better off?

Or, to be accurate, the Nationals would be expected to be better off — because there is a theoretical aspect to VORP, which stands for value over replacement player.

What is it, you ask? It measures the number of runs a player contributes (or, in the case of pitchers, prevents) beyond what would be expected from a “replacement-level” player — which is to say, one that could be had as a cheap fill-in and who would be expected to produce at around 80 percent of the league average at his particular position during a particular year.

As for a team — that is, a theoretical team — made up entirely of replacement-level players? According to Keith Woolner, the sabermetrics pioneer who invented VORP in the late 1990s, “They would be expected to win between 45 and 50 games, which is comparable to the worst teams we see.” Well, not all of them: The 43-win Detroit Tigers of 2003 own the worst offensive VORP (-50.8) of the past 50 years.

I don’t use VORP, and it’s not that I don’t understand the concept: I just cannot figure out what I gain by using it over other metrics. Sheinin gives the following reasons that I should change my mind.

Why should you care about VORP? Because it presents the most complete picture of a hitter’s or pitcher’s true value. Unlike most other statistics, for example, VORP accounts for why a catcher — for whom it is difficult to find a replacement, because not as many players are capable of playing there — is more valuable than a left fielder with similar offensive numbers.

I don’t understand how VORP helps me here, and I don’t believe I’ve ever heard this explanation. Are catchers really more valuable than equally-talented batters who play left field because of scarcity? There are plenty of catchers in the minor leagues and major-league teams often carry three catchers. Teams don’t normally carry nine outfielders, do they? Imagine how bad a ninth outfielder must be–see the 2008 Braves if you are having trouble imagining this. Teams can draft and develop more catchers if they believe there is a shortage. I suspect that catchers are more expensive because they offer a greater defensive contribution. After all, they are the only other player besides the pitcher involved in every pitch. I admit that valuing catchers is difficult, but if VORP has a breakthrough at the catcher position by better-valuing the on-field contribution, then I’d like to see this spelled out.

It accounts for the fact that a run prevented is more valuable in 2008 than during a low-scoring year such as 1968.

You can make the same corrections to any baseball metric. The simplest is OPS+, but you can make more precise adjustments for any offensive statistic as Michael Schell has done. To me, that is superior to VORP, because I know what each of those statistics is telling me.

And it also accounts for the fact that, say, a .600 slugging percentage for someone who plays home games at Coors Field isn’t as impressive as the same percentage for someone playing at Petco Park.

Again, you can adjust any statistic for home-park bias. Park-effect corrections are hardly a novel contribution.

There is also the potential gains of valuing a player relative to “replacement level” as opposed to the average. I still don’t get the advantage of this. First, you have the task of defining replacement level. What is the point of this exercise? It is just an alternative benchmark to the average. I can explain to any baseball fan: “this player is above/below average.” To explain a player relative to “replacement level” requires a long, boring, and unnecessary conversation. Below-average players are valuable, and this isn’t difficult to understand.

What about using VORP to judge salaries? For example, if the league-minimum salary is $390,000, and a team signs a replacement level outfielder for $1 million, hasn’t the team overpaid? Not at all. Player value is determined by opportunity cost as determined by marginal revenue product (MRP). If a player generates many millions of dollars, his value is determined by this, not by how much he makes. Teams pay players with less than four years of service (approximately) less than their MRPs because the collective bargaining agreement allows them to do so. A team that plays a young and reserved player forgoes the potential return from trading the player to another team or from keeping down his service time. Signing a veteran for $1 million can be cheaper than promoting a young player who would provide equal value now by holding down service time.

My point isn’t that VORP is an awful or useless stat. To the contrary, there is clearly useful information contained in it. And those who prefer to hold discussions based on this metric should continue to do so. But there is no need for someone who does not speak the language to learn the ins an outs of a new metric, as Sheinin suggests. I can talk about all its components without dropping the V-bomb. If you want to talk hitting, we can use OBP and SLG. Then you can bring in stolen bases and defense to capture other effects. For pitching, we can use strikeouts, walks, and homers. The big advantage of these is that I can have these conversations with people other than die-hard stat-heads. I can also explain the advantages of these metrics over traditional triple-crown stats, and that is a huge benefit.

I view VORP as an insider language, and by using it you can signal that you are insider. It’s like speaking Klingon at a Star Trek convention. I can signal to others who speak the language that I am one of you. But, the danger of VORP is that once you bring it up the discussion goes down the wrong path as the uninitiated have reason to feel they are being told they are not as smart as the person making the argument. It’s like constantly bringing up the fact that you only listen to NPR or watch the BBC news at dinner parties. The response is likely going to be the same, “well fuck you too, you pretentious asshole!”

Last year, Murray Chass wrote the following.

I receive a daily e-mail message from Baseball Prospectus, an electronic publication filled with articles and information about statistics, mostly statistics that only stats mongers can love.

To me, VORP epitomized the new-age nonsense. For the longest time, I had no idea what VORP meant and didn’t care enough to go to any great lengths to find out. I asked some colleagues whose work I respect, and they didn’t know what it meant either.

Finally, not long ago, I came across VORP spelled out. It stands for value over replacement player. How thrilling. How absurd. Value over replacement player. Don’t ask what it means. I don’t know.

The thing is: I can actually sympathize with Chass here, though for different reasons. I too get the occasional VORP e-mail, and my normal first reaction is to roll my eyes. I don’t speak VORP, and I shouldn’t be expected to do so. If you want to talk about why a player may or may not be valuable, we can have that discussion in a language that I speak.

Rep. Waxman, Please Do Something Productive

Representative Henry Waxman is upset about Bud Selig’s Congressional testimony that positive steroid tests declined from five percent in 2003 to one percent in 2004.

But the accuracy of the picture provided by Commissioner Bud Selig, his deputy Rob Manfred and the players union’s executive director, Donald Fehr, about how the testing was conducted has come into question. The committee’s chairman, Henry A. Waxman, Democrat of California, has said he is troubled, and the committee’s staff is planning to send letters to Selig and Fehr seeking answers to what Waxman has called “misinformation.”

At the heart of the issue is the fact that the committee was not told that the 2004 testing, with its significantly lower positive test results, had been partly shut down for much of that season, what Selig’s office later called an emergency response to an unforeseen situation. Specifically, the shutdown arose from the federal investigation of the Bay Area Laboratory Co-operative steroid ring.

As a result, players who apparently tested positive in 2003 were not retested in 2004 until the final weeks of the season, and might have been notified beforehand, perhaps skewing the overall test numbers for that year.

“It’s clear that some of the information Major League Baseball and the players union gave the committee in 2005 was inaccurate,” Waxman said in a written statement. “It isn’t clear whether this was intentional or just reflects confusion over the testing program for 2003 and 2004. In any case, the misinformation is unacceptable.”

First, there is nothing “inaccurate” about this claim. It could have been misleading, in that the lower positive-test rate had a cause other than decreased steroid use, but baseball appears to have presented correct numbers.

Second, is this news to Waxman? The shutting down of testing in 2004 is such common knowledge that I cannot even recall where I learned of it many months ago. I believe it was included in the Mitchell Report.

UPDATE: Here is the relevant text from the Mitchell Report (pp. 281–282).

In April 2004 federal agents executed search warrants on two private firms involved in the 2003 survey testing, Comprehensive Drug Testing, Inc. and Quest Diagnostics, Inc.; the warrants sought drug testing records and samples for ten major league players connected with the BALCO investigation. In the course of those searches, the agents seized data from which they believed they could determine the identities of the major league players who had tested positive during the anonymous survey testing.

Shortly after these events, the Players Association initiated discussions with the Commissioner’s Office regarding a possible suspension of drug testing while the federal investigation proceeded. Manfred said the parties were concerned at the time that test results that they believed until then raised only employment issues had now become an issue in a pending criminal investigation. Ultimately, the Commissioner’s Office and the Players Association agreed to a moratorium on 2004 drug testing. While the exact date and length of this moratorium is uncertain, and the relevant 2004 testing records have been destroyed, Manfred stated that the moratorium commenced very early in the season, prior to the testing of any significant number of players. Manfred stated that the Players Association was not authorized to advise its members of the existence of the moratorium.

According to Manfred, the moratorium lasted for a short period. For most players, drug tests then resumed. With respect to the players who the federal agents believed had tested positive during 2003 survey testing, however, the Commissioner’s Office and the Players Association agreed that: (1) the Players Association would be permitted to advise those players of this fact, since that information was now in the hands of the government; (2) the testing moratorium would continue with respect to those players until the Players Association had an opportunity to notify them; and (3) the Players Association would not advise any of the players of the limited moratorium.

Sometime between mid-August and early September 2004, Manfred contacted Orza because the Players Association had not yet notified the players involved. The 2004 season was drawing to a close without those players having been tested because they remained under the moratorium. Manfred said that he pressed Orza to notify the players as soon as possible so that they could be tested. All of the players were notified by early September 2004.

The problem is that the owners and players agreed to suspend testing for a portion of the 2004 season after the I.R.S. seized previous test results that were supposed to be anonymous. It was the seizure by a government organization that impeded the testing, not MLB or MPBPA.

But, what really annoys me, is that if Waxman really cared about getting steroids out of baseball, he would have used his powers to suppress the seized evidence. Baseball entered into its drug testing program with good intentions, despite the fact there are incentives for individual players and owners to skirt the system. It took major concessions for both sides to begin testing, and anonymity of the early tests was important. Th raid threw all of that good will out the window, and players were once again suspicious of what would happen to the personal health information contained in the blood samples.

Instead, of blaming baseball, Rep. Waxman should have helped baseball negate these seizures, so that all the parties involved could have used their resources to protect player confidentiality while trying to rid the sport of doping. That is to goal of all of this, isn’t it?

Is Andruw Jones Overrated?

I first became aware that Andruw Jones was going to be the lead story in Jayson Stark’s new book The Stark Truth: The Most Overrated and Underrated Players in Baseball History during a radio interview several months ago. In the discussion Stark mentioned that Andruw Jones is not as good as people think he is and that he would be explaining why in his upcoming book. I was anticipating what he had to say when, lucky for me, published an excerpt from the book on Jones.

What does Stark mean by overrated? Well, that is a tough one, which he admits. It is certainly subjective, and while some people might think Andruw Jones is the greatest center fielder in history this is not the consensus. Stark clearly doesn’t think the public sees Andruw in this class, just in a class higher than he should be.

What does the baseball public think of Jones? Well, in his 10+ year career he has made five All-Star teams, all during excellent offensive seasons (OPS+ > 120). His career OPS+ is 117, which is good, but not outstanding, offensive production for a center fielder. On defense Jones is considered to be one of the best in the game winning nine straight Gold Gloves. I’d say the baseball watching public considers Jones to be a good hitter and an excellent defender. So, how does Jones stack up to Stark’s case?

Luckily, Stark gives up a reference point for judging excellence in center field.

[Center field is] the position of Mays, Mantle, and DiMaggio. Of Cobb, Puckett, and Griffey. Those aren’t just names on a lineup card. Those are names that conjure up magic. This is the glamour position in baseball. Nothing else is close.

Did you catch that? Read over the list of names again. Kirby Puckett? Are you kidding me? Don’t get me wrong. Kirby Puckett was a very good player, but is nowhere close to the class of the other players on the list. In fact, Andruw Jones’s career OPS of 117 is quite similar to Puckett’s 124—and don’t forget that Puckett was forced to retire near the top of his game. Hey, I’ll grant that Puckett was the better player, but I’m a bit uneasy saying that Jones falls well short of of Stark’s own standard. Puckett is more similar to Jones than he is to the other players on the list. Maybe Puckett would have been a better choice for an overrated center fielder if people really do consider him to be as good as Mays, Mantle, DiMaggio, and Cobb.

Let’s look at offense first. Stark quotes a scout who describes Jones’s offense as “not very good.” Now, I’m not sure how to interpret the quote. Taken literally, Jones is not a very good hitter; but, when I’m at a family reunion and someone says, “this congealed salad isn’t very good” I skip it. It’s not like Jones is known for his bat: he’s garnered only one Silver Slugger award, and three are handed out every year. While his offense wouldn’t be anything special for a corner outfielder, he’s more than adequate for his position. For the previous three seasons he’s finished second in OPS among center fielders (2004, 2005, 2006), and I have little doubt that he will finish this season near the top.

Now let’s move to defense. Here is where Stark makes most of his case. He acknowledges that Jones was once one of the best defenders at his position, and he believes he is living off a reputation that is no longer deserved. As he did for offense, he cites the opinion of a few scouts that Jones’s defense has declined.

“I first noticed it two or three years ago,” he said. “Just from sitting there, scouting, watching balls dropping in that should have been caught. I’m not talking about balls that needed to be dived for. I’m talking about balls that should be caught.”

I surveyed other scouts. They’d begun to see the same things. Not getting the same jumps. Not reacting. Not putting in the defensive effort he used to. His body getting thicker. A sudden obsession with home-run hitting over everything else.

Stark doesn’t just believe these words, he goes to some numbers. There is no denying Andruw’s putouts are down from the mid-400s to the 370s—though Jones is on a pace for around 450 putouts in 2007. Stark says this can’t be because the composition of his pitching staff as changed, because his zone rating has fallen. Here is where Stark’s argument falls apart. There is no denying Jones isn’t a zone rating wonder, but zone rating doesn’t tell us much about defensive prowess.

Zone rating is a seductive statistic because it seems like a batting average for hitters. How many outs did you generate from chances withing an a somewhat objective zone? What a nice idea! The problem is that zone rating is very sensitive to balls that players catch outside of their assigned zone. It’s one of the reasons that the inventor of zone rating, John Dewan, abandoned his creation and developed an entirely new method for evaluating defense—more on that in a moment.

Three years ago I wrote a post, Thoughts on Zone Rating, using Jones as an example of why zone rating is flawed (it comes up number one in the Google search for “zone rating problems“). The basic problem is that defensive shifts allow fielders to catch more balls outside of the zones, but also causes them to give up balls hit in zones. Fielders are asymmetrically punished and rewarded for players made and not made in and outside of the zone. I’m not going to rehash the argument, but the quick summary is that the way outfield defense is played today, zone rating has some problem evaluating players, especially when they are catching balls outside of assigned zones.

The problems with ZR extend beyond my critique, and its flaws became so obvious that its creator John Dewan developed two new defensive measures: Revised Zone Rating and the Plus/Minus System. Both are presented in Dewan’s amazing book The Fielding Bible. (I’m also excited to learn that a new volume is scheduled for 2008…Yes!) While the latter measure is superior, I want to focus on Dewan’s revised ZR, because of some information presented in the book that shed’s light on traditional ZR.

Rather than include balls out of a fielding zone in the traditional ZR metric, the revised system credits balls caught out of the zone separately. On page 234 we see that from 2003–2005 Jones made 218 out-of-zone plays—40 more than Juan Pierre (in 53 more innings played), 49 more than Johnny Damon (in 10 fewer innings played), 63 more than Vernon Wells (in 92 more innings played), and 64 more than Carlos Beltran (in 53 more innings played). Long story short: Andruw Jones is good at getting to balls outside of his zone, and because one of weaknesses of traditional ZR is handling balls out of the zone we ought to be wary when using it to judge Jones.

Next, let’s go to the Plus/Minus System. This is Dewan’s masterpiece: a system based on objective video analysis of how players field balls according to the speed, trajectory, and location of batted balls relative to other fielders playing the same position. It’s frickin’ awesome. To use zone rating to evaluate fielders when this system is available is like using a wooden tennis racket at Wimbledon today. How does Jones do in the Plus/Minus system? Using the original Plus/Minus metric presented in The Fielding Bible, from 2003–2005 Jones made 26 more plays than the average center fielder, putting him behind only Torii Hunter (+44) and Aaron Rowand (+34). Furthermore, Dewan awards Jones Gold Gloves in all three seasons.

Dewan published a few results from an updated system that more precisely measures fielding in The Bill James Handbook 2007. Jones performs even better in this system. From 2004–2006 Jones made more plays than any other center fielder—48 more than the average center fielder and three more than the next closest player (Corey Patterson). In 2006, Jones finished second only to Patterson (+34) by making 30 more plays than average. He’s still got it!

The funny thing about this is that before the Plus/Minus system came into being I thought Andruw was underrated as a defender. Rumors of Jones’s defensive decline have been discussed openly for years, but I never saw it. I believe that the main reason for this is that Jones isn’t as skinny as he used to be. Hey, who isn’t? And though his speed may have declined some that was never what made Andruw Jones so good. I have never seen any player take routes to balls as well as he does. His defensive gift is less about his legs and more about his ability to know where any ball is going faster than anyone else. It is almost as if he folds space as he runs, because he consistently gets to balls that I expect to be hits.

I was happy that Dewan’s system confirmed my thinking, and I would have been prepared to admit that my eyes had been deceiving me if it had shown otherwise. Quantifying defense is difficult and only now are we coming close to understanding how to evaluate fielders. Zone rating has its heart in the right place, but it has little value. I would rather judge a hitter solely by his batting average than judge a fielder by his zone rating.

So, is Jones overrated? Well, I think it’s pretty clear that he is a good-hitting center fielder who is one of the top defenders in the league. That is how I have him pegged, and I suspect the perception of the public is not much different.

Aging in Baseball

Fellow economist Steve Walters pinch hits for Dave Berri at Wages of Wins Journal, where he discusses aging in baseball. You may remember Steve from his Statscape column in Sporting News.

Steve discusses how our perceptions about aging in baseball may be incorrect. In the early-1980s, Bill James found that players peak around age 27. But statistician Jim Albert, co-author of the excellent Curve Ball, finds that players tend to peak closer to 29. This reminded me of some work I have done.

A few years back, I ran a series of posts to measuring aging in baseball for hitters (part 1, part 2 , part 3) and pitchers using slightly different methods. What did I find? Like Albert, I find players peaking at around age 29—worse players tend to peak earlier, and better players peak later.

Now, this was three years ago, and I need to follow up on my findings (and I plan to this summer), but I think it is interesting that both Albert and I reach similar conclusions.

I’m not sure if Steve will be writing more for WoW this week, but I will. Look for a post from me there tomorrow or Wednesday.

PrOPS Questions

Over the weekend, Rich Lederer of The Baseball Analysts pointed me to an ESPN story by David Srinivasan (Insider $) about a statistic I developed a few years ago, PrOPS. This led to few comments on the site that I wanted to address. I’ve had numerous conversations about PrOPS since its invention, so I wanted to write a post to bring people up to speed on its development.

First, let me offer a brief introduction. PrOPS (which stands for predicted OPS) is a measure that generates an OPS—on-base percentage(OBP) plus slugging percentage(SLG):mdash;for a player based one a few things that players do. Rather than focus on outcomes on balls in play (hits, outs, etc.) that generate OBP and SLG, PrOPS uses batted-ball types (line drive rate, groundball-to-flyball ratio) and a few other things to generate the typical outcome for a player who hits the ball in this manner.

Now, PrOPS has its origins in my wanting to use batted ball types recently made available by The Hardball Times. In the introductory article on the subject, I used PrOPS to predict which slumping and hot hitters were due for a rise and fall in the 2005 season. The initial numbers were based on one season of data. A few people responded that many of the under-performers were speedy while the over-performers tended to be big and slow. So, I made a minor adjustment to the formula to account for speed. However, the adjustment did very little.

At the end of the season, I wrote a chapter for The Hardball Times Baseball Annual 2006, refining PrOPS using several seasons of data. When including several seasons, I found no relationship between any existing measure of speed and over/under-performance. This doesn’t mean that speed has no impact, but it doesn’t seem to be very important. A few months ago, I posted a summary of the findings.


There is a highly statistically significant relationship…between a player’s over/under performance and his decline/improvement. And the greater the the deviation between PrOPS and OPS, the larger the reversion is the following season. For every 0.01 increase/decrease in a player’s over/under performance, his OPS is likely to fall/rise by 0.008 the following season. For example, a player with an OPS 10 “points” above his PrOPS, can expect his OPS to fall by eight points in the following season. That is quite a reversion.

I also generated lists of the top-25 over and under performing season from 2002-2004. And what happened to them?

Of the top 25 over performers, 20 players had lower OPS in the following season.

Of the top 25 under performers, 21 improved their OPS in the following season.

The article also lists the top-25 over and under performers for 2005. What happened to those players in 2006?

Of the over performers, 12 players declined, 7 improved, and 6 did not deviate more than 20 OPS-points from the previous season. Of the under performers, 11 players improved, 7 declined, 3 had no change, and 5 didn’t garner serious playing time. It’s not an air-tight projection system, but there seems to be some information there.

OPS explained approximately 43% of the variance in OPS in the following year, while PrOPS explained about 46%.

PrOPS is not a stand-alone projection tool. You should not look only at a player’s PrOPS and assume it’s exactly what the player should be doing. When I look at it, I also consider the player’s recent hitting history, injuries, aging, and all that other stuff we sometimes use to evaluate hitters. But when I see a player have a career year, and his PrOPS don’t show it, I start to get suspicious.

If you’re curious about the over/under performers of 2006, see The Hardball Times.

NL over performers
NL under performers
AL over performers
AL under performers

Over the weekend I also ran across a new system for measuring luck by Protrade (also see here). It looks promising.