Comments on: Agreeing and Disagreeing with Bill James http://www.sabernomics.com/sabernomics/index.php/2010/12/agreeing-and-disagreeing-with-bill-james/ Economic Thinking about Baseball Sun, 09 Jan 2011 17:16:18 +0000 hourly 1 https://wordpress.org/?v=4.6 By: Pete http://www.sabernomics.com/sabernomics/index.php/2010/12/agreeing-and-disagreeing-with-bill-james/comment-page-1/#comment-110967 Tue, 21 Dec 2010 14:05:28 +0000 http://www.sabernomics.com/sabernomics/?p=3600#comment-110967 JC,

Looking at only the above-average portion of your updated graph (where there should be little to no selection bias because these are the very best players available) we see that for each lower level of performance there is an increasing number of players. As you get below average the efficiency of the league minimizes below average *performances.*

However, there is still below average *talent* available in increasing numbers. These players just don’t receive major league playing time.

Surely you agree that there are more players clustered around “22nd-25th man on the roster” talent than “18th-21nd man on the roster” talent.

Of course, this won’t show up when you look at players who receive 100 MLB PA in a season. Because most of these players are in AAA.

]]>
By: Donald A Coffin http://www.sabernomics.com/sabernomics/index.php/2010/12/agreeing-and-disagreeing-with-bill-james/comment-page-1/#comment-110964 Tue, 21 Dec 2010 04:14:33 +0000 http://www.sabernomics.com/sabernomics/?p=3600#comment-110964 The question of the distribution of *talent* is, I think, somewhat different from the distribution of *outcomes,* in a situation in which the objective of one participant (say, a hitter) and the objective of another participant (say, the pitcher/defense) are opposed. The question of the distribution of talent can, I think, only be truly answered in one of two circumstances:

1) All hitters hit against only a set of pitchers backed by a defense of given quality. Then, the only variable is the talent of the hitter. (Transpose this for pitcher/defense–hold the hitter talent constant and vary the pitcher/defense.)

2) Participants, while competing with each other do not oppose each others’ efforts. Take, for example, the distribution of driving distances on the PGA Tour in 2010. I can’t reproduce the graph here, but it is clearly non-normal. And it is positively-skewed.

I suspect, but can’t right now, find data for (for example) scores on the Professional Bowlers’ Tour, but I suspect they are also non-normal.

But I’m not surprised that the distribution of outcomes in MLB is approximately normal, just as I would not surprise that the distribution of outcomesin the NFL is roughly normal. The interaction of two non-normal distributions can easily be normal.

]]>
By: JC http://www.sabernomics.com/sabernomics/index.php/2010/12/agreeing-and-disagreeing-with-bill-james/comment-page-1/#comment-110961 Mon, 20 Dec 2010 23:08:33 +0000 http://www.sabernomics.com/sabernomics/?p=3600#comment-110961 Mike,

So if you remove a very large number of players who are below average because they aren’t good enough to stick around, then you can get the distribution to approximate a normal distribution? That’s a tautology, and surely you can see that.

If I had argued that, it would be a tautology, but that was not my argument. I stated, “it is not correct to sum the frequency of bad players over time to tell us something about the availability of bottom level talent in the league in any given year (emphasis added for clarification). There are going to be a few good players who stick around and the players who do not stick around will obviously be worse. Thus, when looking for how much talent might be around when a team is looking to fill its roster, the fact that there were some bad players in the league several years ago is irrelevant.”

You state, “the reason I use the 100-PAs cutoff is that I need a sufficient sample size to measure talent level.” That’s irrelevant to my point, and I don’t know why you state it. I also used a 100-PA cutoff.

This was in response to your claim that “So you don’t think that the players who got 100 PA distributed over two or three years are a good representation of replacement level? Why not? They would seem to be a perfect representation. Yet you purposefully exclude them without giving a good reason, other than an ad hominen allegation of silliness.” I was restating my good reason that you had previously ignored.

You state, “I looked at samples below this cutoff and it did not have a big effect on the distribution.” That may be true for the single season level, but it is decidedly untrue at the multi-season level. Thus, it is either irrelevant or misleading to restate that point in addressing my question.

And then I argued why it was incorrect to use a cumulative sample, because the issue is the availability of “replacement-level” talent at any given time.

You state, “When you look at cumulative PAs over more than a year, you are getting lots of small samples that are not akin to the typical groupings of at-bats that players take regularly. They are also spread over time so it’s not clear what level of talent is being represented.” They are mostly spread over two consecutive years. I’m doubtful that talent level changes so radically in the span of one year that this data becomes unusable. In fact, your aging curves suggest that talent level changes very slowly. If you accept that finding, these samples should be very useful to you. To discard a large portion of the data set that radically changes your conclusion, you ought to have a better reason than simply thinking the data might have problems.

Yes aging changes are minor over time. Age was not mentioned in my argument. I do not think that a few scattered PAs over several years are the same as PAs accrued in a single season when measuring talent. You are free to disagree. But that point was an aside, which I initially stated and restated in a follow-up comment. I took your argument at face value (even though I do think it is silly) and explained why I felt these observations were not relevant to the question at hand. The main reason for not including these observations is because it includes a group of players who are not readily available to serve as replacements. That someone took 150 PAs from 1994–1996 is irrelevant to the availability of marginal players today.

If you include this data, it demonstrates quite clearly that your conclusion is wrong. If you want to exclude this data, you ought to demonstrate with evidence-based tests why you can throw it out and maintain any sort of integrity in your conclusion.

Again, my argument is an economic one. Are replacement players cheap and abundant? I say no, because in any given year there does not appear to be a glut of talent at the bottom of the talent spectrum. Many players have come and gone in the past, but they are gone and not part of the available talent pool.

I have entertained your critiques and explained why I do not agree with them multiple times. This exchange is going nowhere, and I see no reason to continue it.

]]>
By: Mike Fast http://www.sabernomics.com/sabernomics/index.php/2010/12/agreeing-and-disagreeing-with-bill-james/comment-page-1/#comment-110960 Mon, 20 Dec 2010 22:04:55 +0000 http://www.sabernomics.com/sabernomics/?p=3600#comment-110960 So if you remove a very large number of players who are below average because they aren’t good enough to stick around, then you can get the distribution to approximate a normal distribution? That’s a tautology, and surely you can see that.

You state, “the reason I use the 100-PAs cutoff is that I need a sufficient sample size to measure talent level.” That’s irrelevant to my point, and I don’t know why you state it. I also used a 100-PA cutoff.

You state, “I looked at samples below this cutoff and it did not have a big effect on the distribution.” That may be true for the single season level, but it is decidedly untrue at the multi-season level. Thus, it is either irrelevant or misleading to restate that point in addressing my question.

You state, “When you look at cumulative PAs over more than a year, you are getting lots of small samples that are not akin to the typical groupings of at-bats that players take regularly. They are also spread over time so it’s not clear what level of talent is being represented.” They are mostly spread over two consecutive years. I’m doubtful that talent level changes so radically in the span of one year that this data becomes unusable. In fact, your aging curves suggest that talent level changes very slowly. If you accept that finding, these samples should be very useful to you. To discard a large portion of the data set that radically changes your conclusion, you ought to have a better reason than simply thinking the data might have problems.

If you include this data, it demonstrates quite clearly that your conclusion is wrong. If you want to exclude this data, you ought to demonstrate with evidence-based tests why you can throw it out and maintain any sort of integrity in your conclusion.

]]>
By: JC http://www.sabernomics.com/sabernomics/index.php/2010/12/agreeing-and-disagreeing-with-bill-james/comment-page-1/#comment-110959 Mon, 20 Dec 2010 21:39:40 +0000 http://www.sabernomics.com/sabernomics/?p=3600#comment-110959 Mike,

As I previously stated (in the comments to this post and in my book), the reason I use the 100-PAs cutoff is that I need a sufficient sample size to measure talent level. There is a tradeoff here, because I lose observations from players below that level. I looked at samples below this cutoff and it did not have a big effect on the distribution.

When you look at cumulative PAs over more than a year, you are getting lots of small samples that are not akin to the typical groupings of at-bats that players take regularly. They are also spread over time so it’s not clear what level of talent is being represented. That is why I think it would be silly to include them in the sample; however, as I said, “aside from the silliness,” it is not correct to sum the frequency of bad players over time to tell us something about the availability of bottom level talent in the league in any given year. There are going to be a few good players who stick around and the players who do not stick around will obviously be worse. Thus, when looking for how much talent might be around when a team is looking to fill its roster, the fact that there were some bad players in the league several years ago is irrelevant. Those guys are out of the talent-well and gone from baseball. There are lots of players who get to play baseball who are not very good and don’t get to play long. I’m not disputing this. I also responded that it is incorrect to look at cumulative totals above and below benchmarks (e.g., 10% and SD) because there is an upper bound to how good a player can be. The inferior tail of the talent distribution includes more observations, not because it is fatter than the superior tail, but that it is longer.

The initial James statement that I was responding to claimed that baseball talent was not normally distributed, and that “For every player who is 10 percent above the average player, there are probably twenty players who are 10 pecent below average.” This implies that the frequency of observations below average should be greater than the frequency above in reverse proportions. This is clearly not the case.

]]>
By: Mike Fast http://www.sabernomics.com/sabernomics/index.php/2010/12/agreeing-and-disagreeing-with-bill-james/comment-page-1/#comment-110957 Mon, 20 Dec 2010 19:30:09 +0000 http://www.sabernomics.com/sabernomics/?p=3600#comment-110957 So you don’t think that the players who got 100 PA distributed over two or three years are a good representation of replacement level? Why not? They would seem to be a perfect representation. Yet you purposefully exclude them without giving a good reason, other than an ad hominen allegation of silliness.

If it radically changes your distribution, which it does, you might want to consider why.

(My previous post giving details about the 100-PA multiple-season players was cross-posted with your preceding post, in case that isn’t clear from the context.)

]]>
By: Mike Fast http://www.sabernomics.com/sabernomics/index.php/2010/12/agreeing-and-disagreeing-with-bill-james/comment-page-1/#comment-110956 Mon, 20 Dec 2010 19:23:58 +0000 http://www.sabernomics.com/sabernomics/?p=3600#comment-110956 First, I had included pitchers hitting in my original numbers. I should have removed those. I also realized that my data only ran through 2007 instead of 2009. Neither of those change the conclusion substantially.

There were 348 hitters with at least 100 PA above average in OPS, and 958 such hitters below average, from 1994-2007.

I also incorrectly counted the standard deviations. There were 75 hitters more than one standard deviation above average in OPS and 422 hitters more than one standard deviation below average (not including pitchers hitting).

It appears to me that you simply repeated your previous study and expanded the years rather than considering the issue of turnover that Pete and I have raised. You should look at players that had at least 100 plate appearances over the whole time period 1994-2009 and not restrict yourself to players who had at least 100 PA in a given single year.

To give a few specific examples, are you including players like Greg LaRocca, Mike Rouse, Fausto Cruz, Les Norman, and Jim Tatum in your sample? They all had 100 PA or more during the period 1994-2009, but spread across more than one year.

]]>
By: cliff http://www.sabernomics.com/sabernomics/index.php/2010/12/agreeing-and-disagreeing-with-bill-james/comment-page-1/#comment-110955 Mon, 20 Dec 2010 19:23:08 +0000 http://www.sabernomics.com/sabernomics/?p=3600#comment-110955 Let’s try a different athletic endeavor.

Sprinting.

Just for discussion, let’s say that roughly the number of people who have ever run a verified sub 10.00 100 meters is 50. Then, the number sub 10.2 is probably 500. The number sub 10.5 is probably 10000. Etc.

I don’t understand why that argument is not applicable to baseball.

I think a problem is that players that we want to judge in terms of “replacement level” (#5 starting pitchers and “worst starting position player”) are actually usually above replacement level.

For most teams there are about 2 to 3 slots on their roster that call for a replacement player: #4 righthanded reliever, #3 lefthanded reliever, # 5 outfielder, #3 catcher, #2 backup middle infielder. And there are lots and lots of those.

]]>
By: JC http://www.sabernomics.com/sabernomics/index.php/2010/12/agreeing-and-disagreeing-with-bill-james/comment-page-1/#comment-110954 Mon, 20 Dec 2010 19:14:10 +0000 http://www.sabernomics.com/sabernomics/?p=3600#comment-110954 Aside from the silliness of looking at players who gather 100 PAs over many years, I am not arguing that there are just as many good players as bad players. In fact, I am arguing that there are many more bad players than good players which is why a certain level of talent is not just lying freely available for teams to pick up. I stated this in my response to Pete. In any given season, when you have to dip down for a replacement player you are going to pick up an inferior player. There is not a glut of talent clustered at the hypothesized replacement level.

]]>
By: Mike Fast http://www.sabernomics.com/sabernomics/index.php/2010/12/agreeing-and-disagreeing-with-bill-james/comment-page-1/#comment-110952 Mon, 20 Dec 2010 18:50:00 +0000 http://www.sabernomics.com/sabernomics/?p=3600#comment-110952 JC, you should look at any hitter who had at least 100 plate appearances in total during the whole time period, not just those who had 100 plate appearances in any given single year. You just repeated your same study with the same error on a larger scale and did not consider the point about turnover that Pete and I have raised.

]]>