I owe you a blog on Joey Galloway and his place among the all-time greats.
Also, Football Outsiders posted a link to
the post that
necessitated an explanation of Joey Galloway’s place among the all-time greats.
Interesting discussion ensued. The folks over
there brought up some points that I need to address. I’ll do that here, and try to integrate the Galloway
First, let me elaborate a bit on my closing words.
Before I end today’s entry with the list, keep in mind that a methodology stands or falls on its merits,
independent of the results it generates. If you like the methodology, you are not allowed to complain about the
list. If you don’t like the methodology, you shouldn’t even be looking at the list. But no fair changing your
mind after peeking.
I do believe this in theory but I’m also aware, as I betrayed in the first paragraph of the same post, that it’s
not a feasible rule to live by in practice. I am a football fan and am therefore interested in who the greatest
wide receivers of all time are. A big part of putting together lists like these is to learn something I didn’t
already know about football history. Equally interesting is the exercise of thinking through precisely what
criteria we want to use and how they fit together. To me, this is a fascinating puzzle where you have mountains
of data that is only very loosely connected and you have to make sense of it. It’s an academic exercise, but what
do you expect? I’m an academic.
I’ve been around the internet block enough to realize that if you post an all-time list with Joey Galloway at #9, any thoughtful
critique of the method will at best be overcome by discussions about who really ought to be
at #9 or, much more likely, by allegations of crack use. That closing paragraph was my best attempt at a lightning rod.
Bill James — I think it was him, but I have a habit of assuming every good idea I’ve ever read is his — used
to have an 80/20 rule (note that this is not the same as
Pareto’s apparently more well-known
80/20 rule). A methodology is good if it gives you about 80% what you’d expect and 20% surprises. If you’re
getting more than 80% “right” answers, that means that the method was probably rigged to match your preconceived
notions. You can’t learn anything from a method that simply confirms what you already thought you knew. On the
other hand, if you get more than 20% suprises it means that the method, while possibly sound in theory, has too
many loopholes. Or maybe it’s just garbage.
I will concede that the method, as currently constituted, has a surprise rate north of 20%.
Joey Galloway can’t possibly be the 9th best receiver of all-time. He just can’t. But, if you read through
the first part of the post saying, “this sounds OK,” then the proper response to seeing his name in that spot
is not to say, “well, so much for that.” The proper response is to take a long hard look at Joey Galloway’s
career. Maybe you’ll learn that he’s better than you thought he was. Maybe he’ll lead you to a loophole in your
method. Probably some of both. At this point, you’ve learned something about the topic you were trying to learn
about (ranking the all-time great wide receivers), and you set to work on closing the Galloway loophole without
opening any others.
So what exactly is the Galloway loophole?
Well, first let me describe the method in slightly different terms, still using the college football analogy.
How do you objectively rank college football teams? Whether you’re using a fancy computer scheme or having a
casual water-cooler conversation, the method is essentially the same: you start with the team’s record and you
adjust it to account for strength of schedule.
Galloway has played games against Brian Blades, Mike Pritchard, James McKnight, Ricky Proehl, Tim Brown, Terry
Glenn, Darnay Scott, Sean Dawkins, and Rocket Ismail, and he has won most of those games. So he has a good
record. What about strength of schedule? One argument made by those who champion computer ranking systems for
college football is that, in order to know how Auburn compares to USC, you have to know how Oregon compares to
Arkansas, how Arizona compares to Ole Miss, and so on. Intutition only gets you so far. Our brains are only
capable of processing so much information, so we lazily lump Oregon and Arkansas together as “mediocre” and we
call Arizona and Ole Miss “bad.” Many people won’t even go that far, and lump all four into a category called
“unranked,” even though there are significant differences among them. Same thing here. You’ve probably spent lots of time thinking about how Tim Brown and Cris Carter
compare. You probably haven’t spent any time thinking about how Mike Pritchard and Ike Hilliard compare.
Well, it turns out that maybe Brian Blades and Ricky Proehl and Mike Prtichard and Darnay Scott and Rocket Ismail
and Terry Glenn aren’t half bad. I won’t bore you with the particulars, and I won’t try to convince you
that any of these guys is Steve Largent, but all them had a fair amount of success at various points in their
careers. And in almost every case, they had more success competing against other receivers than they did
competing against Galloway.
Two years ago, Miami (Ohio) started appearing in the top 5 of some of the computer polls. Critics thought it was
ridiculous and mocked it accordingly. But when it comes down to it, why were they ranked so darned high?
Because they were 13-1. “Yeah, but who did they beat?” ask the mockers, “Bowling Green and Marshall. Pfft.”
But if you take a close look at it, Bowling Green and Marshall weren’t so bad. Ultimately, Miami was ranked so
high because they beat almost every team they played, and because the teams they played were beating other
teams as often as not. I shouldn’t stretch this parallel too far, but you’ve probably figured out that Galloway
is Miami (Ohio), Brian Blades is Marshall, and so on.
Galloway has very frequently led his team in receiving, and the other receivers on his teams — while admittedly
seeming more Bowling Green-ish than Florida State-like — have at several different times led other
receiving. If you think my initial premise is reasonable, then isn’t this evidence that Galloway is pretty good?
To press the point, what’s the evidence that Galloway isn’t good? His stats? As we are all aware, stats
are, in part, the products of context. To take one example of that, the quarterbacks on Galloway’s teams have
been: Mirer, Mirer, Moon (age
41), Moon (age 42), Kitna, Aikman (his final season), Quincy Carter, Chad Hutchinson, Quincy Carter, Brian Griese.
What wide receiver is going to put up numbers with those guys throwing to him? Apparently, none. But Galloway
has done the best job of making something of a bad situation.
OK, fine. Maybe you’re willing to consider that Galloway is not as dreadful as you thought, but number 9??!
That’s a bit much. I agree. Unfortunately, there’s not a loophole here that can be fixed. What there is is a
idiosynchratic issues that happen to all be working in Galloway’s favor. The situation is similar to if Miami
happened to play Bowling Green on four days rest, Marshall when their star running back was hurt, some other team
in severe weather conditions that favored Miami, etc.
One issue with the system is that it punishes players pretty heavily for really terrible
seasons. It was pointed out in the Football Outsiders discussion that Mark Duper ranked #28 and Mark Clayton
was unranked (he’s #58), which is curious because the two played on the same team and posted very similar
receiving yards totals for most of their careers. The difference is that Clayton had the 6/114/0 rookie year and
the 32/331/3 swan song in Green Bay, whereas Duper quit while he was ahead (Anthony Miller is another player who
rates surprisingly high in large part because he, like Galloway, never had a total dog of a season). If Clayton’s
rookie year had
been 500 yards instead of 100 and he would have retired a year earlier, he leaps ahead of Duper to #25.
Likewise, if Tim Brown had retired instead of playing for the Bucs last year, it would have dropped Galloway a
couple of spots. Tim Brown’s 2004 numbers have almost no bearing on an assessment of Tim Brown’s career,
much less Joey Galloway’s. This is a problem.
Tampa rookie Michael Clayton does not yet qualify as relevant and so was lumped in as a pseudoreceiver. This
dilutes the strength of Clayton’s victory over Galloway last year. The same is true of Antonio Bryant,
who beat Galloway twice with Dallas. If you tinker with the definition of relevance to include guys like Clayton
and Bryant, you can shove Galloway almost out of the top 20 without changing the top of the list very much at all.
But this has unintended consequences, of course. To name a couple, Randy Moss and Marvin Harrison drop even
further down than they already were. It’s not clear which list is more “right.”
Ricky Proehl’s career is just plain funny. The only really bad years he ever had were the years he was
Galloway’s teammate. I doubt much of this is attributable to Galloway, but Galloway gets credit for it
nonetheless. Similar deal with Darnay Scott.
The common theme here is that too much emphasis is placed on distinctions that aren’t meaningful. It’s not
relevant in any real sense whether Tim Brown had 200 yards or 500 yards this year. It’s not relevant whether
Darnay Scott had 200 yards or zero yards for the 2001 Cowboys.
A human can see that Brown’s 2004 and Scott’s 2001
are irrelevant, but it’s not clear how to tell the computer exactly which seasons to throw out.
Probably the most serious issue is that Colley’s algorithm was designed to work in a situation where about 100
teams are playing about 10 games each. I am forcing it into a situation where there are over a thousand teams,
some of whom are playing one game and some of whom are playing nearly 50. Colley’s algorithm inherently assumes
some level of Kevin Bacon-style connectivity, and this is achieved in a college football season. But it’s not a
achieved here. And that’s a real problem. Terrell Owens, for example, only has games against pseudoreceivers,
Jerry Rice and one against Todd Pinkston. This is the equivalent of the Texas Longhorns having a schedule
consisting of five games against OU (losing 4), seven games against Sam Houston St., and one against Baylor. The
human pollsters would have no idea where to rank Texas, and neither would the computers.
There are a handful of parameters in my algorithm — like how many guys are deemed relevant, what the age
adjustments are, etc. — that I can tweak without violating the spirit of the idea. The resulting lists would be more or less reasonable, but
ultimately I don’t think this method is capable of producing an all-time list that will pass the 80-20 test. That’s
a shame, because I really liked the idea.
But, as always, it was still worth the time spent. I’ve gained a new appreciation for Joey Galloway and Anthony Miller, and I’ve given J.C. a break from the grind known as Sabernomics. I’ll see you at next year’s Super Bowl Extravaganza.