Edit: This version includes updated images, as they weren't loading properly before.
Over the past several decades, and especially recently, advanced metrics in sports have become more widely known. The revolution started in baseball, which offered the convenience of isolated one-on-one matchups, dozens of easy-to-measure outcomes, and the luxury of a 162 game sample size each season. This has spread to other sports as well; advanced football metrics such as DVOA (link) are now cited by ESPN writers, and the addition of specialty tracking cameras in all of the NBA arenas offers a ton of promise (link). These are exciting times for fans of both sports and numbers (but especially both).
Hockey has yet to experience the kind of full-scale analytic revolution that these sports have over the past several decades, and it is unclear if it will ever do so given the flow-based nature of the game. Goals in hockey are few and far between, and there are frequently game-deciding plays that don't show up at all in the score sheet (especially on defense). Interesting research such as the shot quality project (link) offer intriguing potential, but generally the "advanced stats" community in hockey leans heavily on two very similar concepts: Corsi and Fenwick. The idea of both is that they provide a more accurate picture of a team's "true" underlying talent level, and we have good reason to believe a team's "true" talent level is going to be better than just their past record for forecasting future performance, which is what we really care about here (those intimately familiar with the stats can skip the next two paragraphs).
The logic behind both is simple and goes as follows: a 7-1 shellacking in which you chase the other team's goalie before the end of the first period counts as a win, but so does a 2-1 nail-biter in which you score on 2 fluke goals and spend what seems like the entire night playing in your own end. Both games count as wins, though, so over the course of a full season two teams with the same number of wins can (and often do) have very different talent levels due to luck. Looking at goal differential provides a more granular approach and, to that extent, allows for a better gauge of a team's "true" talent level as it strips away as many lucky bounces, phantom penalties, and once-in-a-lifetime performances from career AHLers as possible.
But individual goals are still hugely influenced by luck. There aren't that many more goals than there are games so this still leaves plenty of opportunity for chance to dominate skill. To dig further, you can look at shots. This is basically what Corsi and Fenwick do; they start with shot differential, but include missed shots as well (the difference being that Fenwick excludes blocked shots). As this excellent blog post from The Score points out, these numbers really aren't that advanced and simply rely on counting. They have predictive power not because all shots are created equal, but because the same underlying mechanisms that create good shots are the ones that create more shots; shot attempts are the quantifiable residue of those "good hockey processes" the old school types love to talk about, and thus they can help us figure out how good a team really is. These measures are simple, fairly intuitive, and about the best we can do with publicly available data right now.
But if this is really the best we can do, what are we missing out on? Put another way, how do these hockey metrics compare to those from other sports in which we have numbers that we know are better? Especially in baseball, we have a vast spectrum of stats that we can use to describe and predict performance. To some extent, this is true of basketball and football as well. I set out to find analogous predictive measures in baseball to give an idea of how useful these indicators really were. I only attempted to compare baseball and hockey in the 2013 seasons for now but the goal is to expand the time horizon and look at comparisons to football and basketball (as well as individual player performance) next if it's something that generates interest.
In hockey, we can see that goal percentage is pretty well correlated with point percentage. The graph below shows point % [(wins + 0.5*OT) / (games played)] plotted against GF% [(goals for) / (goals for + goals against)] when the score was close and each team had 5 skaters. This eliminates special teams and the scoring effects such as teams playing more defensively after getting a big lead. The data come from the excellent site Extra Skater (link), which unfortunately only has full-season data for the 2013 campaign as far as I can tell. The lockout reduces our sample size but overall the relationship shows pretty much what you'd expect it to.
Not too bad! An r squared of about 0.8 shows that outscoring your opponent is in fact correlated with winning (who knew?!). Let's see what baseball can tell us. Over the course of the 2013 season, runs also correlated very closely with winning percentage (r squared of 0.91).
That looks like a better fit than hockey, but that's not particularly surprising given the much larger sample size (162 games vs 48 for our lockout-shortened hockey season). So far, so good. Now let's look at Corsi and Fenwick:
The explanatory power dips significantly compared to goals, but that's fine- these stats are meant to be predictive as opposed to descriptive (see the blog post I cited earlier for a more thorough discussion of that evidence) so we care more about their ability to make educated guesses of underlying skill than of explaining past wins.
Now let's see if we can try to "Corsify" baseball (all data courtesy of FanGraphs). I decided that an interesting comparison for shots would be to look at pitches. Good teams will face more pitches since they get more hits and walks. Their pitchers will also give up fewer hits and walks. It won't hold in every game, but badly outshot hockey teams win every once in a while too. Let's see what the pitch % looked like compared to win % in the 2013 season.
It looks like the relationship between pitch % and winning in baseball is stronger than either Corsi or Fenwick and point % in hockey. I'll be the first to point out that this is not a perfect apples-to-apples comparison. We don't know how much predictive power, if any, this made-up pitch % number has, and it is drawn from a MUCH larger sample size (around 45,000 total pitches thrown + faced per season per team vs about 1,500 shots faced + taken). All those caveats aside, I don't think the comparison is crazy; good teams tend to have more shot attempts than their opponents, and good baseball teams tend to face more pitches than they throw. Now let's see what else those spoiled baseball fans have to work with.
I chose FIP and walk rate for this example. FIP, or Fielding Independent Pitching, looks at only walks, strikeouts, and home runs (see here for more detail). It is accepted as more demonstrative of underlying skill than ERA, the standard measure by which the "old school" is said to judge pitcher performance. The general idea is that if you strike more batters and walk fewer, you're a better pitcher (like ERA, lower FIP numbers are better, so we expect the line will slope down), and that it isn't really a pitcher's fault if their defense sucks. For offense, I just picked BB%, which is a measure of the rate at which the batting team drew walks. Like Corsi/Fenwick, both these indicators have been shown to have meaningful predictive power.
Wow. Each of these completely disregard an entire half of the game, yet their r squared values are both higher than either Corsi or Fenwick. So what does all this mean? Basically, someone saying "they shouldn't have won that hockey game; their Corsi was way too low!" is very much like saying "they shouldn't have won that baseball game; they didn't draw enough walks!" Even baseball's middling statistics seem to be at least as useful as the best ones that we have in hockey in terms of explaining current performance. Baseball fans are spoiled.
There are of course some important caveats to all of this that will sound familiar to you stat-heads out there but bear repeating. The biggest is that we are judging a series of predictive metrics here on their descriptive abilities (since we are comparing them to wins in the present). Comparisons with present instead of future wins brings up the correlation/causation problem. To illustrate that, imagine what the line would look like if you plotted the time a team spent with a pulled goalie at the end of games vs their point % in a given season; very likely, you would see that teams who pulled their goalies more often ended up losing. Since teams that frequently get outscored pull their goalie more often, the trend line would likely slope downward. Teams that have to play from behind all the time may not be very good; that doesn't mean, however, that leaving your goalie in when you're down a goal with 30 seconds to go is a good way to increase your chances of winning! And it almost certainly isn't better at predicting future wins than their current record, which isn't even as good as other metrics we have. There's also the issue of sample size. Having at least a few more seasons of data would be hugely useful, but I could only find one season on Extra Skater and the thought of compiling it manually makes my stomach hurt. So if any of you know of a good place to get these historical numbers, let me know.