Filed under:

# A Commentary on Advanced Statistics: Corsilogical or Corsillogical?

My summer of discontent resulted in a lot of work, drinking, and thinking about hockey that yielded some interesting thoughts and conclusions.

### Corsilogical or Corsillogical?

#### Part 1: Background

It will be of little mystery to most of you that I am not the largest proponent of advanced stats and their current place in hockey.  My reasoning or at least the root cause for such has always been a bit of a mystery to me however.  Over the past year and during this summer of discontent, I may have discovered what has been nagging me.

To start, I cannot do math. I tried to add three double digits numbers and got 45,000.  Twice.  With a calculator.  Let’s just get that out the way. However, math and statistics are very important and especially vital in analyzing situations such as sports because they bring a level of explication de texte that adds context and deeper levels of understanding.  However, my annoyance with the so-called "advanced stats crowd" in hockey actually started a few years ago.

The Avs had a very bad season and I believe it was the year (2010-2011) that they landed Landeskog.  I ventured over to Hockey Wilderness and they were bashing the Avs as being a terrible team and, in some ways, rightly so.  I explained to them that the Avs had a lot of man games lost but I was rejected because man-games lost didn’t matter as the Avs sucked anyway.  That was the year they had some absurd number of injury-lost games, something like 480.

However, at that time, the editor of Hockey Wilderness, and the leader of this argument, was the despicable BRey who was unfortunately still plaguing the Internet with his asinine and reprehensible thoughts.  Lets fast-forward to the next year when the Wild had a bad, injury-plagued season.  I, of course, honorably went over to their blog to see their thoughts.  I read as BRey explained their team was good but injuries got them in the end.  I referenced our former argument and was loudly rejected. He explained that "his numbers" showed differences between the teams.  The NHL "official" stats showed the Avs lost more games the season before than the Wild did, but he said he had other data.  He said he had numbers that supported his argument.  However, he refused to show me, all the while proclaiming his correctness. He held his holier-than-thou, almost pharisaical, attitude that he was right because he had data, even though he would not produce it. (You can read about this in my fan post titled "Down Goes BRey")

Regardless of the numbers and regardless of who was right, the way he used stats and the way they were proverbially held over my head bothered me.

#### Part 2: Prediction

As the years progressed, the nature with which advanced stats were used made me more splenetic.  At first, I thought I hated the use of advanced stats because they were used and paraded around as a predicative, prophetic, glass ball.  I understand that not everyone does this now or even then but I think we would all be lying to ourselves if we didn’t admit that when these stats first came on the scene they were used inappropriately by a lot of people.

In their nascency, the arguments manifested in this back and forth of "oh you are using your eyes, but I have stats.  My stats say that if you get outshot, you’ll lose over the course of an appropriate sampling of time."  At first, it was wildly annoying how advanced statistics were used as some sort of heroic, statistical savior for all hockey analysis that can provide omniscient foresight.  The lines blurred between analyses and predictions.  However, as the understanding and use of the stats advanced, pun intended, I realized that this prediction-ridden use not the issue I had, at all.

Coming closer to the present day, advanced stats progressed deeper into the general hockey world and became more prevalent in discussions.  The advocates for advanced stats learned to better frame their argument in a much more persuasive way and made it clear that most of them want to use stats like Corsi, Fenwick, or PDO as their way of analyzing games, instead of using just their eyes and visual interpretation; a fair and sensible approach.

I think there was still a sense of the oracular to it, but more in the sense that if you believe that the numbers make sense then they should logically reach a certain outcome.  However, I still would argue my points and the response now became a sense of "well, I have data and you don’t."  Instead of predictive, Nostradamus-like tones it became academic condescension.  My arguments about hockey, using my experience playing and my visual analysis, were less and essentially meaningless compared to people analyzing the game based on numbers, statistics, and hard, factual, scientific, empirical, and analytical data as opposed to intangible, theoretical, and archaic ideas such as "compete" or "grit" or "leadership."  Akin, if you will, to the professor of Shakespeare telling the student that unless he can reference all the male antagonists throughout the First Folio, he cannot draw conclusions as to the actual intentions of Iago.

#### Part 3: Now

In the present day, the advanced stats crowd has gained recognition and I congratulate those who have been hired by NHL teams to bring analytics into the mainstream and in actuality use their skills to help teams succeed (or try with futility in Toronto).  However, there still remains disagreement.  For example, Moser was lambasted and mocked on twitter by people because he rejected advanced stats and said he would prefer to trust hard work and grit.  He’s old school; twitter users are predominantly not.  The analytical battle still exists and will likely exist for a while, but for me, personally, my disagreement or least discontent with advanced stats was still unclear or least uneasily grounded, until now, I think.

Over the summer, I was reading an article about a political/scientific argument and it dawned on me that I might know why I am still dismissive or least have a constantly level of aporia regarding advanced stats.

Currently, there are a variety of websites (less so now, thanks Toronto!) that display a myriad of different types of analytical data: Corsi, Fenwick, PDO, Relative Corsi, etc.  You all know what they are.  However, all of these are based and drawn from stats that have existed for a while: shots against, shots for, shooting percentage, save percentage and more recently missed shots and blocked shots.  These are all shots based. (DISCLAIMER: I understand that time on ice is also used but going forward I will refer to these as shot-based statistics because in the end shot totals are predominantly used and necessary)

Basically, and not to diminish them at all, advanced stats are advanced, in-depth analysis of existing statistics.  To be fair, this analysis is sophisticated, clever, intelligent, and impressive, but it is still based in stats that have existed for years and even decades.  The analysis is new but the underlying statistics are not.

That is, until now.  Corey Sznajder over at www.shutdownline.com began tracking zone entry; this stat has not existed previously, at least not publicly.  We know that Roy for example calculates his own types of data such as offensive zone possession.  Once I began to read Corey’s articles, I realized that my issue with advanced stats is that they are under-developed.  Simply put, advanced stats as they currently exist are incomplete and that is what bothers me.

Let’s the take the Avs as an example:

Last year they were a playoff team and won the division.  However, they were grossly outshot and had horrible possession numbers.  Yet, their zone entry was great.  If you look at only possession they are a lottery team but if you look at this other stat they are in the top of the league.  There’s a disconnect here, at least prima facie, and to my legal brain there is a flaw.  We could argue how this all fits in the end but the point is that if you assume, as a posit, that stats predict the future; you have two different types of hard data that shows conflicting outcomes.

Is one right?  I have no bloody idea at all. My point is that advanced stats as they exist are patently incomplete.  I believe they are accurate for what they have to use but grossly inaccurate in the larger scope of hockey without more data and statistics that hitherto do not exist.  There are a few debates in our current world, outside of hockey, that I could relate this too but I don’t want to derail this article.

I could find 600 new stats and the Avs may still look like they should suck and the Kings look like they should be world-beaters.  Or the opposite could occur.  Or something in the middle.

To give you an example of this, I work with relatively sophisticated trade contracts.  Now, as we all know I am a drunk, well dressed, idiot, so understand these contracts are sophisticated only in the sense that the solutions are complicated; so I am not smart.  However, in my world there is a lot of data out there when it comes to derivatives but no one can really use it.  The government tries but there is so much that there really isn’t a good way of analyzing it.  People try and do fairly well, but no one has the abilities to use it or, better said, no one has a way to document or calculate it. High-frequency trading is a hot button topic now, courtesy of Michael Lewis and his Flash Boys book.  This had been going on for years but no one could catch it because there was really no way of using the data that was floating around.  The regulators and government could do nothing with it.  They used what they had, akin to using only shot statistics, but couldn’t use everything else.  They tried to find criminal actions and drew faulty or at least incomplete conclusions based on incomplete data. As such, this high-frequency trading went on for years, caused market manipulation, and it took years until technology advanced in order for the authorities to fully interpret the situation and discover the truth.

Understand I am drawing an imperfect analogy to show that this type of incompleteness exists everywhere.  My point is that what we, as hockey fans, have now, in the terms of analytical data, at our disposal to analyze this wonderful game is simply incomplete.

#### Part 4: Conclusion

I do not want to show disrespect to the advanced stats community because I do believe they serve an instructive purpose. However, most, if not all, of their stats rely on shots.  Either on net, missed, blocked, or otherwise.  That’s too narro w for me to trust; it focuses on a single, though important, aspect of the game.

To be fair, there are two tenants in hockey: 1) Don’t get outshot and 2) Never ask the coach for playing time.  Ask any player from the NHL to Squirts, they will tell you this.  If you get outshot you lose, full stop.

However, that is not enough for me to trust, use, and believe in the current state of advanced stats.  I think they serve a purpose, as part of the analysis, but there is so much more that goes into this glorious game.  Baseball, I think, is more easily quantifiable whereas hockey is more nuanced.

I believe Corsi, Fenwick, and PDO are important but wildly and completely incapable of predicting hockey because they rely on one part of the game.  Off the top of my head the below are other types of game data that I would love to have. If we could quantify, calculate, and analyze these, along with shot-based stats, I think advanced stats would be so very valuable.  But sadly these are not yet measurable or calculable; these stats are things that I watch, but cannot quantify:

Zone entry chances, time with puck in each zone, time with puck upon entry into zone, time below the goal line, time in the shooting zone, shot quality (whatever that means), defensive control without the puck, time the puck is on the board in each zone, time taken to enter the zone, time taken to regain possession after each loss of possession, time from entry into zone to time a shot is taken, distance of each shot cross-referenced to time in zone, faceoffs taken in the defensive zone that end with the puck outside the zone, offensive zone face-offs lost resulting in opposing team shots, time delay between zone entry and shots from the middle of the ice compared to the boards, average rebound distance, average shot distance upon zone entry both offensively and defensively, average pass distance, possession metrics of defensemen compared to offense based on time on ice/shift totals cross-referenced to shots on net, totals of different types of (1) breakout, (2) neutral zone entries, (2) offensive zone entries, total number of "mishaps" such as (1) broken sticks, (2) accidental falls, etc. These are just a few.

You can see what I am going for here.  These are all things I notice that are not calculated into the current advanced stat world and to be fair they won’t be for a while.  But without this other types of information, I believe advanced stats are incomplete; as such incomplete data cannot be used in any predicative sense and cannot be used to refute what I am able to see, analyze, and interpret when I watch a game.

In sum, we have stats and data (e.g. Avs horrible possession stats) that clearly and evidently show certain logical and maybe probable outcomes (e.g. more losses) but that data is conflicted by other data (e.g. Avs' great zone entry).  There is a disconnect for me and until that is somehow remedied by more complete and exhaustive data, I will still have disagreements with conclusions drawn solely from the use of shot-based statistics.  In the end, Corsi and any similar derivative analysis may be 100% or nearly 100% accurate, but as of now I do not believe it is nor has it proved to be so.  Until then, I do not think I am in the wrong and it runs counter to my method of logic, analysis, and general thinking to put too much faith and reliance in something that I deem incomplete.