Bear with me, this got long, but I have lots of what I think is quite interesting info below:
I just finished my first year of graduate school getting a MA and PhD in Psychology. This quarter I took a class on multiple regression statistics. We had to do a project at the end of the quarter using MR which was due earlier today. Since the research I do doesn't lend itself to MR analysis (we primarily use ANOVA if anyone cares), I had to figure out something else to analyze. What I decided to do was to use MR to analyze NHL player statistics and salaries to determine how players performed in relation to the amount of money they were paid this past season.
Without going too in to the nitty-gritty details about what I did (frankly it's boring and not what's really interesting), basically what I did was take what I called "performance statistics" (Games Played, Goals, Assists, +/-, Penalty Minutes, Shots, and Time On Ice) and regressed them on player cap hits. All stats and salaries were taken from this 2008-2009 regular season. I also excluded all players who played less than 20 games during the regular season (sorry Sakic). I did this separately for defensemen and forwards, because I suspected that different stats would better predict higher salaries between the positions. Through my analysis I found that this was true. For forwards, I found all of the stats to be significant predictors of salary. All but +/- were positively correlated with higher salaries. This means that higher values (e.g. more goals) on these stats predicted that a player would be paid more. I'm not sure why +/- was negative, but whatever. For defensemen, only Assists and TOI were significant predictors.
What this then let me do was use each player's stats to calculate a predicted salary based upon their season performance. I then took each player's predicted salary and divided it by their actual salary. This gave me a percent difference score (so a player who made 2 million and had a predicted salary of 4 million would have a %diff score of 2.0). The higher a %diff score a player had, the better they performed above their salary.
So, the results:
The highest rated players:
1. Kyle Quincey - D - LAK - Cap Hit = $0.525 million, Predicted = $3.715 million, %diff = 7.08
72 GP, 4 G, 34 A, -5, 63 PIM, 150 S, 20:58 TOI
2. Alexander Edler - D - VAN - Cap Hit = $0.550 million, Predicted = $3.523 million, %diff = 6.41
80 GP, 10 G, 27 A, +11, 54 PIM, 145 S, 21:07 TOI
3. Alexandre Burrows - LW - VAN - Cap Hit = $0.483, Predicted = $2.522 million, %diff = 5.22
82 GP, 28 G, 23 A, +23, 150 PIM, 175 S, 16:50 TOI
The lowest rated players:
667. Mike Weaver - D - STL - Cap Hit = $0.700 million, Predicted = $-1.624 million, %diff = -2.32
58 GP, 0 G, 7 A, -3, 12 PIM, 36 S, 17:15 TOI
666. Darcy Hordichuk - LW - VAN - Cap Hit = $0.775 million, Predicted = $-0.594 million, %diff = -1.37
73 GP, 4 G, 1 A, +1, 109 PIM, 26 S, 5:31 TOI
665. Raitis Ivanis - LW - LAK - Cap hit = $0.600 million, Predicted = $-0.429 million, %diff = -1.03
76 GP, 2 G, 0 A, -8, 145 PIM, 25 S, 6:22 TOI
And the highest/lowest Avs players:
26. Paul Stastny - C - COL - Cap Hit = $0.850 million, Predicted = $3.078 million, %diff = 3.62
45 GP, 11 G, 25 A, -9, 22 PIM, 118 S, 21:14 TOI
567. Adam Foote - D - COL - Cap Hit = $3.000 million, Predicted = $1.814 million, %diff = 0.61
42 GP, 1 G, 6 A, -12, 30 PIM, 19 S, 19:40 TOI
Some other players' % difference scores and ranks that I think people would be curious about:
392. Marek Svatos - %diff = 0.94
400. Ian Laperriere - %diff = 0.94
557. Darcy Tucker - %diff = 0.62
478. Jeff Finger - %diff = 0.77
517. Sidney Crosby - %diff = 0.70
459. Alex Ovechkin - %diff = 0.82
181. Evgeni Malkin - %diff = 1.68
31. Johan Franzen - %diff = 3.50
The model is by no means perfect. It is certainly flawed and highly influenced by young players' entry level contracts. For example, Stastny's new contract is going to pay him $6.0 million/year. So, these results only apply to how effective a player was compared to their contract during the 2008-2009 season. Of the 37 players I found with new contracts starting next season, their average rank was 199. Their average salary this season was $1.542 million. This almost doubles to $2.997 million next season. Many of these were good young players in the end of their low paying entry-level contracts. These were the players that clubs want to lock-up ASAP and get signed. Some of these pplayers: Anze Kopitar, Paul Stastny, David Krejčí, Alexandre Burrows, Alexander Edler. So, part of the reason they score so well on this metric is not just their performance, but the restrictions on entry level contracts in the collective bargaining agreement.
Negative predicted salaries are clearly not realistic, so that's a problem.
The model doesn't take in to account all statistics or other characteristics of players that lead to higher salaries. Captaincy, what line a player is on, awards, and other things are not quantified in this model. The league's 10 highest paid players had an average rank of 557.5. This does not mean that these star players are way overpaid, rather it means that the model is once again not perfect. Players simply cannot rack up enough goals, assists, etc to get their predicted salary to a level that high. Additionally, being a highly paid player means you are playing against the best players on the other teams game in and game out, something that is going to limit performance. The highest predicted salary was, appropriately, Ovechkin's ($7.825 million). Overall the correlation between predicted salaries and actual salaries was strong, R = .738. I take the fact that +/- was negatively related to higher salary as a sign that it may truly be a useless statistic. I want to look in to this more.
I don't have any class or TAing over the summer, just research and trying to get published, so I think I will try to update my model in the coming weeks/months to try and improve it, especially to somehow take in to account things such as captaincy and what line a player primarily plays on.
Hopefully people find this interesting.I also have team data I am working on. If anyone is interested to find out about specific players or wants to know more of the statistical details (regression equation coefficients, effect sizes, p-values, etc), just let me know.