Expected wins and statistics
ESPN's John Hollinger uses a basketball team's points scored average and points scored-against (opponent's points) average to calculates the same team's expected wins in any given number of games:
To determine a team's expected winning percentage, take the points scored and raise them to the 16.5 power (i.e., multiply it by itself 16.5 times). That's your numerator. Then, in the denominator, raise the team's points scored to the 16.5 power, raise the points allowed to the 16.5 power, then add those two products. Finally, to convert from an expected winning percentage to Expected Wins, multiply by the team's games played.
Expected Wins = Games played * [Points16.5 / (Points16.5 + Points allowed16.5)]
Basketballreference.com has a similiar formula using a power of 14:
EW
Expected wins; the formula is G*(Tm PTS14 / (Tm PTS14 + Opp PTS14)). The formula was obtained by fitting a logistic regression model with log(Tm PTS / Opp PTS) as the explanatory variable. Using this formula for all BAA, NBA, and ABA seasons, the root mean-square error (rmse) is 3.14 wins. Using an exponent of 16.5 (a common choice), the rmse is 3.48 wins.
I should have taken a stat class or two in school. Can someone versed in statistics tell me how this formula works? What does taking the points scored and scored-against to the 16.5th or 14th power do exactly? Am I correct in assuming that this formula favors "powerhouse overdog" teams that blow out their opponents over "clutch underdog" teams that win close games?

4 Comments:
Jeff, if you had taken statistics, you would also know that there is no such thing as a "clutch" team or performer. You always were more of a scat man than a stat man.
It's called a Pythagorean projection, and it's actually significantly more valuable in baseball than in any other sport because baseball has the lowest correlation between in-game scoring between two teams due to its lack of a clock. In basketball or football, scores tend to correlate, because you stop pushing the ball upcourt/start running the ball into the line as the clock winds down.
Anyway, it's supposed to determine the amount of luck (or good managing) in a team's win-loss record. For example, two teams could score 800 runs and give up 800 runs, making them exactly as "good" as eachother from a statistical standpoint. One could go over .500 by clustering more of those runs allowed in blowout losses, and by eeking out a bunch of one-run wins. Conversely, the other team could go well under .500 by winning a bunch of big blowouts and blowing the close ones (see the 2002 Red Sox, whose 859 runs scored and 665 runs allowed "should" have resulted in a 100-62 record, and instead missed the playoffs with a 93-69 record).
In the baseball example, which was developed by Bill James and likely serves as the inspiration for this application to basketball, the exponents were developed on a quadratic regression model. That is, they took a massive amount of team runs scored/allowed data, entered in their associated W/L records, and ran a regression to determine what exponent to use to best explain the data they had and predict data going forward.
2002 fucking sucked.
http://www.baseball-reference.com/about/faq.shtml#pyth
Thanks Ev. No thanks Dave.
Post a Comment
Links to this post:
Create a Link
<< Home