__________________________________________
Recently, there's been a lot of chatter on Twitter about the value of projections. They're certainly far from perfect -- the creators would admit as much -- but they're among the best we've got.
Many anti-projection arguments tend to put more weight into past stats than we should. Go dig up Kyle Freeland argument threads on Twitter if you don't believe me.
So what should we be using? Just projections? A combination of both?
To find out, let's starting with appreciating how wildly stats can fluctuate year-to-year, particularly those that many fantasy gamers play for in traditional 5x5 leagues.
To quantify this, here are the r-squared figures between the metric in one season ("season Y") and itself in the following season ("season Y+1"). This is using 1,114 player seasons from 2007 to 2018 for the traditional four starting pitching categories (excluded saves) (min. 150 innings pitched in each season):
Metric | R^2 |
K | 0.526 |
WHIP | 0.208 |
IP | 0.155 |
ERA | 0.108 |
W | 0.045 |
Metric | R^2 | Diff |
K | 0.282 | -46% |
WHIP | 0.238 | 14% |
ERA | 0.223 | 107% |
W | 0.188 | 316% |
IP | 0.113 | -27% |
Metric | Season Y | Proj. |
K/9 | 0.615 | 0.470 |
W/IP | 0.017 | 0.134 |
W/GS | 0.042 | 0.169 |
Hey, not bad! Our ability to project strikeouts per inning (or K/9 in this case) is anywhere from roughly 17-66% better than our ability to project raw strikeouts. Wins didn't fare the same -- somehow wins per inning and wins per game started are less sticky than raw wins. Oddly, when excluding ZiPS, projections actually predict W/GS roughly 5% better than raw wins. I'm not sure what's going on with ZiPS, but I'd rather not waste too much time with wins.
Metric | R^2 |
Proj. | 0.223 |
SIERA | 0.193 |
xFIP | 0.183 |
K-BB% | 0.176 |
FIP | 0.174 |
K% | 0.167 |
ACES | 0.164 |
Metric | R^2 |
K-BB% | 0.294 |
SIERA | 0.251 |
Proj. | 0.238 |
K% | 0.218 |
xFIP | 0.213 |
WHIP | 0.208 |
ACES | 0.205 |
FIP | 0.200 |
Metric | R^2 |
K% | 0.615 |
K-BB% | 0.514 |
Proj. | 0.470 |
SwStr% | 0.464 |
Contact% | 0.456 |
SIERA | 0.360 |
Z-Contact% | 0.343 |
ACES | 0.337 |
Metric | R^2 |
Proj. | 0.188 |
FIP | 0.104 |
SIERA | 0.100 |
xFIP | 0.094 |
K | 0.093 |
K-BB% | 0.088 |
K% | 0.074 |
Metric | R^2 |
IP | 0.155 |
TBF | 0.121 |
Proj. | 0.113 |
Pitches | 0.103 |
- Projections absolutely need to be used versus the actual stat itself from previous seasons, particularly for ERA, WHIP and wins
- When evaluating pitchers, bet on strikeouts -- among traditional 5x5 categories, that's the category that we're far-and-away best equipped to predict
- Use an array of metrics when evaluating and projecting pitchers: projections, ERA estimators (SIERA, DRA - not tested here, xFIP, FIP), K-BB%, K% and ACES.
- Here's what I'll be looking at to assess and predict a pitcher's performance across the various categories:
- ERA: Nearly all of the above -- projected ERA, ERA estimators, K-BB%, K% and ACES
- WHIP: K-BB%, projected WHIP, SIERA
- Strikeouts: K%, K-BB%, projected K% or K/9
- Wins: Projected wins
- Innings: Previous season's IP/TBF and projected IP
A NOTE ON SECOND HALF SPLITS
Read enough fantasy analysis and you're sure to come across someone citing second half splits. Maybe there's good reason for it -- injury, change in talent, etc. But more often than not, it's a case of recency bias.
I tested this using data from FanGraphs for starting pitchers who threw at least 30 second-half innings and then 150 innings the next season. I looked at the r-squared between their second half numbers in season one to the same stats in the full season two.
In essentially every case, you're significantly better off using the full season numbers over the cherry-picked second half numbers when the pitcher "figured it out." Outside of relatively obvious cases like injuries, I'd rather bet on the averages (i.e., full season numbers) while others try and find the outliers.
R-Squared of Full Season vs. 2nd Half Numbers
Metric | Full | 2nd Half | Diff |
IP | 0.155 | 0.129 | 20% |
TBF | 0.129 | 0.121 | 6% |
HR/9 | 0.163 | 0.086 | 90% |
K% | 0.615 | 0.553 | 11% |
BB% | 0.455 | 0.346 | 31% |
K-BB% | 0.553 | 0.469 | 18% |
WHIP | 0.208 | 0.135 | 54% |
BABIP | 0.039 | 0.031 | 27% |
LOB% | 0.025 | 0.018 | 38% |
FIP | 0.312 | 0.213 | 46% |
xFIP | 0.444 | 0.389 | 14% |
LD% | 0.027 | 0.021 | 27% |
GB% | 0.627 | 0.585 | 7% |
FB% | 0.617 | 0.581 | 6% |
Soft% | 0.006 | 0.010 | -42% |
Med% | 0.065 | 0.023 | 183% |
Hard% | 0.067 | 0.034 | 97% |
ERA | 0.108 | 0.063 | 70% |
Metric | R^2 |
ACES | 0.764 |
GB% | 0.627 |
SwStr% | 0.622 |
FB% | 0.617 |
K% | 0.615 |
K/9 | 0.615 |
Contact% | 0.608 |
O-Contact% | 0.556 |
Z-Contact% | 0.554 |
K-BB% | 0.553 |
Ks | 0.526 |
O-Swing% | 0.501 |
BB% | 0.455 |
SIERA | 0.455 |
xFIP | 0.444 |
Z-Swing% | 0.433 |
FIP | 0.312 |
WHIP | 0.208 |
HR/9 | 0.163 |
IP | 0.155 |
Pitches | 0.148 |
TBF | 0.129 |
HR/FB | 0.111 |
ERA | 0.108 |
Hard% | 0.067 |
Med% | 0.065 |
W | 0.045 |
BABIP | 0.039 |
LD% | 0.027 |
LOB% | 0.025 |
GS | 0.018 |
Soft% | 0.006 |
G | 0.003 |