This thread is meant as a scientific analysis of poster Huges2.50’s projection system. He has not disclosed his full methodology, but he has provided most of the information here in the infamous “post 93.” Further information comes from subsequent posts in that thread, and quotes here are from that thread.
The point of this thread is not to attack a member of this community or to belittle his opinions. My goal is to better understand how he has come to his end results and to examine how his methods compare to the standard methods of projecting pitchers. It is meant as a critique of his method, so that everyone here might understand it and where applicable it might be improved.
It should be noted that “critique” is neither inherently positive or negative; I hope everyone who participates in this thread will be able to approach it with the attitude of scientific neutrality with which it was intended. From Here I will attempt to transcribe Hughes2.50’s (from now on referred to as H2.5) method from the original post #93. The original post is somewhat disjointed so it is my hope to provide some clarity by presenting the steps in order where possible. I have also edited some of his quotes as they appear here for spelling and punctuation. The section headings provided below are to help organize the entire enterprise (responses can be directed to or parsed by the appropriate section).
I apologize for the length, but I think it is necessary to be both thorough and complete so as to be fair to all concerned.
The start of H2.5’s method is to combine Clay Dreslough’s Defense Independent Component Era (DICE) with the MLE’s provided by minor league splits.com.
I have some comments on this so far:
1. The Formula for DICE given in the post is incorrect, the actual formula for DICE is (13HR + 3(BB + HBP) –2K/ IP) + 3.00. I think this merely a typo, but if it is not, the formula listed would lead to very different results as the 3.0 runs are included in the numerator rather than modifying the final line to look more like an ERA.
2. I am curious as to the choice of Dreslough’s DIPs formula as opposed to FIPs (either TangoTiger’s version or TheHardballTimes version). There are subtle differences in the formulas, but the largest difference is that the latter two use a modifier of 3.20 instead of 3.00 to make the result of the base equation look like ERA. Obviously the use of Dreslough’s formula will give a lower number than FIPs (depending on how many HBP the pitcher has, but since this looks at top pitching prospects, I think its safe to assume that they will not have a great number of hit batsmen).
3. At this point I think it’s important to point out that the MLE’s are not predictive of future performance. They are intended to translate a specific performance at the minor league level into numbers at the ML level.
4. The most important point is that H2.5 has not specified in which order he combines the above statistics. I am assuming that he takes the MLE translation of the raw minor league counting stats (HR, BB, Ks) and then plugs those into the DIPs formula, but he hasn’t stated so explicitly. Any other combination of translations should produce skewed results.
With these numbers in hand H2.5 generated a list of the top minor league pitchers (see post for a list starting with Adenhart and ending with Ohlendorf) whoThe formula for ERA+ is 100 * (league average ERA/pitcher’s ERA). H2.5 sets the league average ERA at 4.50.Originally Posted by Hughes2.50
This is the first major problem in his method. He is arbitrarily setting the league average ERA at 4.50. ERA+ is meant to set a pitcher’s individual performance against the rest of his pitchers in his league. His source data at minor league splits does not provide league ERAs, or even park adjustments. Since different minor leagues tend to vary widely as to whether or not they favor offense or pitching, it is impossible to produce a meaningful statistic by choosing to set ERA at what is not quite a random number (see below).
To give an example of how this adjustment works let’s consider the single best year by ERA+ of the modern era (actually the second best season of all time, behind Tim Keefe in 1880), Pedro Martinez’s 2000 season. He had a 1.74 ERA vs. a 4.97 league average ERA for a 285 ERA+. If the league ERA is arbitrarily set at 4.50 he winds up with a 259 ERA+, which is still good, but drops him into a tie with Bob Gibson in 7th place among single season leaders, so the difference is significant.
Now 4.50 is not a bad guess to set the league average ERA at; the laERA that Pedro pitched against over his career is 4.49. However, the pitchers on this list are pitching in very different environments, and so far H2.5 has not accounted for league and park adjustments in his method. Using the actual minor league average ERAs, no matter how tedious to compile, would go along way towards accounting for this. Again, using Pedro Martinez as an example, the laERA has had a decent amount of variation over the course of his career. Using just his time in the AL, it varied between 4.42 and 5.07. Is there any reason to not expect the sum total of minor league performances to have just as wide (if not wider) variation?
I must confess that here’s where H2.5 looses me. I had been able to follow what he was doing up until now, if not exactly why he was doing it. But here at the end he has numbers jumping in and out of various parts of his formula, so I’ll provide his text from the original post in two sections:
This is where things get very problematic. H2.5 is extrapolating career numbers based on one season’s worth of minor league data. I can only assume that he’s using the player’s “raw walk and hit batter rates” in the DIPs formula shown in section 1 (along with his HR and K rates?).Originally Posted by Hughes2.50
I cannot find any reference on any sabermetric (or other) site that would suggest that these rates could be used to project an entire career at the MLB level. The stats as presented would indicate what skill set the player has (or had that particular year) but the rates themselves cannot be expected to simply continue a the same progression. This seems to be an extremely unadvised use of a small sample size of data.
From there he goes on:
I think I speak for a great many people when I say ”Yes! We would very much like to have this explained in more detail!”Originally Posted by Hughes2.50
What is the purpose in calculating the different ERA+ values, and what exactly is the value in determining the difference between the single season MLE and the career number?
In all the readings and research I have done on baseball, I have never seen reference to a “cubic transformation.” It does not appear to be the basis of any statistical model on any sabermetric site. My background in mathematics is admittedly limited (although that has not hindered any other statistical reading or baseball research I have done), but I have never heard of “cubic transformations.” (Wikipedia doesn’t even have an entry on them (which is more than a little suspicious), and the Google results are mostly about obscure molecules or transitions in N-dimensional space.)
Which leads me to question H2.5’s use of them. His goal is to “normalize the distribution of the scores” but I can’t find any evidence that they either 1. actually do that, or 2. that it is relevant either statistically on a performance basis to “normalize” them in the first place.
The above sections apparently end the mathematical part of the process. H2.5 continues, but it is not clear how exactly this continued commentary impacts his projections methodology. I hope he will provide those answers. I will quote him here and offer my response. notes: The list is not quoted, but can be found in the original post. The last part of the quoted text comes from the next sequential post in the series (#94).
I feel that when presenting a model it is correct to only reference the model within it’s own context. Adjustments made outside the model should be made after the fact and noted as such. A good example would be Nate Silver’s use of PECOTA, where he presents the original data and then points out what his system might be misinterpreting and why.Originally Posted by Hughes2.50
note: this passage also seems to indicate that there is data for more than one year. It would be very helpful if that data was added to what we have.
This suggests that there is another component to H2.5’s method that does not use any stats whatsoever. The “scouts vs. stats” debate is often overplayed in my mind, but I feel that it stems from the fact that the middle ground between the two sides is compromised by one side looking at what did happen and the other looking for what has the potential to happen.Originally Posted by Huges2.50
In any event, we do not know how H2.5 is altering his math to take scouting into account (if indeed he is at all). I do not think projection systems should ever do this. It is better to predict an anomaly (the best example I can think of is PECOTA suggesting that Dustin Pedroia is similar to Garry Sheffield) and then address it and learn from it, than to fudge the data or math in an attempt to make it go away. (Witness Einstein and his use of the cosmological constant. Most theories produce anomalies, and they are often the window to a deeper understanding of the theory and/or nature.)
This is logically inconsistent and does not address the issue at hand. The point is not that it is impossible to project a player with less experience to have more talent and do better than one with more experience. The point is that statistics measure what has been experienced, and that mathematical models are unlikely to make that projection (without extra help). It is not the fault of the projection system, it is a tool with its own strengths and weakness. But that does not mean that this (or any) particular tool should be compromised in order to make it do everything. (I wouldn’t try to turn a table saw into a can opener.)Originally Posted by Huges2.50
This is also seems to indicate that the opinions of scouting are being shoehorned into the statistical model. Scouting is essentially an opinion. It can be an informed opinion, and a professional opinion, but it is still an opinion. I think I can speak for most of the sabermetrically inclined when I say that opinions have no place modifying the numbers within a mathematical system.Originally Posted by Huges2.50
By way of example, consider Betances. Despite having “a once in a generation set of tools” he was not taken first overall, and he fell out of the first round due to a “slow start” in the spring. Why are the opinions of other scouts who prefer other players (who were picked earlier in the draft) not reflected in the listed results? Is it even possible?
Players often do not reach the peaks scouts see in them. This is not a fault of scouting (after all, the job of the scout is to find the possibility “where the return on investment is extraordinary”) but that does not mean that a projection system should be adjusted for such “wish casting.”
A look at how scouting is being used by sabermetricians can be found in TangoTiger’s “Wisdom of Crowds” fan database experiment. He combines a broad range of opinions to get a reading on something that is very hard to quantify (defense). The idea is that a broad range of opinions will reduce the possibility of outliers and small samples from corrupting the final data line. I am extremely doubtful that H2.5 has anything close to this level of input from professional scouts or other “non-public sources.” The most important point is that TangoTiger does not use these results to directly modify the defensive system he works on directly (Michael Lichtman’s UZR).
In order to judge how H2.5’s numbers stand up as a projection system, I would like to set my analysis of his system against what I think are axiomatic standards for any projection system.
1. The system must have a clearly defined methodology.
Hopefully I have helped somewhat with this, but obviously there are still questions that are up to H2.5 to answer. It should be noted that this does not mean that a system is necessarily “open source.” PECOTA’s inner workings are not available, but Nate Silver can at least explain in clear and concise language how it works. Problems that may arise from the methodology should be acknowledged as such. For example, using a simple weighted 3-2-1 system does not really work for rookies who don’t have any previous ML playing time, but that is understood.
I feel that it is important to note that he has said that his numbers require league and park adjustments (see here), but this is not described or accounted for in the original.
Obviously, there is more work to do.
2. The system must be complete and uniform.
By this I mean that the system has a complete data on all the players within its purview and the data is handled in the same manner for every projection. We should be able to compare any two (or more) like players (i.e. pitchers and position players). This does not rule out making judgments on the raw data set based on reasons outside the system (see the PECOTA example, above), but the data set itself should not be compromised.
The major issue with regards to this is what appears to be the influence of H2.5’s scouting opinions on his numbers. I would hope he can provide an explanation, as without one his results must be considered flawed and suspect.
3. The system must be testable.
Clearly, these are a (I daresay the) major problem with H2.5’s system. We currently only have a small list of player projections, and do not have any way to check against other players outside of the data set he has provided.
Testability does not mean that the system is expected to be 100% accurate, no system comes anywhere close to that standard. It does, however, mean that we should be able to track how players are doing against the system in order to see exactly how well it does. Without a complete roster of projections made it would stand to reason that the sample size is to small to declare any sort of success with the projections it makes that are accurate.
This also means that the system should be testable against nature. In the case of H2.5’s system we can compare his methods and projections against known mathematical processes and baseball populations.
The most basic mathematical truism in baseball is to beware of drawing large conclusions on a small sample size. Unfortunately this is exactly what he has done. He has taken pitchers with only a few professional (and no major league) seasons and extrapolated their entire careers. He has not provided a mathematical basis for making that leap of faith.
He has also projected a pair of 21 year olds to have career ERA+s greater than any other pitcher in history. While it is possible that we are about to bear witness to a new golden age of pitching, common sense would seem to caution against it, and he has not provided a suitable explanation as to why we should forgo that natural impulse.
I hope that this can set the groundwork for a more complete understanding the numbers and methods discussed, and I sincerely hope that if I have made any errors they will be presented and corrected here.
Edited to correct a few typos.