Welcome to our community

Be apart of something great, join today!

Any statisticians around? Warning: Math!

  • Thread starter Thread starter Guest
  • Start date Start date

Disclaimer: Links on this page pointing to Amazon, eBay and other sites may include affiliate code. If you click them and make a purchase, we may earn a small commission.

G

Guest

Guest
I love statistics. One of the main reasons I enjoy baseball is because statistics play a large role in the history of the game. Ever since Henry Chadwick began using them to compare players in the 1870s, we have had a way of measuring a player's performance.

Thanks to pioneers like Bill James a wave of new statistics such as the Hall of Fame monitor, Wins Above Replacement (WAR), and others have been developed to further quantify a players contribution to the game. WAR has become my statistic of choice in determining a player's skill. However, one problem with using career WAR is that it rewards players who have long, injury free careers and compile high career numbers. The end result is that players who had long, good careers easily outrank players who had great, short careers.

Sandy Koufax is widely regarded as one of the best pitchers in the history of the game. He is a member of the baseball Hall of Fame and All-Century Team. Yet his career numbers do not stand out, and are in fact mediocre in most regards. His career WAR places him 61st all time behind such notables as Bret Saberhagen and Frank Tanana. But when you talk to experts and casual fans alike, many name Koufax among the greatest pitchers in the history of the game.

My goal was to quantify Sandy Koufax in such a way that the statistics come to support public opinion. There is a consensus that ten All-Star type seasons should merit inclusion in the Hall of Fame. Koufax had five (1961, 1963, 1964, 1965, 1966). However, three of those seasons (1963, 1965, 1966) were MVP type seasons.

I therefore decided to create a new statistic to rate a player solely by his five best seasons as decided by single-season WAR. The goal of this statistic was to measure a player in their prime, rather than on the whole of their career. I created a spreadsheet, wrote a formula, and then began entering player statistics for the entire roster of the baseball Hall of Fame. To eliminate inflation from nineteenth century pitching statistics, only statistics after 1900 were included.

The top 52 players of all time...

52. Arky Vaughan - SS - Pittsburgh Pirates
51. Home Run Baker - 3B - Philadelphia Athletics
T-49. Ron Santo - 3B - Chicago Cubs
T-49. Charlie Gehringer - 2B - Detroit Tigers
48. Robin Roberts - P - Philadelphia Phillies
47. Steve Carlton - P - Philadelphia Phillies
46. Pedro Martinez - P - Boston Red Sox
45. Mel Ott - OF - New York Giants
T-43. Randy Johnson - P - Seattle Mariners
T-43. Shoeless Joe Jackson - OF - Cleveland Naps
42. Bob Feller - P - Cleveland Indians
41. Eddie Mathews - 3B - Milwaukee Braves
40. Juan Marichal - P - San Francisco Giants
T-38. Ken Griffey - OF - Seattle Mariners
T-38. Tom Seaver - P - New York Mets
37. George Brett - 3B - Kansas City Royals
T-35. Gaylord Perry - P - San Francisco Giants
T-35. Joe DiMaggio - OF - New York Yankees
34. Cal Ripken - SS - Baltimore Orioles
33. Carl Yastrzemski - OF - Boston Red Sox
32. Ernie Banks - 1B - Chicago Cubs
31. Cy Young - P - Boston Americans
30. Wade Boggs - 3B - Boston Red Sox
29. Sandy Koufax - P - Los Angeles Dodgers
28. Ed Walsh - P - Chicago White Sox
27. Rickey Henderson - OF - Oakland Athletics
26. Jackie Robinson - 2B - Brooklyn Dodgers
25. Lefty Grove - P - Philadelphia Athletics
24. Roger Clemens - P - Boston Red Sox
23. Christy Mathewson - P - New York Giants
22. Jimmie Foxx - 1B - Philadelphia Athletics
21. Hank Aaron - OF - Milwaukee Braves
T-19. Bob Gibson - P - St. Louis Cardinals
T-19. Mike Schmidt - 3B - Philadelphia Phillies
18. Pete Alexander - P - Philadelphia Phillies
17. Nap Lajoie - 2B - Cleveland Naps
16. Alex Rodriguez - SS - New York Yankees
15. Albert Pujols - 1B - St. Louis Cardinals
14. Stan Musial - 1B - St. Louis Cardinals
13. Tris Speaker - OF - Cleveland Indians
T-11. Eddie Collins - 2B - Chicago White Sox
T-11. Joe Morgan - 2B - Cincinnati Reds
10. Honus Wagner - SS - Pittsburg Pirates
9. Willie Mays - OF - San Francisco Giants
8. Lou Gehrig - 1B - New York Yankees
7. Walter Johnson - P - Washington Senators
6. Ted Williams - OF - Boston Red Sox
5. Ty Cobb - OF - Detroit Tigers
4. Mickey Mantle - OF - New York Yankees
3. Rogers Hornsby - 2B - St. Louis Cardinals
2. Barry Bonds - OF - San Francisco Giants
1. Babe Ruth - OF - New York Yankees

A comparitive analysis of the top five players on this list.

5. Ty Cobb (1909, 1910, 1911, 1915, 1917)
5yA: 127 R, 224 H, 39 2B, 17 3B, 7 HR, 108 RBI, 77 SB, 69 BB, .387 BA, .458 OBP, .550 SLG, 1.007 OPS

4. Mickey Mantle (1955, 1956, 1957, 1958, 1961)
5yA: 131 R, 174 H, 23 2B, 6 3B, 45 HR, 113 RBI, 13 SB, 130 BB, .329 BA, .462 OBP, .652 SLG, 1.114 OPS

3. Rogers Hornsby (1921, 1922, 1924, 1927, 1929)
5yA: 138 R, 232 H, 43 2B, 13 3B, 31 HR, 131 RBI, 9 SB, 78 BB, .392 BA, .466 OBP, .665 SLG, 1.131 OPS

2. Barry Bonds (1993, 1996, 2001, 2002, 2004)
5yA: 133 R, 166 H, 33 2B, 2 3B, 54 HR, 126 RBI, 21 SB, 188 BB, .338 BA, .526 OBP, .745 SLG, 1.271 OPS

1. Babe Ruth (1920, 1921, 1923, 1926, 1927)
5yA: 161 R, 197 H, 38 2B, 10 3B, 54 HR, 154 RBI, 14 SB, 153 BB, .375 BA, .518 OBP, .793 SLG, 1.311 OPS

Well. There you have it. I'm open to any thoughts you have in my attempt to quantify a player's prime value or (PV).
 

bowmanchromeandorr

New member
May 23, 2010
836
0
Race City USA
very good list. however i see one glaring omission. your math left off nolan ryan. one thing to possibly consider is team splayed on. he never played on teams worth a crap. i mean how can you have the most K's and lowest ERA in the same season and lose twice as many games as you win. hitting stats can be based solely on the individual but not necessarily pitching stats. yes he had right around 300 losses but playing for over 20 years on crap teams will do that. not many pitchers who play on crap teams can get to 300 wins and 5200 K's... overall, good list though and you will get people who will say to take off bonds and people like that for the steriod issues.
 

hofautos

New member
Aug 29, 2008
6,678
0
I read the post before I read the author. After reading it, I had to go check the author.
I don't know where you have been, but, I have only read 3 of your posts now, and have enjoyed all 3.
Welcome aboard, and don't be so shy :D

Back to your thread.
I love stats too! And I never have really looked closely at Sandy's stats...i just knew from what I have heard that he was one of the best, but that it was just for a short period...
So do you have all your data in a database or spreadsheet and using macros for your calculations?
It's interesting to see who was best over a short period, but IMHO longevity, should play a VERY IMPORTANT role in "who is great". (I would prefer to see the same breakdown over 10 years vs 5 years)

I have always liked winshares, but I tried Bill James pay site, and IMHO it was terrible. I just want a sortable database or spreadsheet, and I couldn't find what i was looking for.
Any chance you can put together a winshares spreadsheet or database in the same manner you did these statistics?
 
G

Guest

Guest
bowmanchromeandorr said:
very good list. however i see one glaring omission. your math left off nolan ryan. one thing to possibly consider is team splayed on. he never played on teams worth a crap. i mean how can you have the most K's and lowest ERA in the same season and lose twice as many games as you win. hitting stats can be based solely on the individual but not necessarily pitching stats. yes he had right around 300 losses but playing for over 20 years on crap teams will do that. not many pitchers who play on crap teams can get to 300 wins and 5200 K's... overall, good list though and you will get people who will say to take off bonds and people like that for the steriod issues.

My system has Nolan Ryan rated as the 29th best pitcher of the modern era (post-1900). Ryan is one of the ultimate compilers, having an extremely long career, and benefiting from high counted statistics (such as career wins and strikeouts).

However, this system strictly measures players by their five best seasons. Here is a breakdown on Nolan Ryan.

Ryan was a wild pitcher. On some days he was the best, on other days he was just downright bad. His strikeout numbers are great, but what kills Ryan is the amount of walks he allowed. His WHIP (Hits and Walks allowed per inning pitched) is high. If you're putting that many on base it greatly devalues the benefit of a strikeout.

When you compare Ryan at his peak against others at their peak, the difference is clear.

Nolan Ryan (1972, 1973, 1974, 1977, 1987)
5yA: 16 W, 14 L, 2.73 ERA, 17 CG, 4 SHO, 258 IP, 173 H, 90 R, 78 ER, 13 HR, 144 BB, 300 SO, 1.232 WHIP

Steve Carlton (1969, 1972, 1977, 1980, 1982)
5yA: 21 W, 9 L, 2.44 ERA, 17 CG, 4 SHO, 271 IP, 216 H, 83 R, 73 ER, 16 HR, 82 BB, 238 SO, 1.101 WHIP

I'll take Carlton and the 62 less walks, even if it means giving up 62 strikeouts. Carlton's also letting less runs get across.
 

uniquebaseballcards

New member
Nov 12, 2008
6,783
0
Great stuff! Its interesting to see that Ryan has a better career WAR than Carlton, its also surprising to note that Ryan turned it around and actually led his league in WHIP - twice - although most people remember how wild he was early in his career.
 
G

Guest

Guest
hofautos said:
I read the post before I read the author. After reading it, I had to go check the author.
I don't know where you have been, but, I have only read 3 of your posts now, and have enjoyed all 3.
Welcome aboard, and don't be so shy :D

Back to your thread.
I love stats too! And I never have really looked closely at Sandy's stats...i just knew from what I have heard that he was one of the best, but that it was just for a short period...
So do you have all your data in a database or spreadsheet and using macros for your calculations?
It's interesting to see who was best over a short period, but IMHO longevity, should play a VERY IMPORTANT role in "who is great". (I would prefer to see the same breakdown over 10 years vs 5 years)

I have always liked winshares, but I tried Bill James pay site, and IMHO it was terrible. I just want a sortable database or spreadsheet, and I couldn't find what i was looking for.
Any chance you can put together a winshares spreadsheet or database in the same manner you did these statistics?

I believe in posting only when I feel I have something new or noteworthy to contribute to a discussion. As a matter of preference I try not and post anything that basically agrees or restates what someone has already said. It just mucks up a conversation and clouds the original point(s) being made.

I originally considered a ten-year system, but found that it did not agree with contemporary accounts of a player's abilities. This may be an unpopular opinion, but I think five seasons averaging MVP-type numbers is greater than ten seasons of averaging All-Star-type numbers. Every player on this list averaged MVP-type numbers for the five seasons included in my data.

To me 'longevity' is not something that can be quantified. When games played is a variable it becomes increasingly difficult to measure one player against another. By making the games played a constant I was able to compare them more effectively. Milestones like 500 home runs, 3,000 hits, 300 wins, 3,000 strikeouts got thrown out the window. So it's important not to factor them into your opinion of a player when evaluating this list. I think of players like DiMaggio, Williams, and Feller who lost time to the war. Jackie Robinson who burst into the league with amazing talent, and then had a sharp decline, but had several seasons cut by segregation. Gehrig who retired due to illness just short of 500 home runs, Hornsby who could have tacked on 3,000 hits had anyone in their era valued those numbers.

Again, my thesis: 5 MVP type seasons is greater than 10 All-Star type seasons.

Edit: Yes. I do have a spreadsheet database. It currently includes the members of the baseball Hall of Fame, the nominees to the All-Century Team who are not in the HOF, and members of the New York Yankees, New York Giants, and Brooklyn Dodgers.
 

hofautos

New member
Aug 29, 2008
6,678
0
Chris Levy said:
hofautos said:
I read the post before I read the author. After reading it, I had to go check the author.
I don't know where you have been, but, I have only read 3 of your posts now, and have enjoyed all 3.
Welcome aboard, and don't be so shy :D

Back to your thread.
I love stats too! And I never have really looked closely at Sandy's stats...i just knew from what I have heard that he was one of the best, but that it was just for a short period...
So do you have all your data in a database or spreadsheet and using macros for your calculations?
It's interesting to see who was best over a short period, but IMHO longevity, should play a VERY IMPORTANT role in "who is great". (I would prefer to see the same breakdown over 10 years vs 5 years)

I have always liked winshares, but I tried Bill James pay site, and IMHO it was terrible. I just want a sortable database or spreadsheet, and I couldn't find what i was looking for.
Any chance you can put together a winshares spreadsheet or database in the same manner you did these statistics?

I believe in posting only when I feel I have something new or noteworthy to contribute to a discussion. As a matter of preference I try not and post anything that basically agrees or restates what someone has already said. It just mucks up a conversation and clouds the original point(s) being made.

I originally considered a ten-year system, but found that it did not agree with contemporary accounts of a player's abilities. This may be an unpopular opinion, but I think five seasons averaging MVP-type numbers is greater than ten seasons of averaging All-Star-type numbers. Every player on this list averaged MVP-type numbers for the five seasons included in my data.

To me 'longevity' is not something that can be quantified. When games played is a variable it becomes increasingly difficult to measure one player against another. By making the games played a constant I was able to compare them more effectively. Milestones like 500 home runs, 3,000 hits, 300 wins, 3,000 strikeouts got thrown out the window. So it's important not to factor them into your opinion of a player when evaluating this list. I think of players like DiMaggio, Williams, and Feller who lost time to the war. Jackie Robinson who burst into the league with amazing talent, and then had a sharp decline, but had several seasons cut by segregation. Gehrig who retired due to illness just short of 500 home runs, Hornsby who could have tacked on 3,000 hits had anyone in their era valued those numbers.

Again, my thesis: 5 MVP type seasons is greater than 10 All-Star type seasons.

I guess it depends on what you are really trying to get out of it...e.g. who was the best in their prime or who was the best over time. If I had my choice of having one player but i had to start him for 10 years, I would prefer one that was consistently great for 10 years, than one that was a little better but slid downhill fast, for whatever reason. I appreciate your effort though and they are interesting numbers.
 

nosterbor

Well-known member
Jun 20, 2010
6,383
669
Sunny Florida
bowmanchromeandorr said:
very good list. however i see one glaring omission. your math left off nolan ryan. one thing to possibly consider is team splayed on. he never played on teams worth a crap. i mean how can you have the most K's and lowest ERA in the same season and lose twice as many games as you win. hitting stats can be based solely on the individual but not necessarily pitching stats. yes he had right around 300 losses but playing for over 20 years on crap teams will do that. not many pitchers who play on crap teams can get to 300 wins and 5200 K's... overall, good list though and you will get people who will say to take off bonds and people like that for the steriod issues.
that is the reason i HATE the W.A.R. stats!
Example: a player that has these stats for 162 game avg (note: these are an avg of all hitting stats for a players career )
This is a players 162 game avg for his 16 year career!
AB=629 RUNS=102 HITS=186 2B=37 3B=2 HOME RUNS=42 RBI'S=135 BATTING AVG=.295 SLUG= .561 OPS=.904
This players ranks 18 all time in SLUG
15th all time in HR per AB
and the most impressive stat? 3rd all time in RBI per game avg. Ruth 1st Gehrig is 2nd

this players W A R rank is 239th all time!!!! you have got to be freaking kidding me!!!!!!
 
G

Guest

Guest
hofautos said:
Chris Levy said:
hofautos said:
I read the post before I read the author. After reading it, I had to go check the author.
I don't know where you have been, but, I have only read 3 of your posts now, and have enjoyed all 3.
Welcome aboard, and don't be so shy :D

Back to your thread.
I love stats too! And I never have really looked closely at Sandy's stats...i just knew from what I have heard that he was one of the best, but that it was just for a short period...
So do you have all your data in a database or spreadsheet and using macros for your calculations?
It's interesting to see who was best over a short period, but IMHO longevity, should play a VERY IMPORTANT role in "who is great". (I would prefer to see the same breakdown over 10 years vs 5 years)

I have always liked winshares, but I tried Bill James pay site, and IMHO it was terrible. I just want a sortable database or spreadsheet, and I couldn't find what i was looking for.
Any chance you can put together a winshares spreadsheet or database in the same manner you did these statistics?

I believe in posting only when I feel I have something new or noteworthy to contribute to a discussion. As a matter of preference I try not and post anything that basically agrees or restates what someone has already said. It just mucks up a conversation and clouds the original point(s) being made.

I originally considered a ten-year system, but found that it did not agree with contemporary accounts of a player's abilities. This may be an unpopular opinion, but I think five seasons averaging MVP-type numbers is greater than ten seasons of averaging All-Star-type numbers. Every player on this list averaged MVP-type numbers for the five seasons included in my data.

To me 'longevity' is not something that can be quantified. When games played is a variable it becomes increasingly difficult to measure one player against another. By making the games played a constant I was able to compare them more effectively. Milestones like 500 home runs, 3,000 hits, 300 wins, 3,000 strikeouts got thrown out the window. So it's important not to factor them into your opinion of a player when evaluating this list. I think of players like DiMaggio, Williams, and Feller who lost time to the war. Jackie Robinson who burst into the league with amazing talent, and then had a sharp decline, but had several seasons cut by segregation. Gehrig who retired due to illness just short of 500 home runs, Hornsby who could have tacked on 3,000 hits had anyone in their era valued those numbers.

Again, my thesis: 5 MVP type seasons is greater than 10 All-Star type seasons.

I guess it depends on what you are really trying to get out of it...e.g. who was the best in their prime or who was the best over time. If I had my choice of having one player but i had to start him for 10 years, I would prefer one that was consistently great for 10 years, than one that was a little better but slid downhill fast, for whatever reason. I appreciate your effort though and they are interesting numbers.

Here is Koufax vs. Koufax.

Sandy Koufax (1961, 1963, 1964, 1965, 1966)
5yA: 21 W, 7 L, 2.15 ERA, 19 CG, 6 SHO, 259 IP, 186 H, 71 R, 62 ER, 18 HR, 64 BB, 268 SO, 0.962 WHIP

Sandy Koufax (1957-1966)
10yA: 17 W, 9 L, 2.70 ERA, 14 CG, 4 SHO, 227 IP, 169 H, 77 R, 69 ER, 20 HR, 78 BB, 239 SO, 1.086 WHIP

Looking at the difference you'll notice a difference in WHIP. The increase in walks really hurt. He allows more runs to score, doesn't go as long into games, strikes out less, wins less, and loses more. This is because 1957-1960 were not stand out seasons. The numbers are still good, but it's not the crazy 2.15 ERA and 0.962 WHIP worthy of the way people worship him.
 

AndruwHRJones

New member
Aug 9, 2008
1,187
0
bowmanchromeandorr said:
very good list. however i see one glaring omission. your math left off nolan ryan. one thing to possibly consider is team splayed on. he never played on teams worth a crap. i mean how can you have the most K's and lowest ERA in the same season and lose twice as many games as you win. hitting stats can be based solely on the individual but not necessarily pitching stats. yes he had right around 300 losses but playing for over 20 years on crap teams will do that. not many pitchers who play on crap teams can get to 300 wins and 5200 K's... overall, good list though and you will get people who will say to take off bonds and people like that for the steriod issues.

I actually think the fact that Ryan played for 20 plus years gives him a better shot at being higher on the list? If the 5 best years out of 20 plus aren't good enough to be in the top 50, then I agree, he shouldn't be in the top 50.
 

pigskincardboard

New member
Nov 4, 2009
5,444
0
Toronto
For the person looking for a database, you cannot beat the FREE! Lehman database. It's in SQL format, you can do whatever you please, you just have to understand JOINs.

With that said,

If you want to find the best players in modern history according to the fan, which i think would be awesome, you should:

weight production against market size and overall baseball popularity. I'm sure you would come up with a pretty accurate list. The kind of list the MLB network would put together polling experts.
 

hofautos

New member
Aug 29, 2008
6,678
0
I would be curious how well pedro martinez faired in his best 5 seasons...i think it would amaze a few people.
 
G

Guest

Guest
hofautos said:
I would be curious how well pedro martinez faired in his best 5 seasons...i think it would amaze a few people.

Pedro Martinez (1997, 1998, 1999, 2000, 2003)
5yA: 20 W, 6 L, 2.17 ERA, 7 CG, 2 SHO, 244 IP, 175 H, 67 R, 59 ER, 17 HR, 63 BB, 304 SO, 0.973 WHIP
 

hofautos

New member
Aug 29, 2008
6,678
0
pigskincardboard said:
For the person looking for a database, you cannot beat the FREE! Lehman database. It's in SQL format, you can do whatever you please, you just have to understand JOINs.

With that said,

If you want to find the best players in modern history according to the fan, which i think would be awesome, you should:

weight production against market size and overall baseball popularity. I'm sure you would come up with a pretty accurate list. The kind of list the MLB network would put together polling experts.

I would love to be able to get winshares in a sortable database. I wouldn't even mind paying for those stats...i just couldn't find it in a format I liked on Bill James website.
I know SQL, but don't have the time to read and figure out necessary calculations into an updatable database. Bill James needs to hire new programmers for his website...maybe I will take a look again and see if they improved it.

Basically I would want to be able to see a sortable database of total winshares for maybe the top 200 guys over any duration of any time period i chose.
 

hofautos

New member
Aug 29, 2008
6,678
0
Chris Levy said:
hofautos said:
I would be curious how well pedro martinez faired in his best 5 seasons...i think it would amaze a few people.

Pedro Martinez (1997, 1998, 1999, 2000, 2003)
5yA: 20 W, 6 L, 2.17 ERA, 7 CG, 2 SHO, 244 IP, 175 H, 67 R, 59 ER, 17 HR, 63 BB, 304 SO, 0.973 WHIP

wow, that's a 5 year average and with those numbers and he is still like 15th on the list of pitchers?
 
G

Guest

Guest
schmidtfan20 said:
So Joe Morgan>Pujols??

and how the heck is Warren Spahn not on this list?

what a crock.

At first glance, Pujols should outpace Morgan. However, in Morgan's five best seasons he drew 592 walks, whereas Pujols drew 474. That is an astonishing difference of 118 over the span, and 28 each season.

In baseball's "new math" spearheaded by the aforementioned Bill James, walks have become very important in measuring a player's offensive ability. In the old math we look at home runs, hits, rbi, and average. No one ever gets excited about walk because they aren't 'sexy.' Well, math isn't sexy either. Walks are important. Very important.
 

Members online

No members online now.

Latest posts

Top