For those unfamiliar with Joe Peta’s groundbreaking 2013 book Trading Bases, the author is a successful financial analyst and former Wall Street trader. Seriously injured in a traffic accident, Peta’s long and painful recovery included employing his professional skills to develop a baseball wagering methodology. His book is about more than that though, including observations about the 2008 economic meltdown and sports wagering writ large. Peta’s anecdotes alone make it worth the read — imagine being hit by a NYC ambulance and then being billed by the city for the ride to the hospital.
At its highest level, the Peta methodology is based on the utilization of a team’s previous season performance adjusted for cluster luck (a regression of OBP/SLG/ISO to arrive at “hits per run”) and WAR, as well as upcoming-season projected WAR. Arriving at an estimate of a team’s season win total, it is then used to identify and capitalize on inefficiencies between the model’s estimates and wagering lines.
Peta’s work produces two products: a season-long projection of wins (the long game) and the ability to handicap individual games through adjustments to each team’s lineup, starting pitcher, and home field. While conceptually straightforward, it is time-consuming to operate, requiring familiarity with Excel (particularly the ability to link sheets). In lieu of Peta’s regression calculation of cluster luck, I utilized FanGraphs’ calculation of BaseRuns, convinced of its utility as a proxy after reading a 2019 article at samkonmodels.com arguing it was one of a number of comparable and readily available such calculations.
The Model’s 2020 Season
When MLB management and labor finally got their collective act together, the model’s projection of team wins was compared to the Vegas line just prior to Opening Day. This is shown in the table below, as well as each team’s strength of schedule:
2020 MLB Team Lines
|Team (SOS)||Model Wins||FanDuel||Dif|
|Blue Jays (8.5)||29.0||27.5||1.5|
|Red Sox (14.5)||30.0||30.5||-0.5|
|White Sox (28)||30.3||31.5||-1.2|
Opening Day Concerns
In writing this article, it was instructive to look back at my notes from the previous July. At the time, there were several concerns:
- The methodology is built around WAR. Projected WARs had to be adjusted to a 60-game season, essentially cutting the original WARs down to 37%. How would they hold up?
- COVID impacted more than the total number of games. Articles from both Sports Illustrated and The Athletic highlighted the variance in each division’s strength of schedule. For example, the SI piece noted the Orioles as having the league’s most difficult schedule, as seen in the table above.
- Finally, will there be serious performance impacts for players who contracted COVID? Recall that Nick Markakis initially decided to opt out after hearing just how seriously sick his friend Freddie Freeman had been.
Ultimately, I decided the last thing I needed to do was start “thinking for myself.” It is one thing to use another identified calculation like BaseRuns to account for cluster luck, but it’s quite another to start tinkering to account for strength of schedule or performance impacts following contracting COVID. Instead, I addressed these concerns through conservative application of the model.
Applying the Numbers
In a 162-game season, Peta suggests that season win totals deviating from the line by more than four games represented “unrepeatable results” and therefore were worth a wager. In a 60-game season, that would be a deviation of 1.48. The concerns outlined above made my wagering approach even more conservative, so I only looked at teams with deviations greater than two. That left eight clubs:
2020 Win Total Projection Outliers
Two of the outliers, the Tigers and Yankees, are good examples of the interplay between WAR and BaseRuns, arriving at win projections starkly different from those in Vegas:
- The 2020 FanGraphs Opening Day WAR suggested that the Detroit Tigers would hit better than in 2019, combined with what seemed to be a terrible case of offensive cluster luck that had cost them 50 runs.
- Conversely, the New York Yankees’ 2020 FanGraphs Opening Day WAR suggested only a modest increase from 2019 (improved pitching offset by a loss in offense), with the team’s overall 2019 cluster luck having benefitted them to the tune of 70 runs.
Long-Term Model Performance
Money management is a key part of Peta’s methodology. Drawing from the work of thoroughbred handicapper Andrew Beyer, Peta suggests reserving 10% of a bankroll for the long plays. Using a total bankroll of $2,000, we arrive at eight bets on the outliers at $25 each. As seen in the table below, those eight wagers would have resulted in seven wins and a profit of 65% on the year.
Applying the Model to 2020 Outliers
The Daily Grind
While we have identified deviations in sportsbooks, those investments would take months to come to fruition. As the season unfolded, I applied those projections to individual games with the goal of uncovering daily opportunities created by the difference between the model’s projections and the oddsmakers’ lines.
This is a capital accrual methodology, the complete antithesis of what Danny Ocean said in Ocean’s Eleven:
“Play long enough, you never change the stakes, the house takes you. Unless, when that perfect hand comes along, you bet big, and then you take the house.”
If you can never wager against your favorite team, this methodology may not be for you. You must be prepared to bet on the team that does not necessarily have the better chance to win. This money management follows the tenets of the Kelly Criterion, used with great success by thoroughbred handicappers such as Beyer and Mark Cramer. The basic concept the wager amount increases only in relation to the perceived advantage.
The Trading Bases Methodology (Individual Game Version)
Recall that the Peta methodology is based on the utilization of a team’s previous season performance adjusted for cluster luck, WAR, and the team’s upcoming season projected WAR. Peta’s work allows for a projection of total season wins as well as the capability to handicap individual games through adjustments to each team’s lineup, starting pitcher, and home field.
For example, currently FanGraphs projects (Depth Charts) Clayton Kershaw to make 29 starts with a WAR of 3.7. But what if Kershaw could start all 162 games? He’d be worth more than 20 WAR! A similar calculation can also be made for hitters. Corey Seager is currently projected to play 151 games with a WAR of 5.4. Projecting him to play every game would increase his WAR to 5.8 on the year.
This calculation, plus a 4% advantage to the home club, produces a win probability for each team which — when compared to the sportsbooks’ win probability and after converting the money line (ML) to its percentage equivalent — will uncover potential opportunities. Additionally, when utilizing a strategy based on the Kelly Criterion, the wager amount increases in relationship with the perceived advantage.
Applying the Numbers
This is a somewhat simplified description. There are also in-season adjustments accounting for what actually transpires as the season progresses (as adjusted for base runs). Let’s look at two examples from the 2020 season.
On September 5th, the Reds (Anthony DeSclafani) were set to face the Pirates (Trevor Williams) at PNC Park. With over half the season in the books, the model’s projection of each team’s total runs scored and runs allowed for the year has changed twice to account for reality and changes in team WAR.
On Opening Day, the Reds were projected to score 772 runs and allow 706 on the year, good for a win percentage of 54% (normalized to 31.5 wins). By September, that had changed to 724 runs scored and 633 runs allowed — a win percentage of 56%, or 33.6 wins. Meanwhile, the Pirates moved from being projected for 745 runs scored and 853 allowed on Opening Day (a 44% win percentage, normalized to 25.6 wins). Come September that projection had moved to 611 runs scored, 881 runs allowed, 34% of games won, and just 20 victories on the year.
If you made no other adjustments, a team with a 56% winning percentage would be expected to defeat a team with a 34% win percentage more than 70% of the times they face each others. However, as discussed above, the Peta model adjusts those numbers further to account for the lineup and home team. The lineup rolled out by Cincinnati on this day would, if played every day, have scored seven fewer runs over the course of the season while allowing 45 more. However, the Pirates lineup would have scored an additional 22 runs while allowing just an additional three. This reduces the Reds’ projected winning chance from 70% to 67%. Folding in the fact that the Pirates are the home team, and the model’s final assessment gives the Reds a 64% probability of winning.
The money line also made the Reds the favorite, just less of one than the model. Given these projected odds and the ML, applying the money management principles discussed above results in a 40-basis point wager (four-tenths of 1% of bankroll) on the Reds, who do win the game, 6-2. Take note in the table that the model’s line adds to 100%, while the ML adds up to 104%, the difference being the book’s take.
Pirates vs. Reds, 9/5/20
|Teams||Model Win %||ML||Difference|
The model allows the handicapper to hunt for advantage, regardless which team has the better chance to win. Let’s consider this game on September 8th between the Rays (Ryan Yarbrough) and the the Nationals (Aníbal Sánchez) in Washington:
Nationals vs. Rays, 9/8/20
|Teams||Model Win %||ML||Difference|
The ML made the Rays a heavy favorite, and the model favored them to win as well — just less so. However, while applying the money management principles discussed above resulted in another 40-basis point wager, this time it saw advantage in the underdog Nationals, who indeed won the game, 5-3.
I ended up modeling 508 games (including playoffs) from the 2020 season. Approximately three-quarters of those games were playable, meaning there was a perceived advantage between the model’s estimate and the money line. The overall win percentage came out to be 49%, with a profit of 9%, or $156 on a $1,800 bankroll.
Applying Peta Model to 2020 Games
|Bankroll||End Amt||# Games||# Plays||Wins||Losses||No bet||Profit||Profit %||Win %|
Candidly, the model was probably capable of even better performance, it just needed a smarter human. As discussed above, the model needs to be adjusted throughout the season to account for actual results. I flat out failed to adjust correctly at the end of the first quarter, and as a result it was the only quarter in which the model experienced a negative return. Backing out 2Q, the model’s performance increased to a win percentage of 52% and profit of 13%.
Here are some things I learned from this maiden effort:
- I thought I was ready – I wasn’t. Prior to the start of the 2020 season, I had not done near enough individual game practice utilizing 2019 contests. This made the opening weeks of the season a struggle. I also had not developed the results tracking sheet, which turned out to be an enormous undertaking, especially when you are struggling to find a rhythm to modeling as many as 15 games a day.
- Even when I found a groove in terms of churning out projections, it was still a grind. It turns out major league managers have an annoying habit of working to their schedule and not releasing their lineups in what I considered a timely manner. A secondary issue was each manager’s lineup tendencies. David Ross of the Cubs was like clockwork, as at least seven of his nine starters could be penciled in even before his lineup was posted. On the flip side, Kevin Cash of the Rays was a nightmare, offering a lineup mix-and-match adventure every day. I rooted for the Dodgers in the World Series in part because Cash’s daily lineup changes drove me nuts during the season.
- Projections were only part of the battle, as staying on top of the changing money line was its own challenge. It was important to be aware of each sportsbooks’ take — which can vary widely, as this January 2020 New York Post article illustrates. A 60-game season results in 900 wins, yet add up the FanDuel line and you get 907 wins. The worst example is the online system in the District of Columbia, GamebetDC, which has a whopping 8% take on individual baseball games. It’s impossible to find wagering opportunities under those circumstances.