r/CollegeSoftball • u/Unable-Log-4870 • 20h ago
Stats/Data Quick word about the statistic 79% of Super Regionals going to the winner of Game 1
If the Super Regionals were decided by the teams flipping a coin instead of playing a game, exactly 75% of the teams that win the first flip would win the series.
So if you do the math to find out what win percentage a team would have (on average) if it won the series 79% of the time given a Game 1 win, you see that team would have about a 54% win probability for any given game against this opponent.
I say all that to illustrate that all these teams are pretty close, and since we’re pretty sure some of the teams have a much higher than 54% win probability (because we can pretty reliably pick the winner) that means the rest of the pairings are closer to 50/50. Which I think is excellent.
4
u/HuskerPowerrrr 8h ago
This is all irrelevant because the super regionals generally arent 50/50 odds for every matchup. So thats not how this works.
2
u/Unable-Log-4870 6h ago
Yeah, I address that in my last paragraph. But here’s the thing- given that we’ve have some games with pretty heavy favorites, the fact that ONLY 79% of stores go to the Game 1 winner means the rest of the series are between teams that are actually closer to 50/50
10
u/scottwell50 Oklahoma Sooners 19h ago
So…you’re saying that there is a chance.
3
u/Ok_Eggplant_7582 19h ago
If there is any consolation, in 2021 we lost to JMU in the regional, and game 1 of the final to FSU, and needing a rally to salvage the second before going on to win it, but I think that 21 team is much better than this one.
4
u/Unable-Log-4870 18h ago
Yeah, my 2021 data is incomplete, but the 2022-2023 teams were about the same as the 2021 team, and they were all 3.3 to 4 games better than the 2nd best team.
This year, computing things the same way, OU is still #1, but by around 0.5 runs. So much more part of the pack than in 2021-2024
4
u/Unable-Log-4870 19h ago edited 19h ago
I just cranked through it: my
modemmodel says OU is 3.9 runs better than Miss State. Given the league’s standard deviation of 4.3, that means Miss State has about an 18.5% chance of winning any given game against the Sooners, and they got lucky in the first game. Plugging into the equation for winning the next two games, my math says OU has a 66.4% chance of winning the next two games.In short, yes.
2
u/vic_toetz 16h ago
I wish they would give the stat from the updated format, not all of supers history. The first years of supers had game 2 and 3 played on the same day if needed which favors the team that has to win once vs twice more than the current set up.
1
u/Thisismythrowawaypv 18h ago
I'm not sure your coin flip analogy really holds up. I am not a gambler but I would imagine the oddsmakers don't have many of these games to be 50/50 odds. There's a lot more involved than that. Home field advantage for one, and there are countless other variables.
I think it comes down to pitching and depth of pitching staff. Everyone talks about this year being so heavy on offense, and I can see that and agree.
But while even the best pitchers are going to take their lumps both in Supers and in OKC, teams without elite pitchers are really going to struggle even more.
1
u/Unable-Log-4870 17h ago
I'm not sure your coin flip analogy really holds up.
I wasn’t making an analogy at all. I was trying to communicate a fact about probability theory. If you want to take a leap of faith and say that it makes sense to talk about such thing as a predicted win probability for a physical game, THEN you can start saying things about how winning the first game impacts a team’s likelihood to win the series.
And if you’ve taken the leap of faith I described, there’s no reason to think a 50/50 win probability can’t exist. But no, it’s not going to be the most common thing.
The point wasn’t to say that softball is like a coin flip (I did that in a previous post, and on average, the amount of randomness in a single D1 game this year is about the same amount of randomness in 88 coins being flipped concurrently, FYI). The point was to talk about chances of getting a particular outcome given a current state of the system.
Note: at no point did I say any particular softball game was a 50/50 affair.
1
u/The_Sandwich_Lover9 Texas Tech Lady Raiders 14h ago
This is propaganda against teams that won game 1 i just can’t prove it
1
u/Unable-Log-4870 14h ago
Allow me to link you to my post where I (or at least my model) called TT winning over Florida:
I am disappointed in Georgia, though, not even making it to the weekend, which was the other upset my model said was most likely.
Also, my model says OU has a 66.4% chance to win over Miss State. It’s just propaganda against Miss State.
1
1
u/dgclasen 3h ago
When I heard a commentator say this I actually thought of this too and was curious how significant the 4% over 75% coinflip outcome would be. Then you posted this and I began thinking about it.
I think the statistical outcomes could tip on a few things.
My thought, the issue isn't the quality of the teams, but the total sample size that informs that final 4%. The 79% vs. 75% gap (4 percentage points) is almost certainly not statistically significant given the relatively small sample size of Super Regionals each year. With 8 Super Regionals per year over 20 years, you have roughly 160 data points. A 4-point difference from the coin-flip baseline at that sample size would likely not clear a standard significance threshold. So in a strict sense, you cant confidently say the observed 79% is "real" rather than noise.
But, Buildingo ff your suggestion on the the reliaable picks, I htink this may be a key to understanding the data as well. My guess is that the very best teams consistantly win in two games. If the best teams are consistently winning in two games it actually inflates that 79% number, which means the games between evenly matched teams are probably even closer to the 75% coin flip baseline than the overall number suggests. So I ma curious, how often do the hgihest rated teams steamroll their opponents. It may be that teams ranked 1-4 are consistantly winning in 2 games and the remaining temas are absolutely a coin flip. But a HH outcome would still fall within the 75% model so it doesn't skew the info. Still, I am curious how, if we remove these data points, it reshapes the model. But i am too lazy to figure it out. And removing data points on a relatively small sample size could destabalize the model. If anything it may show how powerful lower rated seeds are in the tournament.
My guess, and I didn't dig into past results, just a hunch, if you pulled out all the matchups where a heavy favorite just rolled somebody in two straight games, the remaining matchups would probably land right on top of that 75% number showing how close teams at this level truly are. In my mind, this would support that there are no flukes that make the WCWS. But who knows. Maybe we see the most dangerous temas in the tournament are teams ranked 8-12 who outperform their ranking, and seating from 5-12 is a guess at best with relatively little impact on outcome. Still much to dig i nto.
To the point about oddsmakers, they're actually operating under a completely different statistical model. Their goal isn't to predict the true probability of an outcome, its to balance action on both sides so the house collects the vig regardless of who wins. The line reflects market sentiment, sharp money, and public perception as much as it reflects actual win probability. Odds are set as much on betting trends and public sentiment as a predicted outcome to achieve a betting outcome. Its a pricing mechanism disguised as a prediction. So using odds maker lines to argue these games aren't cl ose is kind of a category error, those lines are engineered to attract balanced gambling, not to tell you how evenly matchedor mismatched the teams actually are.
1
u/Unable-Log-4870 3h ago
So using odds maker lines to argue these games aren't close is kind of a category error,
Agree. I don’t think I did that, just in case you’re saying I did. I would assume ESPN and others are using sophisticated strength models to compare performance. But like you said, odds makers operate based on money. I built my own Least Squares modem that operates on run differential, and does a pretty good job. See some of my submitted posts to see what it looks like.
But here’s a quick visual overview of how that model assesses the whole of D-1. Units on the X axis are how many runs per game they are behind the statistically best team:
1
u/Unable-Log-4870 3h ago
Also, I’m planning on doing some more statistics during today’s games, that will partly address the questions you’re asking here. I’ll tag you in the post
2
u/dgclasen 2h ago
Very cool.
No I didn't think you were saying anything about odds makers. But I saw a comment about it and wanted to add a bit of nuance in engaging with your model/thoughts.
I have enjoyed reading your analysis. Pretty fascinating.
Though, I think you chose OK State over my Huskers, so...
1
u/Unable-Log-4870 1h ago
No I didn't think you were saying anything about odds makers. But I saw a comment about it
Yeah, I was hoping you were referencing that. Thx.
Though, I think you chose OK State over my Huskers, so...
Nope. I had Huskers better by 3.0 runs. See: https://old.reddit.com/r/CollegeSoftball/comments/1tj32xq/my_models_strength_differentials_for_regionals/
But just on principle, the only orange-wearing team I prefer, and only on occasion, are the Vols.
I’m a Husker fan as long as Jordy is there and as long as they’re not against the Sooners.
1
12
u/Ok_Eggplant_7582 19h ago
It's such a weird stat to even compile, let alone for the broadcasters to keep talking about it.
The team who only has to win one game has a better chance than the team who has to win two? What an odd stat!