Two years ago, I wrote a blog about the importance of the toss in determining the result of county championship games. It turned out that it wasn’t important – in fact, it had no significant effect. Nor did batting first, nor did whether you were at home or away from home. Basically, all those things we obssess about at the beginning of the match make no discernable difference.
In the light of the ECB’s decision to scrap the toss in LVCC games next season, this has become quite a hot topic. Harry Gurney (@gurneyhf), Nottinghamshire and England fast bowler, tweeted his opinion:
IMO scrapping the toss and allowing the away team to have the option is a great idea. Not only would it encourage good cricket wickets. But also take luck out of the equation. In a county season it is perfectly possible that a captain could lose more than 75% of tosses.
He’s right – in a season of 16 games, the probability of a captain losing the toss 12 or more times is about 4% (from basic binomial distribution theory, which I have learn many times and yet still have to work out from basic principles every time – call myself a statistician?) and given that there are 18 counties, the probability that none of them will lose the toss 12 or more times in a year is 49% – just under half.
But I thought we established in my last blog that the result of the toss doesn’t affect the outcome of the game? I sent a link to this to Harry Gurney, and he replied,
hmm not a big enough sample for me, wonder if there are any studies with more data?
It’s a very good point. My analysis was attempting to reject the null hypothesis (i.e. to show that if you know the result of the toss you have a better idea of the result of the game than if you don’t) and I couldn’t do this. That might be because the toss doesn’t have an effect (there is no “signal”) or because there is not enough “power” to detect the signal from the “noise” (random variation). The way to increase power is to increase the number of samples you take. This makes sense intuitively: if you want to know how good a batsman is, and you see his score in one innings, he might have made a duck or a hundred, but either way it doesn’t give you a lot to go on. On the other hand, if you see all the scores he’s made in his career, you’ve got a much better chance of judging how good a batsman he is. That’s because having a lot of samples allows you to better discern a signal from all the noise.
So my wonderful friend MJ (@SubtleKnife00) went to the Cricinfo archive and dug out all the results from both divisions of the County Championship in the last two years: 2014 and 2015, which I added to my existing data for 2013. That gave us 431 games (one was rained off without a toss), of which 259 had a positive outcome (i.e. not a draw – this will be important later).
Last time I just did a simple chi-square analysis of probability of winning compared to losing. This time I decided to do a slightly more complicated Generalised Linear Model, because it means I don’t have to worry so much about repeatedly sampling the same population, and I can also look for interactions between variables. I made two models, one testing what factors are correlated with whether or not the game has a positive result, and the other looking at what affects whether the home team wins or loses, given that it isn’t a draw.
Question 1: If I know the result of the toss, whether the home or away team batted first, the year, and the division, do any of these help to improve my prediction of whether there will be a positive result in the game?
Answer 1: 172 of the 431 games ended in draws (40%), so my best guess with no other information is that any given game has a 40% chance of being drawn. Knowing the result of the toss doesn’t improve my prediction (all Generalised Linear Models with binomial error structure; F(1,429)=1.48, p=0.224), nor does knowing which team batted first (F(1,429)=1.13, p=0.289), the year (F(1,429)=2.83, p=0.0933), or the division (F(1,429)=2.35, p=0.126) – so the probability of drawing the game is not significantly different whether you are in the first or second division, and doesn’t differ between years. I also tested all the possible interactions between these four factors – for example, is the probability of drawing higher in division 2 in 2013 but lower in 2014? – and none of these were significant either (using stepwise model simplification; stats on demand). So who wins the toss doesn’t make a difference to the likelihood of drawing – that’s not very surprising.
Question 2: If I know the result of the toss, whether the home or away team batted first, the year, and the division, do any of these help to improve my prediction of who will win the game, assuming there is a positive result?
Answer 2: For this analysis, I only used the 259 games which were not draw. 138 of these (53%) were won by the home team. Once again, knowing the result of the toss (Generalised Linear Models with binomial error structure again; F(1,257)=0.397, p=0.529), who batted first (F(1,257)=2.85, p=0.0925), the year (F(1,257)=0.0552, p=0.814) or the division (F(1,257)=0.557, p=0.456) doesn’t improve our guess of who is going to win, nor do any of the interactions (stats on demand). So we can conclude that there is no evidence that winning the toss changes your likelihood of winning the game. Finally, of the 259 games with a positive result, the home team won 53% of them, which is not significantly different to the half you’d expect at random (t-test, t(258)=1.06, p=0.292).
But what about the power of this test? We were worrying about that earlier – is our three-year sample big enough to detect a signal amongst the noise? Well, I tried to do a basic power analysis on the size of this dataset, failed, and got the amazing Xav (@XavHarrison) to fix my code and calculate that, with 431 data points, I had an 80% chance of detecting a difference of 15 percentage points to the probability of a draw, using a p-value of 0.05 (so, for example, if the probability of a draw was 40% if the home team lost the toss and 55% if they won it, I would have an 80% chance of detecting that with this sample size). Likewise, with 259 data points, I had an 80% chance of detecting a difference of 17 percentage points to the probability of a win, using a p-value of 0.05. This means that if there is a fairly big effect of winning the toss, we would probably have spotted it, but it’s also a low enough power that I’d like to go back and increase my sample size – so I may be revisiting this again in the future!