Oct 20, 2018

Most Promising Batsman - Find Out Mathematically

Cricket enthusiasts often rely on batting averages to gauge a batsman's skill, but this widely-used metric has its drawbacks. This blog introduces a game-changing statistic—the batting index—which combines batting averages with the standard deviation to measure both volume and consistency of runs scored.

When comparing different batsmen, the statistic that is invariably brought out is the batting average. It is a fair enough indicator of a batsman's ability too, for it suggests the number of runs he scores per dismissal - Brian Lara makes 53 runs per dismissal to Ramnaresh Sarwan's 40, hence Lara is clearly a superior batsman.

While the efficacy of averages is inarguable, it has its limitations. For instance, it doesn't tell us the consistency levels of a player: a batsman who scores 0, 200, 25 has exactly the same average - 75 - as one who makes 70, 80, 75, though it's obvious which one of the two has been more consistent.

Enter a statistical tool called the standard deviation. As the name suggests, this method indicates how much a sequence of numbers deviates from its average.

The problem with average is that if one leg of yours is in the oven, and the other in a freezer, on average you are comfortable. This is a big learning in statistics – the word average makes no sense if the standard deviation is very high. That's why if a batsman scores 200 in one inning and goes for duck in the next 3, he would still have an excellent average of 50, but the player not consistence. that means the standard deviation is very high.

You'd obviously want greater consistency from a batsman, but check this sequence out: 16, 15, 17, 20, 22, 14, 18. Mr X is obviously extremely consistent - the standard deviation is only 2.61 - but at an average of 17.43, he isn't doing much to help the cause of his team.

In the two run-sequences given earlier, for example, the second one has a standard deviation of just 4.08, while for the first, it's a whopping 88.98.

A meaningful stat, then, is one that combines batting averages - for that is an indication of the sheer volume of runs he scores each time he bats - with a consistency index that measures how much he deviates from his average score. For the purpose of this exercise, the batting average has been divided by the standard deviation to arrive at an index. The batting index is exactly inverse to another statistical term called the coefficient of variation (CV) which is defined as the ratio of standard deviation to mean.

Batsman Runs Average SD Batting index (Average/ SD)
Jacques Kallis 7940 56.31 44.54 1.26
Allan Border 11,174 50.56 40.49 1.25
Ken Barrington 6806 58.67 47.36 1.24
Jack Hobbs 5410 56.95 46.68 1.22
Arjuna Ranatunga 5105 35.70 29.44 1.21

Here are some other star players who had not made it to the top 10 either, Ricky Ponting (1.13), Rahul Dravid (1.12), Adam Gilchrist and Sourav Ganguly (both 1.10). Inzamam-ul-Haq manages an index of 1.07, while Sachin Tendulkar has 1.03, both slightly better than two stalwarts from the 1980s, Sunil Gavaskar and Viv Richards (both 1.02, rounded off to the second decimal).

Let's now lower the bar to 3000 runs and look for consistency alone. How many would have guessed that Shaun Pollock would have had the lowest standard deviation among this group? In fact, the top six are all lower middle-order batsmen who have consistently bailed their teams out in crises. Their averages aren't so impressive, but the standard deviations indicate just how consistently they have performed.

Batsman Runs Average SD
Shaun Pollock 3406 31.25 23.44
Rodney Marsh 3633 26.52 25.91
Richard Hadlee 3124 27.17 26.31
Mark Boucher 3357 29.97 26.65
Ian Healy 4356 27.40 26.69
Jeff Dujon 3322 31.94 29.01

[All the stats as of 06.10.2018 and for test cricket only]

Quick Fact - Don Bradman, with a staggering average of 99.94, is widely known. However, his standard deviation of nearly 87 is also the highest (most inconsistent) among all batsmen with at least 3000 runs.