I came across this post from official handle of Indian national congress party (~12 M followers). The post highlights that as many as 8 NDA candidates got votes close to 100000, and this “unusual” pattern indicates a “Khel” or some kind of voting fraud.

A cluster of results inside a 1000 vote range sounds suspicious, and political parties often frame such patterns as fraud. But statistics tells us whether such clustering is actually unusual or simply what we should expect given the underlying data and normal variations.
The real question is: What is the probability that in 243 constituencies, several winning candidates fall inside the same 1000-vote band (e.g., 100K–101K)?
Intuition: We know that in state elections in a state like Bihar, average voting in a constituency is ~200K votes. In a constituency, what are the chances of a winning candidate getting votes in the range 100K to 101K? Imagine we know this probability somehow. Say p% chances of this happening, then the problem becomes a simple binomial distribution problem for us to get a sense of it.
Binomial Distribution Refresher:
For the uninitiated, The Binomial Distribution is used to calculate the probability of achieving a specific number of successes (k) in a fixed number of trials (n), where each trial has the same probability of success (p).
P ( X=k ) = nCk pk . (1-p)n-k
(Ignore the formula if this looks unfamiliar, and follow the argument)
So, imagine a tennis player has a 10% chance (p=0.1) of hitting an ace on any given serve. What are the chances he wins the game immediately by serving 4 consecutive aces (i.e., k=4 successes in n=4 trials)?
Answer = 4C4 . (0.1)^4 . (1-0.1)^(4-4) = 0.0001 (or 1 in 10000 chances in a game)
How do we estimate p in this case:
Back to the question at hand: How likely was it for 8 candidates to have winning votes in such a tight range (within 1000 votes from each other).
To answer, let’s analyze actual data for total number of votes that were polled in each constituency, as well as % of votes garnered by winning candidates in each constituency. As expected, these are broadly normal bell shaped distributions.


Nothing abnormal till here.
Below are stats for both functions (rounded to nearest logical digits):
Number of Votes:
Mean = 206500
Std Dev = 23000
Winning Vote %:
Mean = 0.48
Std Dev = 0.05
Now, we will use these theoretical normal curves. Assume that Total Votes Polled and % of votes polled for the winning candidate are two independent events (a reasonable assumption) to calculate theoretical distribution of total votes for the winning candidate.
I used Monte Carlo simulation with 1 million runs to calculate this distribution. (each simulation is a random number taken from 1st distribution to show total number of votes polled in a constituency, and a random winning vote % taken from the 2nd distribution. Multiple of these two numbers is the number of votes received by the winning candidate)
If you are interested in code or want to play around, feel free to visit it at : https://github.com/bhanu-sisodia/StateElections/blob/main/Election_Winning_Votes_prob.ipynb
Below is the resultant theoretical distribution of winning votes by the candidate in Bihar assembly (based on Total votes & winning vote % distributions)

The peak of this distribution is ~97K votes. The distribution is a narrow bell shape, which means a lot of winning votes are likely to be concentrated around this peak.
Let’s make a simplification to allow us do a quick binomial distribution check:
Let’s take ~25 buckets in and around this peak (12 on each side). While the data is normal shaped, with area under each bucket being ~2% of sides and ~2.7% at peak, let’s take it to be universal in this narrow band. The data range we are interested in is ~85K to 110 K. Total area under this range (highlighted in red color) is ~60% (or in other words, there is a 60% chance that the winning candidate will get votes between 85K to 110K), or 2.4% per bucket.
[Note: We are being conservative by going with this universal distribution assumption, as we see from the chart, the peak is at ~2.7% and is very close to the 100K votes mark. In such scenarios, any simplifying assumption that takes a more conservative route is a reasonable one and wouldn’t distort our conclusion as you will see later]
This 2.4 % is our p value. This is saying that there is about 2.4% chance that the winning candidate will get votes in a given 1000 vote bucket in this range.
The question we are asking is: Given 243 assembly results, how many results can fall under these buckets. A classical binomial problem now.
Results of Binomial distribution:

So what are the chances that either 1, or 2.. up to 8 candidates (cumulative 1 to 7) win within a given 1000 vote range? That’s 76.9%. What are the chances that more than 7 candidate will end up in a bucket like this? That’s ~23.1% (complementary of previous probability). But we have 25 such buckets, what are the chances that at least 1 of these 25 buckets get more than 7 candidates? That’s ~100% (99.86% to be precise).
The probability that at least one of the 1000-vote bucket contains ≥8 winning candidates is ~99.86%
What’s being highlighted as a fishy scam is actually a mathematical certainty!
Though to complete the story, the data had 11 winning candidates that polled between 100K and 101K votes. (8 from winning alliance and 3 from opposition alliance). What are the chances of 11 or more candidates winning in any of these 25 buckets? That’s still a very healthy 58%!!
(This aligns closely with our Monte Carlo model expectations, further confirming that the pattern is statistically normal.)
Final Words:
Isn’t it amazing how the results that look very interesting at first glance come out as not so interesting when we dig deeper. We, as corporate workers and as data professionals, face these challenges very often. The story most of the time originates from our biases (things that should be great, things that should not be working etc), and we often cherry pick data points to feed our own data bias. How come the same team won the best innovator award for last 3 consecutive years? How come 4 out of 10 folks of a team got promoted last year while the function number was only 11 out of 50, several such questions are thrown at us every week. Many of these questions can be tested against binomial lenses.
Binomial distribution is a strong tool to do quick back of the envelop calculations to figure out if things are within domain of being “as usual” or if they are asking for a deeper study?




















