Every time an election takes place in India (which is quite frequent), TV screens and newspaper pages are flooded with opinion polls, exit polls and other such surveys trying to predict winners and losers. With an explosion in the number of polling agencies and election forecasters in the last decade and a half, this trend is only increasing. While it can be understood that one of the objectives of these polls and forecasts is to make people better informed, given the increasing mistrusts and doubts due to the polls being frequently incorrect, they usually end up confusing people more than informing them. Each survey tries to prove itself as the best by citing various parameters that suit it like largest sample size, its past accuracy in calling elections, the duration for which it has been in the field and therefore understands the “pulse of the public” etc. etc. All of these convenience based metrics end up further cluttering the issue instead of actually making it clear to people on which polls to trust.
This issue got flared up again recently when nearly all pollsters got the exit poll for Bihar Assembly election 2020 wrong. Most pollsters expected RJD led Mahagathbandhan to win while some called a close contest with edge to Mahagathbandhan over National Democratic Alliance. Actual result, however, was a close contest with a narrow NDA victory.
Pollster like Axis MyIndia, which had had a consistent track record of correctly predicting the elections in at least last 2 years including the 2019 Lok Sabha, had to issue a clarification on its wrong prediction while the CVoter which had not had much success until recently came very close to predicting this election correctly.
“Which Polls to trust?” This question has also baffled us at Dhruv Research. What are the objective parameters one can use to determine the accuracy levels of various pollsters? Are some pollsters objectively better than the others? These are the questions that led us to start working on this problem.
First step was to collect the data for as many polls as possible. For this we had to rely on the data available in the public domain on the internet. We compiled the Exit and Opinion poll data of 843 polls starting from 2005. We managed to get data for 48 pollsters in this timeframe where the number of polls for a pollster ranges from 1 to 220. We included only those polls which we were able to attribute to a polling agency. Most of the polls in the public domain tend to be sponsored by the media houses, in our methodology, such polls are attributed to polling agencies and not the news channel. We removed the polls for which could only find out the media channels that carried them and not the polling agencies which conducted them, forcing us to revise our number downwards to 705.
The number of polls available online is very less for the period before 2010. However, we were able to get a considerable number of polls from the year 2014 onwards. Below is a detailed break up of the year wise number of polls.
|Election Span||#Exit/Opinion Polls in the time frame|
Now the question comes, what should be the parameters on which we can compare these polls and pollsters. One thing that comes to mind is a simple metric, that in how many elections have they called the winners and losers correctly. However, there are many drawbacks in rating or ranking the pollsters only on the basis of calling the election winner correctly.
Indian elections are far from a simple contest between winner and loser. There is almost always some third force that has to be taken into account. Take for example, out of the 8 major state elections that took place in 2018 and 2019 (apart from the ones that took place with general election) any party or pre poll alliance could gain a majority in only 4 of them (Chhattisgarh, Telangana, Maharashtra and Jharkhand). Even in this list it was for the first time in Jharkhand that any party or pre poll alliance got the majority. Even in states like Rajasthan and Haryana which seemed straight forward according to most of the polls, parties that were expected to win comprehensively did not win the election
However, this does not mean that polls got those elections entirely wrong. Most of them were directionally correct, parties that were expected to win ended up being the largest party in each election. The problem instead, lies in the magnitude of the lead that was assigned to the winning party over the losing party. In other words the margin between the top 2 parties is what the most polls got wrong. We can say that the closer the poll predicted this margin to the actual result, the better a poll will be.
Measuring the margin between the top two parties is a more nuanced measure than just looking if the winner was called correctly. Calling the winner gets tricky and difficult in races that are too close to call and actual differences may be equivalent to the polling error. In recent Madhya Pradesh elections, INC led BJP by merely 5 seats (out of 230) and a lot of pollsters got it wrong. It is in such cases that our parameter of margin comes into play. The pollster who may have called the election to INC by giving it a lead of 20 Seats is much further from the actual result than someone who may have called the election for BJP with a lead of 5 seats.
This, in our view reflects the public opinion more accurately, which should be the primary concern of the pollster. Calling elections right or wrong is just a corollary of getting the public opinion right.
We also have to acknowledge the fact that not all elections are equal when it comes to predicting the winner or even the margin of victory. There are certain elections that almost every polster has gotten massively wrong. Chhattisgarh election of 2018 is one such example where the highest seat margin predicted between INC and BJP was 34, but INC ended up getting 53 seats more than BJP. Now if we are comparing the errors of a pollster with respect to the actual result, we will not be acknowledging this fact. Also, it will give an undue advantage to the pollsters who might have chosen to skip polling in such tough to call states (for whatever reasons). Hence, in our view it makes more sense to compare an individual pollster’s error in polling to the average of the error of all pollsters in that election, rather than the absolute results.
Now, here we would like to acknowledge that while doing our research on worldwide practices on rating or ranking pollsters, we came across an American website called fivethirtyeight.com which uses similar parameters to rate pollsters in the US. Our idea of using these two parameters at some level is also inspired by them. However, given the uniqueness of Indian circumstances, we have had to modify these ratings a bit.
We come back to the relevance of third forces in the Indian context. Third forces are almost negligible in US politics and if you know the margin between top two candidates, you can almost always say what are their exact vote shares in any given election. I.e. if a poll shows that democrats have a 5 point lead over the republicans, vote shares will most likely be around 52% and 47% for respective parties. But a 5 point lead in an Indian election could mean anything from vote shares of 52%-47% to say 40%-35%. Now any framework that measures only the margin between the first two parties will say both polls are equally good. Hence in Indian context the margin, while a good measure, is liable to error because it can stay constant in a scenario where absolute vote shares or seat shares vary widely. We needed a parameter that can anchor the margin value that can remain constant while the positions of parties change drastically.
Therefore we decided to introduce the component of how accurate the pollster was in predicting the seat share/ Vote Share of the party that it predicted to win or emerge as the largest party in the election. This takes care of the margin being a floating constant.
Now, with these basic concepts, let us have a look at the exact methodology that we used.
Calling the winner correctly does not determine the performance of a pollster. Margin Seat-Share or Margin Vote Share are better metrics to predict polling accuracy.
SError or Simple Error tells us that by what percentage points a pollster was off in predicting the margin between the top two contestants. e.g. if in a national election, a pollster predicted a margin of 255 seats between top 2 contestants but, the actual margin comes out to be 262 seats then the SError will be calculated as:
SError = (mod(255-262)/543)*100
SError = (7/543)*100 = 1.29%
As discussed previously, in order to contextualize the performance of a pollster with respect to other pollsters working for the same election and we term this as Relative Pollster Margin Error (RPM Error). In order to do so, we calculated the average of SError on an election level (election level error) and then compared the SError of a poll with the election level error. e.g. if a poll ‘A’ had 10% SError in a particular election and the average SError for all the polls in that election was 15% then, RPM error for poll ‘A’ will be -5% which indicates that its performance was 5% better than the average of all the polls for that particular election. Hence in RPM error, the more negative a value is, the better the pollster is. We had to normalize the RPM over different pollsters which have done a number of polls ranging from 1 to 220, so that a single pollster with who has done only one poll does not dominate the ranking over the pollsters that have done multiple polls.
Hence we have normalized RPM using the following formula
Adjusted RPM = (RPM*Number of polls done by a pollster)/(Number of polls done by a pollster+Average number of polls)
Further, in order to calculate how accurate the pollster was in predicting the party in first position we calculated First Position Error (FPE) tells us that by what percentage points a pollster was off in predicting the number of seats/votes for the party in first position.e.g. if in a national election, a pollster predicted party in first position to win 270 seats but, the actual result comes out to be 300 seats then the FPE will be calculated as:
FPE = (270-300)/543)*100
FPE = (-30/543)*100 = -5.52%
FPE can be either negative or positive, however closer it is to 0 better it is for pollster.
In order to calculate RPM error and FPE on a pollster level, we weighed it using a recency factor wherein a poll that happened recently received a higher weight as compared to a poll that happened much back in time. This was done in order to reward the pollsters that may not have been as accurate in the past but have improved their performance in recent years.
Final rating has been calculated giving 60% weightage to Adjusted RPM and 40% weightage to absolute value of FPE.
Another conundrum that we had to face was that between using seat share and vote share projections of pollsters. Ideally, we would like to use the vote share projections because, that is the quantity that pollsters actually calculate before converting it into seatshares for parties. Vote Share to seatshare conversion adds another layer of calculations which cannot strictly be considered a part of polling. However, given that almost every media house is fixated with seat shares, most pollsters share only seat share data in public. As discussed in the beginning, while for seat share projections we could get the data for 705 polls, the corresponding number for the vote share projections was only 152.
For the purposes of making a start in this field, we have as of now calculated the ranking of pollsters using both the methodologies. Going forward we would like to focus more on the vote share projection subject to availability of the data. You can see the data we use along with the calculations in this link.
Below are the tables showing the pollster level analysis:
Pollster Rating 2020:
Pollster Ranking : Basis Vote Share
|Pollster||# of polls||Elections called correctly||Average Error||RPM Error||Adjusted RPM||Average FPE||Final Rating||Rank|
*Sum of number of polls will not add up to 167 as we discarded any pollster with less than or equal to 3 polls
Pollster Ranking : Basis Seat Share
|Pollster||# of polls||Elections called correctly||Average Error||RPM Error||Adjusted RPM||Average FPE||Final Rating||Rank|
|Jan Ki baat||28||86%||21%||-7.6%||-5.05%||-3.30%||-1.71%||3|
*Sum of number of polls will not add up to 733 as we discarded any pollster with less than or equal to 3 polls
Results of the Polster Ranking
As can be seen from the above tables, Axis MyIndia and Today’s Chanakya consistently rank in top 2 with axis having only a slight edge in both methodologies. It can also be seen that Hansa and CSDS, which do a pretty good job in predicting the Vote shares in elections are not that good when it comes to predicting the seat share. On the Other hand, Jan Ki Baat which has done a good job in predicting seat share, has not given out vote shares in enough number of elections so s to be judged on this basis.
Axis MyIndia remains the best pollster in the country even after the wrong prediction in Bihar election, while CVoter’s relatively better prediction could not cover for its previous misjudgments.
While VMR and IPSOS which have one of the best track records in calling elections correctly, tend to falter when it comes to nuanced understanding of the election in terms of vote share and seatshare.
While we have done our best to stay objective and present to the public a metric that can be used to build trust in the opinion and exit polls in the country, we are open to improving our methodologies and adding more data as and when it is available, in order to update the ranking. We have made our data and the calculations open to the public on GitHub and invite the pollsters, psephologists and others to have a look at the same.
Going Forward we are planning several additions to this study for a more nuanced understanding of the polling scene in the country. These include assessing the overall scenario of polling in India to see if the polls on average have improved or worsened in their accuracy with respect to the results, assessing partisan bias of the pollsters and looking at the correlation of the accuracy of the polls with respect to its proximity with the election date and how that can affect the rating and help us figure out a meaningful way in distinguishing between exit and opinion polls in terms of accuracy.