Published October 29-November 4, 2020
A Tale of Two Surveys
On October 20, India Today television aired the results of a survey conducted by Lokniti-CSDS on the Bihar elections. It reported NDA ahead with 38% vote share and RJD/INC with 32% vote share.
Prashnam’s Friday Insight #4 on October 2 reported NDA vote share of 38% and RJD/INC vote share of 28%.
Further, in a question of who is the preferred choice for the next Chief Minister of Bihar, in the CSDS-Lokniti survey, 31% chose Nitish Kumar and 27% chose Tejashwi Yadav. Others such as Lalu Yadav, Chirag Paswan, Sushil Modi were also chosen by a small 3-5% of voters each.
Prashnam’s Friday Insight #2 on September 20 had reported about Tejashwi Yadav being as popular a choice for Chief Minister as Nitish Kumar, something that many found it hard to believe then. (In fact, Prashnam was the first to identify Tejashwi’s popularity. Prior to Prashnam, Tejashwi was seen as an also-ran compared to Nitish.)
Prashnam ran a new survey again on the morning of October 21 with 2 questions.
- 1: Who will you vote for?
- 2: Who should be the next Chief Minister of Bihar – Nitish Kumar, Tejashwi Yadav or someone else.
Over 2500 voters in Bihar across all assembly constituencies responded. The survey was conducted using Prashnam’s proprietary artificial intelligence powered survey engine.
- Who will you vote for? NDA 37%, UPA 31%, LJP 6%, Others/Undecided 25%
- Who should be next Chief Minister? Tejashwi Yadav 40%, Nitish Kumar 37%, Others 22%
The key takeaway: Lokniti-CSDS Bihar survey was conducted over 7 days and probably cost twenty times more than Prashnam’s feedback engine which was done in an hour and cost much less. But the results were near identical.
On October 22, BJP announced in its manifesto that everyone in Bihar would be given the Covid-19 vaccine free of cost if they come to power. The next morning, Prashnam ran a survey to assess the impact. Prashnam asked two questions to 2708 people in Bihar across all districts and assembly constituencies:
- Have you heard of BJP’s poll promise on Corona vaccination?
- Is it appropriate to offer free coronavirus vaccination as a poll promise?
The results: 53% have heard of BJP’s free coronavirus vaccine poll promise. 66% of those that have heard found it appropriate.
The survey, like all done by Prashnam, was completed within an hour.
Spread, Scale, Speed – that’s how Prashnam is transforming surveys in India. Google’ search box opened up new doors to information; Prashnam’s feedback engine is doing the same to understand how people think. At a hundredth of the time to get answers and a tenth of the cost — how is Prashnam doing it? What is the science behind surveys? How can this change decision-making across industries? This is what I will answer.
The Birth of an Idea
Prashnam’s story has its origins 8 years ago, in 2012. I was reading Sasha Isenberg’s just published book “Victory Lab.” In it, he talked about political science in the context of US elections. I had then set up Niti Digital to work on Narendra Modi’s 2014 election campaign. The use of polling as a primary input to decision making during election campaigns fascinated me. I started thinking how it could be used during Indian elections.
I got an opportunity prior to the 2014 Lok Sabha elections. I decided to survey voters in UP and Bihar before and after they voted (pre- and post-poll). I followed the methodology outlined in CSDS-Lokniti’s national election surveys – use the electoral rolls to randomly select booths and then in the chosen booths random select voters. Done right, all that one needed was a sample of about a thousand voters to get an accurate assessment of what people were thinking.
The surveys in the 120 constituencies of the two states took time and were expensive. A person had to visit each of the identified voters and ask the questions. Responses were either tallied on paper or where possible entered into a mobile app. The process took time. It was also hard to verify if the conversation had actually taken place – so a smaller sample had to be called to do a cross-check.
The process worked well. When I compared the survey results with the actual outcomes, the accuracy was 90%.
A friend and I were discussing ideas around polling in India just before the lockdown started in late March. We both agreed that there had to be a better way. Almost everyone in India had a mobile phone – so why not just call them and ask? While a call centre agent could do it, that process was prone to manual errors. (And research has shown that respondents answer more truthfully to a recorded voice than a person asking them.)
One of the key elements we needed to ensure was stratified sampling – to make sure that the people chosen for the survey were representative of the overall population. If we could do this and combine it with an interactive voice response method, we could transform the process of surveys in India. Thus was born the idea of Prashnam.
A key conundrum that needs to be addressed is the following: how can just asking a thousand people give an accurate indication of the thinking general population numbering tens or hundreds of millions. This is where science comes in. Let’s split the problem into two parts: who to sample and how many to sample.
Here is a short summary from National Science Foundation on how to sample scientifically:
When conducting a survey, how a researcher selects participants is just as important as how many participate. Scientific surveys can include every member of the group to be studied, but this approach is usually impractical and/or expensive. Instead, researchers often draw conclusions about a target group using information gathered from a small representative sample of that group. Representative samples must be selected carefully and without bias.
The term “random” has a different meaning in statistics than in ordinary language. In everyday terms, a random event is one that is unpredictable, lacks purpose and/or has no discernible pattern. In statistical terms, a random event is one that occurs with a certain, measurable chance or probability of happening. For example, under the simplest circumstances, where each member of a population has one chance of being sampled, the probability of getting selected for a survey can be calculated just by knowing a population size and desired sample size. One would have a 10 percent chance of being selected for a 100-person sample out of a total population of 1000. But, researchers use several methods for randomly selecting samples. These include stratified, cluster and systematic sampling. Stratified and cluster sampling require prior knowledge about the survey population but can produce more representative samples than simpler “blind” sampling methods. Researchers often use stratified sampling to capture the diversity of large populations with distinctive, homogeneous subgroups—such as the U.S. population.
In Prashnam, we use a process called stratified random sampling. More from Wikipedia:
In statistics, stratified sampling is a method of sampling from a population which can be partitioned into subpopulations.
Assume that we need to estimate the average number of votes for each candidate in an election. Assume that a country has 3 towns: Town A has 1 million factory workers, Town B has 2 million office workers and Town C has 3 million retirees. We can choose to get a random sample of size 60 over the entire population but there is some chance that the resulting random sample is poorly balanced across these towns and hence is biased, causing a significant error in estimation. Instead if we choose to take a random sample of 10, 20 and 30 from Town A, B and C respectively, then we can produce a smaller error in estimation for the same total sample size. This method is generally used when a population is not a homogeneous group.
The next question: how many to sample? The answer will surprise you.
There are two important terms to understand for determining sample size – confidence level and margin of error. Stat Trek explains both:
Statisticians use a confidence interval to express the degree of uncertainty associated with a sample statistic. A confidence interval is an interval estimate combined with a probability statement.
For example, suppose a statistician conducted a survey and computed an interval estimate, based on survey data. The statistician might use a confidence level to describe uncertainty associated with the interval estimate. He/she might describe the interval estimate as a “95% confidence interval”. This means that if we used the same sampling method to select different samples and computed an interval estimate for each sample, we would expect the true population parameter to fall within the interval estimates 95% of the time.
Confidence intervals are preferred to point estimates and to interval estimates, because only confidence intervals indicate (a) the precision of the estimate and (b) the uncertainty of the estimate.
The margin of error expresses the maximum expected difference between the true population parameter and a sample estimate of that parameter. To be meaningful, the margin of error should be qualified by a probability statement (often expressed in the form of a confidence level).
For example, a pollster might report that 50% of voters will choose the Democratic candidate. To indicate the quality of the survey result, the pollster might add that the margin of error is +5%, with a confidence level of 90%. This means that if the survey were repeated many times with different samples, the true percentage of Democratic voters would fall within the margin of error 90% of the time.
Most political surveys provide their results with a 95% confidence level and a +3% margin of error. For this, the sample that they need for a heterogenous population (irrespective of size) is about 1000. This is the magic of sampling. To get a sense of what voters in Bihar (about 7 crore) think, all we need to do is to sample 1000 randomly selected people. By choosing them across all assembly constituencies and ensuring proper representation against age, gender and geography, this sample of 1000 can give a very accurate view of what the general population thinks.
(If you don’t believe it, try this sample size calculator. Select confidence level of 95% and margin of error of 0.03, and play around with the population size.)
What we have learnt so far: a sample of about 1000 people chosen via stratified random sampling is good enough to provide a mirror of what the people are thinking. This gives a 95% confidence level and a 3% margin of error. (If the population is more homogenous as in a village or a PIN code or even an assembly constituency, then the sample size can be reduced.)
A Pollster Speaks
A recent book by Anthony Salvanto, “Where Did You Get This Number?” has a lot of interesting insights into surveying. Salvanto is a pollster. He explains sampling:
The first step is to forget for a moment anything about the specific size of the poll, be it one thousand people or ten thousand people, and right now simply think in terms of knowledge about the world—knowledge that you can either get or not get.
There are plenty of times people can gauge how well they know something by what portion of all the available information they have. In school, for instance, when tomorrow’s history test covers the whole textbook, but you only read half of it, you can correctly gauge that you’re in trouble. (I found this out the hard way a few times.) Or if you’re buying a new car, and you haven’t read the crash test ratings or found out the gas mileage yet, you could justifiably feel uninformed walking into the dealership. Those are problems of completeness: you haven’t seen all the information that’s out there, and what you do know just will not substitute for what you don’t.
A poll, as traditionally conceived, does not try to fit into those categories of information gathering. There are other occasions, more akin to polling, when we gauge whether we truly know about something by whether or not we’ve sampled it well; that is, when we think what we’ve already seen is a good enough representation of all that we have not seen. It’s the restaurant you visit twice, not a hundred times, before you decide if it’s good.
A classic analogy for the mechanism behind this was mentioned by Gallup in a chapter he wrote in his book The Pulse of Democracy called “Building the Miniature Electorate,” in which he compared sampling the country to tasting a “bowl of soup.”
He adds on sizing:
On a sample of 1,000, a poll will often report a margin of error of 3 points. If a poll reports an estimate of 50 percent with a margin of error of 3, we’re saying we’d get values between 53 and 47 if we kept repeating the poll, and that the truth is in that range. That’s often good enough for us to tell a meaningful story, such as how many movie fans there are. And we sometimes have to, because the margin doesn’t get a lot better as we collect more samples from there. On a sample of 3,000 it’s . . . about 2 points. We just tripled our sample size from 1,000 to 3,000 and barely dropped the margin of error. That’s because there’s always going to be at least some uncertainty arising from the fact that we haven’t talked to everybody. Even if we drew huge samples of one million people, sometime along the way of drawing them pick by pick we’d get some samples that were 59 percent to 41 percent, or even 60-40, instead of being evenly balanced. Not many, but some. That’s randomness at work, too. Samples, it turns out, work mathematically a lot like experience in life. Getting some is necessary, and getting a lot makes you good. But no matter how good you get, no one is perfect.
In India, the focus needs to be on India’s 4000 Assembly Constituencies to ensure the spread that is needed. Prashnam does just that.
How Prashnam Works
Prahnam’s secret sauce lies in the way it combines the science of surveys with technology. An AI (Artificial Intelligence) engine helps select the people to be sampled. Prashnam ensures that the spread is as wide as possible to ensure the sample is as representative of the underlying population as possible. These people are then called on their phones and their input is sought using an interactive voice response (IVR) system. Since not everyone responds to incoming calls, care is taken to ensure the sanctity of the sample. The calling process is very scalable, and thus thousands of calls can be made in a matter of minutes. Results are visible in real-time. And what’s more, Prashnam allows for verification for a small subset of the numbers called – they can be called manually and queried on their response, which can be compared with the answer they gave earlier to the automated call.
Consider the alternatives:
- In-person survey, where agencies need to train and send people across the country, and then do data entry. While they can ask many more questions through a longer survey, this method is time-consuming and simply not scalable.
- Telecalling, which is used by most political parties, involves the use of massive call centres. There is a lot of manual intervention leading to mistakes. It is also not easy for them to ensure stratification. As such, the answers are unlikely to represent the true voice of the people.
- Online surveys, which are becoming popular, cannot get a picture of true India at all. The sample is inherently biased towards an urban, younger population.
Prashnam has advantages different from traditional and online methods. Its IVR system with the AI-based sampling ensures speed, scale and stratified sampling. Automation in the entire process eliminates all sources of errors. Its use of the mobile phone ensures spread, and eliminates urban and youth bias. Prashnam thus offers a true representation of the opinions of “real India.” And by doing this at a fraction of the cost of other methods and at a speed that ensures surveys can be done in under an hour, Prashnam’s feedback engine is a disruptive innovation – in the same vein that Google’s search engine was.
To this, Prashnam has added ease: an end-to-end Do It Yourself (DIY) capability. Any individual can record the questions right from the phone or the desktop, and launch the survey – without having to rely on any human interface. The hope is that this will massify the use of surveys for decision-making. Media professionals, researchers, academics, politicians, business managers, NGO leaders – everyone can now rely on data-driven inputs rather than instinct to better understand what people are thinking.
In India recently, a huge controversy has broken out on TV ratings which are published by BARC (Broadcast Audience Research Council). The ratings have been allegedly manipulated by some channels. Given that advertisers spend tens of thousands of crores based on these ratings, the incentive to tamper with the process is definitely there.
So, just what is the process? BARC has 44,000 meters put in households across the country – thus creating a fixed panel of television viewers. Data from these meters is then collected and collated to provide weekly ratings. The location of the meters do not change for a substantial period of time. You can imagine what can be done if some of the households where the meters are present are identified. And therein lies the problem.
Now, imagine an alternate solution. A random sample of a few hundred people can be surveyed daily about what they watched. The sample can be changed daily so no manipulation can be done. Because the cost of IVR surveys is much lower, the frequency can be increased. Results can be made available in near real-time on the viewing habits of people.
This is the power that Prashnam’s solution can bring to the table. By randomising the sample, bringing down cost and eliminating errors, Prashnam can revolutionise surveys. Quick surveys, done frequently, can provide better inputs for decision-making. Consider, how several US agencies do it (look not just at the sample size but also at the frequency):
- US Presidential surveys get inputs from 900-1000 people via phone on a daily or weekly basis to provide insights into what 200 million American voters are thinking
- University of Michigan’s Consumer Confidence survey does 500 interviews each month
- The US Federal Reserve Inflation Expectations survey covers 1300 households and is done monthly
- The US Purchasing Manager’s Index surveys 400 senior purchasing managers monthly
Prashnam can bring the same ease and affordability to surveys in India. It is up to the imagination of decision-makers to use it. What we have created is an engine whose power is only limited by the human mind.
Did people see your ad in the 8 pm TV serial? What should be the angle on the 9 pm prime time TV show? What hair oil are people using? What is the perception of your detergent brand among consumers? Which candidate should be given the ticket for the election? What is public perception about government programmes? Which marketing slogan sounds better? Which song tune will work better? For all the questions, a Prashnam survey can provide quick answers. The Prashnam use cases are many and unlimited. Just like Google’s search box.
For me, Prashnam is the culmination of a journey that began during the 2014 election campaign. The seeds were sown then, the tree has grown. My hope is that it finds use in hundreds of daily decisions that people make. As we say on the Prashnam website: “No longer does one have to make big decisions based entirely on one’s intuition or in isolation. No longer does one have to be told by self-styled pundits on what people think. Our platform seeks to revolutionise the insight gathering process cutting times from weeks to mere hours.” Try it out and experience the magic!