Exploiting survey data
NEELANJAN SIRCAR and MILAN VAISHNAV
MUCH like the weatherman, the Indian psephologist too is the object of unrelenting criticism emerging from diverse quarters. This is because survey opinion research, like meteorology, is a complicated undertaking – susceptible to flaws in measurement, subject to exogenous shocks, and open to certain variations in interpretation. There are margins of error attached to both pursuits, even though these are often ignored or understated by those who immediately flock to the headline numbers.
In this short article, we shed light on three issues that can help determine what we can and cannot learn from election opinion surveys in India. We restrict our attention to surveys with a social science orientation whose primary focus is to analyze political behaviour, as opposed to surveys commissioned with the sole focus of predicting the ‘horse race’, or which party will win or lose the election.1
First, we highlight the challenges that all opinion surveys – whether in India or elsewhere – encounter when trying to measure ‘occurrences’, or events that have taken place (e.g., turning out to vote on election day) and ‘beliefs’, or subjective assessments of performance (e.g., the incumbent’s management of the economy). How accurately voter surveys can measure either of these phenomena constitutes a good test of their quality. Especially with regards to beliefs, the precise wording of survey questions is absolutely central. After all, how a question is framed can have a subtle, yet significant impact on the response of those surveyed.2
Second, we discuss the barriers to drawing accurate inferences from survey data. This can be broken down into two separate but related issues: how ‘representative’ the sample is of the broader population the surveyors seek to characterize, and how well the survey picks up the true preferences of respondents. Because the latter enterprise could be subject to social desirability bias, we briefly discuss novel approaches that employ experimental or indirect techniques that can mitigate these concerns.3
Third, we highlight the importance of data openness and replicability for efficient aggregation of knowledge. In recent years, social science (much like the hard sciences) has been harshly criticized due to a perceived lack of data openness, which serves to limit the aggregation of knowledge as well as the possibility of additional testing for robustness of results. Indeed, numerous studies have documented widespread failures of empirical replicability, suggesting that openness is paramount not only with regard to data sources but also data methods.4
In order to assess the quality of a survey, it is important to have clarity on what the researcher is trying to learn from the data. Two of the most common uses for social survey data are the measurement of occurrences and beliefs. Occurrence questions measure and characterize actions that have taken place, asking for example, for whom a survey respondent cast her vote, while belief questions record subjective evaluations by the survey respondent. In order to construct a complete argument from survey data, an analyst must typically measure both occurrences and beliefs.
An occurrence is an objective quantity of interest, such as whether a voter has voted for the Bahujan Samaj Party (BSP) in the state of Uttar Pradesh in the most recent assembly election. They are occurrences because they are the result of an action that has taken place (e.g., casting a vote), even if the actual individual level occurrence may be difficult to verify. How well or poorly a survey measures an occurrence, therefore, is usually a measure of how closely survey data from a sample resembles the ‘true’ frequency of these objective actions in the population. One may assess the quality of a survey intended to predict the voting decisions in an election by comparing, for instance, the survey estimate of the BSP’s vote share to the BSP’s actual vote share in the Uttar Pradesh election.
In contrast, a belief is a subjective assessment by the survey respondent, such as whether she believes the incumbent state government is doing well in managing the economy. These beliefs are necessarily subjective because they cannot typically be cleanly tethered to any objective quantity or event. To continue with our example, if one wants to understand beliefs about the incumbent government’s performance in managing the economy, there are many questions that a researcher could ask through a survey. One might pose the positively framed yes-or-no question, ‘Do you believe that the incumbent government has managed the economy well?’ Or one could also ask a negative-framed binary question, ‘Do you believe that the incumbent government has managed the economy poorly?’ Technically, these questions are mirror images of each other, but research shows that responses to the former will typically be biased in favour of the government, and vice versa. The reason is the positive or negative framing of a question tends to, in and of itself, influence respondents. On the face of it, both questions (and responses) are equally valid, even if they lead to different conclusions.
In the measurement of beliefs, therefore, the analyst must pay particular attention to how questions are framed and how they are understood by the survey respondent. Such framing issues involve how the survey investigator’s word choice may also impact answers as well as the choice of response categories allowed to the respondent.5
The challenges to drawing inferences from survey data can be disaggregated into two categories. First, it can be difficult to discern about whom the survey is making inferences, or what we call concerns of representativeness. Second, it is not always straightforward to discern what the survey respondent is intending to say, or what we refer to as concerns of interpretation.
The protocol developed by the Lokniti Programme of the Centre for the Study of Developing Societies (CSDS), the leading social science organization in India undertaking election survey research, typically relies on surveys of respondents after the election poll has been conducted but before the results of the election have been declared. In particular, Lokniti makes a distinction between this type of ‘post-poll’ and an ‘exit poll’, which is conducted by many other media agencies (on occasion, Lokniti also carries out pre-polls, which are done in advance of the election). In a post-poll, the surveyor conducts the survey at the home of a randomly drawn respondent from the voter list, rather than just outside the polling booth as in an exit poll. This provides greater privacy for the respondent, and allows the surveyor to attempt to reach a potential respondent multiple times. For these reasons, post-polls often have a natural advantage over exit polls when it comes to representativeness of the sample.
However, while pre- and post-polls allow for more sophisticated sampling, they too are not without their shortcomings. For instance, on account of time constraints, the surveyor must attempt to reach a large number of people in a short period of time. Furthermore, it is often difficult to find certain persons at home; due to work schedules or seasonal migration, for example, many individuals cannot be reached easily. In addition, because the period around elections may be one of heightened social tensions and uncertainty, some would-be respondents might be unwilling to participate in the survey. The upshot is that the set of respondents for the survey will always be demographically biased. Certain populations – namely those that are easier to reach and more comfortable answering the survey (and who have the time to do so) – will thus be disproportionately represented in the survey sample.
This is a common concern across social surveys the world over. In such situations, analysts calculate a set of ‘survey weights’ that underweight populations who are overrepresented in the data and overweight populations which are underrepresented in the data. The basis for these weights is typically other demographic data, such as information taken from the decennial Indian Census.
To understand how survey weights can help in bolstering the quality of the survey estimates, imagine that survey data shows Scheduled Castes (SCs) are more likely to vote for the Indian National Congress over the Bharatiya Janata Party (BJP) in a given state, while other caste groupings are more likely to support the BJP over the Congress. Then imagine that we find that SCs as a fraction of the survey sample is smaller than the true frequency of SCs as a fraction of the total population (based on the latest Census data). In order to provide better estimates, one could increase the amount that each individual SC respondent matters by ‘weighting’ the data accordingly. Similarly, we know that certain areas will have a higher concentration of SCs, and these corrections can also be made by survey weighting. More sophisticated regression based methods, beyond the scope of this piece, may also be pursued. The point is that methods for demographic adjustment are likely to increase the quality of the estimates in terms of representativeness, but, to the best of our knowledge, these sorts of adjustments are not made – or not made consistently – to a good deal of election survey data collected at present.
Asecond concern is that survey respondents might misrepresent their preferences, making it difficult to interpret the data. This is particularly a concern in election surveys. For instance, respondents may feel implicit or explicit pressure to answer that they have voted for a particular party, even if they have done otherwise. This is a concern because respondents will likely (correctly) believe that those conducting the survey will be able to discern their preferences, and respondents may either question the neutrality of those conducting the survey or may believe that it is inappropriate (or even harmful) to reveal their true preferences.
As an aside, beliefs about the sanctity of the secret ballot cut both ways here. Those who believe the secret ballot is not perfectly secret, might be reluctant to divulge information about their ‘true’ preferences to a survey enumerator. Equally, those who do have confidence in the secret ballot might also be hesitant to respond truthfully because they view voting as a highly personal and protected form of civic duty.6 Either way, the reality is that the raw data rarely correspond to what the analyst will declare to be the expected vote share and seat share for the party. All survey organizations make adjustments to the data but they rarely reveal how they have done so.7 If the raw data is not deemed to be trustworthy and/or if most analysts cannot figure out how the data are being adjusted, this severely hampers the use of survey data by the larger research and policy communities.
Another approach to dealing with misrepresentation is to use a series of survey tools that remove incentives for respondents to misrepresent information. One way of doing this is to make sure that the information in question cannot be tied to the respondent. In a survey the authors conducted with Devesh Kapur, in collaboration with the Lok Foundation and CMIE (Centre for Monitoring Indian Economy) in advance of the 2014 general election, we embedded a ‘list experiment’, designed to do exactly this.8 In this survey we were interested in soliciting the views of potential voters’ on political candidates who faced criminal cases.9 A common argument made is that even though voters in India often have information about the criminal antecedents of their candidates, they nevertheless vote for them because of their credibility in ‘getting things done’ once in office.10 This goes against popular theories in political economy that believe voter ignorance is at the heart of this puzzle.
For good reasons, asking voters directly whether they would be willing to support a candidate who delivers benefits but faces serious criminal cases is susceptible to social desirability bias. Voters might be discouraged from answering truthfully for fear that the survey enumerator may attach a normative judgment to their response. In the Lok survey we found that 26 per cent of respondents openly signalled their willingness to vote for a candidate facing serious cases.11 While one could argue that this proportion is high – after all, one in four voters answered in the affirmative – we also asked the question indirectly through a list experiment.
For the experiment we randomly assigned all respondents to one of two equal sized groups: ‘control’ or ‘treatment’. The control group received a list of three types of candidates (a candidate who is wealthy, a candidate who is poor, and a candidate who does social service but is not affiliated with any party) and was asked how many (crucially, not which ones) trouble them. The treatment group was provided the same three options but also a fourth, ‘sensitive’ option (a candidate who delivers benefits but has serious criminal cases). This fourth option contains language identical to the direct question we asked earlier. Since the respondents are being asked how many – not which – statements trouble them, any difference in the average responses between the treatment and control groups is due to the inclusion of the sensitive item. The analysis allows us to conclude that 48 per cent, or nearly one in two respondents, are not troubled by a candidate who faces serious criminal cases but who gets things done. This represents a 22 percentage point discrepancy between the direct question and the list experiment result, which suggests considerable social response bias with regards to the former.
List experiments are not the only experimental method that can be employed to tease out answers to difficult questions where there is a strong likelihood of lurking social desirability bias. Random response techniques, endorsement experiments, and survey experiments have already been implemented in similar election surveys around the world, and survey research organizations in India should consider using some of these techniques to reduce biases associated with the misrepresentation of data.
We would be remiss if we did not insert a caveat to these words of encouragement: our own experience, drawn from trying to utilize these experimental techniques in different parts of India, highlights the challenges associated with implementation. Many survey organizations do not (yet) have deep experience with these methodologies, so mistakes and miscommunication are common. Often researchers try to field overly ambitious (read, complicated) questions. Ensuring adherence to a strict randomization/implementation protocol also poses challenges. To be sure, these obstacles are not insurmountable, but they do require greater up-front investments in training and monitoring.
Modern standards of data analysis require the possibility of replication. In short, statistical analyses that one publishes should be verifiable by another researcher using the same data set. Replication, in turn, requires greater data transparency – an attribute which is often missing in opinion surveys.
Without adherence to the principle of transparency and the associated process of replication, there is likelihood of mistakes going unrecognized and, in extreme cases, malfeasance going uncorrected. In short, there is no way of verifying just how robust or trustworthy the data are. Indeed, many major scholarly journals in the sciences, economics, and political science explicitly require that analysts provide a data set and statistical code that can be used to replicate the findings submitted in a research paper.
The process of replication effectively requires that the data set upon which the analysis is conducted be made publicly available (excepting personally identifiable information about the respondents, which is withheld or obfuscated). The public availability of data has several benefits for the quality of research. First, the ability to replicate and publicly check results will likely improve the quality of the research. Whenever survey data is collected, questions of how best to interpret it, as well as coding errors, emerge. But rather than relying on the researchers who directed the original survey to invest significant new resources to improve the quality of the data, ‘crowdsourcing’ will take care of many of these issues.
Second, by democratizing the space of individuals who may conduct the analysis, public availability increases the number of unique and insightful analyses emanating from the data. A small team of researchers, no matter how skilled, will never think of all the important analyses that are possible with a given data set. To take a mundane example, election surveys in India often gauge the extent to which a candidate, political party, or prime ministerial candidate influenced a voter’s decision on election day. In any given election, the average response to this question represents just one (modestly interesting) data point. But, imagine comparing the change in average response across states and over time to tease out the relative influence of these factors. One researcher’s run-of-the-mill background question is another investigator’s goldmine.
Finally, and perhaps most significantly, providing public access to the research community increases the exposure and the importance of Indian electoral behaviour in the academic and policy discourse worldwide. As the world’s largest – and the developing world’s most enduring – democracy, India provides insights into electoral behaviour that are deeply beneficial to scholars working beyond India’s borders. As of now, poor data access for researchers places severe constraints on the process of knowledge aggregation. Notably many other key democracies – be it Brazil, Mexico, or the United States – have benefited from the availability of high quality data from election surveys which are carried out at consistent intervals. These are made publicly available with a lag, which grants researchers who invest in the survey design and implementation exclusive access for a limited duration, but then allows for public access after this window has expired. This model, with numerous variations across countries, provides incentives to the research team associated with the survey while simultaneously fulfilling the longer term objective of building the ‘theoretical and empirical foundations of national election outcomes’, to borrow a phrase from the American National Election Studies (ANES).12
Moving toward such an arrangement in India, we believe, will further improve standards of data quality and data analysis. Equally, such a shift would not only increase the quality of the analysis on India, but also bolster the relevance and visibility of Indian democratic behaviour in research and policy circles.
Election related survey research in India is ripe for growth in the coming years. As of late, new researchers and organizations have made initial forays into rigorous public opinion research. Thanks to decades of pioneering research undertaken by groups such as Lokniti, there is a strong foundation to build on. The challenge for the future involves harnessing the growing interest in Indian politics and voter behaviour and translating that interest into investments in high quality data sets that permit better measurement, more open norms of data sharing and analysis, and easier replicability. Taken together, we are confident that these investments will deepen and broaden our knowledge of the Indian voter.
* The authors would like to thank Aidan Milliff for comments on an earlier draft of this article.
1. For nearly five decades, the Lokniti Programme of the Centre for the Study of Developing Societies (CSDS) has been the standard-bearer when it comes to election survey research in India – and with good reason. Much of what we know about the behaviour of the Indian voter, her preoccupations, and preferences comes from the regular surveys the organization had carried out during state and national elections dating back to 1967. The analysis and suggestions here are not meant as a critique of the organization’s methods, but rather a set of basic principles that should guide all future election survey research in India.
2. The influence the precise wording of questions has on responses is usually referred to as a ‘framing effect’.
3. In simple terms, ‘social desirability bias’ refers to the proclivity of survey respondents to answers enumerators’ questions in a way that would show them in a good light. The upshot is that respondents tend to under-report ‘bad’ behaviour and over-report ‘good’ behaviour.
4. For an accessible account of the so-called replication crisis, see Jonah Lehrer, ‘The Truth Wears Off’, New Yorker, 13 December 2010 (accessed 8 June 2016).
5. Beyond word choice alone, there is an extensive literature examining the impact attributes of enumerators, such as race, gender, and ethnicity, on respondents’ answers. For a review, see Robert Rosenthal, ‘Experimenter Attributes as Determinants of Subjects’ Responses’, Journal of Projective Techniques and Personality Assessment 27(3), 1963, pp. 324-331.
6. In India, the poor regularly turn out to vote in greater numbers than the non-poor, precisely the opposite of what transpires in many long-standing democracies, such as the United States. Research has shown that this is because the poor perceive voting as an important civic duty. See Amit Ahuja and Pradeep Chhibber, ‘Why the Poor Vote in India: "If I Don’t Vote, I Am Dead to the State",’ Studies in Comparative International Development 47(4), December 2012, 389-410.
7. For more on this, see Rajeeva Karandikar, ‘Power and Limitations of Opinion Polls: My Experiences’, Hindu Centre for Public Policy, 2 April 2014, http://www.thehinducentre. com/verdict/commentary/article5739722.ece (accessed 6 June 2016).
8. Details about this survey can be found at: https://casi.sas.upenn.edu/lok-survey-social-attitudes-and-electoral-politics/lok-survey-social-attitudes-and-electoral (accessed 5 June 2016).
9. The so-called criminalization of politics is a trend that is visible in most parts of the country and has become deeply entrenched in India’s political economy. However, to date, there have been few attempts to systematically discern – using survey-based methods – why voters lend their support to candidates who face pending criminal cases, particularly those of a serious nature.
10. Milan Vaishnav, When Crime Pays: Money and Muscle in Indian Politics. HarperCollins India, New Delhi, 2017.
11. Details of this analysis can be found in Neelanjan Sircar and Milan Vaishnav, ‘Ignorant Voters or Credible Representatives? Why Voters Support Criminal Politicians in India’. Paper presented at the annual meeting of the Midwest Political Science Association, Chicago, 16-19 April 2015 (on file with authors).
12. More information about the ANES can be found at http://www.electionstudies.org/ (accessed 7 June 2016).