New ways of conducting field surveys: Computerised data collection and responsive survey design
In November, I went for a talk at NCAER on “Computerized data collection and the management of Survey Costs and Quality” by James Wagner and Nicole Kirgis from the University of Michigan. The abstract of the talk stated that it would cover topics like responsive survey design, survey biases and ‘paradata’. Now, usually, I am quite wary of talks where I don’t understand 50% of the abstract. However, this talk turned out to be quite interesting and useful.
As most of you know, a lot of the PAISA work that AI does involves extensive surveys of schools in our PAISA districts. To carry out such large scale surveys, we mobilize a team of 35-50 local volunteers who visit around 140-150 schools in each district. This process involves a number of monitoring and rechecking exercises at various levels to ensure that data collection is of the highest quality. What I learnt from the talk was that responsive survey design and ‘paradata’ can help in ensuring that this aim is achieved more efficiently.
So what exactly is responsive survey design?
A responsive survey design pre-identifies a set of design features which can affect survey costs and statistics, monitors them through the process of data collection and makes changes to features of the survey if required. The survey administrator is able to respond to the data being collected while the survey is being carried out thereby ensuring that mistakes are being rectified almost simultaneously. For example, if we are doing a survey of 100 individuals between the age of 15-50 and out of this 10 people are in the age group of 15-20. However, when we conduct the survey, only 5 of these people consent to do the survey. The results of this data, would thus, suffer from a non-response bias because of a higher non-response in a specific category, which would lead to biased estimates. Similar problems could arise for specific questions as well, for example if there is a question about maternal health, certain sections of the society may not be comfortable responding to them. In a standard survey design, the survey would first have to be completed, compiled, data entered and then analysed before the administrators would see such trends emerging, which would make responding to these problems difficult. To overcome these issues, survey administrators can employ a responsive survey design through computerized collection of data. This design would allow the administrators to skip the compilation and data entry stage, and start analyzing the data straightaway. The main survey team can then monitor the process from a distance and check if there are certain sections which are not responding. If required, the surveyors can be instructed to conduct more follow-ups with such groups and try and correct this problem.
Paradata, which is the administrative data about the survey such as the time taken to survey, number of visits required to complete the survey etc., can be very useful at this stage. When we use a computerized form of data collection, we can automatically monitor the surveyors on various parameters like how many times did the surveyors follow-up with the respondents? How much time did they spend on a survey? Whether they had to go back to an earlier question while administering the survey etc. Thus, we can actually check if the surveyors are making that extra effort towards the sub sample where non response is higher. Softwares such as SurveyTrak are easy to use for this purpose and they automatically generate a lot of useful paradata for the survey administrators. These softwares also allow us to record how the surveyors are introducing themselves and asking questions. This can be very handy during training as we can identify volunteers who need more support.
Along with reducing survey biases, this design can cut down on the cost of transporting the survey tools and getting the data entered. This method would further allow a centralized monitoring of the survey with the survey data and the paradata being generated in real time. Furthermore, since this process does not have to go through a data entry phase, the analysis can start almost simultaneously with the data collection. This would allow analysts to notice certain trends while the survey is still in the field and conduct any follow-up/corrections on this, if required. Finally, it allows surveyors to communicate directly with the team and leave comments which can be useful during the analysis.
However, there are some limitations to this. Firstly, the volunteers would have to be equipped with either laptops or other mobile devices to carry out the survey which would result in increased costs. Secondly, training volunteers to use this technology may also require a longer time and monetary investment. Thirdly, the low penetration of internet facilities in India would slow down the process as there would be a time lag between collection and upload of data. Finally, replicating this model in a national survey in India could be difficult as the software would have to be available in multiple languages, which may increase the costs significantly.
Any organization looking to take up such survey models will have to consider these factors and ascertain which cost model works best for them. The total sample size and the length of the survey would be the most important factors while deciding whether this investment is viable. However, looking at the benefits involved, any survey design should definitely consider this approach before proceeding.
 Such a design is currently being used in the National Survey of Family Growth in USA. For more details check out Wagner et al, 2012, “Use of Paradata in a Responsive Design Framework to Manage a Field Data Collection”, available at http://www.jos.nu/Articles/abstract.asp?article=284477
 For more such applications and a stronger theoretical framework for this survey design check out- Groves, Robert M., and Steven Heeringa. 2006. "Responsive design for household surveys: tools for actively controlling survey errors and costs." Available at www.isr.umich.edu/src/smp/Electronic%20Copies/127.pdf