We want your
feedback

Before Data Analysis Begins…

accountability

26 September 2011

Finally, the PAISA District surveys are over! The Jalpaigudi survey has just been completed; the filled, checked and rechecked questionnaires have now reached us. It took us 4 months to cover 140-145 schools across 9 PAISA districts[1]. Quite an exciting and eventful few months, I must say! It gives us a great sense of achievement that within a period of 5-6 months, we took a huge leap from conducting small pilots in one block of our PAISA districts to surveying 140-145 schools spread across the entire district. We are now in the process of analysing the data.

I have observed that researchers who work with secondary data often don’t realize the extent of effort that go behind creating data sets which are being used to perform tabulations and regressions. A neat looking data file is a culmination of a long process which can stretch literally to a few months to couple of years, depending on the nature and scale of data collection. Our experience bears this out.

The first step was to develop a questionnaire which is easy to understand and fill, and at the same time, capable of collecting information at the level of detail we needed. It involved thinking carefully about PAISA questions, arguing over how many questions to include, which questions to include and how to phrase them, extensive piloting in the areas where the survey would happen, incorporating the feedback from these pilots, and not to mention, endless tinkering with questionnaires to get it just right[2].

The next step was to figure out when the surveys were to be conducted. As Anirvan has rightly pointed out in a previous blog, it’s not as easy as it sounds[3]. Then comes the hard part, which tests a PAISA Associate’s (PA) mettle- that of mobilizing enough volunteers capable of conducting the survey properly[4]. Given the scale of the exercise, this was very crucial. We tried different models in different places – in Nalanda, Purnia (Bihar) and Satara (Maharashtra), we had relatively more volunteers and hence less schools per volunteer. In Udaipur (Rajasthan), we had few volunteers and thus, more schools per volunteer. Both methods have their pros and cons. The ‘more volunteers’ method is useful when the survey questionnaire is not very complicated and the window of time to complete the survey process is quite narrow. But training and coordinating a large number of volunteers is not an easy task. On the other hand, when there are fewer surveyors, they can be trained thoroughly, can be monitored rigorously and coordinated easily. This is especially useful when the survey questionnaires are complex, and schools need to be visited more than once.

We also made sure that there was extensive monitoring and rechecking at various levels to ensure accurate and truthful data collection. The Master Trainers (MTs) and the PAs made surprise visits to some schools where the survey was being conducted to ensure that the surveyors were indeed there and were collecting data the way they are supposed to[5]. Just to give an example, one of our PAs found that the surveyors were filling student attendance column based on what the HM was saying, rather than conducting a headcount. The mistake corrected in time. In addition to spot-checking, the PAs and MTs visited a sample of schools after the survey to cross-check the information that was collected. When the questionnaires were submitted, the MTs and the PAs went through each one of them carefully to spot errors, missing fields if any. The surveyors were also asked a few questions to make sure that they had indeed been to the school. It did not stop here. Some of the head-masters of the schools were contacted again via telephone, and were asked if the surveyors had visited the school. They were also asked a few questions from the questionnaire and their answers were cross-checked against the submitted questionnaires.

Once all the questionnaires were submitted, the PAs prepared a ‘master file’ for each district, which had an exhaustive listing of all the nomenclatures for a particular entity. For example, a school management committee could be mentioned as SMC, Parent Teacher Association (PTA), Vidyalay Shiksha Samiti or some other local name. The master file has all such possible combinations. This makes data analysis a tad easier. This was the learning from the pilots we conducted in December 2010.

Once the master file was prepared, the questionnaires came to Delhi office where another round of rechecking took place. Only then were the questionnaires sent for data entry.

Data entry can’t begin without telling the data entry firm how to enter the data and that requires defining data structures and creating a code book. A codebook specifies variable name(s) corresponding to each and every question in the questionnaire, the format in which it is to be entered, and gives codes in cases where the answer can be one of the multiple options. We have created a ‘flat’ data format and defined around 2000 variables. The code book, the master file and the questionnaires were then handed to the data entry firm.

The data entry firm scans all the questionnaires before entering the data. We have opted for ‘double’ entry i.e. the firm employs two separate teams to enter the data simultaneously. Once both teams finish data entry, the data sets are compared. If there are any inconsistencies, the original questionnaire is checked and data set corrected accordingly. The firm then hands over the data set to us.

But wait….that’s not the end of the story. We then check the data set for any mistakes in data entry. And this is where the scanned formats come in handy. After 4-5 days of intensive data ‘cleaning’, the data is finally ready for basic statistical analysis, which is the stage we are at now. The analysis itself can make for another blog-post. But that will have to wait.


[1] PAISA Districts are the districts where the PAISA project is being implemented. They are Kangra (Himachal Pradesh), Jalpaiguri (West Bengal), Nalanda & Purnea (Bihar), Jaipur and Udaipur (Rajasthan), Sagar (Madhya Pradesh), Satara (Maharashtra), and Medak and Hyderabad (Andhra Pradesh). We did not conduct field survey in Hyderabad.

[2] PAISA questions are- a) do schools get their money?, b) do schools get their entire entitlement, c) when do they get money?, d) do they spend it?, and e) what do they spend money on? The answers to these questions throw a light on problems in fund flow processes.

[3] http://www.accountabilityindia.in/accountabilityblog/2323-pot-luck-field

[4] PAISA Associates (PA) work at the district level- one PA for each district.

[5] The PAs trained the Master trainers (MTs), who in turn trained the surveyors in the survey questionnaire.

Add new comment

Your email address will not be published. Required fields are marked *