Dating data – What are the characteristics of dream government data?
7 April 2011
The lack of good quality government (GOI) data and the idiosyncrasies in whatever data that does exist is a recurring theme in office conversations and the blog during or just after the budget brief season. This year was no different: we sat around a round table and analysed eight centrally sponsored schemes (CSS) in the social sector, sifted through reams of data on government schemes till we knew government websites better than the government itself[1], checked, cross-checked, rechecked, re-cross-checked, and for good measure, re-rechecked and re-re-cross-checked our numbers, and despite (or perhaps because of) all these measures, we were often left holding our heads in our hands in abject dejection and despair.
Which brings us to the question – what exactly is good quality data? We deliberated on this for some time and came up with a few possible criteria which may be used to judge data quality. We will leave aside the question of authenticity of government data, as it’s nearly impossible to verify that and concentrate on other broader categories.
- Coverage: To evaluate the performance of a CSS during one year, we need data on how much money has been allocated, released and spent during that year. In addition, we need data on outputs and outcomes for that scheme. So one way of judging the quality of data for a CSS, would be to see how comprehensively it covers input, output and outcomes.
- Inputs: These consist of monies that flow into the scheme from any source. This money may be plan/non-plan allocations or grants from GOI or state governments, or discretionary grants from the central and state finance commissions. This concept may be simple enough to understand. However, we need to keep in mind that there are nuances to it – The entire allocation may not be sanctioned for release during the financial year; what is sanctioned for release may not be released and sometimes the entire release may not get spent! To take an example, `100 might be allocated, `90 might be sanctioned for release, `80 actually released (the remaining might be classified under strange categories such as “funds in transit” as in the case of MGNREGA) and only `50 might be actually spent during an FY. So finally, what should we treat as inputs? To make this a little more concrete (pun intended), let’s take the example of the Pradhan Mantri Gram Sadak Yojana (a CSS for the construction of rural roads). The value of proposals approved (this is similar to planned allocations) for the scheme in 2008-09 was Rs.22,027 crores , the release was Rs.8,660 crores, while the actual expenditure was Rs.2,84,913 crores!!! (Weird, right? How can expenditure be more than release?!) Arguably, we should consider the money spent as the pure input, but this data is often not available and even if available has a lag associated with it (we’ll come to this point later).
- Outputs: Now, money that is put in has to be used for something. (Of course some of it may get diverted to non-productive use but that is also not the focus of this article.) To continue with the PMGSY example, the output would be the number of roads built or the length of roads constructed during a period of time. For NRHM, the output would be the number of SCs/PHCs/CHCs constructed.
- Outcomes: in most cases, especially when we look at the social sector, outputs are not the end-all, we generally have broader goals in mind and CSS’s are simply tools to achieve these goals. Simply increasing the road length under the PMGSY does not serve any useful purpose unless the number of unconnected habitations does not decrease with time.
- Periodicity : Secondly, in order to see if there have been changes in the way that money is allocated/released/spent, or to analyse if there have been any improvements in outcomes, we need data that is published at regular intervals. While financial data (when available) is available and needed on an annual basis, social outcomes are generally slow-moving. So while allocations, release and expenditure may show variations from year to year, depending on government priorities, political will and changes in administrative machinery, outcomes like the number of unconnected habitations/IMR/MMR may take a while to show some improvement. Hence, while annual data may be needed for deconstructing inputs, a slightly lower periodicity may be tolerated for outcomes. However, intertemporal analysis becomes difficult and certain nuances may be difficult to capture if the gap between two observations is too large, so we need an optimal periodicity, depending on the type of variable we are interested in.
- Lags : In an ideal world, most data should be available in real time. Only thing is, it’s not. And sometimes, there are inadmissible delays in bringing data out into the public domain (eg. The Sarva Shiksha Abhiyan website had not been updated since September 2008 was finally updated in January 2011. Also, while we know the budgetary allocation for a scheme in 2011, we will only get to know the actual expenditure incurred in 2013. Now, in 2011, we did not find any data more recent than 2008-09 for certain schemes such as the Mid-day Meal Scheme. Incidentally, some departments have an MIS in which data is updated on a day-to day basis (Total Sanitation Campaign and PMGSY in our budget brief series). While this is really impressive, such a system is not without its quirks. For instance, when the MIS for PMGSY is updated, not only do the latest numbers change but previous FYs’ data also change! While this is justified for some variables (read the BB to know which ones, and why!), one would hardly expect the number of roads or the road length to decrease over a few days! We really couldn’t figure this one out – Anirvan’s theory, that aliens had developed an inordinate fondness for PMGSY asphalt was shot down cruelly (along with the spaceship, I might add). He has never been the same since.
It can be argued that with the advent of the Right to Information Act (RTI), our life as “seekers of data” should have become easier, but here too the process is not that straightforward. For instance, when we filed an RTI requesting for information for the state share under the Integrated Child Development Services (ICDS), we were informed that the state shares are only available with respective state government (please note that under this programme, state share accounts for as much as 50 percent of the total allocations, so how is the central government keeping a track on the performance of this scheme?!). Another anomaly, even when our RTI’s were successful (I.e. we got the information we asked for), it didn’t match the data available on the GOI website! (for more details, please see here)
This makes us believe that the issue isn’t just about the data is not being made publicly available at regular intervals or for different parameters, but that data is just not available even with GOI– a scary thought – how does GOI function without knowing how much money is being spent on schemes or how they are performing.
Now, this blog was initially meant to be a humorous rant on the strange things that a hunter-gatherer of government data encounters till better (or worse) judgement took over. So, to end on a lighter note, did you know that Anganwadi helpers are supposed to[2] cook and serve food to children and marchers? What marchers are, we can’t even begin to guess. Let us know if you do.
[1] This incidentally is no exaggeration: it so happened that we had requested some information for state share for NRHM (which is supposed to be 15 percent of the total allocation) during a meeting with the Health State Secretaries. We were however informed that the information is not available. Strangely, later we were able to get some of this information (albeit only for 2007 and 2008) through the State Plan Approval documents.
[2] Click here for a comprehensive list of roles and responsibilities of Anganwadi Workers and Anganwadi Helpers – definitely worth a read!