We want your
feedback

The Data Explosion: Big Data and Development

accountability

26 December 2012

Velocity of data. Quantity of data. Big Data. These were some of the terms that caught my attention after reading Laina’s blog on Data Governance. I was particularly surprised by the figure that showed the varied skills that a data scientist is supposed to possess. I wondered what it is about the new kinds of data that are being created that make them so complex and difficult to analyse? While exploring this question, I was exposed to a variety of literature regarding the new types of data and in particular Big data. In fact if you look at the Google trends image below, you can see that the term Big Data has captured the popular imagination in the past couple of years1. However, till recently most of the applications of Big data have remained within the business realm. Only now are Governments beginning to see the possibilities Big Data could hold for policy making. In this blog, I will try and summarise what big data is and how it can make policies more effective.

Defining Big Data seems to be quite a tough job with different authors explaining the term in different ways2, however there seems to be some agreement that Big Data is a large volume of structured and unstructured data that is so large that it’s difficult to process using traditional database and software techniques3. It is data that is coming in at a great velocity, great variety and great volume. Data from cell phone GPS, data from climate sensors, data from social media are all being generated every second and constitute large datasets which come in without large time lags. Most of this data is being created passively by the data bearers i.e. they are not actively responding to a questionnaire to generate this data.

The main advantage this type of data offers is real time information, which can greatly reduce the time lag between the accumulation and analysis of data, and subsequent responses. The variety of data also allows for the creation of a broader profile of the trends in the society i.e. you have many more variables to analyse, this can allow you to create a better behavioural profile of the people in the area being analysed.

Big data has been used by companies like Google and Amazon to better understand the behaviour of their users and to use this information to optimise their websites and their sales4. The potential of this data is being recognised by governments as well5, and many researchers are beginning to show its uses in pre-empting possible outbreaks of diseases6, estimating GDP in real time7, even calculating where a person might be located at any point in time based on their past movements which were gathered through the GPS on their phones8.

So how does this type of data help us in policy making? Firstly, we can make much more informed decisions. If the government has to make a decision about when to implement a public health intervention in a village, for example, knowing the movements of the villagers and where they are at any point in time could be very useful. The government could use this data to identify the time at which most people are away in the fields and the time they are at home. This would help the government design a policy which is more targeted. Other issues like how people cope with shocks like inflation, natural disasters, unemployment etc. can also be understood much better using this data.

Big data could also help in assessing the impact of policy interventions, for example once the government has implemented the public health initiative, its impact could be seen in a number of ways- we could see reduced visits to pharmacies and hospitals, which could be checked with GPS information. Because of thereduced expenditure to households, more people might start saving which in turn could be represented in mobile based banking databases, collected in real time. There could be consequent increases in purchases of livestock and more investment in agricultural activities. However, if we see that visits to pharmacies remain the same or if there are increases in expenditure on medicines, we can analyse why the policy is not having the desired effect. Since this impact can be gauged much quicker, an adequate response can be implemented without much time lag. The point is that the impact could manifest across multiple databases which are being created passively. These databases, by design, have fewer missing values and they save a lot of input costs which would have been required to collect such data. Thus, an impact assessment can focus purely on the analysis aspect without worrying about the collection or quality of data.

However, there are a number of problems with using this data extensively. The primary issue is privacy; individuals need to have the right to control the information about them. Appropriate safety measures need to be in place for a well functioning system. These measures could be in the form anonymising datasets and strict controls on release of data.

Access and sharing are other issues that create bottlenecks to using big data. . Much of this data is held by private entities who may not be willing to release information as it could fall into the hands of competitors. Some of this data may not be collated in a usable format by the companies and a researcher may have to spend a large amount of time organising this data. Global Pulse, the UN body working in the field of Big Data, is trying to put forth the concept of “Data Philanthropy” to encourage companies to donate their datasets.

Moreover, even if this data is available, the technical capacity required to use this dataset requires large investments and knowledge. Possibly, the private companies which have already made these investments could be used to structure this data and allow it to be interpreted, which brings us to another issue which is interpretation of data. With so many different data sources, the person studying the data must know what he/she is looking for and in which different datasets this could show up.

Despite all of these challenges, big data could possibly be very useful with its early awareness, real time data and lowered feedback gap. As Global Pulse states “the promise of Big Data for Development is, and will be, best fulfilled when its limitations, biases, and ultimately features, are adequately understood and taken into account when interpreting the data.”

1 http://www.google.co.in/trends/explore#q=big%20data; in fact google trends itself is a form of Big Data.

2 The following links are all interesting articles about Big data and its uses- http://blogs.starcio.com/2012/12/what-is-big-data-real-challenges-beyond.html, http://radar.oreilly.com/2012/01/what-is-big-data.html?cmp=ba-conf-st12-twitter-promo, http://www.weforum.org/reports/big-data-big-impact-new-possibilities-international-development

3 Most blogs and articles define Big data is this way, this definition is taken from http://www.unglobalpulse.org/sites/default/files/BigDataforDevelopment-GlobalPulseMay2012.pdf

4 Many companies have been using big data to improve and predict sales, there are many articles which talk about this trend and how the companies are using Big data analytics http://gigaom.com/2012/03/22/synthesizing-insights-and-capitalizing-on-consumers-digital-signals-structure-data-2012/

McKinsey published a report this year on how Big data is the next frontier for innovation- http://www.mckinsey.com/insights/mgi/research/technology_and_innovation/big_data_the

_next_frontier_for_innovation

Harvard Business review talks about big data as a management revolution and how users like Amazon embraced and utilized its potential – http://hbr.org/2012/10/big-data-the-management-revolution/ar/1

5 http://www.zdnet.com/blog/btl/u-s-government-commits-big-r-and-d-money-to-big-data/72760 & http://www.policyexchange.org.uk/images/publications/the%20big%20data%20opportunity.pdf

6 Paul, M.J. and M. Dredze. You Are What You Tweet: Analyzing Twitter for Public Health. Rep. Center for Language and Speech Processing at Johns Hopkins University, 2011. <http://www.cs.jhu.edu/%7Empaul/files/2011.icwsm.twitter_health.pdf>

7 Helbing and Balietti. “From Social Data Mining to Forecasting Socio-Economic Crisis.” As quoted in “Big Data for Development”, United Nations Global Pulse

8 “When there is no such thing as too much information” http://www.nytimes.com/2011/04/24/business/24unboxed.html?_r=1&src=tptw

Add new comment

Your email address will not be published. Required fields are marked *