Creating a splash with Data Diving

Over a July weekend in London four charities and more than 80 data professionals took part in a “DataDive”, organized by DataKind UK. Ricardo, Richard and Simone from Oxfam’s Research Team (see pic of handsome hunks below) went along. Here’s what happened.

If you came to London for a weekend during the best summer since 1976, how would you like to spend your time? Inside, hunched over a laptop crunching numbers, that’s how.

At the end of a glorious (weather-wise) July, we joined a large number of volunteers and data scientists for a ‘DataDive’. We came armed with access to a couple of relevant datasets, a specific problem we wanted some help with, and an acute knowledge of the limitations of our own abilities. To invoke the spirit of Donald Rumsfeld, we knew what we didn’t know, but we also knew there were things that, as number crunchers, we didn’t know we didn’t know. The event seemed like the place to find them. The sheer collection of advanced numerical skills gathered in the DataDive would give us insights we didn’t know we could have. We were lucky that our application to participate was successful.

These gatherings are a bit like speed-dating, and a bit like the reality show The Bachelor. A group of bright, successful and motivated organizers (some of them coming from the US for the event) put together a venue (and pizza, and beer) to match needs (us) and skills (volunteering number crunchers). Data scientists with very different backgrounds and abilities devote their spare time to work on something different, and hopefully meaningful, from their everyday work. Charities and institutions (in this case HelpAge International, Hampshire County Council, CVAT and Oxfam) get exposed to new innovative ways of working and try to solve very concrete problems that can support their work.

DataDives are getting popular. In January DataKind (the organization behind the event in London) and UN Global Pulse organized ‘Data science at the United Nations’ in New York. In February the World Bank and UNDP held an event on ‘Measuring Poverty’ in Vienna, followed by another one in March in Washington DC.

Each organization pitched their questions to the assembled geeks (hence the speed dating analogy). Our pitch to attract people to our problem was rather simple: using a couple of databases on monthly local food prices, we wanted to have a better understanding of the patterns that underlie their movements (long-term creeping trends? Recurring seasonal changes? Random noise? All three?). This curiosity is partly borne out of our qualitative work on food price volatility, but also a need to better understand how and why local food prices are changing over time. Our initial questions related to the trends for local prices, their correlation with other indicators such as cost of fuel and rainfall, and how we could we identify and track sentiments on local food price movements from the web and social media.

The main two datasets we proffered were from similar sources: the FAO Global Information and Early Warning System (GIEWS) and FEWS NET’s monthly food price monitor. Both datasets include monthly price data for staple food commodities in a variety of local markets in developing countries.

Out pitch must have been geek catnip, because about 30 volunteers chose to spend their time working with us and were split into groups. The first group produced a trend disaggregation analysis for different products and countries, and came up with a price prediction model – including the documentation to replicate the analysis using R, an open-source statistical software package. The same group also created a useful interface tool for visualising food price trends for different commodities and time periods in different countries. Probably the most appealing and innovative piece of analysis was on rainfall and food prices. The creators modelled the timing of rainfall throughout the year and estimated peak precipitation deviations from the expected day of maximal rain, linking these to food price time series. Finally the team working on signals from social media and the web prepared several useful ways to extrapolate data from the web, Twitter and Google Trends.

Come on in, the data's lovely....

This was just a first date, and these preliminary explorations will be further prodded, interrogated, and developed (hopefully with the help of some of the July volunteer data crunchers). The DataDive opened new doors that we would like to explore together with the volunteers. We know much more than we did that sultry Friday riding the train from Oxford to London. And more importantly, we have an explicit workplan to build on the initial, frenetic, caffeine-fuelled effort. We plan to follow up by:

1. Adding a local price visualisation tool to the website for our existing research work on the impacts of food price volatility

2. Looking at the linkages between variations in seasonal rainfall patterns, local prices and harvest volumes.

3. Merging the information on local markets from sites like M-Farm with web-based measures of “consumer sentiment”

“Matching” is one of those words that economists use to understand how mutually beneficial relationships form. The DataDive was a fantastic practical matching exercise. Like all plunges (or first dates) it was a bit of a leap into the unknown, but we were handsomely rewarded with dozens of people volunteering their (very expensive) time to help us understand a problem so central to our work. We are very grateful to the organizers, data ambassadors and volunteers who worked with us during and leading up to those three days. All in all, a great Speedo-free weekend’s diving and dating.

If you want to read more, The Economist and the Guardian had good write ups.

Comments