11 May 2017 | Justin Ginnetti

The future is here, ahead of schedule: detecting incidents of displacement through machine learning and natural language processing

The head of our Data & Analysis department, Justin Ginnetti, shares his initial reflections and takeaways on the IDETECT challenge which ended in April. IDETECT enabled IDMC to leverage the efforts of brilliant, talented data scientists from around the world. Also, it called attention to the issue of internal displacement and introduced our work to a community that we had not previously engaged with.


Girls attend class in a classroom at a school in Khost Province, Pakistan.
Photo: Andrew Quilty / Oculi for Norwegian Refugee Council

One afternoon in November 2016, Leonardo Milano and I sat down to map out a plan to develop a semi-automated tool to detect incidents of internal displacement through the analysis of “Big Data” researched online. The tool would sift through databases like GDELT, which contain most of the news published around the world in nearly every language, and find and extract information and data on internal displacement – in near real time. We left the meeting with a solid plan outlining a process to develop a viable working tool ready for implementation by late 2018 or early 2019. 

Despite this, we didn’t go home satisfied that night. There were a couple of reasons for this. First, we both knew how important it is to IDMC to be able to identify new incidents of displacement: if we don’t know when internal displacement started, we cannot indicate its duration; if we cannot accurately measure new displacement flows it is that much more difficult to develop quantitative empirical evidence about the factors that drive new displacement. In short, IDMC estimates of the number of new displacements are often overshadowed by our global IDP headcount, and this is wrong.

The availability of quantitative empirical evidence on incidents of new, repeated and secondary displacement is critical for those working on disaster risk reduction, climate change adaptation and development, because it explains what factors need to be addressed – by whom and when – in order to reduce the risk of future displacement. This evidence is also important for humanitarian actors because it can help with disaster preparedness (e.g. knowing how many people have just become displaced or who are likely to continue across borders) and response (e.g. building resilience to future shocks).

However, our plan was going to take too long to implement: evidence about the drivers of new, secondary and repeated displacement is needed now, and its absence is felt across many global and regional policy fora. Addressing the risk of new displacement caused by climate change is part of the Paris Agreement on climate change; it is also needed to determine the factors that explain when and why some IDPs are forced to flee onward across international borders, and in order to begin to understand how many people are in situations of long-term, protracted displacement.

 A new way of working

The second reason we went home dissatisfied that night was because we considered our plan to be a bit old fashioned, considering that we were trying to apply a new technology to internal displacement. The following morning, I suggested to Leonardo that for this project we try a new way of working rather than implement the plan with IDMC staff, consultants and interns. The idea was to crowdsource not just the solution but the way to develop it. By lunchtime, Leonardo had discovered the UN’s Unite Ideas platform for crowdsourcing data analysis and visualisation tools. Ten minutes later, I’d written to Unite Ideas, and by the time we went home we’d already received a positive response and an invitation to a conference call.

Unite Ideas represents the UN at its best. It’s a vehicle for enlisting talented individuals to develop innovative solutions that help solve global problems. The genius of this platform derives from its ability to harness talent and cutting edge techniques and to organise them around a specific and clearly articulated ‘challenge’. Unite Ideas is a newly conceived UN for the 21st century: agile, democratic − a way to merge top-down and bottom-up approaches at lightning speed.

Proof: within a week, Unite Ideas had green-lighted the idea for our #IDETECT challenge to detect, tag and cluster reports of incidents of displacement and to extract critical information from the source documents, such as the number of people displaced and their location. By the end of January, we’d launched the challenge. I write this in May, having begun to review the first batch of submissions received.

Initial reflections and takeaways

Not only have we progressed much faster than originally planned in the development of a tool to detect new displacement, but by working in this manner we have also benefitted in several other ways.

First, we’ve met and engaged with more than 100 talented data scientists, many of whom had not worked on the issue of internal displacement before. Some weren’t aware of this issue, while others were but didn’t know how or with whom to engage. One team competing in the challenge, Data for Democracy, even organised well-publicised hackathons to help build the code for their solution. The discussions we’ve had and the people we’ve met since November are just the start of what we hope will become several lasting partnerships and collaborations with these new colleagues.

Second, partnering with Unite Ideas meant crowdsourcing both the solution and the way to develop it. Although we had our own ideas about how to produce the tool, it was up to the Unite Ideas participants to come up with their solution. Thus, instead of having a single possible solution developed one way, we’d receive several working submissions, each potentially unique. Working in this manner meant trusting the crowdsourcing approach, relinquishing a degree of control, and coming out of our comfort zone. In our case, that small leap of faith has been rewarded.  

The #IDETECT challenge with Unite Ideas has been a big learning experience, both for IDMC and others involved. This learning has taken many forms, but one of the most important lessons is that while IDMC remains part of a large humanitarian NGO – the Norwegian Refugee Council – it can also operate like a small start-up in Kampala, Jakarta, Bangalore, Silicon Valley or elsewhere. 

What’s next?

At this point, we would like to sincerely thank everyone who has submitted an entry or contributed to one by working as part of a larger team. In June, IDMC and the rest of the judging panel will select and announce the winner of the challenge. We’ll post a follow-up blog by the challenge winner in which they describe their tool and how IDMC will be able to use it. We’re also planning to organise an event where the winning entry will be presented and demonstrated.

As excited as we are to have come this far this fast, we’re not done yet. So stay tuned. 

Previous: Painting a clearer picture with ‘flawed’ and ‘messy’ data