What promise does machine learning hold for global development?
Surgo’s Machine Learning Initiative for Precision Public Health
Tremendous effort goes into improving global health and development outcomes by scaling up specific interventions, but little is known about the causal chain of factors that drive those outcomes. And while the volume of development data has been growing exponentially, disparate datasets make it difficult to connect the dots and construct a systemic view of all the factors affecting health and development outcomes.
One example is infant mortality. Existing data and traditional statistical analyses help to uncover some of the main causes, such as prematurity, neonatal sepsis, and birth asphyxia. We also know some of the associated risk factors, such as the mother’s failure to take iron and folic acid supplements, giving birth at home or at a health facility, or lack of breastfeeding. But what is the causal chain of factors that drive these development outcomes? How are these factors linked to each other? What are the most important behaviors to focus on, and what are the most crucial system improvements to be made?
Machine learning has the potential to be transformative in its ability to help us look at the whole system, uncover patterns in the data, and develop “generative” models of development outcomes. Causal machine learning, in particular, can provide a full-system view of the chains of factors leading to particular outcomes. This approach has been applied in other sectors such as bioinformatics, but at Surgo Foundation we want to know whether it is applicable to the development sector, and whether it can be applied to linked but disparate datasets.
With the Surgo Machine Learning Initiative for Precision Public Health (ML4PxP) we aim to answer two fundamental questions. First, can we put machine learning to work on the kinds of development datasets that are currently available? And second, does causal machine learning yield insights that complement traditional statistical methods?
Working with partners from academia and industry, we are applying machine learning approaches to data on various development outcomes from Uttar Pradesh state in India. Our partners are trying different kinds of causal machine learning methods to see what combinations best suit the available data. And they are creating and analyzing causal maps to see how these perform compared with traditional statistical analysis.
Our progress so far indicates that unlike the highly controlled settings in which many machine learning tools are developed, real-world datasets have complex characteristics that are challenging for machine learning, and disparate datasets cannot always easily be combined. Moreover, a wide spectrum of traditional statistical tools and machine learning tools must be considered in tandem to guide the development sector’s approach for these new breeds of analyses. Surgo’s own large-scale data collection efforts may help us to identify and address some of these issues. The proof-of-concept will allow us to move forward in applying causal machine learning to improve the design and delivery of public-health and development problems at scale.