Data for Good: Fighting COVID-19 with Data Science
This is the first of two blog posts about our recent participation in the Pandemic Response Hackathon. Our project (CoronaRank) was one of only 5 projects out of 230 submissions chosen to present at the closing ceremony. For the technical details of our CoronaRank solution (Markov Chains, R, Shiny, and how to quickly manipulate a dataset of >100GB) see the follow-up article.
The COVID-19 pandemic is putting an unprecedented strain on communities, healthcare systems, and the economy. Much of the effort towards containing the spread of the virus remains with taking individual responsibility for the benefit of the wider community. Various governmental agencies and international organizations are putting policies in place aimed at containing the pandemic and maximizing the efficiency of healthcare service delivery.
What can a data science company do to assist these efforts?
Our Data for Good initiative aims at bridging the gap between tech expertise and those in need of such support who are at the forefront of the fight for a sustainable future of our planet. Committed to this vision, we set out to contribute our data science skills to a project which could reduce the impact of the COVID-19 pandemic.
We recently got this chance during a hackathon centered around finding solutions for the global pandemic. During the hackathon, we developed CoronaRank – an algorithm that provides users with a personal coronavirus risk score and generates heat maps of risky areas.
Pandemic Response Hackathon
Devpost is a platform that provides the tech community with an opportunity to contribute to overcoming various global challenges. Their recent Pandemic Response Hackathon asked the participants to develop technologies to solve what appears to be the greatest public health challenge in decades.
The hackathon launched on the 27th of March. Over the course of the next three days more than 2,000 participants got involved and submitted upwards of 230 projects across four tracks:
- Public Health and Information Sharing
- Epidemiology & Science of the Disease
- Keeping our Health Workers Safe
- Second-Order Societal Impacts
30 different organizations committed resources including cloud computing from Amazon AWS, visualization tools from Mapbox, datasets from Veraset, and many others.
We entered the hackathon in collaboration with Ewa Knitter, an infectious disease epidemiologist who kindly offered to support our efforts.
Problems we set out to tackle
After initial discussions, we identified a number of problems particularly compelling in the current outbreak and we realized that they can be addressed using geolocation data. Specifically:
- COVID-19 tests are a limited resource, and there’s not an obvious way to decide who should be tested.
- Since few tests are being done, and partly because many infected people are asymptomatic, it’s difficult to know which people and areas to avoid.
- Supply chain management in the healthcare sector is going to be extremely difficult moving forward and policymakers need information on the current potential hotspots where an outbreak might be imminent.
- Many young healthy people are ignoring social distancing guidance on the basis that they have a low personal risk. We need a way to illustrate how breaking isolation can affect communities.
To address these problems, we decided to create heat maps of pandemic hotspots with high human interaction. Such heat maps would give public officials an idea of the locations for the next potential outbreak and provide the users with information about the risk of non-compliance with public health measures.
To achieve this we took inspiration from Google’s PageRank algorithm, which ranks web pages based partly on their interactions and connections with other popular web pages. We replicated this methodology in epidemiology with Markov Chain modeling. The resulting CoronaRank is an algorithm that uses geolocation data, epidemiology data, self-reporting, and Markov Chain modeling to assess the likelihood of coronavirus exposure.
To create and implement CoronaRank we made use of the Veraset database for New York. Veraset provides anonymized phone geolocation data giving each individual a unique identifier.
The challenge was to analyze this large dataset (over 100GB of data per day) in a limited timeframe. However, building on our previous experience with Big Data, we were able to quickly develop the algorithm. We went on to embed it within a web application — Community Shield — designed for use on smartphones, which displays pandemic hotspots – areas with high activity in a recent period, as well as give the user a risk score depending on how many interactions they had in these hotspot areas.
An individual’s CoronaRank is the likelihood that they may be infected with COVID-19. Confirmed cases are assigned a CoronaRank of 1. Non-confirmed persons are assigned a CoronaRank of 0<x<1 based on the interactions or possible interactions with others based on geolocation data from the past two weeks obtained from phones.
The more you travel to risky places, the higher your CoronaRank. The more high-rank people visit a place, the riskier it becomes.
You can test out the demo of the CoronaRank app. For now, it includes three predefined risk profiles to showcase the app’s capabilities.
Our plans for the future
We plan to develop the CoronaRank algorithm further by including a self-reporting feature. This way, the user can anonymously provide information about their COVID-related symptoms (if any). This will affect their CoronaRank and by extension that of all other people they met in the recent weeks. This would be very valuable to public health organizations that do not have the capacity to screen and test each citizen.
We also aim to integrate Google Takeout to import personal location data into the app to make it fully user-specific and improve the UI.
We hope to partner with governmental and international institutions to get endorsement for the app and deliver it to the public. A long-term collaboration would help to turn the app into a comprehensive tool to educate individuals and drive informed healthcare delivery policy for public institutions. To make this a reality we need to obtain cloud resources to make this app available at scale. Please don’t hesitate to reach out if you would like to provide resources or collaborate with us on the project.