Using Big Data To Prevent Pandemics

Amid the COVID-19 crisis, with 2 million cases of infection in the United States and 110,000 Americans killed by the disease, it is important to consider the role of big data in the virus’ spread and how data analysis can be used to prevent further infectious disease outbreaks. According to an Economist Intelligence Unit survey, 73% of data scientists and professionals in public, private, and non-government organization sectors believe that data analytics will help to prevent pandemics, so although it clearly did not prevent this one, there is still reason to believe it can be a useful tool in the future. Researchers have known for years that analyzing and acting on Internet data, mobile phone logs, and aggregated reports of symptoms can help to slow the spread of infectious diseases. When relatively few people are infected by a virus, it can be tremendously helpful to first detect where people are infected and where these people are moving, then enact policies specifically targeted at hotspots of the disease. This can ensure that the virus is contained before it ever has a chance to be exposed to most of the country. And because standard methods of reporting case data are very slow, with lags of two weeks or more, many people have begun experimenting with the big data strategies mentioned above. Researchers have utilized these methods in response to the outbreaks of Ebola, Zika, dengue fever, and H1N1 in other countries, and the flu in the United States, all with varying degrees of success, but never to the scale that is necessary in dealing with COVID-19. So while American medical researchers’ technology could have done relatively well at mitigating the outbreak of coronavirus, it is also critical that we understand what aspects of big data analysis could be improved on for the next time the world faces a threat from an infectious disease (potentially including the second wave of coronavirus).

Currently, there is a vast array of platforms that can be used for monitoring the spread of diseases, including many that are not specifically designed for that purpose. For example, Google and Twitter have been vital resources to scientists tracking the spread of Zika in Latin America, dengue fever in Brazil, and the flu in the United States. Researchers track how often people search for (or include in their tweets) terms related to these illnesses, such as the diseases’ treatments or common symptoms. Using this data, which is anonymized, they can see the rises and falls of the number of people in a region who are likely to be infected. Although this method is not perfect, people do tend to be much more interested in terms related to a disease when they are personally affected by it, whether they or one of their relatives is suffering from symptoms. Google Flu Trends, started in November 2008, used this method to track seasonal flu cases in the U.S., monitoring increases or decreases in flu-related terms appearing in searches. The U.S. Centers for Disease Control and Prevention (CDC) reports data of flu cases two weeks behind real-time, but Google Flu Trends was able to accurately report the incidence of flu much faster: their data had a lag of only about one day. This program was successful for a few years, essentially predicting the spread of flu in the United States and showing when spikes were occurring, but it fell apart after about 5 years because of issues such as its troubles detecting different strains of flu such as H1N1. There was somewhat large media attention given to that virus, which caused many people who had no connection to the disease to search for it, in turn biasing the Google Flu Trends algorithm. So although using social media or search data has shown potential to aid the government in predicting the spread of infectious diseases, it cannot be the only method that they use to gather information.

A related method, but one that has proven more effective, is aggregating survey data to draw conclusions about the spread of diseases. Apps such as Flu Near You prompt their users to report symptoms of infection and the user’s demographic information, giving researchers an idea of where people are getting infected and if particular groups of people are experiencing symptoms at different levels. In Europe, 10 countries participate in using a similar app called Influenzanet, which has 30 to 40 thousand users at the moment. This gives governments information on a more detailed level, aiding their efforts to allocate resources in the correct places. A slightly more advanced tool managed by the International Service for Infectious Diseases, called ProMED-mail, has 70,000 users, most of whom are public health experts. The service lets the experts share reports of diseases and moderates these reports, giving people a central source for finding information about the spread of a particular virus. And ProMED-mail does not only include reports on humans – it also contains information about infectious diseases in livestock and wildlife. 

This database has already shown its effectiveness, forcing the Saudi Arabian and Chinese governments into action in dealing with Middle East Respiratory Syndrome (MERS) and Severe Acute Respiratory Syndrome (SARS), respectively. But similarly to the previous section, although using survey data is a promising strategy to monitor viral outbreaks, it is not enough on its own.

              A separate tactic that has great potential to help, but may also be more invasive, is using mobile phone data to track population movement. This is something that many companies and governments have cooperated on in the past to figure out where people are moving and where people are effectively quarantining. If authorities know where people are staying home and where large numbers are moving to, they will be able to more efficiently target policies and restrictions regarding the spread of infectious diseases. For instance, in Haiti, Senegal, Mexico, and China, telecommunications companies have given anonymized voice, text, and location data to those countries’ governments and health services in order for them to understand the population movements and mitigate the spread of cholera, Ebola, H1N1, and coronavirus, respectively. While a survey of Americans showed a majority of both Republicans and Democrats would be willing to share phone location data in order to get alerts about nearby COVID-19 cases, they assumedly would not want their privacy to be compromised after this crisis ends. Many policies enacted by the American government, including ones considered intrusive on citizens’ privacy, that were enacted in response to major crises tend to become permanent afterwards, such as the NSA continuing to perform dragnet internet surveillance well after 9/11. So before proceeding with cell phone surveillance, it is crucial to ensure that it is done in the correct way, meaning being anonymous and protecting individuals’ privacy, something that organizations such as the Internet Society, United Nations Global Pulse, GSMA, and International Telecommunication Union have stated they are willing to help with.

              Looking at all of this information, it appears that for future pandemics, the American government should utilize all of these methods in conjunction with each other, and possibly with new technology as well.  According to Michael Eisenstein, a science writer, in an article for Nature, the optimal strategy to proceed with is most likely to integrate multiple sources of data in order to give medical professionals and the public a reliable central source for information. Because nobody knows what combinations of tactics will produce the best results, there is still a lot of work to be done in order to determine the truly best strategy. But an example of what this could look like is integrating early reports from ProMED-mail about what diseases are being transmitted in animals, real-time warnings about spikes in cases of a certain virus from search and social media data, and concrete medical records obtained from hospitals. This method would not be limited to only one disease – ideally, it would be able to detect outbreaks of many different infectious diseases and simultaneously allow the public to have important informational resources and prompt health authorities to act quickly in preventing further spread. And there are still big data techniques being developed at the moment, such as DNA testing of potentially infected people that can give health care workers immediate results and tell researchers the specific strain of virus involved. With the continued improvement of this strategy and others, health researchers will likely have more than enough tools at their disposal to create effective models and prevention techniques for infectious diseases within a few years. If this effort is successful, the public would have a much better idea of when and where a viral threat is facing them and give medical authorities ample warning about a looming danger. This would buy the authorities time that is extremely valuable at the beginning of an outbreak and hopefully prevent a similar case explosion to the one that has occurred worldwide with COVID-19.

Works Cited


Chetty, Raj. “The Economics of Health Care and Insurance.” 2019. YouTube, uploaded by Opportunity Insights, 15 May 2019, Accessed 11 June 2020. Lecture.

The Economist Intelligence Unit. "From chaos to coherence: Managing pandemics with data." The Economist, Accessed 11 June 2020.

Eisenstein, Michael. "Infection forecasts powered by big data." Nature Outlook, 7 Mar. 2018, Accessed 11 June 2020.

Oliver, Nuria. "Using Big Data To Fight Pandemics." TechCrunch, 8 Nov. 2014, Accessed 11 June 2020.

Paine, Neil. "Big Data Is Helping Us Fight The Coronavirus — But At What Cost To Our Privacy?" FiveThirtyEight, 9 Apr. 2020, Accessed 11 June 2020.

Richards, David. “The Ebola Crisis and Where Big Data Can Help.” Vox Recode, 24 Oct. 2014, Accessed 11 June 2020.

U.S. Customs and Border Protection. “CBP COVID-19 Updates and Announcements”. N.d. Accessed 12 June 2020.


  1. The information you have shared here is really informative as best pest control sydney contains some great knowledge which is very helpful for me. Thanks for posting it. Keep it up.


Post a Comment

Popular posts from this blog

Criminal Justice Data Analysis

Housing Voucher and Tax Credit Programs