Key Epidemiological Parameters for SARS-CoV-2 Outbreaks and Variant Selection From Noisy Data
The COVID-19 pandemic changed the work strategies of epidemiologists. Instead of iteratively refining models for a fixed data set over the course of months or years, researchers experienced an insatiable demand for definitive knowledge about a new pathogen. Suddenly, preliminary predictions based on noisy and often incomplete data became elements of a real-time, public discussion in a politically charged atmosphere. Conducting scientific studies that influence public decision-making is difficult, but it is not impossible. Here we discuss some of the key takeaways from our work with both noisy data and a high demand for certainty during an ongoing pandemic.
and for Initial Outbreaks
At the beginning of the SARS-CoV-2 outbreak in Wuhan, China, in late 2019, researchers around the world wanted to estimate two fundamental epidemiological parameters: the early exponential growth rate
The earliest estimates of
Our team approached this problem in January 2020 not by trying to “fix” the noisy data from Wuhan, but by collecting extensive case reports and travel data for people who moved from Wuhan to other provinces. Focusing on individuals who were infected in Wuhan but detected outside of Hubei province (where Wuhan is located) sidestepped the issue of unreliable data that came directly from Wuhan. Unlike those at the epicenter of the pandemic, provincial health systems were prepared for incoming cases and began to rigorously test everyone who entered each province.
To further isolate potential sources of bias in different data collection systems, we designed two inference approaches to reconstruct the preliminary dynamics of SARS-CoV-2 in Wuhan [6]. We found that the early epidemic in Wuhan doubled every 2.4 days, suggesting an extremely rapid spread that progressed much more quickly than previously thought. We further estimated
Using simulations to connect our results to real-world implications, we found that incorporating the possibility of asymptomatic transmission—which was not yet evident at the time—meant that even extensive quarantine and contact tracing of symptomatic individuals would not control the epidemic locally. Instead, early and strong control measures like social distancing were required to stop the virus’ spread [6].
Retrospectively, we realized that inaccuracies in the earliest estimates of
Selection Coefficient for SARS-CoV-2 Variants
As COVID-19 spread globally, researchers began to worry about viral evolution and selection. Bette Korber of Los Alamos National Laboratory was the first person to identify positive selection in SARS-CoV-2 in the D614G mutation by tracking the change in frequency over time [3]. At first, her discovery faced widespread incredulity; critics wondered how a single amino acid could change the phenotype so drastically. Other studies around that time employed phylogenetic-based methods and found limited evidence of selection at D614G [9], thus framing an apparent controversy surrounding SARS-CoV-2’s possible evolution towards higher levels of contagiousness.
We approached this issue from a different perspective. Rather than attempting to track the places at which D614G emerged globally, we considered its first entry into a country as a unique trial and modeled the time to extinction or fixation (when all sampled viruses have the D614G mutation) [7]. By treating countries like pseudo-independent units, we explicitly modeled the heterogeneity in selection effects due to differences in the ways in which various countries collected SARS-CoV-2 data. This method ultimately led to a more stable estimate of the danger of new variants.
After our attempts to estimate
Lessons Learned
Ultimately, we concluded that applied epidemiology during an ongoing pandemic is severely complicated by the noisy and incomplete data that stem from a novel pathogen outbreak. Even simple tasks, such as estimating an exponential growth rate, become challenging if auxiliary assumptions are left unexamined and subsequently unmet. While we do not have an immediate solution to these problems, we attempted to implement workflows that acknowledge early data’s potential inaccuracies and the helpfulness of analytical methods—like novel data cleaning techniques and explicit modeling of random effects—when accounting for data uncertainty. We can therefore collectively prepare for the next pandemic by developing a body of knowledge that encompasses strategies to best handle potentially unreliable data, realistic standards for the integration of uncertainty in epidemiological parameters, and statistical software that incorporates those concerns.
More broadly, our experiences during the beginning stages of the COVID-19 pandemic clearly demonstrate that uncertainties in early data collection for a novel outbreak can yield diverse or opposing conclusions from different research groups. As scientists, we should resist the urge to quickly settle scientific issues and form consensuses too prematurely; the rigorous discussion and evaluation of different findings will presumably lead to more accurate knowledge and better public health policies. According to theoretical physicist Richard Feynman, “The first principle is that you must not fool yourself, and you are the easiest person to fool.” This is a good reminder, especially when the stakes are high.
References
[1] Ke, R., Romero-Severson, E., Sanche, S., & Hengartner, N. (2021). Estimating the reproductive number R0 of SARS-CoV-2 in the United States and eight European countries and implications for vaccination. J. Theor. Biol., 517, 110621.
[2] Ke, R., Sanche, S., Romero-Severson, E., & Hengartner, N. (2020). Fast spread of COVID-19 in Europe and the US suggests the necessity of early, strong and comprehensive interventions. Preprint, medRxiv.
[3] Korber, B., Fischer, W.M., Gnanakaran, S., Yoon, H., Theiler, J., Abfalterer, W., … Montefiori, D.C. (2020). Tracking changes in SARS-CoV-2 spike: Evidence that D614G increases infectivity of the COVID-19 virus. Cell, 182(4), 812-827.e19.
[4] Li, Q., Guan, X., Wu, P., Wang, X., Zhou, L., Tong, Y., … Feng, Z. (2020). Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia. New Engl. J. Med., 382(13), 1199-1207.
[5] Park, S.W., Bolker, B.M., Champredon, D., Earn, D.J.D., Li, M., Weitz, J.S., … Dushoff, J. (2020). Reconciling early-outbreak estimates of the basic reproductive number and its uncertainty: Framework and applications to the novel coronavirus (SARS-CoV-2) outbreak. J. R. Soc. Interface, 17(168).
[6] Sanche, S., Lin, Y.T., Xu, C., Romero-Severson, E., Hengartner, N., & Ke, R. (2020). High contagiousness and rapid spread of severe acute respiratory syndrome coronavirus 2. Emerg. Infect. Dis., 26(7), 1470-1477.
[7] Van Dorp, C.H., Goldberg, E.E., Hengartner, N., Ke, R., & Romero-Severson, E.O. (2021). Estimating the strength of selection for new SARS-CoV-2 variants. Nat. Commun., 12, 7239.
[8] Van Dorp, C., Goldberg, E., Ke, R., Hengartner, N., & Romero-Severson, E. (2022). Global estimates of the fitness advantage of SARS-CoV-2 variant Omicron. Virus Evol., 8(2), veac089.
[9] Volz, E., Hill, V., McCrone, J.T., Price, A., Jorgensen, D., O’Toole, Á., … Connor, T.R. (2021). Evaluating the effects of SARS-CoV-2 spike mutation D614G on transmissibility and pathogenicity. Cell, 184(1), 64-75.
About the Authors
Ruian Ke
Staff Scientist, Los Alamos National Laboratory
Ruian Ke is a staff scientist at Los Alamos National Laboratory whose research centers on modeling the dynamics and evolution of viral pathogens, including HIV, influenza, and HCV. Since late 2019, his research has focused heavily on the use of mathematical modeling and machine learning approaches to understand the transmission, infection, and evolution dynamics of SARS-CoV-2.
Ethan Romero-Severson
Computational Epidemiologist, Los Alamos National Laboratory
Ethan Romero-Severson is a computational epidemiologist in the Theoretical Biology and Biophysics group at Los Alamos National Laboratory. His work on infectious disease epidemiology bridges the evolutionary biology and mathematical modeling of viral pathogens like HIV, HCV, and bunyaviruses.
Stay Up-to-Date with Email Alerts
Sign up for our monthly newsletter and emails about other topics of your choosing.