I've been seeing some confusion about the output of epidemiological models in some recent articles in the general press. It's often presented as "one scientist said a million people are going to die, but now they say only 10,000" or something equally confusing. So i thought it might be helpful to discuss these models, how they work, and how they're being used.
If you have an epidemic in your country, and you want to know how much it's going to spread, you can make some rough ballpark estimates using things like population size and infectiousness of the pathogen. But things are much, much more complicated than that. People who were infected once tend to be immune and can't spread it further. People won't get infected if they never come in contact with an infected person. Etc. etc. Epidemiological models are meant to take all these complications into account, and give a more realistic picture of what a disease might do.
So, for example, one model I looked at uses census data to create a per-county population model of the entire US. It then figures out the possibility of spreading an infection based on data on the average classroom size of each school, as well as the office size of companies in each building, etc. For each disease you use it to model, you also put in details like the infectivity and lethality of the disease, how many infected people arrive from other countries, etc. The model can then do daily estimates of how the epidemic progresses.
Now, for the coronavirus pandemic, we'd obviously want to use data specific to this virus, like its fatality rate and how many people each infected person tends to pass it on to. And here already you run into challenges. Do you take the infectivity rate from South Korea, which kept the virus in check, or Italy, which didn't? Do you use the global average lethality, or assume a lower value because you know that many people with mild symptoms were never tested?
The exact assumptions used in a model is an area that experts can disagree on. So, two people could use the same model and get somewhat different results, based on their choices in configuring it. This isn't a matter of one number being right and the other wrong - both numbers are right for the set of assumptions used to configure the model. Which set of assumptions is closer to reality can be tough to tell with a fast moving pandemic like this one.
The other cause of confusion is that somethings you want to model hypotheticals. What happens if a country does nothing in response to the arrival of the virus? (That's a potentially useful "worst case" scenario that can put an upper bounds on the sorts of problems we might see.) What happens if social distancing starts three weeks into an outbreak? What happens if half the population decides not to obey social distancing advice?
We know that many of these conditions are completely unrealistic. Crowded Spring Break beaches already showed us that lots of people are stupid when it comes to the advice of health experts, so you'll never get full, nation-wide isolation. And we already know that some states started social isolation early out of caution, before the virus had hit them too hard. So, we know in advance that the numbers produced by those model runs are going to be "wrong" in the sense of not reflecting what we'll see in the real world. But it can be useful to planners because it gives a sense of how the outbreak might progress under different control approaches, so that they can identify an approach that might be optimal.
Just to be clear on that last point: it's sometimes useful to run models that we know are going to be "wrong" in the sense of not reflecting real-world behavior.
Finally, it's important to note that models run a few weeks ago may rely on data on policies and human behavior that quickly become out of date. So, while they might have been "right" based on what we knew at the time, things changed since then, making them "wrong" in the sense that their results are no longer relevant.
Hopefully, that gives people a sense of how the same general approach of modeling the epidemic can produce very different numbers. So, if you see headlines like "scientists thought 3 million would die, now they're saying 30,000", chances are that what you're looking at are different model runs that may have very different assumptions baked in. It's not a matter of scientists disagreeing with each other or having no idea what's going on.
If you have an epidemic in your country, and you want to know how much it's going to spread, you can make some rough ballpark estimates using things like population size and infectiousness of the pathogen. But things are much, much more complicated than that. People who were infected once tend to be immune and can't spread it further. People won't get infected if they never come in contact with an infected person. Etc. etc. Epidemiological models are meant to take all these complications into account, and give a more realistic picture of what a disease might do.
So, for example, one model I looked at uses census data to create a per-county population model of the entire US. It then figures out the possibility of spreading an infection based on data on the average classroom size of each school, as well as the office size of companies in each building, etc. For each disease you use it to model, you also put in details like the infectivity and lethality of the disease, how many infected people arrive from other countries, etc. The model can then do daily estimates of how the epidemic progresses.
Now, for the coronavirus pandemic, we'd obviously want to use data specific to this virus, like its fatality rate and how many people each infected person tends to pass it on to. And here already you run into challenges. Do you take the infectivity rate from South Korea, which kept the virus in check, or Italy, which didn't? Do you use the global average lethality, or assume a lower value because you know that many people with mild symptoms were never tested?
The exact assumptions used in a model is an area that experts can disagree on. So, two people could use the same model and get somewhat different results, based on their choices in configuring it. This isn't a matter of one number being right and the other wrong - both numbers are right for the set of assumptions used to configure the model. Which set of assumptions is closer to reality can be tough to tell with a fast moving pandemic like this one.
The other cause of confusion is that somethings you want to model hypotheticals. What happens if a country does nothing in response to the arrival of the virus? (That's a potentially useful "worst case" scenario that can put an upper bounds on the sorts of problems we might see.) What happens if social distancing starts three weeks into an outbreak? What happens if half the population decides not to obey social distancing advice?
We know that many of these conditions are completely unrealistic. Crowded Spring Break beaches already showed us that lots of people are stupid when it comes to the advice of health experts, so you'll never get full, nation-wide isolation. And we already know that some states started social isolation early out of caution, before the virus had hit them too hard. So, we know in advance that the numbers produced by those model runs are going to be "wrong" in the sense of not reflecting what we'll see in the real world. But it can be useful to planners because it gives a sense of how the outbreak might progress under different control approaches, so that they can identify an approach that might be optimal.
Just to be clear on that last point: it's sometimes useful to run models that we know are going to be "wrong" in the sense of not reflecting real-world behavior.
Finally, it's important to note that models run a few weeks ago may rely on data on policies and human behavior that quickly become out of date. So, while they might have been "right" based on what we knew at the time, things changed since then, making them "wrong" in the sense that their results are no longer relevant.
Hopefully, that gives people a sense of how the same general approach of modeling the epidemic can produce very different numbers. So, if you see headlines like "scientists thought 3 million would die, now they're saying 30,000", chances are that what you're looking at are different model runs that may have very different assumptions baked in. It's not a matter of scientists disagreeing with each other or having no idea what's going on.
Comment