The exponential impact a single person can have on flattening the curve visualised using pandas and matplotlib in python.
COVID-19 has taken over the world and brought the entire world to a stand still in just a few months. Total cases in the world will be half a million soon and over 20,000 deaths have been confirmed (these figures are as per 26th March). The worrying part is the graph of total cases is still exponentially increasing, and showing no signs of slowing down.
Flattening the curve by social distancing seems to be the only way out of this. Many countries have been locked down in the past few weeks, and people have been asked to strictly stay at home. All these measures will not eliminate the virus, but will help to slow down it’s spread, thus reducing the pressure on the health care system, thus reducing the fatality rate.
But many people still don’t seem to understand the seriousness of social distancing, and how big of an impact even a single person could have. The point is if you are a healthy individual, and the virus may not affect you much, but you could spread it to other people who may be adversely affected by it.
So in this quick post, I will try and visualise the effect of social distancing using python, to see the huge impact every single person could have in stopping the spread of COVID-19, and potentially save thousands of lives.
The Experiment
The goal of this experiment is not to model the spread of the virus, but to understand the impact social distancing has in reducing it’s spread and realise its importance.
First let us import the essentials and define a few parameters.
Let me explain each parameter:
- DAYS: This is simply the number of days we carry out the simulation
- POPULATION: The population of our simulated city.
- SPREAD_FACTOR: It is the number of people an infected person comes in contact with. In a city, an average person is said to be in contact with at least 16 people in a day. Assuming that only a quarter of those people will get infected, I have chosen the SPREAD_FACTOR to be 4. Something to note is that the spread factor depends on many variables and does not stay constant in real life.
- DAYS_TO_RECOVER: The number of days it takes for an infected person to recover. In real life this is also not a constant, but 10 is a good average.
- INITIALLY_AFFECTED: The number of people who were initially affected by the virus. They are the carries who carry the virus from an infected region to a new region, like our hypothetical city.
We will use a DataFrame to model a city where each row corresponds to a citizen, and keep track of infected and recovered people. Using the sample function can randomly select people from the DataFrame. Here is what we will do:
- Create a DataFrame called city, where each row corresponds to a person in the city. It also contains columns to mark when a person is infected and recovered. Initially random INITIALLY_AFFECTED people, using sample and mark them as infected. Also mark their recovery day.
- Run a for loop DAYS times to simulate each passing day.
- Check the number of people who have recovered on this day, and mark them as recovered. These people won’t spread the virus anymore.
- On each day, count the number of infected people, use the SPREAD_FACTOR to calculate the newly infected people on that day. So the number of new cases on a day = SPREAD_FACTOR * number of active cases.
- Keep track of the number of active cases and people who recovered for visualising later.
You can see that in around 10 to 15 days, the entire population of 100,000 has been affected and recovered. This is assuming that the city was capable of treating 100,000 patients at the same time, and everyone recovered at the same rate — in 10 days. But do you think this hypothetical city of 100,000 will have a health care system that can take care of 100,000 active cases per day, for about a week ? Now in reality, the growth may not be so drastic, but it can easily lead to something like this if we take no action at all.
Now let’s take a look at the graph for different values of SPREAD_FACTOR.
Observations:
- SPREAD_FACTOR = 1 (top left): This means every infected person comes in contact with one random person, who is infected if not already infected. Almost all the population has been affected.
- SPREAD_FACTOR = 0.5 (bottom left): For every two infected people one new person is infected per day. Note that the selection of this new person is done at random, and infected only if this person is not already infected. Here the curve is still almost the same as in the first case, but the total cases have gone down by around 20,000.
- SPREAD_FACTOR = 0.25 (top right): For every 4 infected people, one person is infected (if not already infected). Or in other words, 1 one out of these 4 infected people came in contact with a new person who got infected (the other 3 were practicing social distancing !). This could be a state when all the people are consciously quarantining themselves and practicing social distancing. From the previous case, just by reducing the spread factor by half, the spread has decreased exponentially, and the curve is significantly flatter. Here the health care system should be able to provide good care since at the peak, there are only 40,000 active cases.
- SPREAD_FACTOR = 0.2 (bottom right): Here one out of every 5 infected people came in contact with a new person and spread the infection. The other 4 were in isolation. Not too different from the previous case, but the curve is significantly flatter, and the peak active cases have gone down by almost half !
In the last two cases, you can observe the impact a single person can make on the entire spread of the virus ! From this we can conclude that, although the virus spreads exponentially, social distancing also works exponentially, and every single isolated person has an exponential impact on flattening the curve!
Note: I am aware that this is an oversimplification of the real world scenario, but I think it gives us a good understanding of the relationship between the SPREAD_FACTOR and the number of active cases. Also, an exponential function can be simulated easily with math equations, but I think this is more intuitive and easier to understand.
Well now you why exactly social distancing is given so much importance! Basically you are saving lives by sitting at home.
You can find the code here in this Google Colab. You can try and experiment with different values for the parameters. Also try visualising the other metrics like recovery per day. Instead of having a constant value for the spread factor throughout the simulation, you can try to reduce it at different intervals, and observer the effects. I noticed that once the damage has been done, then there is no going back.
So practice social distancing, wash your hands and remember, we are all in this together!
References:
https://www.worldometers.info/coronavirus/
https://medium.com/@tomaspueyo/coronavirus-act-today-or-people-will-die-f4d3d9cd99ca
https://www.washingtonpost.com/graphics/2020/world/corona-simulator/
Note: This article was originally published on towardsdatascience.com, and kindly contributed to AI Planet (formerly DPhi) to spread the knowledge.
Become a guide. Become a mentor.
We at AI Planet (formerly DPhi), welcome you to share your experience in data science – be it your learning journey, experience while participating in Data Science Challenges, data science projects, tutorials and anything that is related to Data Science. Your learnings could help a large number of aspiring data scientists! Interested? Submit here.