Sunday, May 31, 2020

Looking at the Numbers Part 3: Insight into the COV-19 Pandemic using a simulation

This is the third in the series of analyzing the growth of cases for the current pandemic.  In my previous posts here and here, I showed how many countries and states resemble a log-normal distribution. In this post I describe a model that I developed for the pandemic.

TL;DR
  • I verified my hypothesis that the pandemic can be modeled as a scale-free network starting at patient zero and nearest neighbors being infected each cycle.
  • Luck plays a role: where the infection starts in the pandemic makes a big difference in how fast it spreads.
  • Re-opening will most likely result in a rise of cases.

Scale-Free Networks

I first learned about scale-free networks from the book Linked by Albert-László Barabási.  A scale-free network starts with two nodes connected together.  One node is added at a time, preferentially connected to nodes with more connections.  These networks describe the topology (organization) of networks as varied as the Internet and the proteins in our cells.  It also describes social networks and how disease spread.

I decided to see if by using a scale-free network, I could demonstrate the same log-normal pattern of COVID-19 case growth over time (as observed in my earlier posts).  I found a Python module, networkx, that has a function, not surprisingly called "barabasi_albert_graph", that can construct the network.  A "graph" is a mathematical term for networks.

City-County-State-Country-World

A scale-free network can describe the social connections in your city.  Some people have friends and family in other cities so that a county also resembles a scale-free network.  Similarly, counties are connected to form states; states are connected to form countries; and countries are connected to represent the worlds population.

The Scale-Free Network Pandemic Model

Using the scale-free graph, I created a model that I could use to simulate the spread of the disease.  Starting with one person, the disease is spread to nearest neighbors, who spread to their nearest neighbors, etc.  The model includes policies to reduce the spread (using probability of an infection and limiting "group-size").  For a person with a large personal network, this can radically slow down or "flatten" the curve.

Below is an animation of a scale-free network of 1,000 nodes (I used a software tool call Gephi to plot the graph).  Uninfected nodes are gray, infected nodes are red.  The animation loops.  You can see how it starts slowly, then as it hits hubs (nodes with many connections) it speeds up.  It slows again as it infects loosely connected outliers.


A Scale-Free Network results in a Log-Normal Distribution

When I ran my model with 2.5 million nodes, it fit very well a log-normal distribution.  In the plot below, the orange "Fit" curve is the Log-Normal and it almost covers the blue "Total Cases" generated from the simulation.  I also ran the simulation in "flattened" mode with a group-size of 5 (simulating households of 5 people under stay-at-home orders).

Q: What Exactly Does the Network Represent? A: Only Infected People

My initial thinking was that I would produce a graph with 7.7 billion nodes, and experiment with it.  Unfortunately my computer isn't powerful enough (It started complaining when I ran more than a couple million nodes).  What I realized is that if you consider the complete graph of a population, and then only infected around 2% of the nodes, the infected nodes also resemble a scale-free graph.  So I found it most useful to consider the graph as only the people that got infected in a large network.

Luck and Patient Zero

When the current pandemic started, I naturally asked "why is this happening?" and "how can I protect myself?".  I started using coping mechanisms.  "It mostly kills people with preexisting conditions, so I'll be fine".  When Italy was one of the first countries to get hit the hardest, I blamed their culture thinking "It's all of the kissing and hugging Italians do".

With this simulation that produces results similar to real world data, I could play around with it.  My first goal was to see if I could find a relationship between the log-normal parameters and the simulation parameters.  I varied network size, probability of infection, group-size limits.  The resulting fitted log-normal parameters (sigma, scale and offset) showed no correlation. 

I then decided to examine the effect of different patient zeros (the person from which the disease originated).  I re-ran the model with 10 randomly selected starting nodes for a network of 500,000.  In the plot below, you can see that the starting node does make a difference.


I then decided that 10 nodes was too small a sample, so I decided to run for 500 randomly selected starting nodes.  Each simulation was run until half of the network was infected.  Below is a histogram showing the distribution of these 500 runs.  I also wanted to see the dependence on size of the network, so I ran for networks with 1,000, 10k, 20k, 50k, 100k and 500k nodes.


Making Sense of the Variation

This last graph showed that for a network of size of 500k, the disease would most likely take 3-14 weeks to infect half of the population; but it could also take up to 66 weeks!  How could this be?

Let's consider two people from Wuhan, China that have been infected:
  1. A rich businessman goes to the Alps on a Ski trip
  2. A grandma goes to visit her family in a small town outside of China.

Case 1: Rich Businessman goes skiing

The rich businessman is in the ski lodge with some wealthy Italian young men.  They get infected and go back to Italy.  They are very socially active (hubs) and infect hundreds of people at a club they go to.  Those people are also active and spread to their networks.  In a matter of weeks, thousands of people are infected.

Case 2: Grandma visits her grand-kids

Grandma stays with her daughter in a small town.  Her only interaction is with their family, husband, wife, and 2 kids.  The husband works from home and the daughter takes care of the kids.  Once a month, they have dinner with some friends.  They infect their friends, who are also socially isolated.   Slowly the disease makes its way to a hub, where it spreads more rapidly. It takes months to infect 100 people.

Re-opening

I tried simulating what would happen if the stay-at-home orders were removed, by removing the group-size limit part way through the simulation.  The reality is that the rise in cases will probably not be as dramatic since over-all the population has changed it's behavior (wearing masks, washing hands, etc.)


Conclusion

As public policies change in how we respond to the current pandemic, I knew I needed a model where I could simulate changes over time.  The scale-free network has proved to be an interesting model to experiment with as it fits the early log-normal distributions of cases over time.  The model reveals that there are some things out of our control (who patient zero is), while there are other things we can do to make a big difference (avoid infections via hubs).

1 comment:

Unknown said...

Interesting and a lot of work. The unrest of protest may bring the #'s back up? I am guessing those protesting dont care but could be spreading to those that do.