Sunday, May 24, 2020

Looking at the Numbers Part 2: COVID-19 Cases in the U.S.

A week ago I posted analysis of COVID-19 cases for various countries here.

In this post, I use some of the same methods (Python, pandas, scipy, matplotlib) to look at a dataset for U.S. counties http://usafacts.org.  This data is used to look at growth for states.

TL;DR

  • COVID-19 cases in the U.S. also fit a log-normal distribution
  • Several U.S. states are close (90-95%) to the maximum expected total cases
  • The trend shows that the maximum expected total cases will be about 2% of the population
  • Several populous states have a ways to go

Overview

I focus on the 3 most populous states that I have good curve fits for (NY, NJ, MA) and the 3 most populous states that I don't have good curve fits for (CA, TX, FL).  

For this post, I'm using the log-normal cumulative distribution function (CDF) since the underlying data set was total cases (cumulative cases).

Note that the data used only reported numbers and that it is plausible that the actual number of cases is much higher.

Plots

The plots of NY, NJ and MA include the estimated log-normal CDF curve fit.  The legend includes the estimated total number of cases.  Based on the current total number of cases, the percentage complete is: NY (95%), NJ (89%), MA (89%).

I was not able to fit the CA and TX data.  FL has an estimate, but I don't consider it a reliable fit because it is too early (see my previous post on reliability).


Expected Percentage of the Population to Get COVID-19

Two metrics are compared to determine what percentage of the population will get COVID-19.  
  • Estimated % Complete - calculated by dividing the current number of cases by estimated number of total cases.  The higher the percentage, the more reliable the estimate.
  • % of the Population at Estimated Peak - this is the estimated number of total cases divided by the population.
The chart shows that the trend is towards 2% of the population getting infected.  Only the most reliable estimates were included (where the estimated % complete was greater than 50%).


If we assume that 2% of the population will get reported as having COVID-19, then several states have a long ways to go.

State
 Population 
 Total Cases to Date 
 Estimate Remaining Cases to Reach 2 % 
CA
   39,144,818
                            88,226
                                     694,670
TX
   27,469,114
                            52,268
                                     497,114
FL
   20,271,272
                            48,675
                                     356,750

No comments: