Does Our Country Value Data?
This has been a question that has been on my mind for years now. Data is a subject that’s been trending for a while.
Words like ‘data’, ‘estimates’, and ‘models’ have been getting thrown around a lot nowadays. Especially during this pandemic, because we’re all eager to know what our future may or may not be like.
What’s been concerning to me is that a lot of people don’t seem to understand all the intricacies around data or models.
Let me explain.
Data is not just data
You see, data is not just data. Data is information that can easily lie to you, if you don’t know what’s behind it.
Data is a reflection of multiple operations that were combined to create a summary. Good data is reliant on each operation doing their part to not let any leaks get through the cracks. If there are any issues with the process in between, you’re going to have bad data.
If you have bad data, you’re going to risk making the wrong decisions.
To fit our current narrative, we’ll look at what went on with the novel Coronavirus.
A quick overview of our data operations
Collection → Processing → Analysis → Presentation
We’ll start off with the trigger. One day everyone woke up to news that Wuhan, a city of 11 million was shut down and quarantined. A new virus has broken out. We don’t know much about it except it’s causing pneumonia and death in some cases. Some neighboring countries have also seen some cases. It’s January 23rd.
A few days after Wuhan announces its big shutdown, the virus is found circling around the United States. This is when data collection starts. Or at least this is when data collection should start.
In an ideal world, we would try to collect as much data as possible. Anyone who shows symptoms of a mild cold or flu should have had the option to have been also tested for COVID-19. We didn’t do this due to not having enough tests available early on.
This was our first mistake, but nonetheless, it wouldn’t throw us off too much if we caught it early. We can still do random sampling testing to give us a picture of what the actual community spread was and play catch up in prevention.
After we do data collection, we’ll need to process this data in a timely manner to get intelligence as soon as possible so we can be better informed before we commit to any actions.
That means the time from sample collection to test result should come back as soon as possible. Early on it would take about a week for people to get their test results back after a sample was taken. In April, California had more than half of their tests still unprocessed.
Data analysis is taking our processed data and drawing insights. Can this data tell us a story? Is it teaching us something new that might have an impact on our decision making?
As a country, we have some of the smartest analysts, they’re able to churn out actionable insights based on whatever data you can provide them. But this step is heavily dependent on the data that we provide them. Your analysis is only as good as the data you’re able to work with.
We didn’t have good data.
After we do our analysis and gather our findings, this is probably the most important step. We need to present our findings to the major stakeholders. In the case of COVID-19 it’s the American public.
Presentation is probably the most important part, as it is most directly tied to your audience’s understanding of the issue. During the early stages of our bouts with the pandemic, we gave out projected death tolls of 100,000–240,000 based on our analysis and modeling. As time went on, we dropped that number to about 60,000 only to now raise it back to the initial estimated range.
In addition to presenting our findings, we need to transition our findings into actionable steps. In the early stages, this resulted in social distancing guidelines and steps to slowly re-open the economy.
Did we truly value data?
It depends on which part of our society we take a look at.
Based on observations about the data process, we can easily see that those who were in charge of the initial steps of collecting and processing did not care about collecting high quality data. For example, we still don’t know the true death rate of the virus because we don’t know how many people even contracted the virus in the first place.
Those in charge of the latter steps of the process of analysis and presentation cared about data. This includes the academic institutions and private companies that volunteered different scenarios and models based on what data we had collected. They even did a great job by making things public and giving as much transparency as possible.
The main issue is that they could only do so much off of the incomplete data that was provided from the front lines of the data collectors and processors.
If we truly valued data, we would not have had this issue in the first place.