Opinions of Monday, 8 June 2020

Columnist: Alfred Appiah

Ghana’s response to coronavirus from a data management perspective

File photo File photo

There is no doubt that Ghana’s response to the novel coronavirus (COVID-19) is one of the best in Africa. Ghana is one of the top 10 countries in Africa in a test conducted per capita.

Ghana’s strategy of testing, tracing and testing the contacts of confirmed cases has been praised, including the country’s adoption of pooled testing to enhance testing capacity. Ghana is one of the few countries that started testing the contacts of confirmed cases, even if they are not exhibiting any symptoms. At the time of writing this article, Ghana has conducted 233,734 tests and 68% of these tests are from enhanced contact tracing.

Despite the success achieved, there have been a number of challenges. One challenge that is explored in this article is data management/reporting. The importance of data in our fight against COVID-19 cannot be overemphasized. We are able to examine the impacts of non-pharmaceutical interventions on the transmission of the virus using data. Data also guides government decision on whether to increase or reduce the intensity of those non-pharmaceutical interventions. Any underlying issues with the data underpinning our success story cannot be overlooked and need to be addressed.

Tests conducted versus people tested

Despite the success achieved in testing, the reporting associated with the testing has been a very controversial one during the fight against COVID-19. This is in part due to a lack of clarity in the unit of reporting of test data by the Ghana Health Service (GHS).

The GHS has in the past used samples tested, people tested, tests conducted interchangeably in its situational updates. There were many discussions on social media about double-counting of tests, including a claim that the GHS was presenting the number of tests conducted as the number of people tested. Subsequently, health authorities explained that any additional tests conducted for the same person are stored in a “different database”.

That explanation made some sense until the GHS started reporting both the number of people tested and the number of tests conducted. It was observed that the number of tests conducted was equal to the number of people tested for both the general surveillance and the enhanced contact tracing groups. The mandatory quarantine group was the only group where that was not the case. That raised additional questions like:

1. If multiple tests are conducted to ascertain whether a case has recovered or not, then shouldn’t the number of tests be more than the number of people tested?

2. If duplicate tests are stored in a different database as explained by health authorities, then why are we reporting duplicate tests for only the mandatory quarantine group? Do they have their own separate database then?

At this point, one would wonder why it matters whether we are reporting tests conducted or people tested. It matters because the test positivity rate has been a very important metric in government decision making. For test positivity rate, if the denominator contains duplicates that are not contained in the numerator, then we are underestimating the test positivity rate. In the absence of mass testing of the entire Ghanaian population, the number of laboratory-confirmed cases may be lower than the true number of cases and hence the test positivity rate (which is a proxy for infection rate) is inherently underestimated. This is why it is important that the metric is not marred by any additional practices that could further underestimate it.

It is also important to mention that from the May 26 situation update onwards, the GHS is no longer reporting the number of people tested. It appears that the test positivity rate is now calculated with total tests conducted (which contains duplicate tests conducted for the mandatory quarantine group) as a denominator, and no explanations have been provided.

Recoveries and active cases

Until the May 13 update, the GHS reported recoveries by the type of surveillance. From the May 13 update onwards, the GHS lumped all recoveries together. The implication of that is that it is no longer possible for analysts to estimate recovery rates by type of surveillance (general surveillance, enhanced contact tracing and mandatory quarantine).

In addition, no analysis can be conducted on active cases by surveillance type. Active cases give an indication of the pressure on the health care system, and isolation centres, in the case of Ghana. Questions such as “what proportion of active cases are in hospitals versus in isolation centres?” remain unanswered. Without knowing the breakdown of active cases by type of surveillance, it is almost impossible to model the impact on treatment centres as we continue to see increases in newly reported cases (see chart below).



Data errors and how they impact other indicators

Throughout these situational updates, there have been a number of arithmetical errors. As an analyst, I initially thought that was a function of how much pressure GHS staff were working under. For example, in the May 27 update (the same update where the GHS stopped reporting the total number of people tested), it was reported that a total number of 208,328 tests were conducted.

However, summing up the individual rows yielded a total of 210,350 tests, indicating the number of tests in the mandatory quarantine group (2,022) was excluded from the total. That total, that excludes a whole surveillance group, was still used to calculate the cumulative test positivity rate (See screenshot from GHS COVID-19 website below).



However, the June 1 update was one that compelled me to write about data errors. In that update, there were 251 new cases from 328 new tests conducted, yielding a daily test positivity rate of 76.5%. As shown in the chart below, that was the highest daily test positivity rate ever. In addition, the 328 new tests represented the lowest number of tests we have ever conducted in a day, given our enhanced testing capacity.



That raised additional questions that needed to be answered:

1. What was responsible for the very high test positivity rate?

2. Did we truly conduct that few numbers of tests in that period or was it another data reporting error?

Comparing the breakdown of the tests reported on May 31 and June 1 showed that the number of tests in the enhanced surveillance group (152,465) on June 1 was lower than the number of tests for that same group on May 31 (153,056). Since this is a cumulative metric, it is impossible for the count at a later date to be lower than a previous date.

No explanations have been provided for that very high daily test positivity rate or the significantly low number of new tests conducted, given our enhanced daily testing capacity. Is that true data or a computational error? If the data is true as reported, then there are significant implications on community spread of the virus as we plan to relaunch the economy beginning with easing restrictions on some religious and educational activities. If it’s a computational error, then that further compounds the issues with data quality.

Other data points

Earlier in the GHS situational updates, charts on the age distribution of cases were presented on the COVID-19 website. Recent updates no longer contain that so it is hard to understand which age groups are most likely to be infected, given the age distribution in the entire population.

The GHS’ epidemic curve plotted with x-axis as sample collection date could use some further clarifications. A closer look at the charts in different updates show that the data in the recent dates (at least the last four days) on the chart keep changing over time. My hypothesis is that at the time when the epidemic curve is plotted based on sample collection date, the samples for the most recent dates would not all have been tested and hence the results reported to the GHS based on sample collection date would be incomplete for the last few dates.

Sometimes that could lead to the wrong conclusions as it could give an indication of a decline in cases when in reality it is just incomplete information. Some jurisdictions, like the one here, get around this by highlighting the data points for the last few dates and adding a note that they may reflect an incomplete picture. This is something that can be adopted by the Ghana Health Service.

In closing, it is obvious that these data management issues cannot be overlooked in our fight against COVID-19 as they could lead to some inaccurate conclusions and to some extent undo some of the excellent response by the Government of Ghana to COVID-19.

The writer is a program evaluation analyst and a data scientist. He is usually tweeting data visualization products made with R on twitter (@CallmeAlfredo).