We've Now Collected Over 1 Million Coronavirus Sequences

WRITTEN BY: Tara Fernandes

Over 1.2 million genome sequences collected from 172 countries have been uploaded to a data-sharing site over the course of the pandemic. This is a major milestone for epidemiologists and researchers looking to track how SARS-CoV-2 emerged and evolved as it swept the globe.

Critically, this data also helps scientists keeping a close eye on variants on the move (some of which are more transmissible or cause more severe forms of COVID). Such data is important to keep tabs on to ascertain whether emerging variants are able to evade current diagnostics and therapeutics.

The sequences were uploaded to GISAID, a popular global science initiative established in 2006 that is a valuable open-access source of genomic data on influenza viruses and SARS-CoV-2. Most of the coronavirus genome sequences were from samples isolated in the United States, Europe, and Asia.

Mapping the spread of the virus hasn’t been smooth sailing for all data contributors. Scientists in West Africa, for example, did not have the bioinformatics training required to effectively use the analytical tools on the GISAID platform. To address these challenges, GISAID affiliates conducted workshops to educate scientists on how to navigate the sequence display and analytical functions.

Still, experts note gaps in the data uploaded onto GISAID. Countries such as El Salvador and Lebanon that are experiencing massive outbreaks have only contributed a small number of entries. Also, not everyone agrees with the site’s fine print, which stipulates that users must agree not to publish studies using sequences without acknowledging their contributors.

Nevertheless, experts are pleasantly surprised by the degree of participation. As Tulio de Oliveira, from the KwaZulu-Natal Research Innovation and Sequencing Platform in Durban, South Africa says, “This is the first time I’ve seen people sharing so much data before publication.”