International News

Chinese researchers removed key COVID gene data from NIH database

The US National Institutes of Health has deleted gene sequences of early COVID-19 cases from a key scientific database at the request of Chinese researchers,

Sentinel Digital Desk

Published:25th Jun, 2021 at 4:35 AM

WASHINGTON: The US National Institutes of Health has deleted gene sequences of early COVID-19 cases from a key scientific database at the request of Chinese researchers, claimed a Seattle-based virologist. Jesse Bloom, a virologist at the Fred Hutchinson Cancer Research Center in Seattle, described the removal of the sequencing data in a new paper posted online on bioRxiv on Tuesday.

The paper, which hasn't been peer reviewed, flags concerns that lack of the key gene sequences may dent the current probe into the origin of the pandemic by scientists. The paper claims that Chinese researchers took virus samples from some of the earliest COVID patients in Wuhan in January and February of 2020, then posted the viral sequences to a widely used US database. After three months the genetic information was removed to "obscure their existence", an editorial in the journal Science reported on Wednesday.

"Here I identify a data set containing SARS-CoV-2 sequences from early in the Wuhan epidemic that has been deleted from the NIH's Sequence," Bloom posted on bioRxiv.

"I recover the deleted files from the Google Cloud, and reconstruct partial sequences of 13 early epidemic viruses. Phylogenetic analysis of these sequences in the context of carefully annotated existing data suggests that the Huanan Seafood Market sequences that are the focus of the joint WHO-China report are not fully representative of the viruses in Wuhan early in the epidemic.

Meanwhile the US NIH has confirmed that it deleted the sequences after receiving a request from a Chinese researcher who had submitted them three months earlier, the Wall Street Journal reported on Wednesday.

"Submitting investigators hold the rights to their data and can request withdrawal of the data," the NIH said in a statement.

Bloom's search led him to a study which listed all SARS-CoV-2 sequences submitted before March 31, 2020, to the Sequence Read Archive (SRA) — a database overseen by the National Center for Biotechnology Information, a division of NIH. But when he checked SRA for one of the listed projects, he couldn't find its sequences, the Science report said.

Further research led him to another study by Ming Wang from Wuhan University's Renmin Hospital, China, which was published in a journal Small. While the paper lists some of the earliest Wuhan COVID patients and the specific mutations in their viruses, it doesn't give the full sequence data.

Additional internet sleuthing led Bloom to discover that SRA backs up its information in Google's Cloud platform, and a search there turned up files containing some of Wang's team earlier data submissions.

The paper in Small makes no mention of any corrections to viral sequences which might explain why they were removed from SRA, which led Bloom to conclude in his preprint that "the trusting structures of science have been abused to obscure sequences relevant to the early spread of SARS-CoV-2 in Wuhan", the report said. (IANS)

Also Read: Countries Relying on Chinese COVID-19 Vaccine Suffer from Severe COVID Outbreak

Also Watch: