Tracing the geographic origin of Virus

The geographic origin of COVID-19 is a hot topic on social media. Rumours, disinformation, unfounded conspiracy theories about this virus run wildly, which is fuelled by some political calculations. Media and public love this kind of frightening news rather than rational analyses — one weakness of humanity.

From a scientific perspective, the sequences of virus and genomics tools can provide us some clues. I have to emphasize at the beginning : We are tracing the origin not to blame on anyone, any country or any animal. We can use knowledge of geographic origin to better understand this virus — How this virus distribute in the world? Is there feature in its genome to help them quickly adapt to the local environment?

In this post, I present some analyses that can be performed to understand the geographic distribution of the virus. Supposedly, I can collect the virus samples from different places of the word, then genotype them by sequencing and acquire array of genetic variants: SNPs, copy number variations and INDELS.

First, genetic structure analyses. Cluster the strains from different places using PCA and Baysian clustering methods. Moving forward, construct phylogenetic trees to cluster these strains.

Second, isolation by distance and environment. Conduct linear regression of genetic differentiation (Fst) on geographic distance and environmental factors.

Third, GIS-based distribution analyses. Using non-linear regression methods such as spatial distribution modelling methods to associate the occurrence of different virus strains with climate/geographic factors.

Fourth, landscape genetics analyses. Associate genetic variants with climate/geographic factors using GWAS and outlier methods.

Taken together, the above methods can help us understand the spatial distribution of the virus, and genes underlying adaptation of these virus.