Investigating the likely association between genetic ancestry and COVID-19 manifestation ======================================================================================== * Ranajit Das * Sudeep Ghate ## Abstract The novel coronavirus 2019-nCoV/SARS-CoV-2 infection has shown discernible variability across the globe. While in some countries people are recovering relatively quicker, in others, recovery times have been comparatively longer and numbers of those succumbing to it high. In this study, we aimed to evaluate the likely association between an individual’s ancestry and the extent of COVID-19 manifestation employing Europeans as the case study. We employed 10,215 ancient and modern genomes across the globe assessing 597,573 single nucleotide polymorphisms (SNPs). Pearson’s correlation coefficient (r) between various ancestry proportions of European genomes and COVID-19 death/recovery ratio was calculated and its significance was statistically evaluated. We found significant positive correlation (p=0.03) between European Mesolithic hunter gatherers (WHG) ancestral fractions and COVID-19 death/recovery ratio and a marginally significant negative correlation (p=0.06) between Neolithic Iranian ancestry fractions and COVID-19 death/recovery ratio. We further identified 404 immune response related single nucleotide polymorphisms (SNPs) by comparing publicly available 753 genomes from various European countries against 838 genomes from various Eastern Asian countries in a genome wide association study (GWAS). Prominently we identified that SNPs associated with Interferon stimulated antiviral response, Interferon-stimulated gene 15 mediated antiviral mechanism and 2′-5′ oligoadenylate synthase mediated antiviral response show large differences in allele frequencies between Europeans and East Asians. Overall, to the best of our knowledge, this is the first study evaluating the likely association between genetic ancestry and COVID-19 manifestation. While our current findings improve our overall understanding of the COVID-19, we note that the development of effective therapeutics will benefit immensely from a more detailed analyses of individual genomic sequence data from COVID-19 patients of varied ancestries. Key words * Genomic ancestry * COVID-19 manifestation * novel coronavirus * Genome Wide Association study (GWAS) ## Introduction The novel coronavirus 2019-nCoV/SARS-CoV-2, was officially identified on 7th January 2020 in Wuhan, China1. Since then, the virus has spread rapidly throughout the world leading to catastrophic consequences. Notably, COVID-19, caused by 2019-nCoV has shown significant worldwide variability, especially in terms of the death/recovery ratio that pertains to a ratio of the number of deaths caused by the novel coronavirus to numbers of infected individuals recovered within the same time interval. While infected individuals from some countries have recovered relatively quickly with lower morbidities, those from some other parts of the world appear to remain affected for longer with slower recovery times and demonstrate relatively higher death/recovery ratios2. This is suggestive of a likely population level genetic variation in terms of susceptibility to the coronavirus and COVID-19 manifestation. A case in point in this context appears to be *ACE2* that encodes for angiotensin-converting enzyme-2 and has been speculated to be the host receptor for the novel coronavirus3,4. Recently, Cao et al. analysed 1700 variants in *ACE2* from ChinaMAP and 1000 Genomes Project databases5. While they reported the absence of natural resistant mutations for coronavirus S-protein binding across their study populations, their study revealed significant variation in allele frequencies (AF) among the populations assessed, including between people of East Asian and European ancestries. They speculated that the differences in AFs of *ACE2* coding variants, likely associated with its elevated expression may influence *ACE2* function and putatively impact SARS infectivity among the evaluated populations. Additionally, a majority of the 15 eQTL variants they identified had discernibly higher AFs among East Asian populations compared to Europeans, which may suggest differential susceptibility towards coronavirus in these two populations under similar conditions. In addition to *ACE2*, COVID-19 manifestation may also be modulated by several immune response modulating genes. Li et al. (2020) mentioned that coronavirus infection causes secretion of large quantities of chemokines and cytokines (such as ILC1, ILC6, ILC8, ILC21, TNFCβ and MCPC1) in the infected cells, additionally, the host’s major antiviral machinery comprising of Interferons (IFNs) may also the limit of coronavirus6. Overall it can be speculated that variations in host genes encoding for interleukins and interferons may be responsible for differential manifestation of COVID-19. Further recent studies have shown that variants associated with immune responsiveness display significant population-specific signals of natural selection7. In particular, the authors showed that admixture with Neanderthals may have differentially shaped the immune systems in European populations, introducing novel regulatory variants, which may affect preferential responses to viral infections7. European genomes were further modulated and distinguished owing to multiple waves of migration throughout their ancient past. We note here that while Neolithic European populations predominantly descended from Anatolian migrants, European ancestry shows a distinct West–East cline in indigenous European Mesolithic hunter gatherers (WHGs)8. At the same time Eastern Europe was in the rudimentary stages of the formation of Bronze Age steppe ancestry, which later spread into Central and Northern Europe through East-West expansion9. Further, high genetic substructure between North-Western and South-Eastern hunter-gatherers has also been documented recently10. Ancestry specific variation in immune responses and its potential genetic and evolutionary determinants are poorly understood especially in the context of the novel coronavirus infection and COVID-19 manifestation. While COVID-19 has affected huge numbers of individuals worldwide, this pandemic has cause catastrophic damage in Europe. Here we aimed to unravel the potential association between genetic ancestry and COVID-19 employing Europeans as the case study. In addition, using a genome wide association study (GWAS) approach we sought to discern the putative SNP markers and genes that may be likely associated with differential novel coronavirus infection by comparative analyses of the European and East Asian genomes. To this end we employed 10,215 ancient and modern genomes, obtained from the databank of Dr. David Reich, Harvard Medical School, USA ([https://reich.hms.harvard.edu/datasets](https://reich.hms.harvard.edu/datasets)) to evaluate the likely correlation between European ancestry and COVID-19. Our findings revealed significant positive correlation between WHG-related ancestry and COVID-19 death/recovery ratio and marginally significant negative correlation between Neolithic Iranian ancestry and COVID-19. Finally, we identified several SNPs that may influence immune responsiveness and displaying significant differential AF between Europeans and East Asians. Our current results shine light on ancestry driven genetic factors underlying variabilities in COVID-19 infection and manifestation; it strongly advocates concerted sequencing/genotyping of COVID-19 affected patients worldwide not only to expand and substantiate our current knowledge but to develop focused population specific therapeutic approaches to mitigate this worldwide challenge. ## Method ### Dataset Genome data for the current study was obtained from the personal database of Dr. David Reich’s Lab Harvard Medical School, USA ([https://reich.hms.harvard.edu/datasets](https://reich.hms.harvard.edu/datasets)). The final dataset comprised of 10,215 ancient and modern genomes across the globe assessing 597,573 single nucleotide polymorphisms (SNPs). File conversions and manipulations were performed using EIG v7.211 and PLINK v1.912. COVID-19 data was obtained from 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE2. The most recent data available in this web portal as of Sunday, 5th April 2020, 6 PM was used in the analyses. Indian data was obtained from Government of India coronavirus portal ([https://www.mygov.in/corona-data/covid19-statewise-status](https://www.mygov.in/corona-data/covid19-statewise-status)). ### Determination of ancestry proportions in European genomes in a global context We used *qpAdm*13 implemented in AdmixTools v5.114 to estimate ancestry proportions in the European genomes originating from a mixture of ‘reference’ populations by utilizing shared genetic drift with a set of ‘outgroup’ populations. 14 ancient genomes namely *Luxembourg_Loschbour*.*DG, Luxembourg\_Loschbour\_published*.*DG, Luxembourg\_Loschbour, Iberia\_HG* (N=5), *Iberia\_HG\_lc, Iberia\_HG\_published, LaBrana1_published*.*SG, Hungary\_EN\_HG_Koros* (N=2), *Hungary\_EN\_HG\_Koros\_published*.*SG* were grouped together as European Mesolithic hunter gatherers (WHGs); three ancient genomes namely *Russia\_EHG, Russia\_HG_Samara* and *Russia\_HG\_Karelia* (N=2) were grouped together as Mesolithic hunter-gatherers of Eastern Europe (EHGs); two ancient genomes: *KK1*.*SG, SATP*.*SG* were grouped as Caucasus hunter-gatherers (CHGs) and *Sweden\_HG\_Motala* (N=8) were renamed as Scandinavian Hunter-Gatherers (SHGs) in *qpAdm* analysis. After repeating the analyses with several population combinations, we inferred that Europeans could be best modelled as a combination of three source populations namely EHGs, WHGs and Neolithic Iranians (*Iran\_GanjDareh_N*) as *Left* (EHG, WHG, Iran_GanjDareh_N). We used a mixture of eight ancient and modern-day populations comprising of *Ust Ishim, MA1, Kostenki14, Han, Papuan, Chukchi, Karitiana, Mbuti* as our ‘*Right*’ outgroup populations (O8). Pearson’s correlation coefficient (r) between various ancestry proportions of European genomes and COVID-19 death/recovery ratio was calculated and its significance was statistically evaluated using GraphPad Priam v8.4.0, GraphPad Software, San Diego, California USA, [www.graphpad.com](http://www.graphpad.com). Notably, the death to recovery ratio is used here as the test statistic reflecting the extent to which individuals are succumbing versus recovering following the COVID-19 infection. We surmise this ratio to be indicative of the host immune responsiveness against viral assault. Notably since the death to recovery ratio does not take into account the number of active cases, statistical bias arising due to under-reporting/under-testing of COVID-19 cases is avoided. ### Genome-wide association analyses (GWAS) GWAS was performed to investigate SNPs with significant AF variations among European and East Asian genomes and likely correlated with differential COVID-19 infectivity. To this end 753 European genomes were compared to 838 Eastern Asian genomes. Standard case (higher death/recovery ratio: Europeans) – control (lower death/recovery ratio: East Asians) based association analyses was performed in PLINK v1.9 using --assoc command. A Manhattan plot was generated in Haploview16 by plotting Chisquare values of all assessed SNPs to identify the SNPs that are likely associated with COVID-19 manifestation. Highly significant SNPs were annotated using SNPnexus web-based server17. ## Results ### Ancestry proportions in European genomes and COVID-19 manifestation We modelled all Europeans as a combination of three source populations namely EHGs, WHGs and Neolithic Iranians (see Methods) in *qpAdm* analysis. Among 10 European populations employed in this study, GBR (British in England and Scotland) genomes were found to have the highest WHG ancestry proportions (26.1%) and the lowest Neolithic Iranian ancestry fractions (51.5%) (Table 1). Notably, among the European populations evaluated, COVID-19 death to recovery ratio till date, is the highest among the British people: death/recovery ratio=31.94. In contrast, Russian and Finnish populations which could not be modelled with WHG ancestry so far appear to have less detrimental COVID-19 manifestation with 0.13 and 0.08 death to recovery ratio respectively. Congruent with these results, we found significant positive correlation (Pearson’s Correlation; r=0.58; p=0.03) between WHG ancestry fraction and COVID-19 death/recovery ratio and marginally significant negative correlation (Pearson’s Correlation; r= −0.52; p=0.06) between Neolithic Iranian ancestry fractions and COVID-19 death/recovery ratio. We did not find any correlation between EHG ancestry fraction and COVID-19 manifestation. View this table: [Table 1:](http://medrxiv.org/content/early/2020/04/07/2020.04.05.20054627/T1) Table 1: Ancestry fractions and COVID-19 death/recovery ratio among 12 European countries employed in the study ### Genome-wide association analyses For GWAS, 753 genomes from across Europe with high COVID-19 death/recovery ratio (case) were compared to 838 Eastern Asian genomes with relatively lower COVID-19 death/recovery ratio (control). Out of 597,573 SNPs employed in this study, 385,450 (64.5%) markers revealed highly significant variation (Chi square≥ 6.63; P<0.01) between Europeans and East Asians, indicating the discernible differences in the genetic tapestry of the two populations (Fig. 1). For stringency, we annotated only the top 10,000 ranked SNPs (N=10,013, Chi square≥ 739; P≤9.69×10−163) using SNPnexus web-based server. Among top 10,000 ranked SNPs, 404 were associated with host immune responsiveness including Interferon (IFN) stimulated antiviral response, Interferon-stimulated gene 15 (*ISG15*) mediated antiviral mechanism and 2′-5′ oligoadenylate synthase (OAS) mediated antiviral response. These SNPs were associated with 161 immune system related genes including those related to innate immunity (eg. *IFI27, IFIH1* and *IFNG*), interleukins (eg. *IL16, IL17RB, IL17RD, IL1RAP, IL4R* and *IL6*), antiviral response (eg. *OAS3, SEC13, MX1, NUP54, TRIM25, NUP88, HERC5, FLNB, SEH1L, EIF4E3* and *USP18*) and receptors (eg.*EDAR*). Furthermore, six SNPs (rs2243250, rs1800872, rs1800896, rs1544410, rs1800629 and rs1805015) that display large AF variability between Europeans and East Asians are likely associated with the development of immune responses in the first year of life contingent upon the gene-environment interactions ([https://www.snpedia.com](https://www.snpedia.com)). Finally, the SNPs with the highest Chi square values (≥1500; P≈0) were found to be associated with immune system related pathways such as Senescence-Associated Secretory Phenotype (SASP), Interleukin-4 and Interleukin-13 signaling, TNFR2 non-canonical NF-kB pathway and Class I MHC mediated antigen processing & presentation. ![Fig. 1:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/04/07/2020.04.05.20054627/F1.medium.gif) [Fig. 1:](http://medrxiv.org/content/early/2020/04/07/2020.04.05.20054627/F1) Fig. 1: Manhattan Plot. GWAS with 753 European genomes were compared to 838 Eastern Asian genomes. Out of 597,573 SNPs employed, 385,450 (64.5%) markers revealed highly significant variation between Europeans and East Asians. The chi square values are plotted in Y-axis. The SNPs are designated with dots. The SNPs with chi square values >1000 are indicated with the blue line and those with chi square value >2000 are indicated with the red line. While 18 SNPs showed chi square values >2000, 2,992 SNPs had chi square values >1000. ## Discussion The novel coronavirus 2019-nCoV/SARS-CoV-2 infection has shown discernible variability across the globe. While in some countries people are recovering relatively quicker, in others, recovery times have been comparatively longer, and numbers of those succumbing are higher. Here we sought to investigate the likely association between genetic ancestry determinants and the extent of COVID-19 manifestation employing Europeans as a case study. ### The European story Till date, Europe has been most severely impacted owing to the COVID-19 infection. Our findings revealed a significant positive correlation between WHG-related ancestry and COVID-19 death/recovery ratio (p=0.03) and marginally significant negative correlation between Neolithic Iranian ancestry and COVID-19 (p=0.06). Previously it has been shown that European genomes evolved uniquely with regards to immune responsiveness, particularly pertaining to responses against viral infections7. Admixture with Neanderthals is believed to have introduced unique regulatory variants into European genomes, which likely modulated immune responses in European populations7. We surmise that the European genome architecture has been extensively modulated by the complex origin and migration history of modern-day Europeans during Paleolithic, Mesolithic and Neolithic periods that in turn contributed via introduction of novel variants in European genomes. Modern-day European genomic diversity has been thought to be shaped by variable proportions of local hunter-gatherer ancestry (WHGs)18. While Neolithic, Iberian genomes (modern day Spanish and Portuguese populations) revealed widespread evidence of WHG admixture (10-27%), Neolithic German populations (Linearbandkeramik – LBK culture) revealed ∼ 4–5% of the same18. Notably the variation in WHG ancestry fractions among German and Spanish people correlated with the observed variabilities in COVID-19 death/recovery ratio in these two countries; German populations with discernably lower WHG fractions have a current death/recovery ratio of 0.05, however, in Spain the ratio is 0.33, which corresponded to high WHG fraction among Spanish populations. Interestingly, the other Iberian country, Portugal, despite reporting fewer numbers of COVID-19 cases compared to neighboring Spain, has a significantly high death to recovery ratio (3.93) underscoring the likely association between high WHG ancestry proportions in Iberians and the severity of COVID-19 infection. The WHG ancestry proportion is discernibly high among the Basque people in Spain (33.9%) and congruent with our hypothesis, the death percentage in Basque country is significantly higher compared to other Spanish provinces ([https://www.statista.com/statistics/1102882/cases-of-coronavirus-confirmed-in-spain-in-2020-by-region/](https://www.statista.com/statistics/1102882/cases-of-coronavirus-confirmed-in-spain-in-2020-by-region/)). However, owing to unavailability of data we were unable to calculate the death to recovery ratio in Basque people. Nevertheless, taking our findings into consideration, since the WHG fraction was found to be the highest among the Basque people among all populations assessed in this study, we surmise the death to recovery ratio in Basque country will be higher than Spain’s countrywide mean. A similar correlation between WHG ancestry and the extent of COVID-19 manifestation was observed for the UK, where GBR genomes depicted high WHG ancestry fractions (>25%) and was correlated with high COVID-19 death/recovery ratio in the country (31.94). Italy, one of the worst hit countries with the COVID-19 pandemic, showed significant nationwide variation in the severity of disease manifestation (death/recovery ratio=0.73). While COVID-19 related deaths in Northern Italy has surpassed 12,000, less than 1,000 deaths have been so far reported in Southern Italy. Interestingly, these numbers are consistent with WHG fractions among Italian genomes, thus while WHG ancestry fractions were found to be ∼ 23.1% and 9.2% among Northern and Southern Italian genomes respectively. In contrast, only a handful of COVID-19 cases have been so far reported in Russia, the largest country in the world, and we assessed whether this is owing to their genomic make-up. All modern-day Russians originated from two groups of East Slavic tribes: Northern and Southern and have 0% WHG ancestry fraction. They are genetically similar to modern-day other Slavic populations such as Belarusians. We note that COVID-19 death/recovery ratios are significantly low in both Russia (0.13) and Belarus (0.09). Furthermore, COVID-19 manifestation was found to be far less detrimental in the Central Asian countries that were once part of former Soviet Union (USSR) such as Kazakhstan (death to recovery ratio=0.16), Kyrgyzstan (0.11), Tajikistan (0), Turkmenistan (0) and Uzbekistan (0.08). Finally, Finnish genomes that were shaped by migrations from Siberia ∼3500 years ago19 with no WHG ancestral proportions, have a relatively smaller COVID-19 death to recovery ratio (0.08). Overall, our results have revealed a clear association between WHG and Neolithic Iranian ancestry fractions and the acuteness of COVID-19 manifestation suggesting the presence of unique underlying genetic variants in European genomes that maybe correlated with their variable susceptibility and immune responsiveness. ### The curious case of Central and South Americans The COVID-19 infection has so far shown an interesting pattern in Central and South America. While in countries like Ecuador and Brazil the death to recovery ratio is significantly higher (Ecuador: 1.72 and Brazil: 3.5), it is intermediate in Columbia (0.38), and discernibly low in countries such as Chile (0.05), Uruguay (0.05), Peru (0.07), Mexico (0.12) and Argentina (0.15). We hypothesized that this variation in COVID-19 severity may be attributed to the genetic ancestry of Central and South American populations. It has been reported that all present-day Native Americans have descended from at least four distinct ancient migration waves from Asia20. Two of them being ancient migrations (∼15,000 years ago) from south-central Siberia (Mal’ta Upper Palaeolithic site: MA1 related) and the more recent (∼5000 years ago) when Palaeo-Eskimos spread throughout the American Arctic20. MA1, is genetically proximal to all modern-day Native Americans (14-38% Native American ancestry) and has been found to be basal to modern-day West Eurasians, without any close affinity to East Asians21. MA-1 mitochondrial genome was determined to be associated with haplogroup U, which is found at high frequency among Mesolithic European hunter-gatherers (WHGs)21, depicting a connection between the WHG and modern-day South American genomes. In our analyses Americans could be best modelled as a combination of two source populations namely MA1 and Eskimos. We used a mixture of seven ancient and modern-day populations comprising of CHG, WHG, Sweden_HG_Motala, Papuan, Chukchi, Karitiana and Mbuti as our ‘Right’ outgroup populations (O7). Interestingly, our results indicated that countries such as Mexico and Peru with lower death to recovery ratio have higher Eskimo ancestry fraction (13.2% and 20.3%) and lower MA1 fraction (86.8% and 79.7%) compared to our other study populations. Further, we found ∼ zero Eskimo ancestry fraction among Columbians and Purto Ricans where death to recovery ratio is relatively high. The high death to recovery ratio in Brazil (3.5) is reminiscent of similarly high ratios in Portugal (3.9) and may be attributed to the predominant Portuguese ancestry in the former. ### Intrinsic ‘protection’ for East Asians? As noted above, significant variation in AFs of *ACE2* gene, the presumed receptor of novel coronavirus has been reported among people from East Asian and European ancestry5. The authors of this study speculated that the differences in AFs of *ACE2* coding variants are associated with variable expression of ACE2 in tissues. Additionally, they reported that most eQTL variants identified by them had discernibly higher AFs among East Asian populations compared to Europeans, which may result in differential susceptibility towards the novel coronavirus in these two populations under similar conditions. Consistent with these findings we identified 404 SNPs that are associated with host immune response such as Interferon (IFN) stimulated antiviral response, Interferon-stimulated gene 15 (ISG15) mediated antiviral mechanism and 2′-5′ oligoadenylate synthase (OAS) mediated antiviral response and show large differences in AFs between Europeans and East Asians. The genetic differences between East and Southeast Asians with Europeans can be largely attributed to their distinctive ancestral origins. Most East Asians derive their ancestry from Mongolian hunter-gatherers who dispersed over Northeast Asia 6000-8000 years ago22. Similarly, Southeast Asians exhibit a mixture of East Asian ancestry (Southern Chinese agriculturalist) and a diverged form of Eastern Eurasian hunter-gatherer ancestry (EHGs)23. People with similar ancestry can be found as far south as Indonesia23. As noted above EHG ancestry fraction does not appear to be significantly associated with COVID-19 manifestation. In congruence with their unique ancestral make-up, all East and Southeast Asian countries exhibit low COVID-19 death/recovery ratio (Japan=0.15, South Korea=0.03, Hong Kong=0.02, Taiwan=0.1, Thailand=0.03, Malaysia=0.06 and Singapore=0.02). Notably, in mainland China, where the novel coronavirus originated and the catastrophe first initiated, COVID-19 death to recovery ratio has remained discernibly low (0.04). Overall, our results indicate that people of East and Southeast Asian ancestry appear to be intrinsically protected against the most debilitating effects of the novel coronavirus infection. ### South Asians: a variegated canvas for COVID-19 South Asians have a long and complex history of admixture between immigrant gene-pools originating primarily in West Eurasia, Southeast Asia and the South Asian hunter-gatherer lineage with close genetic proximity to the present-day Andamanese people (Ancient Ancestral South Indians: AASI) who likely arrived in India through the “southern exit” wave out of Africa24. People of AASI ancestry admixed with an undivided ancient Iranian lineage that subsequently split and lead to the formation of early Iranian farmers, herders, and hunter gatherers approximately in the 3rd millennium BCE. This gene pool is referred to as the ‘Indus Periphery’ gene pool24,25, which is thought to be the major source of subsequent peopling of South Asia. Modern-day South Asian genome is composed of largely four ancestral components: Ancestral North Indian (ANI), Ancestral South Indian (ASI), Ancestral Tibeto-Burman (ATB), and Ancestral Austro-Asiatic (AAA)26. ANI and ASI gene pools likely arose around 2nd millennium BCE during the decline of Indus Valley Civilization (IVC), which prompted multiple waves of migrations across the South Asia. The southward migration of Middle to Late Bronze Age people from Steppe (Steppe MLBA) into South Asia is thought to have coincided with the decline of IVC. It is speculated that the people of Indus-Periphery-related ancestry, while migrating northward, admixed with Steppe MLBA immigrants to form the ANI, while the others, who migrated southward and eastward admixed with AASI and formed the ASI25. Austroasiatic speakers originated in Southeast Asia and subsequently migrated to South Asia during the Neolithic period. The admixture between local South Asians and incoming Southeast Asians took place ∼2000-3800 years ago giving rise to the AAA ancestry. Notably, the South Asian population(s) with whom the incoming Southeast Asians mingled were AASI related with little to no West Eurasian ancestry fraction27. Finally, it has been showed that the Tibeto-Burmans (ATBs) derived their ancestry through admixture with low-altitude East Asians who migrated from China and likely across Northern India or Myanmar28 leading to the high genomic proximity between South Asians of ATB ancestry and East Asians. The genomic proximity between East Asians and South Asians mostly from Northeast India with prominent ATB ancestry, is likely reflected through fewer numbers of COVID-19 cases from this region, so far (one case each in Arunachal Pradesh and Mizoram, and two in Manipur). Notably the death to recovery ratio in the largely ATB dominated region of Ladakh, India is zero, indicating complete recovery so far of COVID-19 infected individuals from this region, while the same in its neighboring state of Jammu and Kashmir with predominant ANI ancestral fractions is discernibly higher (0.5). Further, Indian states with large number of indigenous tribal population with prevalence of AAA ancestry and/or with large fractions of AASI-related ancestry, eg. Chhattisgarh, Jharkhand and Orissa have not registered any COVID related deaths till date, indicating that South Asians with dominant AAA and AASI related ancestry may also likely be less severely affected against the most detrimental effects of COVID-19 infection. Nevertheless, the severity of COVID-19 manifestation is likely to vary appreciably across the South Asian populations. Approximately 12% people of ANI and ASI ancestries belong to mitochondrial haplogroup U, which, as described above, is found at high frequency among Mesolithic European hunter-gatherers (WHGs)21. Our interrogation of worldwide populations suggests that the WHG ancestral fraction is likely associated with acute COVID-19 manifestation and is predictive of debilitating effects of COVID-19 infection among South Asian populations with substantial fractions of WHG ancestry. Overall COVID-19 manifestation in South Asia is likely to be somewhat intermediate to that observed in Europe and East Asia. This is underscored by the recent finding that AFs of *ACE2* variants among South Asians are intermediate between East Asians and Europeans5. ### Limitations and conclusion The present study shines light of underlying genetic signatures that may be associated with disparate COVID-19 severity and manifestation in worldwide populations. Nevertheless we note that the current work has been performed using publicly available genomic data and a more robust understanding in this regard will emanate from sequencing/genotyping endeavours for COVID-19 patients across the spectrum of varied nationalities/ancestries and geographical locations including individuals with mild to moderate symptoms, severe manifestations and death. Our results strongly advocate the adoption of a rigorous worldwide population genetics driven approach to expand and substantiate our current knowledge, as well as facilitate the development of population specific therapeutics to mitigate this worldwide challenge. ## Data Availability The current study employed publicly available data, which is freely available through the database of Dr. David Reich, Harvard Medical School (https://reich.hms.harvard.edu/datasets). * Received April 5, 2020. * Revision received April 5, 2020. * Accepted April 7, 2020. * © 2020, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution-NonCommercial-NoDerivs 4.0 International), CC BY-NC-ND 4.0, as described at [http://creativecommons.org/licenses/by-nc-nd/4.0/](http://creativecommons.org/licenses/by-nc-nd/4.0/) ## References 1. 1.World Health Organization. Laboratory testing of human suspected cases of novel coronavirus (nCoV) infection [published online ahead of print January 21, 2020]. [https://apps.who.int/iris/bitstream/handle/10665/330374/WHOC2019CnCoVClaboratoryC2020.1Ceng.pdf](https://apps.who.int/iris/bitstream/handle/10665/330374/WHOC2019CnCoVClaboratoryC2020.1Ceng.pdf) 2. 2.Dong et al. An interactive web-based dashboard to track COVID-19 in real time. The Lancet [https://doi.org/10.1016/S1473-3099(20)30120-1](https://doi.org/10.1016/S1473-3099(20)30120-1) (2020). 3. 3.Zhou, P. et al. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature [https://doi.org/10.1038/s41586-020-2012-7](https://doi.org/10.1038/s41586-020-2012-7) (2020). 4. 4.Lu, R. et al. Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding. Lancet [https://doi.org/10.1016/S0140-6736(20)30251-8](https://doi.org/10.1016/S0140-6736(20)30251-8) (2020). 5. 5.Cao, et al. Comparative genetic analysis of the novel coronavirus (2019-nCoV/SARS-CoV-2) receptor ACE2 in different populations. Cell Discovery [https://doi.org/10.1038/s41421-020-0147-1](https://doi.org/10.1038/s41421-020-0147-1) (2020). 6. 6.Li, et al. Coronavirus infections and immune responses. Journal of Medical Virology [https://doi.org/10.1002/jmv.25685](https://doi.org/10.1002/jmv.25685) (2020). 7. 7.Quach, et al. Genetic Adaptation and Neandertal Admixture Shaped the Immune System of Human Populations. Cell [http://dx.doi.org/10.1016/j.cell.2016.09.024](http://dx.doi.org/10.1016/j.cell.2016.09.024) (2016). 8. 8.Mathieson et al. The genomic history of southeastern Europe. Nature [https://doi.org/10.1038/nature25778](https://doi.org/10.1038/nature25778) (2018) 9. 9.Olade et al. The Beaker phenomenon and the genomic transformation of northwest Europe. Nature [https://doi.org/10.1038/nature25738](https://doi.org/10.1038/nature25738) (2018) 10. 10.Olade et al. The genomic history of the Iberian Peninsula over the past 8000 years. Science [https://doi.org/10.1126/science.aav4040](https://doi.org/10.1126/science.aav4040) (2019) 11. 11.Price et al. Principal components analysis corrects for stratification in genome-wide association studies. Nature Genetics [https://doi.org/10.1038/ng1847](https://doi.org/10.1038/ng1847) (2006) 12. 12.Purcell et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. American Journal of Human Genetics [https://doi.org/10.1086/519795](https://doi.org/10.1086/519795) (2007) 13. 13.Haak et al. Massive migration from the steppe was a source for Indo-European languages in Europe. Nature [https://doi.org/10.1038/nature14317](https://doi.org/10.1038/nature14317) (2015) 14. 14.Patterson et al. Ancient admixture in human history. Genetics [https://doi.org/10.1534/genetics.112.145037](https://doi.org/10.1534/genetics.112.145037) (2012) 15. 15.Alexander et al. Fast model-based estimation of ancestry in unrelated individuals. Genome Research [https://doi.org/10.1101/gr.094052.109](https://doi.org/10.1101/gr.094052.109) (2009) 16. 16.Barrett et al. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics [https://doi.org/10.1093/bioinformatics/bth457](https://doi.org/10.1093/bioinformatics/bth457) (2005) 17. 17. Dayem Ullah et al. SNPnexus: assessing the functional relevance of genetic variation to facilitate the promise of precision medicine. Nucleic Acids Research [https://doi.org/10.1093/nar/gky399](https://doi.org/10.1093/nar/gky399) (2018) 18. 18.Lipson et al. Parallel palaeogenomic transects reveal complex genetic history of early European farmers. Nature [https://doi.org/10.1038/nature24476](https://doi.org/10.1038/nature24476) (2017) 19. 19.Lamnidis et al. Ancient Fennoscandian genomes reveal origin and spread of Siberian ancestry in Europe. Nature Communications [https://doi.org/10.1038/s41467-018-07483-5](https://doi.org/10.1038/s41467-018-07483-5) (2018) 20. 20.Flegontov et al. Palaeo-Eskimo genetic ancestry and the peopling of Chukotka and North America. Nature [https://doi.org/10.1038/s41586-019-1251-y](https://doi.org/10.1038/s41586-019-1251-y) (2019) 21. 21.Raghavan et al. Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans. Nature [https://doi.org/10.1038/nature12736](https://doi.org/10.1038/nature12736) (2014) 22. 22.Wang et al. The Genomic Formation of Human Populations in East Asia. bioRxiv [https://doi.org/10.1101/2020.03.25.004606](https://doi.org/10.1101/2020.03.25.004606) (2020) 23. 23.Lipson et al. Ancient genomes document multiple waves of migration in Southeast Asian prehistory. Science [https://doi.org/10.1126/science.aat3188](https://doi.org/10.1126/science.aat3188) (2018) 24. 24.Shinde et al. An Ancient Harappan Genome Lacks Ancestry from Steppe Pastoralists or Iranian Farmers. Cell [https://doi.org/10.1016/j.cell.2019.08.048](https://doi.org/10.1016/j.cell.2019.08.048) (2019) 25. 25.Narasimhan et al. The formation of human populations in South and Central Asia. Science [https://doi.org/10.1126/science.aat7487](https://doi.org/10.1126/science.aat7487) (2019) 26. 26.Basu et al. Genomic reconstruction of the history of extant populations of India reveals five distinct ancestral components and a complex structure. PNAS [https://doi.org/10.1073/pnas.1513197113](https://doi.org/10.1073/pnas.1513197113) (2016) 27. 27.Tatte et al. The genetic legacy of continental scale admixture in Indian Austroasiatic speakers. Scientific Reports [https://doi.org/10.1038/s41598-019-40399-8](https://doi.org/10.1038/s41598-019-40399-8) (2019) 28. 28.Gnecchi-Ruscone et al. The genomic landscape of Nepalese Tibeto-Burmans reveals new insights into the recent peopling of Southern Himalayas. Scientific Reports [https://doi.org/10.1038/s41598-017-15862-z](https://doi.org/10.1038/s41598-017-15862-z) (2017)