Variants in ACE2 and TMPRSS2 genes are not major determinants of COVID-19 severity in UK Biobank subjects ========================================================================================================= * David Curtis ## Abstract It is plausible that variants in the *ACE2* and *TMPRSS2* genes might contribute to variation in COVID-19 severity and that these could explain why some people become very unwell whereas most do not. Exome sequence data was obtained for 49,953 UK Biobank subjects of whom 74 had tested positive for SARS-CoV-2 and could be presumed to have severe disease. A weighted burden analysis was carried out using SCOREASSOC to determine whether there were differences between these cases and the other sequenced subjects in the overall burden of rare, damaging variants in *ACE2* or *TMPRSS2*. There were no statistically significant differences in weighted burden scores between cases and controls for either gene. There were no individual DNA sequence variants with a markedly different frequency between cases and controls. Whether there are small effects on severity, or whether there might be rare variants with major effect sizes, would require studies in much larger samples. Genetic variants affecting the structure and function of the ACE2 and TMPRSS2 proteins are not a major determinant of whether infection with SARS-CoV-2 results in severe symptoms. This research has been conducted using the UK Biobank Resource. Keywords * *ACE2* * *TMPRSS2* * COVID-19 * SARS-CoV-2 ## Introduction There is wide variation in the severity of symptoms in patients infected with SARS-CoV-2 and there are reports in the UK that members of ethnic minorities are more severely affected. An obvious possible explanation for these findings would be that genetic polymorphisms affecting structure or function of key proteins could influence host susceptibility and/or responses to infection. If these polymorphisms varied in frequency between different ethnic groups then this could contribute to differential outcomes. Two key proteins involved in SARS-CoV-2 infective processes are ACE2, which is expressed on the cell surface and acts as a receptor for the viral S protein, and TMPRSS2, which cleaves the S protein to allow fusion of the viral and cellular membranes (Hoffmann et al., 2020). Variants in the genes coding for these proteins might contribute to different responses to infection. ## Methods The UK Biobank dataset was downloaded along with the variant call files for 49,953 subjects who had undergone exome-sequencing and genotyped using the GRCh38 assembly with coverage 20X at 94.6% of sites on average (Hout et al., 2019). Informed consent from the subjects and ethical approval for research uses of the data had been obtained by UK Biobank. All variants were annotated using VEP, PolyPhen and SIFT (Adzhubei et al., 2013; Kumar et al., 2009; McLaren et al., 2016). To obtain population principal components reflecting ancestry, version 1.90beta of *plink* ([https://www.cog-genomics.org/plink2](https://www.cog-genomics.org/plink2)) was run with the options *--maf 0*.*1 --pca header tabs -- make-rel* (Chang et al., 2015; Purcell et al., 2007, 2009). The COVID-19 results table was downloaded from UK Biobank on 28th April 2020. This contained results for 2,724 subjects who had undergone testing for SARS-CoV-2 infection between 16th March and 14th April 2020 (Armstrong et al., 2020). During this period, testing in the UK was done almost exclusively on patients admitted to hospital and thus patients testing positive can be assumed to have severe disease because patients with milder symptoms were generally left at home. Of the subjects tested, 185 had been exome sequenced, of whom 74 had tested positive, meaning that they had at least one swab which demonstrated the presence of viral RNA at detectable levels. The proportion of infected subjects who require hospitalisation rises with age but is still only 0.18 for those aged 80 or over (Verity et al., 2020). Thus the subjects who tested positive could be regarded as cases with an unusually severe response to infection whereas the subjects who tested negative or who were not tested could be regarded as unscreened controls, most of whom would not have severe symptoms even if infected. SCOREASSOC was then used to carry out a weighted burden analysis to test whether, in *ACE2* or *TMPRSS2*, sequence variants which were rarer and/or predicted to have more severe functional effects occurred more commonly in cases, i.e. subjects who tested positive for SARS-CoV-2, than all the other sequenced subjects. All available variants in each gene were included in the analyses. As originally described, variants were weighted according to frequency so that rare variants were accorded 10 times the weight of common variants (Curtis, 2012). Variants were additionally weighted according to their functional annotation using the default weights provided with the GENEVARASSOC program, which was used to generate input files for weighted burden analysis by SCOREASSOC (Curtis, 2019, 2016, 2012). For example, a weight of 5 was assigned for a synonymous variant, 10 for a non-synonymous variant and 20 for a stop gained variant. Additionally, 10 was added to the weight if the PolyPhen annotation was possibly or probably damaging and also if the SIFT annotation was deleterious, meaning that a non-synonymous variant annotated as both damaging and deleterious would be assigned an overall weight of 30. The full set of weights is shown in Table 1, copied from the previous reports which used this method (Curtis et al., 2019, 2018). Variants were excluded if there were more than 10% of genotypes missing in the controls or if the heterozygote count was smaller than both homozygote counts in the controls. For each variant, an overall weight was obtained consisting of the product of the frequency-based weight and the annotation-based weight. For each subject a gene-wise weighted burden score was derived as the sum of the variant-wise weights, each multiplied by the number of alleles of the variant which the given subject possessed. *ACE2* is located on the X chromosome and males were treated as if they were homozygotes. If a subject was not genotyped for a variant then they were assigned the subject-wise average score for that variant. View this table: [Table 1.](http://medrxiv.org/content/early/2020/06/24/2020.05.01.20085860/T1) Table 1. The table shows the weight accorded to each type of variant as annotated by VEP (McLaren et al., 2016). 10 was added to this weight if the variant was annotated by Polyphen as possibly or probably damaging and 10 was added if SIFT annotated it as deleterious (Adzhubei et al., 2013; Kumar et al., 2009). A t test was carried out to determine whether the gene-wise burden scores differed between cases and controls and additionally ridge regression analysis with lamda=1 was performed incorporating the first 20 principal components, as described previously (Curtis, 2019; Curtis et al., 2019, 2018). To do this, SCOREASSOC first calculates the likelihood for the phenotypes as predicted by the principal components and then calculates the likelihood using a model which additionally incorporates the gene-wise burden scores. It then carries out a likelihood ratio test assuming that twice the natural log of the likelihood ratio follows a chi-squared distribution with one degree of freedom to produce a p value. ## Results The genotype counts and frequencies of variants are presented in Supplementary Table 1, with variant positions and annotations redacted in order to preserve subject anonymity. There were 512 valid variants in *ACE2* and there was no tendency for the weighted burden scores to be different between cases (mean (sd): 25.9 (45.7)) and controls (22.6 (37.8)): t=0.74, 49951 df, p=0.45 and chi-squared=0.33, 1 df, p=0.57. There were 658 valid variants in *TMPRSS2* and although the weighted burden scores were lower in cases (64.6 (38.1)) than in controls (74.0 (48.9)) this difference would not meet conventional standards for statistical significance after applying a Bonferroni correction for the fact that two genes were tested: t=-1.6, 49951 df, p=0.10 and, in the ridge regression analysis incorporating principal components, chi-squared=4.17, 1 df, p = 0.04. On visual inspection of the results there were no individual variants with markedly different frequencies between cases and controls. Of course, for both genes there were many rare variants which were observed in controls but not in cases but this is as expected given the disparity in sample sizes. ## Discussion Although the number of severely affected subjects who had been sequenced is very small it is nevertheless possible to draw some preliminary conclusions and given the importance of the topic it seems reasonable to communicate these findings. In general, the results are negative. It is not the case that a large proportion of severely affected subjects have a particular genetic variant in one of these genes which is relatively rare in the general population. Nor is it the case that there is a common variant which confers strong protection against severe infection. It remains possible that there might be rare variants which have a major effect on risk in individual subjects but such effects would only be detected with larger sample sizes. The fact that the weighted burden scores were higher in controls than in cases is consistent with the hypothesis that rare genetic variants in *TMPRSS2* with functional effects disrupting functioning of the protein might be protective against severe infection. Although this is biologically plausible it should be emphasised that the results obtained are not statistically significant. This could be investigated further by carrying out targeted sequencing of this gene in a sample of a few hundred severely affected subjects. Genetic variants affecting the structure and function of the ACE2 and TMPRSS2 proteins are not a major determinant of whether infection with SARS-CoV-2 results in severe symptoms. ## Data Availability No new data was generated for this study. The data used is available on application from UK Biobank. **Supplementary Table 1** Table showing genotype counts and allele frequencies for variants in *ACE2* and *TMPRSS2* in cases tested positive for SARS-CoV-2, presumed to have severe illness, against background controls. Each variant is assigned a weight, with higher weights for variants which are rarer and/or with predicted damaging effects. For variants in *ACE2*, which is on the X chromosome, males are treated as homozygotes. To preserve subject anonymity, variant positions and annotations are redacted. View this table: [Supplementary Table 1A.](http://medrxiv.org/content/early/2020/06/24/2020.05.01.20085860/T2) Supplementary Table 1A. Genotype counts and frequencies for variants in *ACE2*. View this table: [Supplementary Table 1B.](http://medrxiv.org/content/early/2020/06/24/2020.05.01.20085860/T3) Supplementary Table 1B. Genotype counts and frequencies for variants in *TMPRSS2*. ## Acknowledgments This research has been conducted using the UK Biobank Resource. The author wishes to acknowledge the staff supporting the High Performance Computing Cluster, Computer Science Department, University College London. This work was carried out in part using resources provided by BBSRC equipment grant BB/R01356X/1. * Received May 1, 2020. * Revision received June 24, 2020. * Accepted June 24, 2020. * © 2020, Posted by Cold Spring Harbor Laboratory The copyright holder for this pre-print is the author. All rights reserved. The material may not be redistributed, re-used or adapted without the author's permission. ## References 1. Adzhubei, I., Jordan, D.M., Sunyaev, S.R. (2013) Predicting functional effect of human missense mutations using PolyPhen-2. Curr. Protoc. Hum. Genet. 7 Unit7.20. 2. Armstrong, J., Rudkin, J.K., Allen, N., Crook, D.W., Wilson, D.J., Wyllie, D.H., O’Connell, A.M. (2020) Dynamic linkage of COVID-19 test results between Public Health England’s Second Generation Surveillance System and UK Biobank. Microb. Genet. 3. Chang, C.C., Chow, C.C., Tellier, L.C., Vattikuti, S., Purcell, S.M., Lee, J.J. (2015) Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s13742-015-0047-8&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25722852&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F06%2F24%2F2020.05.01.20085860.atom) 4. Curtis, D. (2012) A rapid method for combined analysis of common and rare variants at the level of a region, gene, or pathway. Adv Appl Bioinform Chem 5, 1–9. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22888262&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F06%2F24%2F2020.05.01.20085860.atom) 5. Curtis, D. (2016) Pathway analysis of whole exome sequence data provides further support for the involvement of histone modification in the aetiology of schizophrenia. Psychiatr. Genet. 26, 223–7. 6. Curtis, D. (2019) A weighted burden test using logistic regression for integrated analysis of sequence variants, copy number variants and polygenic risk score. Eur. J. Hum. Genet. 27, 114–124. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F06%2F24%2F2020.05.01.20085860.atom) 7. Curtis, D., Bakaya, K., Sharma, L., Bandyopadhay, S. (2019) Weighted burden analysis of exome-sequenced late onset Alzheimer’s cases and controls provides further evidence for involvement of PSEN1 and demonstrates protective role for variants in tyrosine phosphatase genes. Ann Hum Genet 84, 291–302. 8. Curtis, D., Coelewij, L., Liu, S.-H., Humphrey, J., Mott, R. (2018) Weighted Burden Analysis of Exome-Sequenced Case-Control Sample Implicates Synaptic Genes in Schizophrenia Aetiology. Behav. Genet. 43, 198–208. 9. Hoffmann, M., Kleine-Weber, H., Schroeder, S., Krüger, N., Herrler, T., Erichsen, S., Schiergens, T.S., Herrler, G., Wu, N.H., Nitsche, A., Müller, M.A., Drosten, C., Pöhlmann, S. (2020) SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor. Cell 181, 271-280.e8. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.cell.2020.02.052&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F06%2F24%2F2020.05.01.20085860.atom) 10. Hout, C.V. Van, Tachmazidou, I., Backman, J.D., Hoffman, J.X., Ye, B., Pandey, A.K., Gonzaga-Jauregui, C., Khalid, S., Liu, D., Banerjee, N., Li, A.H., Colm, O., Marcketta, A., Staples, J., Schurmann, C., Hawes, A., Maxwell, E., Barnard, L., Lopez, A., Penn, J., Habegger, L., Blumenfeld, A.L., Yadav, A., Praveen, K., Jones, M., Salerno, W.J., Chung, W.K., Surakka, I., Willer, C.J., Hveem, K., Leader, J.B., Carey, D.J., Ledbetter, D.H., Collaboration, G.-R.D., Cardon, L., Yancopoulos, G.D., Economides, A., Coppola, G., Shuldiner, A.R., Balasubramanian, S., Cantor, M., Nelson, M.R., Whittaker, J., Reid, J.G., Marchini, J., Overton, J.D., Scott, R.A., Abecasis, G., Yerges-Armstrong, L., Baras, A., Center, on behalf of the R.G. (2019) Whole exome sequencing and characterization of coding variation in 49,960 individuals in the UK Biobank. bioRxiv 572347. 11. Kumar, P., Henikoff, S., Ng, P.C. (2009) Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat. Protoc. 4, 1073–1081. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nprot.2009.86&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19561590&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F06%2F24%2F2020.05.01.20085860.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000268858700007&link_type=ISI) 12. McLaren, W., Gil, L., Hunt, S.E., Riat, H.S., Ritchie, G.R.S., Thormann, A., Flicek, P., Cunningham, F. (2016) The Ensembl Variant Effect Predictor. Genome Biol. 17, 122. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s13059-016-0974-4&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27268795&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F06%2F24%2F2020.05.01.20085860.atom) 13. Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M.A.R., Bender, D., Maller, J., Sklar, P., de Bakker, P.I.W., Daly, M.J., Sham, P.C. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–75. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1086/519795&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=17701901&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F06%2F24%2F2020.05.01.20085860.atom) 14. Purcell, S.M., Wray, N.R., Stone, J.L., Visscher, P.M., O’Donovan, M.C., Sullivan, P.F., Sklar, P., Purcell Leader, S.M., Ruderfer, D.M., McQuillin, A., Morris, D.W., O’Dushlaine, C.T., Corvin, A., Holmans, P. a, Macgregor, S., Gurling, H., Blackwood, D.H.R., Craddock, N.J., Gill, M., Hultman, C.M., Kirov, G.K., Lichtenstein, P., Muir, W.J., Owen, M.J., Pato, C.N., Scolnick, E.M., St Clair, D., Sklar Leader, P., Williams, N.M., Georgieva, L., Nikolov, I., Norton, N., Williams, H., Toncheva, D., Milanova, V., Thelander, E.F., Sullivan, P.F., Kenny, E., Quinn, E.M., Choudhury, K., Datta, S., Pimm, J., Thirumalai, S., Puri, V., Krasucki, R., Lawrence, J., Quested, D., Bass, N., Crombie, C., Fraser, G., Leh Kuan, S., Walker, N., McGhee, K. a, Pickard, B., Malloy, P., Maclean, A.W., Van Beck, M., Pato, M.T., Medeiros, H., Middleton, F., Carvalho, C., Morley, C., Fanous, A., Conti, D., Knowles, J. a, Paz Ferreira, C., Macedo, A., Helena Azevedo, M., Kirby, A.N., Ferreira, M. a R., Daly, M.J., Chambert, K., Kuruvilla, F., Gabriel, S.B., Ardlie, K., Moran, J.L. (2009) Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 10, 8192–8192. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nature08185&link_type=DOI) 15. Verity, R., Okell, L.C., Dorigatti, I., Winskill, P., Whittaker, C., Imai, N., Cuomo-Dannenburg, G., Thompson, H., Walker, P.G.T., Fu, H., Dighe, A., Griffin, J.T., Baguelin, M., Bhatia, S., Boonyasiri, A., Cori, A., Cucunubá, Z., FitzJohn, R., Gaythorpe, K., Green, W., Hamlet, A., Hinsley, W., Laydon, D., Nedjati-Gilani, G., Riley, S., van Elsland, S., Volz, E., Wang, H., Wang, Y., Xi, X., Donnelly, C.A., Ghani, A.C., Ferguson, N.M. (2020) Estimates of the severity of coronavirus disease 2019: a model-based analysis. Lancet. Infect. Dis.