Exome-wide association study to identify rare variants influencing COVID-19 outcomes: Results from the Host Genetics Initiative =============================================================================================================================== * Guillaume Butler-Laporte * Gundula Povysil * Jack Kosmicki * Elizabeth T Cirulli * Theodore Drivas * Simone Furini * Chadi Saad * Axel Schmidt * Pawel Olszewski * Urszula Korotko * Mathieu Quinodoz * Elifnaz Çelik * Kousik Kundu * Klaudia Walter * Junghyung Jung * Amy D Stockwell * Laura G Sloofman * Alexander W Charney * Daniel Jordan * Noam Beckmann * Bartlomiej Przychodzen * Timothy Chang * Tess D Pottinger * Ning Shang * Fabian Brand * Francesca Fava * Francesca Mari * Karolina Chwialkowska * Magdalena Niemira * Szymon Pula * J Kenneth Baillie * Alex Stuckey * Andrea Ganna * Konrad J Karczewski * Kumar Veerapen * Mathieu Bourgey * Guillaume Bourque * Robert JM Eveleigh * Vincenzo Forgetta * David Morrison * David Langlais * Mark Lathrop * Vincent Mooser * Tomoko Nakanishi * Robert Frithiof * Michael Hultström * Miklos Lipcsey * Yanara Marincevic-Zuniga * Jessica Nordlund * Kelly M. Schiabor Barrett * William Lee * Alexandre Bolze * Simon White * Stephen Riffle * Francisco Tanudjaja * Efren Sandoval * Iva Neveux * Shaun Dabe * Nicolas Casadei * Susanne Motameny * Manal Alaamery * Salam Massadeh * Nora Aljawini * Mansour S. Almutairi * Yaseen M. Arabi * Saleh A. Alqahtan * Fawz S. Al Harthi * Amal Almutairi * Fatima Alqubaishi * Sarah Alotaibi * Albandari Binowayn * Ebtehal A. Alsolm * Hadeel El Bardisy * Mohammad Fawzy * COVID-19 Host Genetics Initiative * DeCOI Host Genetics Group * GEN-COVID Multicenter Study * GenOMICC Consortium * Japan COVID-19 Task Force * Regeneron Genetics Center * Daniel H Geschwind * Stephanie Arteaga * Alexis Stephens * Manish J. Butte * Paul C. Boutros * Takafumi N. Yamaguchi * Shu Tao * Stefan Eng * Timothy Sanders * Paul J. Tung * Michael E. Broudy * Yu Pan * Alfredo Gonzalez * Nikhil Chavan * Ruth Johnson * Bogdan Pasaniuc * Brian Yaspan * Sandra Smieszek * Carlo Rivolta * Stephanie Bibert * Pierre-Yves Bochud * Maciej Dabrowski * Pawel Zawadzki * Mateusz Sypniewski * Elżbieta Kaja * Pajaree Chariyavilaskul * Voraphoj Nilaratanakul * Nattiya Hirankarn * Vorasuk Shotelersuk * Monnat Pongpanich * Chureerat Phokaew * Wanna Chetruengchai * Yosuke Kawai * Takanori Hasegawa * Tatsuhiko Naito * Ho Namkoong * Ryuya Edahiro * Akinori Kimura * Seishi Ogawa * Takanori Kanai * Koichi Fukunaga * Yukinori Okada * Seiya Imoto * Satoru Miyano * Serghei Mangul * Malak S Abedalthagafi * Hugo Zeberg * Joseph J Grzymski * Nicole L Washington * Stephan Ossowski * Kerstin U Ludwig * Eva C Schulte * Olaf Riess * Marcin Moniuszko * Miroslaw Kwasniewski * Hamdi Mbarek * Said I Ismail * Anurag Verma * David B Goldstein * Krzysztof Kiryluk * Alessandra Renieri * Manuel Ferreira * J Brent Richards ## Abstract Host genetics is a key determinant of COVID-19 outcomes. Previously, the COVID-19 Host Genetics Initiative genome-wide association study used common variants to identify multiple loci associated with COVID-19 outcomes. However, variants with the largest impact on COVID-19 outcomes are expected to be rare in the population. Hence, studying rare variants may provide additional insights into disease susceptibility and pathogenesis, thereby informing therapeutics development. Here, we combined whole-exome and whole-genome sequencing from 21 cohorts across 12 countries and performed rare variant exome-wide burden analyses for COVID-19 outcomes. In an analysis of 5,048 severe disease cases and 571,009 controls, we observed that carrying a rare deleterious variant in the SARS-CoV-2 sensor toll-like receptor *TLR7* (on chromosome X) was associated with a 5.3-fold increase in severe disease (95% CI: 2.75-10.05, p=5.41×10-7). These results further support *TLR7* as a genetic determinant of severe disease and suggest that larger studies on rare variants influencing COVID-19 outcomes could provide additional insights. Key words * COVID-19 * SARS-CoV-2 * Whole-Exome Sequencing * Whole-Genome Sequencing ## Introduction Despite successful vaccine programs, SARS-CoV-2 is still a major cause of mortality and widespread societal disruption1,2. While disease severity has correlated with well established epidemiological and clinical risk factors (e.g., advanced age, obesity, immunosuppression), these do not explain the wide range of COVID-19 presentations3. Hence, individuals without one of these known risk factors may have a genetic predisposition to severe COVID-194. These genetic determinants to severe disease can, in turn, inform about the pathophysiology underlying COVID-19 severity and accelerate therapeutics development5,6. Previous work on COVID-19 host genetics using genome-wide association studies (GWASs) revealed 23 statistically robust genetic loci associated with either COVID-19 severity or susceptibility7–11. Given that most GWASs use genetic data obtained from genome-wide genotyping followed by imputation to measure the association between a phenotype and genetic variation, their reliability and statistical power declines as a variant’s frequency decreases, especially at allele frequencies of less than 1%12. Ascertainment of rare genetic variation can be improved with sequencing technology13. Rare variants are expected to be enriched for larger effect sizes, due to evolutionary pressure on highly deleterious variants, and may therefore provide unique insights into genetic predisposition to COVID-19 severity. Identifying such genes may highlight critical control points in the host response to SARS-CoV-2 infection. Measuring the effect of rare genetic variants on a given phenotype (here COVID-19) is difficult. Specifically, while variants of large effect on COVID-19 are more likely to be rare, the converse is not true, and most rare variants are not expected to strongly impact COVID-19 severity14. Therefore, unless large sample sizes and careful statistical adjustments are used, most rare variant genetic associations studies risk being underpowered, and are at higher risk of false or inflated effect estimates if significant associations are found between COVID-19 and genetic loci. This is exemplified by the fact that several rare variant associations reported for COVID-19 have not been replicated in independent cohorts15–17. Here, we investigated the association of rare genetic variants on the risk of COVID-19 by combining gene burden test results from whole exome and whole genome sequencing. To our knowledge, this is the first rare genetic variant burden test meta-analysis ever performed on a worldwide scale, including 21 cohorts, in 12 countries, including all main continental genetic ancestries. ## Results ### Study population and outcome The final analysis included up to 28,159 individuals infected with SARS-CoV-2, and up to 596,189 controls from 21 cohorts in 12 countries (**Figure 1**). Most participants were of European genetic ancestry (n=576,389), but the consortium also included participants of Admixed American (n=4,529), African (n=25,465), East Asian (n=4,058), Middle Eastern (n=4,977) and South Asian ancestries (n=9,943). Participating cohorts enrolled patients based on local protocols, and both retrospective and prospective designs were used. Genetic sequencing was also performed locally, and cohorts were provided with a specific framework for quality control analyses, but each were allowed to deviate based on individual needs. Both exome (n = 11 cohorts) and genome sequencing (n = 10 cohorts) were included in the meta-analyses. The mean age of participants was 55.7, and 55.9% were females. ![Figure 1:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/03/31/2022.03.28.22273040/F1.medium.gif) [Figure 1:](http://medrxiv.org/content/early/2022/03/31/2022.03.28.22273040/F1) Figure 1: Maps of countries contributing data to the consortium. Sample sizes (cases and controls) for each phenotype were added and represented on the logarithmic scale by each circle. Relative contribution to each phenotype is represented by the three colors. We studied three separate outcome phenotypes, as previously described by the COVID-19 Host Genetics Initiative (COVID-19 HGI)8. Briefly, the outcome cases were defined according to three standard COVID-19 HGI outcomes: A) severe disease: individuals with SARS-CoV-2 infection who died or required invasive respiratory support (extracorporeal membrane oxygenation, intubation with mechanical ventilation, high-flow oxygen support, or new bilevel or continuous positive airway pressure ventilation), B) hospitalisation: individuals with SARS-CoV-2 who died or required hospitalisation, and C) susceptibility to infection: any individual with SARS-CoV-2 infection. These are also referred to as A2, B2, and C2, respectively, in the COVID-19 HGI meta-analyses8. For all three phenotypes, controls were all individuals not classified as cases (including population controls with unknown COVID-19 status). The final meta-analyses included up to 5,048 cases and 571,009 controls for the severe disease outcome, 12,267 cases and 589,175 controls for the hospitalisation outcome, and 28,159 cases and 596,189 controls for the susceptibility outcome. ### Burden test definition Given the expected paucity of large-effect size rare deleterious variants, strategies have been devised to increase statistical power to test associations between rare variants and biomedically-relevant outcomes. One such strategy is to use burden tests18, where each variant is collapsed into larger sets of variants, and association is tested between groups of variants and an outcome. Here, we collapsed deleterious variants in each gene and devised the following burden test: for each gene, an individual received a score of 0 if they do not carry any deleterious variant, a score of 1 if they carry at least one non-homozygous deleterious variant, and a score of 2 if they carry at least 1 homozygous deleterious variant. Similar to previous studies on burden testing of rare variants17,19 deleterious variants were chosen using three masks: 1) “pLoF” which uses only predicted loss of function variants, 2) “coding5” which uses all variants in pLoF, as well as indels of moderate consequence as predicted by Ensembl20, and missense variants classified as deleterious in 5 *in-silico* algorithms (see Methods), and 3) “coding1”, which uses all variants in coding5 and coding5, and also adds all missense variants classified as deleterious in at least 1 of the *in-silico* algorithms. The analyses were performed for variants with minor allele frequency (MAF) between < 1% and < 0.1%. MAFs were obtained from a combination of gnomAD21 and cohort-specific common variant exclusion lists. These common variant lists included variants that achieved a MAF of >1 % or > 0.1 % in at least one study population within the consortium. To reduce the effect of fluctuations due to sampling, a minor allele count (MAC) ≥ 6 in the corresponding study was required for inclusion in the common variant list. Such “blacklists” have been shown to increase statistical power by removing variants at lower risk of being highly deleterious, and it reduces the risk of having cohort-specific false-positive variants being retained on the overall analysis22. The resulting score (either 0, 1, or 2) for each mask was then regressed on each of our three phenotypes using logistic regression, controlling for age, age^2, sex, sex*age, sex*age^2, and 10 common variant (MAF > 1%) genetic principal components (the same covariates as for COVID-19 HGI GWASs7,8). Additionally, given that population genetic structure and its confounding effect on phenotypes is different at the rare variant level23, we also used the first 20 genetic principal components from rare variants (MAF<1%) as covariates in all our analyses. Analyses were performed separately by each cohort and each ancestry using Firth regression as applied in the Regenie software24. Firth regression is a penalized likelihood regression method that provides unbiased effect estimates even in highly unbalanced case-control analyses, as expected with rare variants25. The summary statistics were then meta-analyzed with a fixed effect inverse-variance weighted model within each ancestry, and then with a DerSimonian-Laird random effect model across ancestries. ### Main analysis results Our meta-analysis included a total of 18,883 protein-coding genes, and all burden test genetic inflation factors, for all masks, were less than 1, suggesting that our results were not biased by population stratification and that Firth regression adequately adjusted for unbalanced case-control counts. Using an exome-wide significance p-value threshold of 0.05/20,000 = 2.5×10-6, we found 3 genes associated with one of the COVID-19 phenotypes in at least one mask in our meta-analyses (**Table 1**). Of specific interest, we observed that carrying a predicted loss of function (pLoF) or missense variant (mask coding5) in the toll-like receptor 7 (*TLR7*) gene was associated with a 5.3-fold increase (95% CI: 2.7-10.1, p=5.41×10-7) in odds of severe COVID-19. *TLR7* is an important part of the innate viral immunity, encoding a protein that recognizes coronaviruses and other single-stranded RNA viruses, leading to upregulation of the type-1 and type-2 interferon pathway26. Results from the severe COVID-19 outcome analyses of *TLR7* with other masks also nearly reach our statistical significance threshold, with larger effects found in the pLoF mask (OR: 13.6, 95% CI: 4.41-44.3, p=1.64×10-5) and smaller effect in the coding1 mask (OR: 3.12, 95% CI: 1.91-5.10, p=5.30×10-6), though the latter was balanced by smaller standard errors due to the larger number of cases (3275 cases in coding1 vs 1577 in pLoF), as expected. These findings further support previous reports of *TLR7* errors of immunity underlying severe COVID-19 presentations17,27–30. View this table: [Table 1:](http://medrxiv.org/content/early/2022/03/31/2022.03.28.22273040/T1) Table 1: Exome-wide significant findings, as well as other *TLR7* results (for the severe phenotype only). Note that for Masks pLoF, all deleterious variants had a MAF<0.1%, and hence both burden tests (MAF<1% and 0.1%) gave the same results. Full results available in **Supp. Table 6**. In the meta-analyses, we also found that pLoFs in *MARK1* were associated with a 23.9-fold increase in the odds of severe COVID-19 (95% CI: 6.5-88.2, p=1.89×10-6), and a 12.3-fold increase in the odds of hospitalisation due to COVID-19 (95% CI: 4.8-31.2, p=1.43×10-7). While the number of *MARK1* pLoFs found in severe and hospitalized cases was small (MAC=4 and MAC=8, respectively), the signal was consistent in our three largest cohorts: UK Biobank, Penn Medicine, and Geisinger Health Services. *MARK1* is a member of the microtubule affinity-regulating kinase family, and is involved in multiple biological processes, chief among which is the promotion of microtubule dynamics31. *MARK1* has previously been shown to interact with the SARS-CoV-2 ORF9b protein32, further supporting its potential role in COVID-19. Lastly, our meta-analyses also found marginal evidence for an association between severe COVID-19 and pLoFs in *RILPL1* (OR: 20.2, 95% CI: 5.8-70.7, p=2.42×10-6), a gene that, like *MARK1*, is associated with microtubule formation and ciliopathy33. When we meta-analyzed p-values using the aggregated Cauchy association test34 (ACAT), the association between *TLR7* and severe COVID-19 (p=1.58×10-6), and between *MARK1* and hospitalisation (p=4.30×10-7) remained exome-significant (**Figure 2**). Full summary statistics are available in **Supp. Table 6**. ![Figure 2:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/03/31/2022.03.28.22273040/F2.medium.gif) [Figure 2:](http://medrxiv.org/content/early/2022/03/31/2022.03.28.22273040/F2) Figure 2: Exome burden test ACAT p-value meta-analysis Manhattan plots and QQ plots. ### Rare variants in interferon-related genes and at previously reported genome-wide significant loci Despite a 7.7-fold increase in number of cases, and a 1,069-fold increase in number of controls, the previously reported associations of genes in the interferon pathway with COVID-19 outcomes15,16 could not be replicated with either our exome-wide significance threshold (**Supp. Table 7**) or a more liberal one of p=0.05/10=0.005 (based on Bonferroni correction by the number of genes in the interferon pathway defined in a previous study15). We also tested for rare variant associations between GWAS candidate genes from genome-wide significant loci in the COVID-19 HGI GWAS meta-analyses, but observed no exome-wide significant associations (**Supp Table 8**). However, at a more liberal Bonferroni threshold of p=0.05/46=0.001 (correcting for the 46 genes in the COVID-19 HGI GWAS associated loci), we observed an increased burden of pLOF or missense variants (coding5 mask) in *ABO* gene among those susceptible to SARS-CoV-2 infection. For example, individuals carrying a pLoF with MAF<0.1% in *ABO* were at a 2.34-fold higher risk of having a positive SARS-CoV-2 infection (95% CI: 1.50-3.64, p=1.6×10-4). Note that deleterious variants in *ABO* often lead to blood groups A and B35,36, which is consistent with the epidemiological association that non-type-O individuals are at higher risk of COVID-1937. However, more work is required to better understand the genetics of this locus as it relates to COVID-19 outcomes. Lastly, missense variants in *NSF* (mask coding1, MAF<1%) were also associated with higher susceptibility to SARS-CoV-2 (OR: 1.48, 95% CI: 1.21-1.82, p=1.4×10-4), but this association was not present in other masks (**Supp Table 8**). ### Replication in GenOMICC Data for the pLoF mask for *TLR7* and *MARK1* in the severe COVID-19 phenotype was then replicated with the GenOMICC cohort11, a prospective study enrolling critically ill individuals with COVID-19, with controls selected from the 100,000 genomes cohort38. Results are shown in **Table 2**. For *TLR7*, European ancestry individuals with a pLoF had a 4.70-fold increase in odds of severe disease (95% CI: 1.58 to 14.0, p=0.005). In the sample of South Asian ancestry individuals, a pLoF was associated with a 1.90-fold increase in odds of severe disease, but the 95% confidence interval crossed the null (0.23 to 15.6, p=0.55), which was likely due to a much smaller sample size than in the European ancestry subgroup (1,202 vs 10,645). Of interest, in both Europeans and South Asians, no pLoFs were observed in either of the control groups. View this table: [Table 2:](http://medrxiv.org/content/early/2022/03/31/2022.03.28.22273040/T2) Table 2: Replication of pLoF mask, severe COVID-19, *MARK1* and *TLR7* results in the GenOMICC cohort. Note that the same variants were included in both the MAF<1% and MAF<0.1% replication, and the same results were obtained (shown here). View this table: [Table 3:](http://medrxiv.org/content/early/2022/03/31/2022.03.28.22273040/T3) Table 3: Results of burden tests at genes identified from common variants GWAS in the COVID-19 HGI. Only genes with p<0.05/46 are shown here. Full results available in **Supp. Table 8**. On the other hand, we could not replicate an effect from *MARK1*, which demonstrated an OR of 1.21 in European ancestry participants (95% CI 0.075 to 19.7, p=0.89) and an OR of 4.21 in South Asian ancestry individuals (95% CI 0.058 to 307, p=0.51). ### Single-variant analysis We performed an exome-wide association study using single variants with MAF higher than 0.1% and allele count of 6 or more, with the same analysis design used in the COVID-19 HGI GWAS8, and the same outcome phenotypes used in the burden testing above. The previously described Neanderthal chromosome 3 locus associated with COVID-19 outcomes was also found in all three phenotypes, with lead variants in the *FYCO1* gene for the severe COVID-19 and hospitalisation phenotypes (rs13059238 and rs41289622, respectively), and for the *LIMD1* gene in the susceptibility phenotype (rs141045534). One other variant was found in the hospitalisation phenotype in *SRRM1* (rs1479489847); this association was found in only two of our smaller cohorts (Genentech EUR and AMR ancestries, and Vanda EUR ancestry), and was not replicated in the larger ones (p=0.30 in the UK Biobank EUR ancestry). Summary statistics for genome-wide significant variants can be found in **Supp. Table 9**, and Manhattan plots can be found in the **Supp. Figures**. ## Discussion Whole genome and whole exome sequencing can provide unique insights into genetic determinants of COVID-19, by uncovering associations between rare genetic variants and COVID-19. Specifically, burden tests can be particularly helpful, because they test for coding variants, thereby pointing directly to a causal gene and often suggesting a direction of effect. However, such studies require careful control for population stratification and an adapted analysis method such as burden testing, in order to have enough statistical power to find those associations. In our study, we observed that individuals with rare deleterious variants at *TLR7* are at increased risk of severe COVID-19 (up to 13.1-fold increase in odds in those with pLoFs). Although this association was suggested by previous studies27–29, our study provides the most definitive evidence for the role of TLR7 in COVID-19 pathogenesis, with exome-wide significance for this gene in the discovery phase followed by strong replication in a large independent cohort. *TLR7* is a well-studied part of the antiviral immunity cascade and stimulates the interferon pathway after recognizing viral pathogen-associated molecular patterns. We also uncovered a potential role for cellular microtubule disruption in the pathogenesis of COVID-19 and the microtubule network is known to be exploited by other viruses during infections39. Indeed, the MARK1 protein has been shown to interact with SARS-CoV-2 in previous *in-vitro* experiments32. Nevertheless, these findings at *MARK1* were not replicated in the GenOMICC cohort and will need to be tested in larger cohorts, especially given the small number of highly deleterious variants that we found in our consortium. To our knowledge, this is the first time a rare variant burden test meta-analysis has been attempted on such a large scale. Our framework allowed for easy and interpretable summary statistics results, while at the same time preventing participant de-identification or any breach of confidentiality that stems from sharing results of rare genetic variant analyses40. It also provides important insights into how these endeavours should be planned in the future. First, our burden test operated under the assumption that the effect of any of the deleterious variants on the phenotype would be in the same direction and did not account for compound deleterious variant heterozygosity. This allowed for easier meta-analysis across cohorts but may have decreased statistical power. Other methods may be needed in future analysis to soften this assumption, though some of these cannot be easily meta-analyzed across multiple cohorts directly from summary statistics (e.g., SKAT-O41). Similarly, methods that combine both rare and common variants might also provide additional insights into disease outcomes30,42. Second, our results highlight the importance at looking at different categories of variants through different masks to increase sensitivity and specificity of our burden tests. Lastly, work remains to be done to standardize sequencing and annotation pipelines to allow comparisons of results easily across studies and cohorts. Here, we provided a pipeline framework to evert participating cohorts, but there remains room for process harmonization in the future. Our study had limitations. First, even if this is one of the world’s largest consortia using sequencing technologies for the study of rare variants, we remain limited by a relatively small sample size. For example, in a recent analyses of UK Biobank exomes, many of the phenotypes for which multiple genes were found using burden tests had a much higher number of cases than in our analyses (e.g. blonde hair colour, with 48,595 cases)19. Further, rare variant signals were commonly found in regions enriched in common variants found in GWASs. The fact that *ABO* and *NSF* were the only genes from the COVID-19 HGI GWAS that were also identified in our burden test (albeit using a more liberal significance threshold), also suggests a lack of statistical power. Similarly, GenOMICC, a cohort of similar size, was also unable to find rare variant associations using burden tests11. However, their analysis methods were different from ours, making further comparisons difficult. Nevertheless, this provides clear guidance that smaller studies looking at the effect of rare variants across the genome are at considerable risk of finding both false positive and false negative associations. Second, many cohorts used population controls, which may have decreased statistical power given that some controls may have been misclassified. However, given that COVID-19 critical illness remains a rare phenomenon43, our severe disease phenotype results are unlikely to be strongly affected by this. Further, the use of population control is a long-established strategy in GWAS burden tests7,8,11,19,44, and the statistical power gain from increasing our sample size is likely to have counter-balanced the misclassification bias. In summary, we reproduced an exome-wide significant association with severe COVID-19 outcomes in carriers of rare deleterious variants at *TLR7*. Our results also suggest an association between the cellular microtubule network and severe disease, which requires further validation. More importantly, our results underline the fact that future genome-wide studies of rare variants will require considerably larger sample size, but our work provides a roadmap for such collaborative efforts. ## Methods ### COVID-19 outcome phenotypes For all analyses, we used three case-control definitions: A) Severe COVID-19, where cases were those who died, or required either mechanical ventilation (including extracorporeal membrane oxygenation), high-flow oxygen supplementation, new continuous positive airway pressure ventilation, or new bilevel positive airway pressure ventilation, B) Hospitalized COVID-19, where cases were all those who died or were admitted with COVID-19, and C) Susceptibility to COVID-19, where cases are anyone who tested positive for COVID-19, self-reported an infection to SARS-CoV-2, or had a mention of COVID-19 in their medical record. For all three, controls were individuals who did not match case definitions, including population controls for which case status was unknown (given that most patients are neither admitted with COVID-19, nor develop severe disease45). These three analyses are also referred to as analyses A2, B2, and C2 by the COVID-19 Host Genetics Initiative8, respectively. ### Cohort inclusion criteria and genetic sequencing Any cohort with access to genetic sequencing data and the associated patient level phenotypes were allowed in this study. Specifically, both whole-genome and whole-exome sequencing was allowed, and there were no limitations in the platform used. There were no minimal number of cases or controls necessary for inclusion. However, the first step of Regenie, which was used to perform all tests (see below), uses a polygenic risk score which implicitly requires that a certain sample size threshold be reached (which depends on the phenotype and the observed genetic variation). Hence, cohorts were included if they were able to perform this step. All cohorts obtained approval from their respective institutional review boards, and informed consent was obtained from all participants. More details on each cohort’s study design and ethics approval can be found in the **Supp Tables 1-2**. ### Variant calling and quality control Variant calling was performed locally by each cohort, with the pre-requisite that variants not be joint-called separately between cases and controls. Quality control was also performed individually by each cohort according to individual needs. However, a general quality control framework was made available using the Hail software46. This included variant normalization and left alignment to a reference genome, removal of samples with call rate less than 97% or mean depth less than 20. Genotypes were set to unknown if they had genotype quality less than 20, depth less than 10, or poor allele balance (more than 0.1 for homozygous reference calls, less than 0.9 for homozygous alternative calls, and either below 0.25 or above 0.75 for heterozygous calls. Finally, variants were removed from if the mean genotype quality was less than 11, mean depth was less than 6, mean call rate less than or equal to 0.8, and Hardy-Weinberg equilibrium p-value less than or equal to 5×10-8 (10-16 for single variant association tests). Details on variant calling and quality control is described for each cohort in the **Supp. Table 1**. ### Variant exclusion list For the burden tests, we also compiled a list of variants that had a MAF > 1 % or > 0.1 % in any of the participating cohorts. This list was used to filter out variants that were less likely to have a true deleterious effect on COVID-19, even if they were considered rare in other cohorts, or in reference panels22. We created two such variant exclusion lists: one to be used in our burden test with variants of MAF less than 1%, and the other for the analysis with MAF less than 0.1%. In any cohort, if a variant had a minor allele count of 6 or more, and a MAF of more than 1% (or 0.1%), this variant was added to our exclusion list. This list was then shared with all participating cohorts, and all variants contained were removed from our burden tests. ### Gene burden tests The following analyses generally followed the methods used by recent literature on large-scale whole-exome sequencing19 and the COVID-19 HGI8. The burden tests were performed by pooling variants in three different variant sets (called masks): “pLoF” which included loss of functions as defined by high impact variants in the Ensembl database20 (i.e. transcript ablation, splice acceptor variant, splice donor variant, stop gained, frameshift variant, stop lost, start lost, transcript amplification), “coding5” which included all variants in pLoF as well as moderate impact indels and any missense variants that was predicted to be deleterious based on all of the *in-silico* pathogenicity prediction scores used, and “coding1” which included all variants in coding5 as well as all missense variants that were predicted to be deleterious in at least one of the *in-silico* pathogenicity prediction scores used. For *in-silico* prediction, we used the following five tools: SIFT47, LRT48, MutationTaster49, PolyPhen250 with the HDIV database, and PolyPhen2 with the HVAR database. Protein coding variants were collapsed on canonical gene transcripts. Masks pLoF, coding5, and coding1 are equivalent to masks M1, M3, and M4, respectively, from the recent UK Biobank whole-exome sequencing paper by Backman *et al*.19 and Kosmicki *et al*.17. Once variants were collapsed into genes in each participant, for each mask, genes were given a score of 0 if the participant had no variants in the mask, a score of 1 if the participant had one or more heterozygous variant in this mask, and a score of 2 if the participant had one or more homozygous variant in this mask. These scores were used as regressors in logistic regression models for the three COVID-19 outcomes above. These regressions were also adjusted for age, age*age, sex, age*sex, age\*age\*sex, 10 genetic principal components obtained from common genetic variants (MAF>1%), and 20 genetic principal components obtained from rare genetic variants (MAF<1%). The Regenie software24 was used to perform all burden tests, and generate the scores above. Regenie uses Firth penalized likelihood to adjust for rare or unbalanced events, providing unbiased effect estimates. All analyses were performed separately for each of 6 genetic ancestries (African, Admixed American, East Asian, European, Middle Eastern, and South Asian). Summary statistics were then meta-analyzed using a fixed effect model within each ancestry and using a DerSimonian-Laird random effect model across ancestries with the Metal package51 and its random effect extension52. Participant assignment to genetic ancestry was done locally by each cohort, more details on the methods can be found in the **Supp. Table 1**. Lastly, we used ACAT34 to meta-analyze p-values across masks, within each phenotype separately. ACAT is not affected by lack of independence between tests. These values were used to draw Manhattan and QQ plots in **Figure 2**. ### Single variant association tests We performed single variant association tests using a GWAS additive model framework. We used the same COVID-19 outcomes and covariates as above, except for the addition of the 20 rare genetic variant principal components. Once again, each cohort performed their analyses separately for each genetic ancestry, but also restricted their variants to those with MAF>0.01% and MAC>6. Summary statistics were meta-analyzed as above. Lastly, given that multiple technologies were used for sequencing, and that whole-exome sequencing can provide variant calls of worse quality in its off-target regions53, we used the UKB, GHS, and Penn Medicine whole-exome sequencing variants as our “reference panel” for whole-exome sequencing. Hence, only variants reported in at least one of these biobanks were used in the final analyses. ## Supporting information Supplementary Figures [[supplements/273040_file02.docx]](pending:yes) Supplementary Tables [[supplements/273040_file03.xlsx]](pending:yes) ## Data Availability The exome-wide burden test summary statistics are available in the Supplements. The single variant association studies summary statistics will be made available openly on the GWAS Catalog. ## Code availability Code guidance is available at [https://github.com/DrGBL/WES.WGS](https://github.com/DrGBL/WES.WGS). ## Data availability The exome-wide burden test summary statistics are available in the Supplements. The single variant association studies summary statistics will be made available openly on the GWAS Catalog54. ## Supplements **Supplementary Table**: Contains **Supp. Tables 1-9**. **Supplementary Figures**: contains single variant exome-wide association studies Manhattan plots. ## Author contributions Conceptualization and methodology: GBL, GP, JK, ETC, TD, SF, CS, ASchmidt, PO, MQ, EÇ, KK, KV, TN, MSA, HZ, SO, ECS, KUL, HM, DBG, KK, AR, MF, JBR Formal analyses: GBL, GP, JK, ETC, TD, SF, CS, ASchmidt, PO, MQ, EÇ, KK, KW, JJ, ADS, LGS, BP, TC, TDP, AStuckey, YK, YO, AR Investigation: all authors. Resources: ETC, KW, AWC, FM, JKB, MB, GB, RJME, VF, DM, MLathrop, VM, MH, RF, MLipcsey, BP, BY, SS, CR, SB, PYB, MS, EK, PL, YK, YO, SM, MSA, HZ, SO, KUL, ECS, OR, MK, HM, SII, AV, DBG, KK, AR, MF, JBR Data curation: GBL, GP, JK, ETC, TD, SF, CS, ASchmidt, PO, MQ, EÇ, KK, KW, JJ, ADS, KUL, LGS, BP, TC, TDP, YK, YO Writing – original draft: GBL Writing – review & editing: all authors. Visualization: GBL Supervision: ETC, KW, AWC, FM, JKB, MH, VM, MH, RF, MLipcsey, BP, BY, SS, CR, SB, PYB, MS, EK, PL, YK, YO, SM, MSA, HZ, SO, KUL, ECS, OR, MK, HM, SII, AV, DBG, KK, AR, MF, JBR Project administration: GBL, FF, FM, AG, DBG, KK, AR MF, JBR ## Competing interests See **Supp. Table 2**. **Materials & Correspondence** J Brent Richards is the corresponding author (brent.richards{at}mcgill.ca). ## Acknowledgement We thank the patients who volunteered to all participating cohorts, and the researchers and clinicians who enrolled them into the respective studies. A full list of acknowledgements can be found in **Supp. Table 2**. ## Footnotes * **Funding:** See **Suppl. Tables 2-4.** * **Disclosures:** See **Supp. Tables 2-4.** * The author list was updated. William Lee was added as a co-author, and Urszula Korotko was moved up the authors list. No additional changes were made. * Received March 28, 2022. * Revision received March 31, 2022. * Accepted March 31, 2022. * © 2022, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution-NonCommercial-NoDerivs 4.0 International), CC BY-NC-ND 4.0, as described at [http://creativecommons.org/licenses/by-nc-nd/4.0/](http://creativecommons.org/licenses/by-nc-nd/4.0/) ## References 1. Cai Y, Kwek S, Tang SSL, et al. Impact of the COVID-19 pandemic on a tertiary care public hospital in Singapore: Resources and economic costs. J Hosp Infect 2021; published online Dec 14. DOI:10.1016/j.jhin.2021.12.007. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.jhin.2021.12.007&link_type=DOI) 2. Mulholland RH, Wood R, Stagg HR, et al. Impact of COVID-19 on accident and emergency attendances and emergency and planned hospital admissions in Scotland: an interrupted time-series analysis. J R Soc Med 2020; 113: 444–53. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1177/0141076820962447&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=33012218&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F03%2F31%2F2022.03.28.22273040.atom) 3. Huang C, Wang Y, Li X, et al. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet 2020; 395: 497–506. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0140-6736(20)30183-5&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F03%2F31%2F2022.03.28.22273040.atom) 4. Nakanishi T, Pigazzini S, Degenhardt F, et al. Age-dependent impact of the major common genetic risk factor for COVID-19 on severity and mortality. J Clin Invest 2021; 131: e152386. 5. Zhou S, Butler-Laporte G, Nakanishi T, et al. A Neanderthal OAS1 isoform protects individuals of European ancestry against COVID-19 susceptibility and severity. Nat Med 2021; 27: 659–67. 6. Gaziano L, Giambartolomei C, Pereira AC, et al. Actionable druggable genome-wide Mendelian randomization identifies repurposing opportunities for COVID-19. Nat Med 2021; 27: 668–76. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41591-021-01310-z&link_type=DOI) 7. COVID-19 Host Genetics Initiative. Mapping the human genetic architecture of COVID-19: an update. medRxiv 2021; : 2021.11.08.21265944. 8. COVID-19 Host Genetics Initiative, Niemi MEK, Karjalainen J, et al. Mapping the human genetic architecture of COVID-19. Nature 2021. DOI:10.1038/s41586-021-03767-x. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41586-021-03767-x&link_type=DOI) 9. Ellinghaus D, Degenhardt F, Bujanda L, et al. Genomewide Association Study of Severe Covid-19 with Respiratory Failure. N Engl J Med 2020; 383: 1522–34. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/nejmoa2020283&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F03%2F31%2F2022.03.28.22273040.atom) 10. Pairo-Castineira E, Clohisey S, Klaric L, et al. Genetic mechanisms of critical illness in COVID-19. Nature 2021; 591: 92–8. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41586-020-03065-y&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=33307546&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F03%2F31%2F2022.03.28.22273040.atom) 11. Kousathanas A, Pairo-Castineira E, Rawlik K, et al. Whole genome sequencing reveals host factors underlying critical Covid-19. Nature 2022. DOI:10.1038/s41586-022-04576-6. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41586-022-04576-6&link_type=DOI) 12. Tam V, Patel N, sTurcotte M, Bossé Y, Paré G, Meyre D. Benefits and limitations of genome-wide association studies. Nat Rev Genet 2019; 20: 467–84. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41576-019-0127-1&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=31068683&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F03%2F31%2F2022.03.28.22273040.atom) 13. Taliun D, Harris DN, Kessler MD, et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 2021; 590: 290–9. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41586-021-03205-y&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=33568819&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F03%2F31%2F2022.03.28.22273040.atom) 14. Ganna A, Satterstrom FK, Zekavat SM, et al. Quantifying the Impact of Rare and Ultra-rare Coding Variation across the Phenotypic Spectrum. Am J Hum Genet 2018; 102: 1204–11. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2018.05.002&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=29861106&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F03%2F31%2F2022.03.28.22273040.atom) 15. Qian Z, Paul B, Zhiyong L, et al. Inborn errors of type I IFN immunity in patients with life-threatening COVID-19. Science (80-) 2020; 370: eabd4570. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjE3OiIzNzAvNjUxNS9lYWJkNDU3MCI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIyLzAzLzMxLzIwMjIuMDMuMjguMjIyNzMwNDAuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 16. Povysil G, Butler-Laporte G, Shang N, et al. Rare loss-of-function variants in type I IFN immunity genes are not associated with severe COVID-19. J Clin Invest 2021; 131. DOI:10.1172/JCI147834. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1172/JCI147834&link_type=DOI) 17. Kosmicki JA, Horowitz JE, Banerjee N, et al. Pan-ancestry exome-wide association analyses of COVID-19 outcomes in 586,157 individuals. Am J Hum Genet 2021; 108: 1350–5. 18. Cirulli ET. The Increasing Importance of Gene-Based Analyses. PLOS Genet 2016; 12: e1005852. 19. Backman JD, Li AH, Marcketta A, et al. Exome sequencing and analysis of 454,787 UK Biobank participants. Nature 2021; 599: 628–34. 20. Howe KL, Achuthan P, Allen J, et al. Ensembl 2021. Nucleic Acids Res 2021; 49: D884–91. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/NAR/GKAA942&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F03%2F31%2F2022.03.28.22273040.atom) 21. Karczewski KJ, Francioli LC, Tiao G, et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 2020; 581: 434–43. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41586-020-2308-7&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32461654&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F03%2F31%2F2022.03.28.22273040.atom) 22. Maffucci P, Bigio B, Rapaport F, et al. Blacklisting variants common in private cohorts but not in public databases optimizes human exome analysis. Proc Natl Acad Sci 2019; 116: 950 LP – 959. 23. Mathieson I, McVean G. Differential confounding of rare and common variants in spatially structured populations. Nat Genet 2012; 44: 243–6. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.1074&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22306651&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F03%2F31%2F2022.03.28.22273040.atom) 24. Mbatchou J, Barnard L, Backman J, et al. Computationally efficient whole-genome regression for quantitative and binary traits. Nat Genet 2021; 53: 1097–103. 25. Wang X. Firth logistic regression for rare variant association tests. Front. Genet.. 2014; 5. [https://www.frontiersin.org/article/10.3389/fgene.2014.00187](https://www.frontiersin.org/article/10.3389/fgene.2014.00187). 26. Petes C, Odoardi N, Gee K. The Toll for Trafficking: Toll-Like Receptor 7 Delivery to the Endosome. Front. Immunol.. 2017; 8: 1075. 27. van der Made CI, Simons A, Schuurs-Hoeijmakers J, et al. Presence of Genetic Variants Among Young Men With Severe COVID-19. JAMA 2020; 324: 663–73. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1001/jama.2020.13719&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F03%2F31%2F2022.03.28.22273040.atom) 28. Fallerini C, Daga S, Mantovani S, et al. Association of Toll-like receptor 7 variants with life-threatening COVID-19 disease in males: findings from a nested case-control study. Elife 2021; 10: e67569. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.7554/eLife.67569&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=33650967&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F03%2F31%2F2022.03.28.22273040.atom) 29. Asano T, Boisson B, Onodi F, et al. X-linked recessive TLR7 deficiency in ∼1% of men under 60 years old with life-threatening COVID-19. Sci Immunol 2021; 6. DOI:10.1126/sciimmunol.abl4348. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MTA6ImltbXVub2xvZ3kiO3M6NToicmVzaWQiO3M6MTM6IjYvNjIvZWFibDQzNDgiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMi8wMy8zMS8yMDIyLjAzLjI4LjIyMjczMDQwLmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 30. Mantovani S, Daga S, Fallerini C, et al. Rare variants in Toll-like receptor 7 results in functional impairment and downregulation of cytokine-mediated signaling in COVID-19 patients. Genes Immun 2022; 23: 51–6. 31. Drewes G, Ebneth A, Preuss U, Mandelkow EM, Mandelkow E. MARK, a novel family of protein kinases that phosphorylate microtubule-associated proteins and trigger microtubule disruption. Cell 1997; 89: 297–308. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0092-8674(00)80208-1&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=9108484&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F03%2F31%2F2022.03.28.22273040.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1997WU88800016&link_type=ISI) 32. Gordon DE, Jang GM, Bouhaddou M, et al. A SARS-CoV-2 protein interaction map reveals targets for drug repurposing. Nature 2020; 583: 459–68. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41586-020-2286-9&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F03%2F31%2F2022.03.28.22273040.atom) 33. Schaub JR, Stearns T. The Rilp-like proteins Rilpl1 and Rilpl2 regulate ciliary membrane content. Mol Biol Cell 2013; 24: 453–64. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MTE6Im1vbGJpb2xjZWxsIjtzOjU6InJlc2lkIjtzOjg6IjI0LzQvNDUzIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjIvMDMvMzEvMjAyMi4wMy4yOC4yMjI3MzA0MC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 34. Liu Y, Chen S, Li Z, Morrison AC, Boerwinkle E, Lin X. ACAT: A Fast and Powerful p Value Combination Method for Rare-Variant Analysis in Sequencing Studies. Am J Hum Genet 2019; 104: 410–21. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2019.01.002&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30849328&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F03%2F31%2F2022.03.28.22273040.atom) 35. Ying Y, Hong X, Xu X, et al. Molecular Basis of ABO Variants Including Identification of 16 Novel ABO Subgroup Alleles in Chinese Han Population. Transfus Med Hemotherapy 2020; 47: 160–6. 36. Hult AK, Yazer MH, Jørgensen R, et al. Weak A phenotypes associated with novel ABO alleles carrying the A2-related 1061C deletion and various missense substitutions. Transfusion 2010; 50: 1471–86. 37. Zietz M, Zucker J, Tatonetti NP. Associations between blood type and COVID-19 infection, intubation, and death. Nat Commun 2020; 11: 5761. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41467-020-19623-x&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F03%2F31%2F2022.03.28.22273040.atom) 38. 100,000 Genomes Pilot on Rare-Disease Diagnosis in Health Care — Preliminary Report. N Engl J Med 2021; 385: 1868–80. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJMoa2035790&link_type=DOI) 39. Simpson C, Yamauchi Y. Microtubules in Influenza Virus Entry and Egress. Viruses. 2020; 12. DOI:10.3390/v12010117. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.3390/v12010117&link_type=DOI) 40. Sankararaman S, Obozinski G, Jordan MI, Halperin E. Genomic privacy and limits of individual detection in a pool. Nat Genet 2009; 41: 965–7. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.436&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19701190&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F03%2F31%2F2022.03.28.22273040.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000269382100006&link_type=ISI) 41. Lee S, Emond MJ, Bamshad MJ, et al. Optimal Unified Approach for Rare-Variant Association Testing with Application to Small-Sample Case-Control Whole-Exome Sequencing Studies. Am J Hum Genet 2012; 91: 224–37. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2012.06.007&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22863193&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F03%2F31%2F2022.03.28.22273040.atom) 42. Wainschtein P, Jain D, Zheng Z, et al. Assessing the contribution of rare variants to complex trait heritability from whole-genome sequence data. Nat Genet 2022. DOI:10.1038/s41588-021-00997-7. 43. Williamson EJ, Walker AJ, Bhaskaran K, et al. Factors associated with COVID-19-related death using OpenSAFELY. Nature 2020; 584: 430–6. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41586-020-2521-4&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F03%2F31%2F2022.03.28.22273040.atom) 44. Cirulli ET, White S, Read RW, et al. Genome-wide rare variant analysis for thousands of phenotypes in over 70,000 exomes from two cohorts. Nat Commun 2020; 11: 542. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.3367&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26258848&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F03%2F31%2F2022.03.28.22273040.atom) 45. Mutambudzi M, Niedzwiedz C, Macdonald EB, et al. Occupation and risk of severe COVID-19: prospective cohort study of 120 075 UK Biobank participants. Occup Environ Med 2021; 78: 307 LP – 314. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1136/oemed-2020-106731&link_type=DOI) 46. Hail Team. Hail 0.2. 2021. [https://github.com/hail-is/hail](https://github.com/hail-is/hail). 47. Vaser R, Adusumalli S, Leng SN, Sikic M, Ng PC. SIFT missense predictions for genomes. Nat Protoc 2016; 11: 1–9. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nprot.2015.123&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26633127&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F03%2F31%2F2022.03.28.22273040.atom) 48. Chun S, Fay JC. Identification of deleterious mutations within three human genomes. Genome Res 2009; 19: 1553–61. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NjoiZ2Vub21lIjtzOjU6InJlc2lkIjtzOjk6IjE5LzkvMTU1MyI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIyLzAzLzMxLzIwMjIuMDMuMjguMjIyNzMwNDAuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 49. Schwarz JM, Rödelsperger C, Schuelke M, Seelow D. MutationTaster evaluates disease-causing potential of sequence alterations. Nat Methods 2010; 7: 575–6. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nmeth0810-575&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20676075&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F03%2F31%2F2022.03.28.22273040.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000280500000014&link_type=ISI) 50. Adzhubei I, Jordan DM, Sunyaev SR. Predicting Functional Effect of Human Missense Mutations Using PolyPhen-2. Curr Protoc Hum Genet 2013; 76: 7.20.1-7.20.41. 51. Willer CJ, Li Y, Abecasis GR. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 2010; 26: 2190–1. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btq340&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20616382&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F03%2F31%2F2022.03.28.22273040.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000281738900017&link_type=ISI) 52. Hemani G. random-metal. GitHub Repos. 2017. [https://github.com/explodecomputer/random-metal](https://github.com/explodecomputer/random-metal) (accessed March 15, 2022). 53. Guo Y, Long J, He J, et al. Exome sequencing generates high quality data in non-target regions. BMC Genomics 2012; 13: 194. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/1471-2164-13-194&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22607156&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F03%2F31%2F2022.03.28.22273040.atom) 54. Buniello A, Macarthur JAL, Cerezo M, et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res 2019. DOI:10.1093/nar/gky1120. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/nar/gky1120&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30445434&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F03%2F31%2F2022.03.28.22273040.atom)