Abstract
Genome-wide association studies (GWAS) of coronavirus disease 2019 (COVID-19) and idiopathic pulmonary fibrosis (IPF) have identified genetic loci associated with both traits, suggesting possible shared biological mechanisms. Using updated GWAS of COVID-19 and IPF, we evaluated the genetic overlap between these two diseases and identified four genetic loci (including one novel) with likely shared causal variants between severe COVID-19 and IPF. Although there was a positive genetic correlation between COVID-19 and IPF, two of these four shared genetic loci had an opposite direction of effect. IPF-associated genetic variants related to telomere dysfunction and spindle assembly showed no association with COVID-19 phenotypes. Together, these results suggest there are both shared and distinct biological processes driving IPF and severe COVID-19 phenotypes.
Introduction
Coronavirus disease 2019 (COVID-19) is an infectious disease potentially leading to long lasting respiratory symptoms and has resulted in over 4 million deaths worldwide. Idiopathic pulmonary fibrosis (IPF) is a chronic interstitial lung disease (ILD) characterised by an aberrant response to alveolar injury leading to progressive scarring of the lungs. Individuals with ILD are at a higher risk of death from COVID-191.
Large genome-wide association studies (GWAS) have identified multiple genetic signals associated with severe COVID-192, including a signal within the DPP9 gene that is also associated with increased IPF risk3. GWAS have identified 20 genome-wide significant signals of association with IPF risk4,5 with the largest genetic risk factor being a common variant located in the promoter region of MUC5B (rs35705950, odds ratio>4). Previous analyses suggest IPF is a causal risk factor for severe COVID-19 but noted that the effect of rs35705950 was in the opposite direction (i.e. the allele associated with increased risk of IPF was protective for severe COVID-19)6.
We aimed to further explore the shared genetic architecture and identify novel shared genetic loci between the two diseases, using new enlarged GWAS of IPF and COVID-19 risk.
Methods
Data
We used the largest GWAS of IPF risk which consisted of unrelated European individuals from across five studies5. Cases were selected from centres in the USA, UK and Spain diagnosed using American Thoracic Society and European Respiratory Society guidelines.
For COVID-19, the summary statistics from version 6 of the COVID-19 Host Genetics Initiative (HGI_v6, https://www.COVID-19hg.org/results/r6/) were used. This analysis considered four different COVID-19 phenotypes according to the severity of the disease and the controls used; A2) Very severe respiratory confirmed COVID-19 vs. population, B1) Hospitalised COVID-19 vs. not hospitalised COVID-19, B2) Hospitalised COVID-19 vs. population and, C2) COVID-19 vs. population.
Genetic Overlap
Genome-wide genetic correlation analyses were performed using LD Score Regression. Twenty previously reported IPF genetic association signals4,5 were investigated for their association in the four COVID-19 GWAS, and 26 variants reaching genome-wide significance in the COVID-19 GWAS were tested for their association with IPF (proxy variants – r2>0.8 in European population – were investigated if the top associated variant was not included). Genetic loci showing an association with both traits (after Bonferroni correction for multiple testing) were investigated to determine whether the same causal variant was driving both the IPF and COVID-19 associations using coloc7. Regions with a posterior probability>80% of there being a shared causal variant (assuming up to one causal variant for each trait in the region and that variant has been measured) were deemed to have colocalised.
Identification of putative causal genes and pleiotropic effects
We investigated whether shared genetic signals were associated with gene expression in lung tissue (GTEx_lung8, n=515) and whole blood (eQTLGen9, n=31,684). Where the variant met a false discovery rate of 5%, colocalisation analyses were performed using coloc and deemed to be linked to gene expression if the posterior probability of a shared causal variant was greater than 80%.
We performed a phenome-wide association study (PheWAS) to identify if the overlapping genetic signals had been previously reported for association with other traits (p<10−5) using publicly available resources (PhenoScanner_v2, GWAS Catalog and Open Targets). Colocalisation analyses were performed to determine if the same causal variant was driving both traits. For overlapping signals for traits of relevance, we additionally assessed gene expression in all GTEx tissues, as described above.
Results
There was a significant weak positive genome-wide correlation between IPF and severe COVID-19 (A2 r2=0.274 p=0.0045, B1 r2=0.279 p=0.0093, and B2 r2=0.261 p=0.0005) but not with COVID-19 infection (C2 r2=0.066 p=0.433).
Four genetic association signals showed evidence of a shared causal variant between IPF and at least one COVID-19 phenotype (posterior probability>80%), namely loci at 7q22.1, near MUC5B, near ATP11A and near DPP9 (Table 1a). The 7q22.1 locus has not previously been reported for association with COVID-19. Three additional IPF genetic signals (at 17q21.31, DSP and DEPTOR) showed an association with COVID-19 but did not colocalise, suggesting there are different causal variants between the two traits at these loci. Visual inspection of the 17q21.31 locus revealed extended linkage disequilibrium (due to the presence of a large inversion) meaning colocalisation analyses could not determine whether there were shared or distinct causal variants.
Three of the four shared signals colocalised with expression of the single nearest gene in blood or lung (MUC5B, ATP11A and DPP9) (Table 1b). The IPF and COVID-19 risk increasing alleles at the 7q22.1 signal colocalised with decreased expression of ZKSCAN1 and TRIM4 in blood.
The signal on chromosome 7 was associated with a number of blood traits and the signal near ATP11A was associated with blood traits and HbA1c (average blood glucose levels, used in diagnosing diabetes) (Table 1b). The IPF and HbA1c signals did not colocalise, however, as diabetes is a risk factor for COVID-1910, we further investigated the effects on gene expression for this signal. The allele (rs423117_T) associated with higher Hb1AC levels was associated with increased ATP11A expression in liver and decreased expression in cultured fibroblasts, but there was no association with ATP11A expression in blood.
Discussion
Genetic association signals near MUC5B, DPP9 and ATP11A have previously been reported for both COVID-19 severity and IPF risk; we show for the first time that these signals are likely due to the same underlying causal variant. In addition, we report a novel overlapping signal at 7q22.1 implicating TRIM4 and ZKSCAN1.
Despite a positive genome-wide genetic correlation between IPF risk and the COVID-19 severity phenotypes (A2, B1 and B2), we show that two of the four shared signals (at MUC5B and ATP11A) have opposite directions of effect on risk for the two diseases. The allele associated with increased risk of IPF and increased ATP11A expression in blood (rs9577395_C) was associated with decreased risk of severe COVID-19.Our PheWAS highlighted a potential link with HbA1c and diabetes risk at this locus via ATP11A expression, although effects were tissue dependent.
The rs2897075_T allele at 7q22.1, associated with increased IPF and COVID-19 risk, was linked to decreased TRIM4 and ZKSCAN1 expression. TRIM4 is an important regulator of virus-induced interferon induction pathways and a proteomic study identified significant adjacency between SARS-CoV-2 M protein and TRIM411. Viral infection-induced micro-injury to the alveolar epithelium is thought to be a trigger for development of IPF12, suggesting the interferon-mediated innate immune response could be central to both risk of chronic lung disease and worse outcomes due to SARS-CoV-2 infection.
Loci previously implicated by IPF GWAS relating to telomere dysfunction (TERT, TERC, RTEL1) and mitotic spindle assembly (KIF15, MAD1L1, SPDL1, KNL1) were not associated with COVID-19.
The colocalisation analyses assume a single measured causal variant. Although conditional analyses found no evidence of multiple independent association signals at the regions studied, we cannot guarantee all causal variants were measured. Furthermore, we utilised whole blood and lung tissue for gene expression so we cannot rule out cell-specific effects. Further analyses in non-European populations could help identify other ancestry specific overlapping variants and increase the generalisability of the results.
In conclusion, using the largest IPF and COVID-19 GWAS to date, we show there is a positive genetic correlation between IPF and severe COVID-19 risk. However, some IPF-related pathways may have an opposite (e.g. MUC5B and ATP11A pathways) effect on severe COVID-19 risk.
Data Availability
This study used previously published publicly available data obtained from obtained from: https://github.com/genomicsITER/PFgenetics (IPF genome-wide summary statistics), https://www.covid19hg.org/results/r6/ (COVID-19 genome-wide summary statistics), https://gtexportal.org/home/datasets (lung eQTL data) and https://www.eqtlgen.org/index.html (blood eQTL data).
Funding and conflicts of interest
R Allen is an Action for Pulmonary Fibrosis Mike Bray Research Fellow. B Guillen-Guio is supported by Wellcome Trust grant 221680/Z/20/Z. G Jenkins is a trustee of Action for Pulmonary Fibrosis and reports personal fees from Astra Zeneca, Biogen, Boehringer Ingelheim, Bristol Myers Squibb, Chiesi, Daewoong, Galapagos, Galecto, GlaxoSmithKline, Heptares, NuMedii, PatientMPower, Pliant, Promedior, Redx, Resolution Therapeutics, Roche, Veracyte and Vicore. L Wain holds a GSK/British Lung Foundation Chair in Respiratory Research (C17-1) and reports research funding from GSK and Orion and consultancy for Galapagos, outside of the submitted work. The research was partially supported by the National Institute for Health Research (NIHR) Leicester Biomedical Research Centre; the views expressed are those of the author(s) and not necessarily those of the National Health Service (NHS), the NIHR or the Department of Health. This research used the SPECTRE High Performance Computing Facility at the University of Leicester.