ABSTRACT
Clinical and molecular characterization by Whole Exome Sequencing (WES) is reported in 35 COVID-19 patients attending the University Hospital in Siena, Italy, from April 7 to May 7, 2020. Eighty percent of patients required respiratory assistance, half of them being on mechanical ventilation. Fiftyone percent had hepatic involvement and hyposmia was ascertained in 3 patients. Searching for common genes by collapsing methods against 150 WES of controls of the Italian population failed to give straightforward statistically significant results with the exception of two genes. This result is not unexpected since we are facing the most challenging common disorder triggered by environmental factors with a strong underlying heritability (50%). The lesson learned from Autism-Spectrum-Disorders prompted us to re-analyse the cohort treating each patient as an independent case, following a Mendelian-like model. We identified for each patient an average of 2.5 pathogenic mutations involved in virus infection susceptibility and pinpointing to one or more rare disorder(s). To our knowledge, this is the first report on WES and COVID-19. Our results suggest a combined model for COVID-19 susceptibility with a number of common susceptibility genes which represent the favorite background in which additional host private mutations may determine disease progression.
INTRODUCTION
Italy has been the first European Country experiencing the epidemic wave of SARS-CoV-2 infection, with an apparently more severe clinical picture, compared to other countries. Indeed, the case fatality rate has peaked to 14% in Italy, while it remains stable around 5% in China.
(https://www.who.int/docs/default-source/coronaviruse/situation-reports/20200512-covid-19-sitrep-113.pdf?sfvrsn=feac3b6d_2). Currently, SARS-CoV-2 positive subjects in Italy have reached the threshold of 200.000 cases, and we are now experiencing the long awaited descent of viral spread. Since the beginning of the epidemic wave, one of the first observations has been a highly heterogeneous phenotypic response to SARS-CoV-2 infection among individuals. Indeed, while most affected subjects show mild symptoms, a subset of patients develops severe pneumonia requiring mechanical ventilation with a 20% of cases requiring hospitalization; 5% of cases admitted to the Intensive Care Unit (ICU), and ~2.5% requiring intensive support with ventilators or extracorporeal oxygenation (ECMO) machines [1]. Although patients undergoing ventilatory assistance are often older and are affected by other diseases, like diabetes [2], the existing comorbidities alone do not fully explain the differences in clinical severity. A reasonable hypothesis is that at the basis of these different outcomes there are host predisposing genetic factors leading to different immunogenicity/cytokine responses as well as specific receptor permissiveness to virus and antiviral defence [3-6]. Similarly, during the study of host genetics in influenza disease, a pattern of genetic markers has been identified which underlies increased susceptibility to a more severe clinical outcome (as reviewed in [7]). This hypothesis is also supported by a recent work reporting 50% heritability of COVID-19 symptoms [8].
The identification of host genetic variants associated with disease severity is of utmost importance to develop both effective treatments, based on a personalized approach, and novel diagnostics. Also, it is expected to be of high relevance in providing guidance for the health care systems and societal organizations. However, nowadays, little is known about the impact of host genome variability on COVID-19 susceptibility and severity.
On March 16th, 2020 the University Hospital in Siena launched a study named GEN-COVID with the aim to collect the genomic DNA of 2,000 COVID-19 patients for host genetic analysis. More than 30 different hospitals and community centers throughout Italy joined the study and are providing samples and clinical detailed information of COVID-19 patients. This study is aimed to identify common and rare genetic variants of SARS-CoV-2 infected individuals, using a whole exome sequencing (WES) analysis approach, in order to establish an association between host genetic variants and COVID-19 severity and prognosis.
RESULTS
Clinical data
The cohort consists of 35 COVID-19 patients (33 unrelated and 2 sisters) admitted to the University Hospital in Siena, Italy, from April 7 to May 7, 2020. All patients are of Caucasian ethnicity, except for one North African and one Hispanic. The mean and median age is 64 years (range 31-98): 11 females (median age 66 years) and 24 males (median age 62 years).
The population is clustered into four qualitative severity groups depending on the respiratory impairment and the need for ventilation: high care intensity group (those requiring invasive ventilation), intermediate care intensity group (those requiring non invasive ventilation i.e. CPAP and BiPAP, and high-flows oxygen therapy), low care intensity group (those requiring conventional oxygen therapy) and very low care intensity group (those not requiring oxygen therapy) (groups 1-4 in Table 1 and different colors in Fig. 1). In the two most severe groups (groups 1 and 2, including 13 patients) there are 11 males and 2 females, while in the two mildest groups (groups 3 and 4 including 22 patients) males are 13 while females are 9.
Patients were also assigned a lung imaging grading according to X-Rays and CT scans. The mean value is 13 for high care intensity group, 12 for intermediate care intensity group, 8 for low care intensity group and 5 for very low care intensity group.
Regarding immunological findings, a decrease in the total number of peripheral CD4+ T cells were identified in 13 subjects, while NK cells’ count was impaired in 10 patients. Six patients showed a reduction of both parameters. IL-6 serum level was elevated in 13 patients. Based on blood groups, the cohort is divided into 15 patients of group 0, 16 patients of group A, 4 patients of group B and none of group AB.
Hyposmia was present in 3 out of 34 evaluated cases (8.8%), and hypogeusia was present in the same subjects plus another case. These four cases belong to the first three severity groups. Liver involvement was present in 7 cases (20%), while pancreas involvement in 4 cases (11%); 10 patients presented both (29%). Heart involvement was detected in 13 cases (37%). 9 patients (25%) showed kidney involvement. Fibrinogen values below 200 mg/dL were identified in 2 cases (6%), between 200 and 400 mg/dL in 7 cases (20%), and above 400mg/dL in 22 cases (63%). D-dimer value below 500 ng/mL was present in 1 case (3%), between 500 and 5000 ng/mL in 26 cases (74%), and in 7 cases (20%) was 10 times higher than the normal value (>5000 ng/mL) (Table 1).
Unbiased collapsing gene analysis
At first, we tested the hypothesis that susceptibility could be due to one or more common factor(s) in the cohort of patients compared to controls. According to this idea, damaging variants of that/those gene(s) should be either over- or under- represented in patients vs controls. We used, as controls, individuals of the Italian population assuming that the majority of them, if infected, would have shown no severe symptoms. WES data of 35 patients were compared with those of 150 controls (corresponding to a sub-group of the Siena part of the Network of Italian Genomes NIG http://www.nig.cineca.it) using a gene burden test which compares the rate of disrupting mutations per gene. The variants were collapsed on a gene-by-gene basis, in order to identify genes with mutational burden statistically different between COVID-19 samples and controls. The analysis identified genes harboring deleterious mutations (according to the DANN score) with a statistically significant higher frequency in controls than in COVID-19 patients such as the olfactory receptor gene OR4C5 (adjusted p-value of 1.5E-10) and the zinc finger transcription factor ZNF717, which regulates IL6 (adjusted p-value 9.2E-06, respectively) (Fig. 2 and Supplementary Table S1) and FAM104B and NDUFAF7, although to a lesser extent (Fig. 2 and Supplementary Table S1). For all these genes, the susceptibility factor is represented by the functioning (or more functioning) gene. We also identified two additional genes, PRKRA and LAPTM4B, for which the probability of observing a deleterious variant was computed higher in the COVID-19 samples compared to controls (Fig. 2 and Supplementary Table S2). In these latter cases, the functioning gene represents indeed a protective factor.
Gene analysis using the Mendelian-like model
We then tested the hypothesis that COVID-19 susceptibility is due to different variants in different individuals. A recently acquired knowledge on the genetic bases of Autism Spectrum Disorders suggests that a common disorder could be the sum of many different rare disorders and this genetic landscape can appear indistinguishable at the clinical level [9]. Therefore, we analyzed our cohort treating each patient as an independent case, following a Mendelian-like model. According to the “pathogenic” definition in ClinVar database (https://www.ncbi.nlm.nih.gov/clinvar/), for each patient, we identified an average of 1 mutated gene involved in viral infection susceptibility and pinpointing to one or more rare disorder(s) or a carrier status of rare disorders (Fig. 1). Following the pipeline used in routine clinical practice for WES analysis in rare disorders we then moved forward checking for rare variants “predicted’’ to be relevant for infection by the means of common annotation tools. We thus identified an average of additional 1-5 variants per patient which summed up to the previous identified pathogenic variants (Fig. 1, Supplementary Table S3).
Known common susceptibility/protective variants analysis
We then checked the cohort for known non rare variants classified as either “pathogenic” or “protective” in ClinVar database and related to viral infection. Variants in six different genes matched the term of “viral infection” and “pathogenic” according to ClinVar (Fig. 1). Overall, a mean of 3 genes with “pathogenic” common variants involved in viral infection susceptibility were present (Fig. 1).
Among the common protective variants, we list as example three variants which confer protection to Human Immunodeficiency Virus (HIV), the first two, and leprosy, the third one: a CCR2 variant (rs1799864) identified in 8 patients, a CCR5 (rs1800940) in one patient and a TLR1 variant (rs5743618) in 26 patients (not shown). A IL4R variant (rs1805015) associated with HIV slow progression was present in 8 patients (not shown).
Candidate gene overview
Although not identified by unbiased collapsing gene analysis a number of obvious candidate genes were specifically analyzed. First of all, we noticed that SARS-CoV-2 receptor, ACE2 protein is preserved in the cohort, only a silent mutation V749V being present in 2 males and 2 heterozygous females. This is in line with our previous suggestion that either rare variants or polymorphisms may impact infectivity [10] The IFITM3 polymorphism (rs12252) was found in heterozygosity in 4 patients as expected by frequency. Eight patients had heterozygous missense mutations in CFTR gene reported as VUS/mild variants, 7 / 8 being among the more severely affected patients.
DISCUSSION
In this study, we present a cohort of 35 COVID-19 patients admitted between April and May 2020 to the University Hospital of Siena who were clinically characterized by a team of 29 MDs belonging to 7 different specialties. As expected, the majority of hospitalized patients are males, confirming previously published data reporting a predominance of males among the most severe COVID-19 affected patients [11]. The distribution of blood types in our patients is not statistically different from that of the general population according to a chi-square test with alpha equal to 0.05 (data not shown). Lung imaging involvement, evaluated through a modified lung imaging grading system [12], did not completely correlate with respiratory impairment since among the 13 patients who required mechanical ventilation (group 1 and 2), grading was either moderate (10) or mild (3). In line with our previous data, lymphocyte subset immunophenotyping revealed a decrease in the total number of CD4 and NK cells count, especially in the most severe patients [13]. Laboratory tests revealed a multiple-organ involvement, confirming that COVID-19 is a systemic disease rather than just a lung disorder (Fig. 1). We thus propose that only a detailed clinical characterization can allow to disentangle the complex relationship between genes and signs/symptoms.
In order to test the hypothesis that the COVID-19 susceptibility is due to one or more genes in common among patients, we used the gene burden test to compare the rate of disrupting mutations per gene. This test has already been successfully applied to discover susceptibility genes for Respiratory Syncytial Virus infection [14]. We identified 2 pretty robust genes whose damage represents a protective factor: OR4C5 and ZNF717.
OR4C5 is a “resurrected” pseudogene, known to be non functioning in half of the European population. In our genome, in addition to 413 Olfactory Receptor (OR) loci, there are 244 segregating pseudogenes, 26 of which are “resurrected” from a pseudogene status. The OR4C5 locus belongs to this subgroup and it has the highest intrapopulation variability, with a frequency of inactive allele of 0.62 in Asians, 0.48 in Europeans and 0.16 in Africans [15]. The locus has 68 different haplotypes and it presents an extensive range of copy-numbers across individuals [15,16]. Our data demonstrate an inverse correlation (p-value 2.13911369444695E-14 and Adj p-value 1.53930621452403E-10) between the OR4C5 mutation burden and COVID-19 status suggesting that “resurrected” functional alleles of OR4C5 are susceptibility alleles for COVID-19 infection.
Each sensory cell expresses a single allele of a single OR locus, thus transmitting a molecularly defined signal to the brain. It is known that the olfactory bulb is a critical immunosensory effector organ that effectively clears viruses. After the infection, the neuroepithelium triggers nitric oxide (NO) and major histocompatibility antigens (MHC) activation which starts innate immunity [17]. The entry of the virus into neuroepithelial cells induces the activation of microglia/macrophages which are involved in the phagocytic process, clinically corresponding to Hypo(an)osmia. This response is essential during the initial stages of infection and blocks the virus access to the CNS [18].
Expression of the “resurrected” pseudogene OR4C5 may help in triggering the natural immunity leading to virus and cell death. It is interesting to note that protein atlas shows OR4C5 protein expression in the liver without the corresponding mRNA expression (www.proteinatlas.org). Usually, dissociation between protein and mRNA means that the protein is produced elsewhere and then transported into the organ. This is the case, for instance, for neurotransmitters that are synthetized in neuronal cell bodies located in abdominal ganglia and are subsequently transported to the liver through the axons innervating the organ [19]. Thus, we may speculate that OR4C5 reaches the liver through nerve terminals. If this is the case, those individuals expressing the resurrected OR4C5 gene may have more triggers of innate immunity and subsequently higher organ damage. It is noteworthy that in our cohort the putative expression of OR4C5 (white boxes) is identified in patients with liver damage (Fig. 1).
Previous studies reported a prevalence of olfactory disorders in COVID-19 population ranging from 5% to 98%. A recent meta-analysis of 10 studies demonstrated a 52.73% prevalence for smell dysfunction in COVID-19 subjects [20]. In our population, only 3/35 (8.6%) subjects reported olfactory disorders. Both the limited sample size and the characteristic of the population (severely affected hospitalized subjects) could explain this result. However, a report focusing on smell dysfunction in severely affected hospitalized subjects reported a prevalence of 23.7% among 59 patients [21].
Kruppel-associated box zinc-finger protein 717 (ZNF717) belongs to a large group of transcriptional regulators playing important roles in different cellular processes, including cell proliferation, differentiation and apoptosis, and in the regulation of viral replication and transcription. ZNF717 variations were detected in more than 10% of the WGS samples in Hepatitis B Virus (HBV)-related hepatocellular carcinoma (HCC) [22] where it has been identified as a potential driver gene with high frequency mutations at both single-cell and population levels. Moreover, this gene is one of the most recurrent somatically mutated genes in gastric cancer, together with TP53 [23]. From a functional standpoint it acts through the regulation of the IL-6/STAT3 pathway. ZNF717 knockdown in HCC cell lines results in increased levels of IL-6 and upregulation of STAT3 and its target genes [24]. According to our results those people who have this gene more damaged are more protected from SARS-CoV-2 infection, likely because they have a smarter innate immunity.
PRKRA (protein kinase activator A, also known as PACT; OMIM# *603424) is a protein kinase activated by double-stranded RNA which mediates the effects of interferon in response to viral infection, resulting in antiviral activity [25]. Multiple transcript variants have been identified, including a polymorphism removing the canonical ATG. Mutations in the PRKRA gene have been associated with autosomal recessive dystonia [26]. The innate immune response is activated by the detection of viral structures, potentially via dsRNA-binding partners such as PRKRA [27]. In line with PRKRA antiviral activity, in our cohort we found a higher burden of potentially deleterious variants in COVID-19 patients, suggesting that these variants might reduce PRKRA functionality, potentially impairing IFN-mediated immune response.
Fourteen percent of patients bear highly disrupting mutations in LAPTM4B (Lysosomal Protein Transmembrane 4 Beta) gene, which was selected using burden gene test (P-val 4.04625E-06 and adj P-val 0.02911682). LAPTM4B protein is involved in the endosomal network, which eventually enables productive viral infection [28]. In particular, in HBV infection, endocytosis of EGF Receptor (EGFR) drives the translocation of HBV particles from the cell surface to the endosomal network, enabling productive viral infection [28]. In this process, LAPTM4B down regulates the formation of late lysosomes, suppressing EGFR lysosomal degradation and thus leading to a prolonged permanence of EGFR on the cell surface [29]. Accordingly, LAPTM4B knockdown significantly promotes HBV infection [28]. This perfectly fits with our results, indicating a significantly increased probability of deleterious changes in COVID-19 patients compared to controls.
Being affected by a rare disorder and/or being a carrier of rare disorders may represent a susceptibility factor to infections (Fig. 1). Having this in mind and driven by the lesson learned from the studies on the genetics bases of Autism Spectrum Disorders, we explored the possibility that each patient could have one or a unique (personalized approach) combination of rare pathogenic or highly relevant variants related for different reasons to infection susceptibility [9]. For instance, one male patient is affected by Glucose-6-phosphate dehydrogenase (G6PD)-deficiency (rs137852318), the most common enzymopathy in humans, affecting 400 million people worldwide. G6PD-deficient cells are more susceptible to several viruses including coronavirus and in G6PD-deficient cells, innate immunity is down regulated, in line with the observed very low levels of IL-6 in this patient (Fig. 1) [30]. The same patient also presents a ZEB1-linked corneal dystrophy. ZEB1 gene is known to function in immune cells, playing an important role in establishing both the effector response and future immunity in response to pathogens [31].
In addition, two sisters have TGFBI mutations, associated with corneal dystrophy, several patients are carriers of pseudoxanthoma elasticum bearing ABCC6 gene mutations, others have likely hypomorphic mutations in CHD7 or COL5A1/2 variants. All these genes play a role as modulators of immune cells activity and/or response to infections [32-39].
Other rare variants were identified in the following interesting genes: ADAR, involved in viral RNA editing; CLEC4M, an alternative receptor for SARS-CoV [40]; HCRTR1/2, receptors of Hypocretin, important in the regulation of fatigue during infections [41]; FURIN, a serine protease that cleaves the SARS-Cov-2 minor capsid protein important for ACE2 contact and viral entry into the host cells [42,43].
Additional interesting variants have been identified in NOS3 and OPRM1. COVID-19 aggravates NitricOxide (NO) production deficit in patients with NOS3 polymorphisms. Management of eNOS/iNOS ratio (endothelial/inducible NO synthase) and NO level can prevent development of severe acute respiratory distress syndrome [44]. From an immunological point of view, NO is mainly produced through the iNOS, which can be selectively expressed both in epithelial and white blood cells. It is postulated that NO plays a crucial role in innate and specific host defense, particularly against protozoa and bacteria. However, its role in viral infection is debated, as conflicting results have been reported in literature. Even though iNOS is generally overexpressed in patients with active viral infection, experiments conducted in murine models show that the inhibition of iNOS leads to a significant improvement of HSV-viral pneumonia, despite the impairment of viral clearance [45]. Moreover, NO production may facilitate virus mutations and selection of more resistant strains. However, it seems that NO may have different effects according to causative agents. Notably, few studies demonstrated that NO was able to significantly reduce viral infection and replication of SARS-CoV through two distinct mechanisms: impairment of the fusion between the spike protein and its receptor ACE2, and reduction of viral RNA production [46]. Very few data are currently available on NO specific effects in COVID-19. The promising results reported in SARS-CoV infection may suggest a similar effectiveness also in COVID-19, considering that SARS-CoV and SARS-CoV2 share more than 70% of RNA sequence and have shown similar mechanisms of infection and viral replication. Clinical trials are ongoing to evaluate the effectiveness of inhaled NO in COVID-19 patients: although inhaled NO is conceived as a vasodilator therapy and therefore is indicated for the optimization of ventilation/perfusion mismatch in intubated patients, the results may be helpful to elucidate its potential antiviral properties [47,48].
Opioid ligands may regulate the expression of chemokines and chemokine receptors [49]. Due to immunomodulatory effect of morphine, OPRM1 has been supposed to be involved in immune response and in HIV expansion [49,50]. Proudnikov et al. suggested that Opioid receptor OPRM1 variants may affect the pathophysiology of HIV infection in the response to HIV treatment. They found an association with clinical improvement in subjects bearing alleles IVS1+1959A, IVS1+14123A, and IVS2+31A. Association of the same variants with HIV status was also reported [50].
Several rare variants in Interleukins (ILs) and Interleukins receptors (ILRs) are found. Interleukins are crucial in modulating immune response against all types of infective agents. The variants reported in this study include different interleukins that are not specifically involved in the defense against virus but are critical in balancing both innate and specific adaptive immune response. IL-2 and IL-15 are essential for proliferation and activation of CD8+ effector T-cells, while IL-13 is fundamental, together with IL-4, in inducing and maintaining a type-2 inflammation in airways, causing mucus hypersecretion and favouring potential fibrogenetic processes of the bronchial wall. On the other hand, IL-12 promotes Th1 polarization of inflammatory processes in airways and has an important role in the defense against intracellular pathogens and mycobacteria. Evidence concerning the role of IL-12 against viruses are scarce and conflicting: however, IL-12 is reported also to enhance NK cell cytotoxic properties and therefore should be helpful in innate response against viral infection. IL16 is a pleiotropic cytokine chemotactic for T cells and able to modulate lymphocyte activation. IL-16 in HIV infection seems to inhibit virus replication [51]. On the other hand, in mice infected with influenza A virus, an IL-16 deficiency was associated with increased Th1 and cytotoxic lymphocyte responses.
We also identified common “pathogenic” variants in genes known to be linked to viral infection, such as MBL2, IRGM and SAA1, and/or specific organ damage as PRSS1.
PRSS1 encodes for a serine protease secreted from the pancreas. Mutations in this gene, including those identified in our cohort, are associated with autosomal dominant hereditary pancreatitis (OMIM#167800) [52]. MBL2 encodes a mannose-binding lectin (MBL) secreted by the liver as part of the acute-phase response and involved in innate immune defense. Its deficiency causes increased susceptibility to infections, possibly due to a negative impact on the ability to mount an immune response [53,54]. MBL2 deficiency has also been suggested to be a susceptibility factor for vascular disease [55]. IRGM plays a role in autophagy and control of intracellular mycobacteria [56]. Autophagy plays relevant roles in many processes, including innate and adaptive immunity and antimicrobial defense. Accordingly, alterations in IRGM regulation affect the efficacy of autophagy and are considered relevant contributing factors in chronic inflammatory diseases, including Inflammatory Bowel Disease (IBD; OMIM#612278) [57]. SAA1, encoding the serum amyloid A (SAA) protein, is an apolipoprotein reactant, mainly produced by hepatocytes and regulated from inflammatory cytokines. In patients with chronic inflammatory diseases, the SAA cleavage product, Amyloid protein A (AA), is deposited systemically in vital organs including liver, spleen and kidneys, causing amyloidosis [58].
For the last above reported genes and pathogenic variants or predicted variants relevant for infection, a statistically significant difference in variant’s frequency was not found between cases and controls looking at either the single variant or the single gene, as a burden effect of variants. However, as depicted in the overall Fig. 1, we could hypothesize a combined model in which common susceptibility genes will sum to less common or private susceptibility variants. A specific combination of these 2 categories may determine type (organotropism) and severity of the disease.
Our observations related to the huge amount of data, both on phenome and genome sides, and represented in Figure 1, could also lay the bases for association rule mining approaches. Artificial intelligence techniques based on pattern recognition may discover an intelligible picture which appears blurred at present.
Further analyses in larger cohort of cases are mandatory in order to test this hypothesis of a combined model for COVID-19 susceptibility with a number of common susceptibility genes which represent the fertile background in which additional private, rare or low frequency mutations confer to the host the most favourable environment for virus growth and organ damage.
METHODS
Patients clinical data and Samples collection
Thirty-five patients admitted to the University Hospital in Siena, Italy, from April 7 to May 7, 2020 were recruited. The study was consistent with Institutional guidelines and approved by the University Hospital (Azienda Ospedaliera Universitaria Senese) Ethical Review Board, Siena, Italy (Prot n. 16929, dated March 16, 2020). Written informed consent was obtained from all patients. Peripheral blood samples in EDTA-containing tubes and detailed clinical data were collected. All these data were inserted in a section dedicated to COVID-19 of the established and certified Biobank and Registry of the Medical Genetics Unit of the Hospital. An example of the Clinical questionnaire is illustrated in Supplementary Fig. S1. Each patient was assigned a continuous quantitative respiratory score, the PaO2/FiO2 ratio (normal values >300) (P/F), as the worst value during the hospitalization.
Patients were also assigned a lung imaging grading according to X-Rays and CT scans. In particular, lung involvement was scored through imaging at the time of admission and during hospitalization (worst score), annotating the chest X-Ray (CXR) score (in 34 patients) and CT score in 1 patient for whom X-Rays were not available. To obtain the score (from 0 to 28) each CXR was divided in four quadrant (right upper, right lower, left upper and left lower) and for each quadrant the presence of consolidation (0= no consolidation; 1 <50%, 2>50%), ground glass opacities (GGOs: 0= no GGOs, 1<50%, 2 >50%), reticulation (0= no GGOs, 1<50%, 2 >50%) and pleural effusion on left or right side (0= no, 1= minimal; 2= large) were recorded. The same score was applied for CT (1 patient).
For each patient, the presence of hyposmia and hypogeusia was also investigated through otolaryngology examination, Burghart sniffin’ sticks [59] and a visual analog scale (VAS). Whenever the sign was present, a score ranging from 0 to 10 was assigned to each patient using VAS where 0 means the best sense of smell and 10 represents the absence of smell sensation [60].
The presence of hepatic involvement was defined on the basis of a clear hepatic enzymes elevation as glutamic pyruvic transaminase (ALT) and glutamic oxaloacetic transaminase (AST) both higher than 40 UI/L. Pancreatic involvement was considered on the basis of an increase of pancreatic enzymes as pancreatic amylase higher than 53 UI/l and lipase higher than 60UI/l. Heart involvement was defined on the basis of one or more of the following abnormal data: Troponin T (indicative of ischemic disorder), NT-proBNP (indicative of heart failure) and arrhythmias (indicative of elettric disorder). Kidney involvement was defined in the presence of a creatinine value higher than 1,20 mg/dl in males and higher than 1,10 mg/dl in females.
Whole Exome Sequencing analysis
Genomic DNA was extracted from peripheral blood using the MagCore®Genomic DNA Whole Blood kit (RBC Biosciences) according to manufacturer’s protocol. Whole exome sequencing analysis was performed on Illumina NovaSeq 6000 system (Illumina, San Diego, CA, USA). DNA fragments were hybridized and captured by Illumina Exome Panel (Illumina) according to manufacturer’s protocol. The libraries were tested for enrichment by qPCR, and the size distribution and concentration were determined using an Agilent Bioanalyzer 2100 (Agilent Technologies, Santa Clara, CA, USA). The Novaseq 6000 platform (Illumina), along with 150 bp paired-end reads, was used for sequencing of DNA.
Genetic data analysis
Reads were mapped to the hg19 reference genome by the Burrow-Wheeler aligner BWA [61]. Variants calling was performed according to the GATK4 best practice guidelines [62]. Namely, duplicates were first removed by MarkDuplicates, and base qualities were recalibrated using BaseRecalibration and ApplyBQSR. HaplotypeCaller was used to calculate Genomic VCF files for each sample, which were then used for multi-sample calling by GenomicDBImport and GenotypeGVCF. In order to improve the specificity-sensitivity balance, variants quality scores were calculated by VariantRecalibrator and ApplyVQSR. Variants were annotated by ANNOVAR [63], and with the number of articles answering the query “gene_name AND viral infection” in Pubmed, where gene_name is the name of the gene affected by the variant.
In order to identify candidate genes according to the Mendelian-like model, rare variants were filtered by a prioritization approach. We used the ExAC database (http://exac.broadinstitute.org/), in particular the ExAC_NFE reported frequency to filter variants according to a minor allele frequency < 0.01. Synonymous, intronic and non-coding variants were excluded from the analysis. Mutation disease database ClinVar (ncbi.nlm.nih.gov/clinvar/) was used to identify previous pathogenicity classifications and variants reported as likely benign/benign were discarded. Filtering and prioritization of variants was completed using the CADD_Phred pathogenicity prediction tool. Finally, we selected genes involved in infection susceptibility using the term “viral infection” as Pubmed database search.
In order to identify genes with a different prevalence of functionally relevant variants between COVID-19 patients and control samples, the following score was calculated:
Where wi is a weight associated with the i-th variant; and xi,j is equal to 0 if the variant is not present in sample j, 1 if sample j has the variant in heterozygous state, and 2 if sample j has the variant is homozygous state. The weight wi was assumed equal to the DANN score of the variant [64], which provides an estimate of the likelihood that the variant has deleterious functional effects (i.e. variants more likely to have a functional effect contribute more to the score). The sum in equation (1) was performed over all the variants in the gene where the DANN score was available. Genes with less than 5 annotated variants were discarded from the analysis. The scores calculated by equation (1) were ranked for all the samples, and the sum of the ranking for the COVID-19 samples, named rCOVID was calculated. Then, sample labels were permuted 10.000 times, and these permutations were used to estimate the average value and the standard deviation of rCOVID under the null-hypothesis. The p-value was calculated assuming a normal distribution for the sum of the ranking [65].
Data Availability
I declare the availability of all data referred to in the manuscript
DATA AVAILABILITY
Data about the gene-based analyses and variants are available as Supplementary Material. The results of variant calling are available as aggregated data in the Network for Italian Genomes database (http://www.nig.cineca.it). The datasets generated during the current study are available from the corresponding author on reasonable request.
COVID-19 MULTICENTER STUDY (composition at May 22, 2020)
Alessandra Renieri1,2, Elisa Benetti3, Francesca Montagnani3,5, Rossella Tita2, Chiara Fallerini1, Sara Amitrano2, Mirella Bruttini2, Gabriella Doddato1, Annarita Giliberti1, Floriana Valentino1, Susanna Croci1, Laura Di Sarno1, Francesca Fava1,2, Margherita Baldassarri1, Andrea Tommasi1,2, Sergio Daga1, Maria Palmieri1, Arianna Emiliozzi3,5, Massimiliano Fabbiani5, Barbara Rossetti5, Giacomo Zanelli3,5, Laura Bergantini6, Miriana d’Alessandro6, Paolo Cameli6, David Bennett6, Federico Anedda7, Simona Marcantonio7, Sabino Scolletta7, Federico Franchi7, Maria Antonietta Mazzei8, Edoardo Conticini9, Luca Cantarini19, Bruno Frediani20, Danilo Tacconi10, Chiara Spertilli10, Marco Feri11, Alice Donati11, Raffaele Scala12, Luca Guidelli12, Agostino Ognibene13, Genni Spargi14, Marta Corridi14, Cesira Nencioni15, Leonardo Croci15, Gian Piero Caldarelli16, Maurizio Spagnesi17, Paolo Piacentini17, Anna Canaccini18, Agnese Verzuri18, Valentina Anemoli18, Massimo Vaghi23, Antonella D’Arminio Monforte24, Esther Merlini24, Mario Umberto Mondelli25,26, Stefania Mantovani25, Serena Ludovisi26, Massimo Girardis27, Sophie Venturelli27, Andrea Cossarizza28, Andrea Antinori29, Alessandra Vergori29, Stefano Rusconi30,31, Matteo Siano30,31, Arianna Gabrieli31, Daniela Francisci32,33, Elisabetta Schiaroli32, Pier Giorgio Scotton34, Francesca Andretta34, Sandro Panese35, Renzo Scaggiante36, Saverio Giuseppe Parisi37, Francesco Castelli38, Maria Eugenia Quiros Roldan38, Paola Magro38, Cristina Minardi38, Matteo Della Monica39, Carmelo Piscopo39, Mario Capasso40,41,42, Massimo Carella43, Marco Castori43, Giuseppe Merla43, Filippo Aucella44, Pamela Raggi45, Matteo Bassetti46,47, Antonio Di Biagio47, Maurizio Sanguinetti48,49, Luca Masucci48,49, Chiara Gabbi21, Serafina Valente22, Susanna Guerrini8, Elisa Frullanti1, Ilaria Meloni1, Maria Antonietta Mencarelli2, Caterina Lo Rizzo2, Anna Maria Pinto2, Elena Bargagli6, Marco Mandalà4, Simone Furini3, Francesca Mari1,2.
Medical Genetics, University of Siena, Italy
Genetica Medica, Azienda Ospedaliera Universitaria Senese, Italy
Department of Medical Biotechnologies, University of Siena, Italy
Otolaryngology Unit, University of Siena, Italy
Department of Specialized and Internal Medicine, Tropical and Infectious Diseases Unit, Azienda Ospedaliera Universitaria Senese, Italy
Unit of Respiratory Diseases and Lung Transplantation, Department of Internal and Specialist Medicine, University of Siena
Department of Emergency and Urgency, Medicine, Surgery and Neurosciences, Unit of Intensive Care Medicine, Siena University Hospital, Italy
Department of Medical, Surgical and Neuro Sciences and Radiological Sciences, Unit of Diagnostic Imaging, University of Siena, Azienda Ospedaliera Universitaria Senese, Italy
Rheumatology Unit, Department of Medicine, Surgery and Neurosciences, University of Siena, Policlinico Le Scotte, Italy
Department of Specialized and Internal Medicine, Infectious Diseases Unit, San Donato Hospital Arezzo, Italy
Department of Emergency, Anesthesia Unit, San Donato Hospital, Arezzo, Italy
Department of Specialized and Internal Medicine, Pneumology Unit and UTIP, San Donato Hospital, Arezzo, Italy
Clinical Chemical Analysis Laboratory, San Donato Hospital, Arezzo, Italy
Department of Emergency, Anesthesia Unit, Misericordia Hospital, Grosseto, Italy
Department of Specialized and Internal Medicine, Infectious Diseases Unit, Misericordia Hospital, Grosseto, Italy
Clinical Chemical Analysis Laboratory, Misericordia Hospital, Grosseto, Italy
Department of Prevention, Azienda USL Toscana Sud Est, Italy
Territorial Scientific Technician Department, Azienda USL Toscana Sud Est, Italy
Department of Medical Sciences, University of Siena, Italy
Research Center of Systemic Autoinflammatory Diseases and Behçet’s Disease and Rheumatology-Ophthalmology Collaborative Uveitis Center, Department of Medical Sciences, Surgery and Neurosciences, University of Siena, Italy
Independent Scientist, Milan, Italy
Department of Cardiovascular Diseases, University of Siena, Italy
Chirurgia Vascolare, Ospedale Maggiore di Crema, Italy
Department of Health Sciences, Clinic of Infectious Diseases, ASST Santi Paolo e Carlo, University of Milan, Italy
Division of Infectious Diseases and Immunology, Department of Medical Sciences and Infectious Diseases, Pavia, Italy.
Department of Internal Medicine and Therapeutics, University of Pavia, Italy
Department of Anesthesia and Intensive Care, University of Modena and Reggio Emilia, Modena, Italy
Department of Medical and Surgical Sciences for Children and Adults, University of Modena and Reggio Emilia, Modena, Italy
HIV/AIDS Department, National Institute for Infectious Diseases, IRCCS, Lazzaro Spallanzani, Rome, Italy
III Infectious Diseases Unit, ASST-FBF-Sacco, Milan, Italy
Department of Biomedical and Clinical Sciences Luigi Sacco, University of Milan, Milan, Italy
Infectious Diseases Clinic, Department of Medicine 2, Azienda Ospedaliera di Perugia and University of Perugia, Santa Maria Hospital, Perugia, Italy
Infectious Diseases Clinic, “Santa Maria” Hospital, University of Perugia, Perugia, Italy
Department of Infectious Diseases, Treviso Hospital, Local Health Unit 2 Marca Trevigiana, Treviso, Italy
Infectious Diseases Department, Ospedale Civile “SS. Giovanni e Paolo”, Venice, Italy
Infectious Diseases Clinic, ULSS1, Belluno, Italy
Department of Molecular Medicine, University of Padova, Italy
Department of Infectious and Tropical Diseases, University of Brescia and ASST Spedali Civili Hospital, Brescia, Italy.
Medical Genetics and Laboratory of Medical Genetics Unit, A.O.R.N. “Antonio Cardarelli”, Naples, Italy.
Department of Molecular Medicine and Medical Biotechnology, University of Naples Federico II, Naples, Italy.
CEINGE Biotecnologie Avanzate, Naples, Italy
IRCCS SDN, Naples, Italy.
Division of Medical Genetics, Fondazione IRCCS Casa Sollievo della Sofferenza Hospital, San Giovanni Rotondo, Italy.
Department of Nephrology and Dialysis, Fondazione IRCCS Casa Sollievo della Sofferenza Hospital, San Giovanni Rotondo, Italy.
Department of Medical Sciences, Fondazione IRCCS Casa Sollievo della Sofferenza Hospital, San Giovanni Rotondo, Italy.
Department of Health Sciences, University of Genova, Genova, Italy.
Infectious Diseases Clinic, Policlinico San Martino Hospital, IRCCS for Cancer Research Genova, Italy.
Microbiology, Fondazione Policlinico Universitario Agostino Gemelli IRCCS, Catholic University of Medicine, Rome, Italy.
Department of Laboratory Sciences and Infectious Diseases, Fondazione Policlinico Universitario A. Gemelli IRCCS, Rome, Italy.
ETHICAL APPROVAL
The GEN-COVID study was approved by the University Hospital of Siena Ethical Review Board (Prot n. 16929, dated March 16, 2020).
AUTHOR CONTRIBUTIONS STATEMENT
A.R. and F.M. designed the project and experiments. F.F, M.B, A.E, F.A, E.C., M.dA., S.M, M.A.M., F.M., M.M., E.B., A.R. and F.M. performed clinical evaluations. A.G., F.V., S.A., L.B., and M.B. carried the experiments. E.B. and S.F performed bioinformatic analyses. R.T., C.F., and E.B. carried out statistical analysis and prepared the tables. E.B., A.R. and F.M. wrote the manuscript. E.B. submitted this paper. All authors reviewed the manuscript.
ADDITIONAL INFORMATION
The authors declare no competing interests.
ACKNOWLEDGEMENTS
This study is part of GEN-COVID, https://sites.google.com/dbm.unisi.it/gen-covid the Italian multicenter study aimed to identify the COVID-19 host genetic bases The Genetic and COVID-19 Biobank of Siena, member of BBMRI-IT, of Telethon Network of Genetic Biobanks (project no. GTB18001), of EuroBioBank, and of D-Connect, provided us with specimens. We thank the CINECA consortium for providing computational resources and Network for Italian Genomes NIG http://www.nig.cineca.it. We thank private donors’ support to A.R. (Department of Medical Biotechnologies, University of Siena) for the COVID-19 host genetics research project (D.L n.18 of March 17th 2020).
REFERENCES
- [1].↵
- [2].↵
- [3].↵
- [4].
- [5].
- [6].↵
- [7].↵
- [8].↵
- [9].↵
- [10].↵
- [11].↵
- [12].↵
- [13].↵
- [14].↵
- [15].↵
- [16].↵
- [17].↵
- [18].↵
- [19].↵
- [20].↵
- [21].↵
- [22].↵
- [23].↵
- [24].↵
- [25].↵
- [26].↵
- [27].↵
- [28].↵
- [29].↵
- [30].↵
- [31].↵
- [32].↵
- [33].
- [34].
- [35].
- [36].
- [37].
- [38].
- [39].↵
- [40].↵
- [41].↵
- [42].↵
- [43].↵
- [44].↵
- [45].↵
- [46].↵
- [47].↵
- [48].↵
- [49].↵
- [50].↵
- [51].↵
- [52].↵
- [53].↵
- [54].↵
- [55].↵
- [56].↵
- [57].↵
- [58].↵
- [59].↵
- [60].↵
- [61].↵
- [62].↵
- [63].↵
- [64].↵
- [65].↵