Abstract
Background Although COVID-19 infection has been associated with a number of clinical and environmental risk factors, host genetic variation has also been associated with the incidence and morbidity of infection. The CRP gene codes for a critical component of the innate immune system and CRP variants have been reported associated with infectious disease and vaccination outcomes. We investigated possible associations between COVID-19 outcome and a limited number of candidate gene variants including rs1205.
Methodology/Principal Findings The Strong Heart and Strong Heart Family studies have accumulated detailed genetic, cardiovascular risk and event data in geographically dispersed American Indian communities since 1988. Chi-square tests, logistic regression and generalized linear mixed models (implemented in SOLAR) were used in analysis. Genotypic data and 91 COVID-19 adjudicated deaths or hospitalizations from 2/1/20 through 3/1/23 were identified among 3,780 participants in two subsets. Among 21 candidate variants including genes in the interferon response pathway, APOE, TMPRSS2, TLR3, the HLA complex and the ABO blood group, only rs1205, a 3’ untranslated region variant in the CRP gene, showed nominally significant association in T-dominant model analyses (odds ratio 1.859, 95%CI 1.001-3.453, p=0.049) after adjustment for age, sex, center, body mass index, and a history of cardiovascular disease. Within the younger subset, association with the rs1205 T-Dom genotype was stronger, both in the same adjusted logistic model and in the SOLAR analysis also adjusting for other genetic relatedness.
Conclusion A T-dominant genotype of rs1205 in the CRP gene is associated with COVID-19 death or hospitalization, even after adjustment for relevant clinical factors and potential participant relatedness. Additional study of other populations and genetic variants of this gene are warranted.
Author summary Considerable inter-individual variability in COVID-19 clinical outcome has been noted from the onset of this pandemic. Some risk factors, such as age, diabetes, obesity, prior cardiovascular disease (CVD) are well established. The possibility of inherited, host genetic risk factors has also been examined and a number validated in very large, population based datasets. The present study leveraged on-going clinical surveillance of CVD and available genetic information in an American Indian population to investigate potential host genetic contributors to COVID-19 morbidity and mortality. Although the number of cases ascertained was relatively small, a novel variant in the C-reactive protein gene was found to be associated with COVID-19 outcome. This protein constitutes a critical component of the innate immune system and CRP variants have been reported associated with infectious disease and vaccination outcomes. Improved understanding of the pathophysiology of this infection may allow more targeted therapy.
Introduction
The ongoing pandemic of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) infections has taken a devastating toll on American Indian, Alaskan Native (AI/AN) populations, with hospitalization and death rate ratios for COVID-19 (compared with White ethnicity) reported as 2.5 fold; and 2.1 fold, respectively.[1]
A number of demographic and clinical characteristics, such as male gender, socio-economic deprivation, diabetes, cardiovascular disease and obesity, have been identified as risk factors for COVID-19 associated morbidity and mortality.[2] The hypothesis that this increased burden of COVID-19 morbidity and mortality among ethnic minorities is due to the increased prevalence of these co-morbidities in many populations, has been considered; but the proportion of effect attributable to clinical co-morbidities may, be small.[2] In spite of considerable interest in possible host genetic factors that influence the severity of COVID-19 among all patients, there have been few reliably replicating studies in specific racial/ethnic groups.[3]
The expectation that ethnic disparities in COVID-19 outcomes are primarily driven by population specific behavioral or clinical burden of disease may have resulted in less interest in investigating genetic susceptibility among AI/AN communities. This is of importance for the possible identification of therapeutic targets and prevention strategies assisting AI/AN as well as other populations.
The Strong Heart Study (SHS) and allied Strong Heart Family Study (SHFS), comprise the largest, ongoing prospective epidemiological cohort study among American Indians in 12 different Tribal communities located in Arizona, North and South Dakotas, and in Oklahoma. These cohorts have the requisite genetic datasets and ongoing surveillance of medical records for cardiovascular disease (CVD) and risk factors, including COVID-19, to also assess genetic contributions to COVID-19 death and morbidity risks. The present results derive from a case/cohort analysis of SNPs reported associated with COVID mortality or morbidity in other populations.
Materials and Methods
Participants and Cohorts
The SHS/SHFS methodology and design has been described previously.[4,5] Diabetes and hypertensive status were defined as previously;[6] and the dichotomous CVD covariate indicates any history of myocardial infarction, coronary artery disease, congestive heart failure, and atherosclerotic stroke and peripheral vasculature disease.[7] While SHS/SHFS medical record surveillance of participants focuses on ascertainment of outcome events and clinical risk factors related to CVD, in early 2020 ascertainment for COVID-19 was begun, with the recognition of a bidirectional relationship between CVD and COVID-19 infection. The physicians conducting medical records reviews identified COVID-19 as either a contributing or definitive factor in death or hospitalization (disease severity). However, medical record review was triggered primarily by CVD related events (for ascertainment criteria, see Lee et al [4]) and a small number of other conditions, but not primarily by infections. Thus, the identified COVID-19 endpoint of hospitalization is probably under-reported, given the surveillance focus on CVD outcomes. However, all deaths associated with COVID-19 during the study period are included in this report.
The study period was 2/1/2020 through 3/1/2023, when medical record reviews ascertaining COVID-19 outcomes were available. We included all participants alive on 2/1/2020 and identified outcome events (either COVID-19 death or hospitalization) during follow-up. There were no specific ICD-9/10 codes used to identify relevant morbid COVID-19 cases, but physician review of the medical record determined COVID-19 case status based on immune measures and clinical diagnoses, with morbid events for review selected on the basis of standard SHS protocol to ascertain incident CVD.[4]
Rationale for genetic variants selected
A PubMed literature search for candidate variants previously reported to be associated with COVID-19 pathogenesis or clinical outcomes found previously implicated genes in the interferon response pathway,[8,9] APOE,[10] TMPRSS2,[11] TLR3,[9] ACE,[12] FURIN[13] and the human leukocyte antigen (HLA)[14,15] regions. We also included ABO blood group polymorphisms which have been among the most consistently identified host factors influencing the COVID-19 phenotype.[16-18] The top 17 “hits” of of the Covid-19 Host Genetics Initiative Browser, round 7 identified additional variants of interest.[19]
Thus our literature search developed a total of 73 plausible candidate variants and resulted in 21 polymorphisms that were available from the genetic resources of the SHS, albeit within two subsets of participants that were genotyped in SHS/SHFS substudies. The candidate SNPs and identifying, published references are found in Table 1.
Although genotypes were derived from extensive microarray sources, no attempt was made to screen or identify variants of interest on the basis of genome-wide analysis of SHS microarray data.
The only exception to the qualification of previously identified association was the choice of CRP polymorphisms as explained below. This was felt to be justified due to the central role of CRP in the innate immune system and dramatic elevation during COVID-19 and other viral infections.[43]
C-reactive protein (CRP) levels are typically thought of as a biomarker for inflammatory response and not as a pathogenetic factor, although inherited CRP variants have been associated with, and thus possibly a modifying factor in some infections[22,23] response to vaccination,[26] as well as other conditions such as cancer[44,45] and pre-eclampsia.[46,47] For this reason, we selected 4 available variants related to CRP expression for our analysis.
Laboratory methods and considerations
Clinical, anthropometric and laboratory measures are reported as obtained from the most recent SHS exam prior to the study period. Laboratory methodology for determination of serum creatinine,[48] HgbA1c[49] and high sensitivity C-Reactive Protein (hsCRP)[50] have been reported previously
The SHFS [5] participants were genotyped with the Illumina Human Cardio-Metabo BeadChip microarray (Illumina, San Diego, CA), incorporating approximately 200,000 single nucleotide polymorphisms (SNPs) in loci previously identified as significantly associated with metabolic and CVD traits.[51] These genotypes were generated exclusively from SHFS participants without diabetes mellitus during exams between 1997-2003. Further details related to quality filters and data preparation have been published.[50]
The SHS participants were genotyped using the Illumina Infinium Multi-Ethnic Global-8 vs1. Quality control included filtering variants with call rates <95% and duplicated QC samples. The CRP rs1205 variant was genotyped from both the SHFS (N=1,666) and SHS (N=2,114) participants, providing a total of (N=3,780) rs1205 genotypes. This is presented graphically in Figure 1. Except for rs1205 and rs8176746, genotypic data from other variants were available from either the SHFS or the SHS participants (Table 1).
Although two different genotyping platforms were used for the SHS and SHFS participants, the genotyping for both was conducted in the same laboratory at Texas Biomedical Research Institute in San Antonio, Texas. All 121 participants genotyped for rs1205 and the ABO variant rs8176746 on both platforms and meeting the criteria for inclusion in the study had concordant genotyping.
Statistical analysis methods
Descriptive statistics presented counts and percentages for discrete variables and means with standard deviations for continuous variables. Between-group comparison of continuous variables used T-tests and assumed independent samples. Discrete variables utilized Pearson chi-square comparisons between groups.
Logistic regression models were developed for both univariate and multivariate analysis of COVID-19 associated death or hospitalization during the observation period (2/1/20 through 3/29/22). Only a single COVID-event was counted for each participant. To select covariates included in the final multivariate model, BMI and any history of CVD were chosen on the basis of apparent independent association (p<0.05) when all available covariates (sex, body mass index (BMI), hypertension, diabetes mellitus (DM), impaired fasting glucose (IFG) or normal fasting glucose (NFG), history of CVD event, (tobacco (current, ever, never), serum creatinine, and hemoglobin A1c (HgbA1c)) were considered jointly. Age, sex and center were deemed important, standard adjustments and therefore kept in models. The covariate related to diabetes and dysglycemia was not independently associated with COVID-19 when included with BMI (as seen in the first section of Table 5), so was not retained in the final logistic model. The serum hsCRP was not included as a covariate, given we were testing CRP genetic variants with known effects on serum levels. Significance was accepted at a p-value of 0.05, since the genetic variants were chosen on the basis of a priori evidence of potential association with COVID-19 as well as recognizing the hypothesis driven nature of the presented analysis.
The SHFS cohort enrolled large families, and therefore we accounted for the relatedness using random effects of pedigree relationships (established with participants at recruitment) in generalized linear mixed models implemented in SOLAR.[52,53] For discrete phenotypes such as affection status, SOLAR employs a classic liability model.[54] Briefly: the unmeasured, presumed multifactorial, basis of liability to disease is modeled as a standard normal distribution (mean=0, SD=1), with affected individuals represented by an upper tail of the distribution that has the same density as the incidence of affection status in the general population. For this study we used the mean weekly incidence of COVID-19 in the US population during the study period (217.72 cases/100,000 population, 0.218%)[55] Heritability of liability is estimated from genetic correlations between relatives[56] and the effects of measured factors (age, sex, etc., as well as measured genotypes) are estimated as deviations of the liability threshold from the population incidence.
Ethics Statement
All participants in this study have provided written, informed consent to allow genetic and other research related to CVD and its risk factors. Consent was obtained at least once in all cases since Phase I of SHS in 1988, and in most instances has been renewed during each the 6 subsequent phases until the present. Approval has been obtained from the following institutional review boards (IRB) of record: Arizona Area Indian Health Service IRB, MedStar Health Research Institute IRB, Great Plains Area Indian Health Service IRB, Oglala Sioux Tribe Research Review Board, University of Oklahoma Health Sciences IRB, and Oklahoma Area Indian Health Service IRB. In addition to these formal, IRB approvals, all of the participants’ Tribal governments have approved the conduct of the SHS/SHFS studies for these purposes. The close association between CVD (especially stroke) and COVID-19 infection, often in a bidirectional manner, justified IRB approval for this substudy.
Results
Table 1 shows the genotyped variants selected, their associated genes and allele frequencies in the SHFS or SHS cohorts. The characteristics of both the SHS and SHFS cohorts are summarized in Table 2.
Unadjusted results limited to the primary variant of interest from the SHFS, the SHS cohort or combined SHFS/SHS cohorts, using additive and dominant models based on risk or non-risk alleles are shown in Table 3.
Results for the remaining variants are found in Supplemental Table S1. For variants with low frequency, logistic models were often unstable and not reported. Analyses in the SHFS cohort showed nominally significant associations for the rs1205 T allele dominant (T-Dom) genotype in the combined SHFS/SHS cohort and both additive and T-Dom models in the SHFS cohort).
Table 4 shows the distribution of COVID-19 cases among recruitment centers, highlighting the relatively larger number of cases identified in the Dakota center compared to other centers, and provides the allelic prevalence of rs1205 by center. Among the combined SHFS/SHS cohorts, the prevalence of the rs1205 T allele was 0.525 (95% CI 0.51 - 0.54).
Multivariate logistic regression model findings for the primary SNPs of interest are reported in Table 5. Results of the rs1205 T-Dom model is attenuated after adjustment for covariates, but retains nominally significant association (OR 1.859, 95% CI 1.001-3.453, p=0.049) with COVID-19 in in the combined SHS/SHFS cohorts. Within the younger SHFS cohort, the same model showed an odds ratio of 2.857, 95%CI 1.108-7.362, p=0.030.
In the SHFS analysis, using a generalized linear threshold model in SOLAR to adjust for the random effect of family relatedness, and adjusting for fixed effects of sex, age, BMI, CVD, and center, either fatal or non-fatal COVID-19 was significantly associated with the fixed effect of the rs1205 T allele-dominant genotype (p=0.0003). The regression beta for this association was negative (= -0.50 standard deviations), indicating a tendency to increase the affected upper tail of the liability distribution (thus confirming rs1205-T Dom genotype as a risk allele).
While those with rs1205 T-Dom genotypes trended toward lower mean hsCRP levels at baseline than others (p=0.087 excluding levels over 15 mg/L to avoid spurious elevations due to intercurrent infections), mean hsCRP levels were not significantly different between COVID-19 cases or controls. (Supplemental Table S2).
Discussion
We present previously unreported evidence of association between a common human variant (rs1205) and the clinical impact of the SARS-CoV-2 virus causing COVID-19. While our study has a modest number of cases, these results derive from a large cohort of American Indian individuals who are being followed longitudinally using standardized, physician review of medical records[4] The rs1205 variant has been recognized as functionally affecting serum levels of CRP,[25,28,29] in several populations and, in the present SHFS population as well.[50] CRP is an important component of the innate immune system, thus supporting a role of the gene and this variant’s potential effects on the pathophysiology of COVID-19. A genome-wide association meta-analysis has reported an intron variant (rs67579710) associated with COVID-19 hospitalization among 24,741cases and 2,835,201controls.[42] This variant is 4.5Mb from rs1205 [57] and within the thrombospondin3 gene, thus it may play a role in the thrombosis associated with COVID-19, rather than inflammatory pathways.
There are ample theoretical reasons that heritable genetic variation could affect SARS-CoV-2 infection, beginning with the viral requirement for surface receptors to gain entry to the cytoplasm[58] and running through multiple immune response pathways generating the “cytokine storm” which has been prominently noted in the pathogenesis of COVID-19.[59] A number of genetic variants have been associated with morbidity and mortality, including variants in the angiotensin converting enzyme 1 (ACE1) gene,[60] the TMPRSS2 gene,[61] the IL-6 gene[62] and others extensively reviewed by Ishak et al.[63] The COVID-19 Host Genetics Initiative, a massive ascertainment of nearly 50,000 cases from 19 countries has identified 13 genome-wide loci related to either initial infection, or morbidity.[64]
The rs1205 variant is a C/T single nucleotide polymorphism in the 3’ untranslated region of the C-reactive protein gene (CRP), with a prevalence of the C allele ranging from ∼ 0.67 in European populations, to ∼ 0.40 in Asian populations,[65] and 0.51 to 0.54 in the present study. The 4 CRP variants were chosen for their role in the innate immune system and availability in the SHS dataset, but no literature was found that examined CRP variants related to COVID-19 severity, perhaps because CRP has traditionally been viewed as a biomarker and not as a potential causal factor. There are, however, reports of CRP variants associated with the clinical outcomes of other infections[22,23] and the response to vaccination.[26] Additionally, reduced expression of serum CRP is consistently reported associated with the rs1205 T allele, with the beta coefficient for each additional T allele ranging between -0.17 and -0.27;[25,28,29] and estimated at -0.23 in the SHS Dakota center.[50] As noted in Table S2 the mean serum CRP level is lower in the present study, among those with the rs1205 T-Dom genotype compared with those homozygous for the C allele, although the difference is not significant.
Although the prevalent view of COVID-19 pathophysiology implicates a hyperactive immune response (often characterized by increased CRP serum levels as well as coordinated production of other cytokines), it also seems plausible that a relative, baseline insufficiency of an innate immune factor, particularly in the early phase of exposure, could also result in increased morbidity.
CRP genotypes correlating with increased baseline CRP levels have been positively associated with risk of pre-eclampsia[46,47,66] and conversely, as in the current study, CRP genotypes (including rs1205 T-Dom) correlated with lower CRP levels have been associated with increased infectious burden.[22-24] These divergent effects invite speculation that CRP polymorphisms may have experienced balanced selection during evolution.
Our multivariate logistic regression results are in accord with prior studies that showed the contribution of commonly reported clinical factors, such as age,[67] BMI[64] and pre-existing CVD,[68,69] which negatively impact COVID-19 outcomes. Although the mean BMI of the SHFS is somewhat greater than the SHS cohort, it should be noted that the SHFS cohort has a low prevalence of DM (13%) and is younger than the SHFS, due to selection for that substudy of those without diabetes at baseline. Perhaps this allows a clearer genetic (vs an acquired clinical or environmental) association to become apparent.
The small number of cases when stratified by Center suggests cautious interpretation, but is presented since the number of cases from the Dakota Center is considerably higher than from the comparable Oklahoma center. Although the possibility of population stratification confounding our results is present, we believe the fact that all 3 centers show an effect of the T-Dom genotype in the same direction, including the Arizona Center, which is likely to be most distant in their genetic background, suggests genetic background effects are not likely. The statistical significance of the center covariate could be due to various differences in regional, environmental factors, such as household density, weather, and accessibility of medical care.
The lack of significant results in the other SNPs is not surprising, given the number of these variants with low allele frequencies and the small number of COVID-19 cases. Further surveillance of the SHS cohort will likely identify new cases and improve the power for other variants (eg rs111837807 and rs2109069).
The limitations of this analysis are chiefly related to the relatively small number of cases obtained to this point. The morbid case ascertainment is subject to undercounting since the choice of medical records for review is based primarily on CVD outcomes.
This does not affect ascertainment of mortal cases, since all deaths are reviewed and recorded as soon as possible. The possibility of population stratification is noted, although seems unlikely for reasons noted above. P values have not been adjusted for multiple testing.
The study’s strengths include a robust physician assignment of outcomes using medical records and comprehensive analysis of genetic variabilities as well as adjustment for relatedness between individuals.
It is clear that race/ethnicity is a valid and critical risk marker for other underlying conditions affecting the complexity of COVID-19 disease, such as structural racism, discrimination and socioeconomic status,[70] lack of health care access,[71] and exposure to infectious agents related to high risk and service industry occupations.[72] Although one can never be confident that all confounding factors have been adjusted properly, Williamson et al[2] showed that ethnicity among a very large cohort in England was associated with COVID-19 outcomes, even after adjusting for socioeconomic factors. Thus, even though we are committed to improving our understanding of the complex social and behavioral factors influencing COVID-19 disease landscape among Tribal communities we also believe that investigating the potential influence of common genetic variants among the high risk American Indian/Alaska Native populations may provide opportunities for therapeutic or preventive options. Although severely impacted by COVID-19,[73,74] Tribal communities are, at times, excluded from scientific studies due to their smaller proportion in the U.S. population and various socio-political factors.
In conclusion, a statistically significant association was found between the rs1205 T-Dom genotype and risk of COVID-19 death or hospitalization among American Indian participants, employing chi-square tests, logistic regression models adjusting for age, sex, center, BMI and prior history of CVD, and SOLAR software analysis also adjusting for relatedness within the SHFS cohort. The direction of effect suggests a lower level of CRP during early phases of infection may increase the risk of subsequent complications, a novel finding we will continue to investigate further as the pandemic continues and our samples size tragically continues to increase.
Supporting Information:
Supplemental Table 1. Additional univariant associations with either fatal or non-fatal COVID-19.
Supplemental Table 2. Baseline hsCRP measures* in relation to rs1205 T-Dom genotype and COVID-19 case/control status.
Data Availability
The data underlying the results of this study are owned and controlled by the Tribal entities that approved its collection. This fact is clearly stated in the Tribal resolutions authorizing the research; and it must be recognized that these Tribal communities are an independent, sovereign governments, in control over research activities within their borders. Access to data and materials can be accomplished by application to Mr. Guthrie Ducheneaux, IT director, Missouri Breaks Research Inc, 505 S Willow St, Eagle Butte, SD 57625, 605-964-3419, email: guthrie.ducheneaux{at}mbiri.com, who will arrange for further consultation with the appropriate Tribal partners. Approximately 2 to 3 months may be required. The authors received special access privileges to the data due to their relationship with the Tribal Governments, however, interested researchers who apply for data access will be able to access the same data as the authors.
Acknowledgements
We thank the study participants, Indian Health Service facilities, and participating Tribal communities for their extraordinary cooperation and involvement, which has been critical to the success of the Strong Heart Study since 1988.