Abstract
Objective Poor metabolic health and certain lifestyle factors have been associated with risk and severity of coronavirus disease 2019 (COVID-19), but data for diet are lacking. We aimed to investigate the association of diet quality with risk and severity of COVID-19 and its intersection with socioeconomic deprivation.
Design We used data from 592,571 participants of the smartphone-based COVID Symptom Study. Diet quality was assessed using a healthful plant-based diet score, which emphasizes healthy plant foods such as fruits or vegetables. Multivariable Cox models were fitted to calculate hazard ratios (HR) and 95% confidence intervals (95% CI) for COVID-19 risk and severity defined using a validated symptom-based algorithm or hospitalization with oxygen support, respectively.
Results Over 3,886,274 person-months of follow-up, 31,815 COVID-19 cases were documented. Compared with individuals in the lowest quartile of the diet score, high diet quality was associated with lower risk of COVID-19 (HR, 0.91; 95% CI, 0.88-0.94) and severe COVID-19 (HR, 0.59; 95% CI, 0.47-0.74). The joint association of low diet quality and increased deprivation on COVID-19 risk was higher than the sum of the risk associated with each factor alone (Pinteraction=0.005). The corresponding absolute excess rate for lowest vs highest quartile of diet score was 22.5 (95% CI, 18.8-26.3) and 40.8 (95% CI, 31.7-49.8; 10,000 person-months) among persons living in areas with low and high deprivation, respectively.
Conclusions A dietary pattern characterized by healthy plant-based foods was associated with lower risk and severity of COVID-19. These association may be particularly evident among individuals living in areas with higher socioeconomic deprivation.
Introduction
Poor metabolic health1,2 has been associated with increased risk and severity of coronavirus disease 2019 (COVID-19), and excess adiposity or preexisting liver disease might be causally associated with increased risk of death from COVID-19.3,4 Underlying these conditions is the contribution of a diet, which may be independently associated with COVID-19 risk and severity.
On the basis of prior scientific evidence, diet quality scores have been developed to evaluate the healthfulness of dietary patterns.5–7 Dietary patterns capture the complexity of food intakes better than any one individual food item and offer the advantage of describing usual consumption of foods in typical diets.8 One such diet score is the healthful plant-based diet index (hPDI), which emphasizes intake of healthy plant foods, such as fruits, vegetables, and whole grains, and has been associated with lower risk of metabolic diseases.5,9,10
Adherence to healthful dietary patterns may also be a proximal manifestation of distal social determinants of health.11–13 Addressing adverse social determinants of health, such as poor nutrition, has been shown to reduce the burden of certain infectious diseases in the past,14 supporting calls for prioritizing social determinants of health in the public health response to COVID-19. However, evidence on the association between diet quality and the risk and severity of COVID-19 is lacking, especially in the context of upstream social determinants of health. To address this evidence gap, we analyzed data for 592,571 United Kingdom (UK) and United States (US) participants from the smartphone-based COVID Symptom Study,15 to prospectively investigate the association of diet quality with risk and severity of COVID-19 and its intersection with socioeconomic deprivation.
Materials and Methods
Study design and participants
The COVID Symptom Study is a smartphone-based study conducted in the UK and US. Study design and sampling procedures have been published elsewhere.15 This analysis included participants recruited from March 24, 2020 and followed until December 2, 2020. Participants who reported any symptoms related to COVID-19 prior to start of follow-up, or reported symptoms that classified them as having predicted COVID-19 within 24 hours of first entry, or who tested positive for COVID-19 at any time prior to start of follow-up or 24 hours after first entry were excluded. We also excluded participants younger than 18 years old, pregnant, and participants who logged only one daily assessment during follow-up. At enrollment, we obtained informed consent to the use of volunteered information for research purposes and shared relevant privacy policies and terms of use agreements. The study protocol was approved by the Mass General Brigham Human Research Committee (protocol 2020P000909) and King’s College London Ethics Committee (REMAS ID 18210, LRS-19/20-18210).
Data collection procedures
Information on demographic factors was collected through standardized questionnaires at baseline,15 including self-reported COVID-19 or any COVID-19 related symptoms and personal medical history including lung disease, diabetes, cardiovascular disease, cancer, kidney disease, and use of medications. During follow-up, daily prompts queried for updates on interim symptoms, health care visits, and COVID-19 testing results. Through software updates, a survey to examine self-reported diet and lifestyle habits during the pre/early-pandemic period was launched between August and September 2020. Details about this survey are available in the Supplementary Methods and published elsewhere.16
Assessment of diet quality
Diet quality was assessed using information obtained from an amended version of the Leeds Short Form Food Frequency Questionnaire17 that included 27 food items (online supplementary methods). Participants were asked how often on average they had consumed one portion of each item in a typical week. The responses had eight frequency categories ranging from “rarely or never” to “five or more times per day”.
Diet quality was quantified using the validated hPDI score.5 To compute the hPDI, the 27 food items were combined into 14 food groups (online supplementary table 1). The original hPDI score included 18 food groups but nuts, vegetable oils, tea or coffee, and animal fat were not specifically queried. Food groups were ranked into quintiles and given positive (healthy plant food groups) or reverse scores (less healthy plant and animal food groups). With positive scores, participants within the highest quintile of a food group received a score of 5, following on through to participants within the lowest quintile who received a score of 1. With reverse scores, this pattern of scoring was inverted. All component scores were summed to obtain a total score ranging from 14 (lowest diet quality) to 70 (highest) points. Criteria for generation of the hPDI are provided in online supplemental table 2. As an additional method to quantify diet quality based on available diet information, we used the Diet Quality Score (DQS).17 The DQS is a score for adherence to UK dietary guidelines and was computed from five broad categories including fruits, vegetables, total fat, oily fish, and non-milk extrinsic sugars. Each component was scored from 1 (unhealthiest) to 3 (healthiest) points, with intermediate values scored proportionally (online supplementary table 3). All component scores were summed to obtain a total score ranging from 5 (lowest diet quality) to 15 (highest) points (online supplementary table 4).
Assessment of COVID-19 risk and severity
The primary outcome of this analysis was COVID-19 risk defined using a validated symptom-based algorithm,18 which provides similar estimates of COVID-19 prevalence and incidence as those reported from the Office for National Statistics Community Infection Survey.19 Details on the symptoms included in the predictive algorithm and corresponding weights are provided in the online supplementary methods. In brief, the symptom-based approach uses an algorithm to predict whether a participant has been infected with SARS-CoV-2 on the basis of their reported symptoms, age, and sex. The rationale for symptom-based classifier as a primary outcome was due to widespread difficulties obtaining testing during the early stages of the pandemic.20 Secondary outcomes were confirmed COVID-19 based on a self-report of a reverse transcription polymerase chain reaction (RT-PCR) positive test and COVID-19 severity. COVID-19 severity was ascertained based on a report of the need for a hospital visit which required 1) non-invasive breathing support, 2) invasive breathing support, and 3) administration of antibiotics combined with oxygen support (online supplementary methods).
Statistical analysis
We summarized continuous measurements by using medians and interquartile ranges, and present categorical observations as frequency and percentages. Based on zip code (US) or post code (UK) of residence, participants were assigned to country-specific community-level socioeconomic measures including socioeconomic deprivation and population density (online supplementary methods). The methods for classifying socioeconomic deprivation, population density, and other a priori selected covariates are provided in online supplementary methods. Multiple imputations by chained equations with five imputations were used to impute missing values. All covariates in the primary analysis were included in the multiple imputation procedure, and estimates generated from each imputed dataset were combined using Rubin’s rules.21
Follow-up time for each participant started 24 hours after first log-in to the time of predicted COVID-19 (or to time of secondary outcomes) or date of last entry prior to December 2, 2020, whichever occurred first. We modeled the diet quality score as a continuous variable and generated categories of the score based on quartiles of the distribution (quartile 1, low diet quality; quartiles 2-3, intermediate diet quality; quartile 4, high diet quality). Cox regression models stratified by calendar date at study entry, country of origin, and 10-year age group were used to calculate hazard ratio (HR) and 95% confidence intervals (95% CI) for COVID-19 risk and severity (age-adjusted model 1). Model 2 was further adjusted for sex, race/ethnicity, index of multiple deprivation, population density, presence of diabetes, cardiovascular disease, lung disease, cancer, kidney disease, and healthcare worker status. Model 3 was further adjusted for body mass index, smoking status, and physical activity. We verified the proportional hazards assumption of the Cox model by using the Schoenfeld residuals technique.22 Absolute risk was calculated as the percentage of COVID-19 cases occurring per 10,000 person-months in a given group. We used restricted cubic splines with four knots (at the 2.5th, 25th, 75th, and 97.5th percentiles) to assess for non-linear associations between diet quality and COVID-19 risk.
In secondary analyses, we used a self-report of a positive test to define COVID-19 risk. For these analyses, we used inverse probability-weighted Cox models to account for predictors of obtaining country-specific testing. Inverse probability-weighted analyses included presence of COVID-19-related symptoms, interaction with a person with COVID-19, occupation as a healthcare worker, age group, and race. Inverse probability-weighted Cox models were stratified by 10-year age group and date with additional adjustment for the covariates used in previous models. For severe COVID-19 analyses, we adjusted for the same covariates used in previous models. As an additional method to quantify diet quality we used the DQS and tested for associations between diet quality and COVID-19 risk and severity. In addition, we censored our analyses to cases that occurred after completing the diet survey to investigate potential bias due to time-varying confounding.
In subgroup analyses, we assessed the association between diet quality and COVID-19 risk according to comorbidities, demographic, and lifestyle characteristics. We also classified participants according to categories of the diet quality score and socioeconomic deprivation (nine categories based on thirds of diet quality score and deprivation index) and conducted joint analyses for COVID-19 risk. We tested for additive interactions by assessing the relative excess risk due to interaction, and further examined the COVID-19 risk proportions attributable to diet, deprivation, and to their interaction (online supplementary methods).23
We conducted sensitivity analyses to account for regional differences in the effective reproductive number (Rt) or other risk mitigating behaviors such as mask wearing. Details on how we obtained and classified individuals for these analyses are provided in online supplementary methods. Two-sided p-values of <0.05 were considered statistically significant for main analyses. All statistical analyses were performed using R software, version 4.0.3 (R Foundation).
Results
Self-reported diet quality was evaluated in 647,137 survey responders, of which 54,566 were excluded due to prevalent COVID (n=1,555), presence of any symptoms at baseline (n=47,594), logged only once (n=1,201), pregnancy (n=1,129), or age under 18 year (n=3,087; online supplementary figure 1). Baseline characteristics of the 592,571 participants included in this study according to categories of the hPDI score are shown in table 1. Participants in the highest quartile of the diet score (reflecting a healthier diet) were more likely than participants in the lowest quartile to be older, female, healthcare workers, of lower BMI, engage in physical activities ≥ 5 days/week, and less likely to reside in areas with higher socioeconomic deprivation. The hPDI score was normally distributed (online supplementary figure 2).
Over 3,886,274 person-months of follow-up, 31,815 COVID-19 cases were documented. Crude COVID-19 rates per 10,000 person-months were 72.0 (95% CI, 70.4-73.7) for participants in the highest quartile of the diet score and 104.1 (95% CI, 101.9-106.2) for those in the lowest quartile. The corresponding age-adjusted HR for COVID-19 risk was 0.80 (95% CI, 0.78-0.83, table 2). Differences in the risk of COVID-19 persisted after adjustment for potential confounders. In fully adjusted models, the multivariable-adjusted HR for COVID-19 risk was 0.91 (95% CI, 0.88-0.94) when we compared participants with high diet quality to those with low diet quality. We observed non-linear decreasing trends in the risk of COVID-19 with higher diet quality (P < 0.001 for non-linearity), in which COVID-19 risk plateau among individuals with a diet quality score > 50 (online supplementary figure 3). The association between diet quality and COVID-19 risk was consistent but attenuated in secondary analyses using the DQS score (HR, 0.92; 95% CI, 0.89-0.95; online supplementary table 5), and became non-significant in fully adjusted models (HR, 1.00; 95% CI, 0.97-1.03). We also investigated whether our primary findings were consistent in an analysis censored to cases that occurred after the completion of the diet survey. These analyses showed that high diet quality, compared to low diet quality, was associated with lower COVID-19 risk (multivariable-adjusted HR 0.88; 95% CI, 0.83-0.93; online supplementary table 6).
In secondary analyses for COVID-19 risk based on a positive test, we showed that crude COVID-19 incidence rates per 10,000 person-months were 12.9 (95% CI 12.2-13.6) for individuals with high diet quality and 16.4 (95% CI 15.5-17.2) for individuals with low diet quality. The corresponding multivariable-adjusted HR for risk of COVID-19 was 0.82 (95% CI, 0.78-0.86; table 2). For risk of severe COVID-19, crude incidence rates were lower for individuals reporting high diet quality compared to those with low diet quality (1.6 (95% CI, 1.3-1.8) vs. 2.1 (95% CI, 1.9-2.5; per 10,000 person-months) table 2). In the fully adjusted model, high diet quality, as compared to low diet quality, was associated lower risk of severe COVID-19 with an a HR of 0.59 (95% CI, 0.47-0.74; table 2).
In stratified analyses, the inverse association between diet quality and COVID-19 risk was more evident in participants living in areas of high socioeconomic deprivation and those reporting low physical activity levels (P < 0.05; table 3). We found no significant effect modification for other characteristics such age, BMI, race/ethnicity or population density. When diet quality and socioeconomic deprivation were combined, there was a risk gradient with low diet quality and high socioeconomic deprivation. Compared with individuals living in areas with low socioeconomic deprivation and high diet quality, the multivariable-adjusted HR for risk of COVID-19 for low diet quality was 1.08 (95% CI, 1.03-1.14) among those living in areas with low socioeconomic deprivation, 1.23 (95% CI, 1.17-1.29) for those living in areas with intermediate socioeconomic deprivation, and 1.47 (95% CI, 1.38-1.52) for those living in areas with high socioeconomic deprivation (figure 1). The joint associations of diet quality and socioeconomic deprivation was higher than the sum of the risk associated with each factor alone (relative excess risk due to interaction (RERI) = 0.05 (95% CI 0.02-0.08); Pinteraction=0.005; online supplementary table 7). The proportion of contribution to excess COVID-19 risk was estimated to be 31.9% (95% CI, 18.2-45.6) to diet quality, 38.4% (95% CI, 26.5-50.3) to socioeconomic deprivation, and 29.7% (95% CI, 2.1-57.3) to their interaction. The absolute excess rate of COVID-19 per 10,000 person-months for lowest vs highest quartile of the diet score was 22.5 (95% CI, 18.8-26.3) among individuals living in areas with low socioeconomic deprivation and 40.8 (95% CI, 31.7-49.8) among individuals living in areas with high deprivation (online supplementary figure 4)
We conducted a series of sensitivity analyses to further account for variation in Rt or mask wearing. For peak Rt censored analyses, crude COVID-19 rates per 10,000 person-months were 148.1 (95% CI, 139.9-156.8) among participants with low diet quality and 92.9 (95% CI, 86.6-99.5) for participants with high diet quality. The corresponding multivariable-adjusted HR was 0.84 (95% CI, 0.76-0.92, figure 2). The same trend was observed for nadir Rt censored analyses, in which crude COVID-19 rates per 10,000 person-months were 67.1 (95% CI, 61.7-73.0) among participants with low diet quality and 45.8 (95% CI, 41.3-50.5) for participants with high diet quality (multivariable-adjusted HR, 0.89; 95% CI, 0.80-1.00, figure 2). We further adjusted our models for mask wearing. This analysis showed that high diet quality, as compared to low diet quality, was associated with lower risk of COVID-19 with an adjusted HR of 0.88 (95% CI, 0.83-0.94; online supplementary table 8).
Discussion
In this large survey among UK and US participants prospectively assessing risk and severity of COVID-19 infection, we found that a dietary patterns characterized by healthy plant foods was associated with lower risk and severity of COVID-19. We observed a risk gradient of poor diet quality and increased socioeconomic deprivation that departed from the additivity of the risks attributable to each factor separately, suggesting that the beneficial association of diet with COVID-19 may be particularly evident among individuals with higher socioeconomic deprivation.
Our findings are aligned with preliminary evidence showing that improving nutrition could help reduce the burden of infectious diseases.12,14,24 Early studies have shown that the administration of arachidonic or linoleic acid partially suppresses SARS-CoV-1 and coronavirus 229E viral replication,25 and that specific nutrients or dietary supplements associate with modest reductions in COVID-19 risk.26 Results from this observational study could expand previous single nutrient observations and highlight the beneficial association of healthy dietary patterns, which was most pronounced for risk of severe COVID-19. Our findings also concur with a comparative risk assessment study suggesting that a 10% reduction in the prevalence of diet-related conditions such as obesity and type 2 diabetes would have prevented ∼11% of the COVID-19 hospitalizations that have occurred among US adults since November 2020.27
The association of healthy diet with lower COVID-19 risk appears particularly evident among individuals living in areas of higher socioeconomic deprivation. Our models estimate that nearly a third of COVID-19 cases would have been prevented if one of two exposures (diet and deprivation) were not present. Although these estimated attributable risks should be interpreted in the context of the population-specific prevalence, and are likely to change over time with the prevailing SARS-CoV-2 infection rate, our observations are consistent with data from ecological studies showing that people living in regions with greater social inequalities are likely to have higher rates of COVID-19 incidence and deaths.28 By generating a granular deprivation index based on zip code information our study adds to previous country-level ecological studies. In addition, recent studies on the impact of socioeconomic status on COVID-19 have shown that community-level deprivation indices are strongly associated with COVID-19 risk and mortality.29,30 However, it is still possible that differences in deprivation exists within communities. Further studies including information about household characteristics, built environment, or access to healthy foods are needed to expand these initial associations.
Our study adds to knowledge by formally investigating how diet quality, in the context of distal social determinants of health, associates with risk and severity of COVID-19. While our study supports the beneficial association of diet quality with COVID-19 risk and severity, particularly among individuals with higher deprivation, we cannot completely rule out the potential for residual confounding. Individuals who eat healthier diets are likely to share other features that might be associated with lower risk of infection such as the adoption of other risk mitigation behaviors, better household conditions and hygiene, or access to care. However, it is reassuring that our findings were consistent despite controlling for additional surrogate markers of SARS-CoV-2 infection such as mask wearing or community transmission rate, two of the most relevant factors associated with virus transmission and COVID-19 risk.31 These findings suggest that efforts to address disparities in COVID-19 risk and severity should consider specific attention to access to healthy foods as a social determinants of health.
We acknowledge several limitations. First, as an observational study, we are unable to confirm a direct causal association between diet and COVID-risk or infer specific mechanisms. Second, our study population was not a random sampling of the population. Although this limitation is inherent to any study requiring voluntary provision of information, we recognize our participants are mainly white participants and less likely to live in low deprived areas and are less ethnically diverse than the general population.19 Thus, generalizability of our finding even to the wider British and American population is uncertain. Third, our results could be biased due to the time lapse between the dietary recalls, administered a few months after the relevant period of exposure (pre-pandemic). However, our sensitivity analyses in which we censored cases that had occurred before the administration of the diet survey showed consistent results. Fourth, the self-reported nature of the diet questionnaire is prone to measurement error and bias, and the use of a short food frequency survey could have further reduced the resolution of dietary data collected. More accurate dietary intake assessment methods such as the use of dietary intake biomarkers would be valuable in future studies,32 but also difficult to implement in large-scale and time-sensitive investigations. Fifth, we defined risk of severe COVID-19 according to reports of hospitalization with oxygen support, which may not have captured more severe or fatal cases.
Conclusions
In conclusion, our data provide evidence that a healthy diet was associated with lower risk of COVID-19 and severe COVID-19 even after accounting for other healthy behaviors, social determinants of health, and virus transmission measures. The joint association of diet quality with socioeconomic deprivation was greater than the addition of the risks associated with each individual factor, suggesting that diet quality may play a direct influence in COVID-19 susceptibility and progression. Our findings suggest that public health interventions to improve nutrition and poor metabolic health and address social determinants of health may be important for reducing the burden of the pandemic.
Data Availability
Data availability statement Zoe Platform data used in this study is available to researchers through UK Health Data Research using the following link: https://web.www.healthdatagateway.org/dataset/fddcb382-3051-4394-8436-b92295f14259. The diet quality data used for this study are held by the department of Twin Research at Kings College London. The data can be released to bona fide researchers using our normal procedures overseen by the Wellcome Trust and its guidelines as part of our core funding. We receive around 100 requests per year for our datasets and have a meeting three times a month with independent members to assess proposals. Application is via https://twinsuk.ac.uk/resources-for-researchers/access-our-data/. This means that the data needs to be anonymized and conform to GDPR standards.
Conflict of Interest
JW, CH, SS, and JC are employees of Zoe Global Ltd. TDS, ERL, SEB, area consultant to Zoe Global Ltd. DAD, JM, and ATC previously served as investigators on a clinical trial of diet and lifestyle using a separate mobile application that was supported by Zoe Global Ltd. Other authors have no conflict of interest to declare.
Contributions
JM, ADJ, LHN, ERL, TDS, SEB, and ATC conceived the study design. JM, ADJ, ERL, MSG, JC, BM, SS contributed to the statistical analysis. All authors were involved in acquisition, analysis, or interpretation of data. JM, LHN, and DAD wrote the first draft of the manuscript. DAD, WCW, SO, CJS, JW, PWF, TDS, SEB, ATC obtained funding. JM, ADJ, LHN provided administrative, technical, or material support. TDS, SEB, ATC jointly supervised this work. All authors contributed to the critical revision of the manuscript for important intellectual content and approved the final version of the manuscript. The corresponding authors attest that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.
Data availability statement
Zoe Platform data used in this study is available to researchers through UK Health Data Research using the following link:
https://web.www.healthdatagateway.org/dataset/fddcb382-3051-4394-8436-b92295f14259. The diet quality data used for this study are held by the department of Twin Research at Kings’ College London. The data can be released to bona fide researchers using our normal procedures overseen by the Wellcome Trust and its guidelines as part of our core funding. We receive around 100 requests per year for our datasets and have a meeting three times a month with independent members to assess proposals. Application is via https://twinsuk.ac.uk/resources-for-researchers/access-our-data/. This means that the data needs to be anonymized and conform to GDPR standards.
Grant Support
National Institutes of Health, American Gastroenterological Association, Massachusetts Consortium on Pathogen Readiness, National Institute for Health Research, Crohn’s and Colitis Foundation, Chronic Disease Research Foundation, UK Medical Research Council, Wellcome Trust, UK Research and Innovation. The funders had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Acknowledgments
We express our sincere thanks to all of the participants who entered data into the app, including study volunteers enrolled in cohorts within the Coronavirus Pandemic Epidemiology (COPE) consortium. We thank the staff of Zoe Global, the Department of Twin Research at King’s College London and the Clinical and Translational Epidemiology Unit at Massachusetts General Hospital for tireless work in contributing to the running of the study and data collection. This work was conducted using the Short Form FFQ tool developed by Cleghorn as reported in https://doi.org/10.1017/S1368980016001099 and listed in the Nutritools (www.nutritools.org) library.