Modeling the relative influence of socio-demographic variables on post-acute COVID-19 quality of life ===================================================================================================== * Tigist F. Menkir * Barbara Wanjiru Citarella * Louise Sigfrid * Yash Doshi * Luis Felipe Reyes * Jose A. Calvache * Anders Benjamin Kildal * Anders B. Nygaard * Jan Cato Holter * Prasan Kumar Panda * Waasila Jassat * Laura Merson * Christl A. Donnelly * Mauricio Santillana * Caroline Buckee * Stéphane Verguet * Nima S. Hejazi ## Abstract **Background** Post-acute sequelae of SARS-CoV-2, referred to as “long COVID”, are a globally pervasive threat. While their many clinical determinants are commonly considered, their plausible social correlates are often overlooked. **Methods** Here, we use data from a multinational prospective cohort study to compare social and clinical predictors of differences in quality of life with long COVID. We further measure the extent to which clinical intermediates may explain relationships between social variables and quality of life with long COVID. **Findings** Beyond age, neuropsychological and rheumatological comorbidities, educational attainment, employment status, and female sex were important predictors of long COVID-associated quality of life days (long COVID QALDs). Furthermore, most of their associations could not be attributed to key long COVID-predicting comorbidities. In Norway, 90% (95% CI: 77%, 100%) of the adjusted association between belonging to the top two quintiles of educational attainment and long COVID QALDs was not explained by these clinical intermediates. The same was true for 86% (73%, 100%) and 93% (80%,100%) of the adjusted association between full-time employment and long COVID QALDs in the United Kingdom (UK) and Russia. Additionally, 77% (46%,100%) and 73% (52%, 94%) of the adjusted associations between female sex and long COVID QALDs in Norway and the UK were unexplained by the clinical mediators. **Interpretation** Our findings highlight that socio-economic proxies and sex are key predictors of long COVID QALDs and that other (non-clinical) mechanisms drive their observed relationships. Importantly, we outline a multi-method, adaptable causal approach for evaluating the isolated contributions of social disparities to experiences with long COVID. **Funding** UK Foreign, Commonwealth and Development Office; Wellcome Trust; Bill & Melinda Gates Foundation; Oxford COVID-19 Research Response Funding; UK National Institute for Health and Care Research; UK Medical Research Council; Public Health England; Liverpool Experimental Cancer Medicine Centre; Research Council of Norway; Vivaldi Invest A/S; South Eastern Norway Health Authority ## Introduction Long-term COVID-19 sequelae, referred to as long COVID, have resulted in a pressing public health crisis since early 2020. As defined by the World Health Organization, long COVID encompasses unexplainable symptoms which persist at least three months after an infection, occurring over two or more months.1 Its widespread presence and impacts have been immense: a multinational study found that nearly half of individuals who were previously infected with SARS-CoV-2 went on to experience long-term symptoms around four months post infection2; further an estimated 59% of previously infected subjects reported a reduced quality of life (QoL).3 Prior work has focused on identifying a myriad of clinical risk factors for post-COVID-19 consequences, including co-infections and pre-existing conditions, non-vaccination, age, and female sex.4–7 Conditions that have been consistently identified as key correlates of long COVID sequelae include obesity, asthma and other pulmonary diseases, chronic cardiac disease, diabetes, and smoking.6,7 Beyond clinical factors, social vulnerabilities are often critical determinants of differential disease burden overall.8–11 Such inequities are attributed to broader challenges in access to health services and an array of health-threatening exposures, including but not limited to food and housing insecurity, financial discrimination, and air pollution.8–11 While there have been efforts to examine social factors potentially linked to long-term symptoms of COVID-19, findings on these relationships have been somewhat mixed.7,12–17 For instance, a 2021 study in the United Kingdom (UK) found that living in high-deprivation settings was associated with both a higher and lower odds of symptom persistence, depending on the measure of deprivation index used18, and a 2021 study in Michigan (USA) found that lower income was both significantly associated and not associated with long COVID symptoms’ prevalence, depending on the post-illness duration considered.13 It is also important to highlight that many of these studies rely on self-reported binary measures of post-COVID recovery, which do not capture nuanced experiences in recovery and may be differentially classified across demographic groups. Given this context, we aimed to complement efforts centered on uncovering disparities in long COVID outcomes, by leveraging a large dataset from a prospective, observational, multinational study (n=76 countries) of COVID patients with post-infection follow-up data, focusing on Norway, the UK, and Russia. Specifically, we formally assessed the relationship between a diverse group of biological and social exposures, and our long COVID QoL measure, reasoning that factors like socio-economic status (SES) would be as much or more critical risk factors than important comorbidities, as has been recently illustrated for related outcomes, such as “healthy aging”.19 We further evaluated the extent to which clinical intermediates contribute to any observed disparities, hypothesizing that they may only partially explain these differences. Our analysis applies a similar mediation-centered lens to that of Vahidy et al.17 and Lu et al.20 However, rather than focusing on evaluating whether various mediators can independently explain the effects of a given social factor on long COVID risk17 or on how much social factors mediate disparities in all-cause mortality20 we measured the degree to which social variables’ associations with long COVID-linked QoL *cannot* be explained by a collective set of comorbidities. ## Methods This study uses data from the International Severe Acute Respiratory and emerging Infection Consortium’s (ISARIC) multi-cohort consortium.21 This prospective study across 76 countries collected demographic and illness-associated data during acute SARS-CoV-2 infection, with a subset of sites assessing participants any time from one-three months following their infection, and subsequent check-ins at three-six month intervals thereafter, depending on the site.22 Recruitment was aimed at clinical settings, among patients who had COVID-19, which included hospitalized and non-hospitalized cases. Complete details on the study design and recruitment procedures can be found in the published follow-up protocol.22 We focused our analysis on countries with data available on SES, age, and sex, QoL, *and* comorbidities, with combined demographic, comorbidity, and QoL datasets yielding sample sizes of at least n=1000 subjects. The countries meeting this criterion were Norway (n=1672), the UK (n=1064), and Russia (n=1155). Our study incorporates information on two measures capturing long COVID experiences, self-reported continued symptomatology and post-illness QoL, thereby capturing a spectrum of post-COVID experiences in recovery. Health utility values were obtained using standard QoL-adjustment estimation procedures, based on subjects’ responses to the EQ-5D-5L survey in follow-up forms, eliciting self-reported rankings of the intensity of problems experienced with each of five dimensions of health (mobility, self-care, usual activities, pain/discomfort, and anxiety/depression)23, prior to their COVID infection and in the present. We subset each cohort to only subjects reporting at least one long COVID-associated symptom that was not present prior to illness. Utility scores were computed following standard practice and then time-transformed (Supplementary Appendix: Methods). Similar to Sandmann et al.24, we used a measure of quality-adjusted life days or QALDs (See Supplementary Appendix: Methods) and additionally chose to focus on QoL at least three months following infection, or QoL in the ‘present’. Issues with recalling experiences several months in the past are likely to result in erroneous measures of pre-COVID QALDs, lending to biased measures of QALD differences. We incorporated covariate data on age, socio-demographic variables, female sex at birth and SES proxies, a set of clinical comorbidities and risk behaviors, COVID-19 severity, antiviral treatment, and vaccination status. For all countries included in our analysis, indicators of SES were selected depending on the data available (quintiles of educational attainment (years) in Norway and pre-illness employment status in the UK and Russia) (Supplementary Appendix Table S2). Finally, for a subset of variables with incomplete data, we used multiple imputation by chained equations to assign values to missing entries (Supplementary Appendix: Methods). The distributions of sex at birth in our sampled populations non-negligibly deviated from those in the underlying Norwegian and British populations. Thus, to correct for this binary mis-representativeness, we conducted sensitivity analyses where subjects were assigned post-stratification weights according to the *raking* method (Supplementary Appendix: Methods and Results).25 To identify social predictors of long COVID QALDs, we applied a series of random forest ensemble learners26 for each country, fit to all available clinical and demographic data, where variables were either treated individually (RF #1), pre-grouped based on subject matter knowledge (RF #2), or grouped algorithmically via hierarchical clustering (RF #3). We implemented a pre-grouped procedure, which incorporates subject matter knowledge, as an alternative to model-grouped approaches agnostic to such context. For RF #1 and RF #2, the percent increase in mean squared error (MSE) associated with each variable was reported as a measure of importance, while, for RF #3, the frequency of variable selection was used. To estimate the natural direct effects (NDE) and natural indirect effects (NIE) of binary SES proxies or female sex on long COVID QALDs in each cohort, we applied a flexible semi-parametric statistical approach (Supplementary Appendix: Methods).27 The NDE and NIE arise from a decomposition of the average treatment effect or total effect (TE), as first described by Robins and Greenland.28 In this context, the NDE describes the relationship between a given social variable and long COVID QALDs, operating through all pathways excluding the mediators of interest, while the NIE describes this exposure/outcome relationship *through* the mediators. We define the proportion non-mediated as NDE/the total effect (TE), i.e. the share of the TE of the social variable that cannot be explained by the clinical intermediates. In order for these measures to be interpreted causally, several assumptions are necessary, including exposure and mediator positivity, well-defined exposures and potential outcomes, no interference between study units, and no unmeasured confounding of the exposure-outcome, exposure-mediator, and mediator-outcome relationships.28,29 Rather than make the strict, and possibly untenable assumption, that these criteria for causal identifiability are all met, we eschew any claims about causality, using causal mediation analysis as a framework to arrive at *interpretable, model-free* statistical (i.e., non-causal) target parameters. Thus, the estimates we report reflect adjusted associations, where the NDE, NIE, and proportion non-mediated are used to communicate, in a clearly interpretable manner, how mediators play a role in any observed disparities. All non pre-processing code is publicly available at: [https://github.com/goshgondar2018/social\_long\_covid](https://github.com/goshgondar2018/social_long_covid). ### Role of the funding source The funders of this study had no role in study design, data collection, data analysis, data interpretation, or manuscript writing. ## Results ### Norway Long COVID QALDs in this cohort reported a median of 345 (Interquartile range/IQR: 313-360). There was no broadly consistent trend in long COVID QALDs across quintiles of educational attainment, although the lowest mean QALDs occurred in the bottom two quintiles. The greatest differences in long COVID QALDs, in order of increasing magnitude, occurred between quintiles 3 and 1, 5 and 1, and 4 and 1. Estimated long COVID QALDs among males slightly exceeded that of females (W=265746, p<0.0001). This cohort was the youngest, with a mean age of 51.8 years (SD: 13.6 years). The most commonly reported comorbidity was asthma (22%). From the total cohort, 50% reported receiving at least one dose of any COVID-19 vaccine. Among the leading individual predictors of long COVID QALDs from RF #1, anxiety/depression ranked first, followed by educational attainment, rheumatological disorder, and age (Figure 1a). For RF #2, the first and second PCs of the cluster containing all socio-demographic variables, i.e., educational attainment indicators and sex, ranked below the first and second PCs of the cluster containing psychological disorder and chronic neurologic disorder (Figure 1b). RF #3 largely corroborated these orderings, where psychological disorder, rheumatological disorder, chronic neurological disorder, and asthma were the most consistently selected variables within identified important clusters, followed by educational attainment (in years) and a dummy educational attainment indicator for quintile 5 (vs 1) (Figure 1c). ![Figure 1a.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/07/08/2024.02.21.24303099/F1.medium.gif) [Figure 1a.](http://medrxiv.org/content/early/2024/07/08/2024.02.21.24303099/F1) Figure 1a. Estimated variable importance measures, i.e. % increase in mean squared error or MSE, from individual random forest implementation (RF #1) for Norway. Variables with negative % MSE values are considered unimportant. ![Figure 1b.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/07/08/2024.02.21.24303099/F2.medium.gif) [Figure 1b.](http://medrxiv.org/content/early/2024/07/08/2024.02.21.24303099/F2) Figure 1b. Estimated variable importance measures, i.e. % increase in mean squared error or MSE, from pre-grouped random forest implementation (RF #2) for Norway. Rows indicate cluster names (a full list of variables belonging to each cluster can be found in Supplementary Table S3) and corresponding principal components, if the cluster consists of multiple variables. PC1 denotes principal component 1 and PC2 denotes principal component 2. ![Figure 1c.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/07/08/2024.02.21.24303099/F3.medium.gif) [Figure 1c.](http://medrxiv.org/content/early/2024/07/08/2024.02.21.24303099/F3) Figure 1c. Number of times (frequency) each variable appears in clusters selected for each CoV-VSURF run (RF #3) for Norway. We estimated that falling in the top two quintiles of educational attainment was significantly associated with 12.3 (6.49,18.2) additional long COVID QALDs, on average, via the NDE, and 0.67 (−0.982, 2.32) additional long COVID QALDs on average, via the NIE (non-significant), corresponding to a proportion non-mediated of 0.897 (0.773, 1). That is, 89.7% of the adjusted association between high educational attainment and long COVID QALDs could not be explained by the included mediators and must thus be attributed to other mechanisms. The exact relationship between the NDE/NIE and proportion non-mediated may not hold for computational reasons, because CIs are estimated separately for each of the measures using cross-validation, which may introduce technical noise due to the randomness inherent in sample splitting. We obtained consistent directionality in findings for pairwise comparisons of quintiles 3 and 1, 4 and 1, and 5 and 1, with the greatest proportion non-mediated for the quintile 5 versus 1 comparison. However, we note that such pairwise comparisons warrant multiple testing corrections before any inferential claims can be made. A clear and statistically significant negative association was also observed between female sex and long COVID QALDs, with an estimated NDE of −6.79 (−12.8, −0.723), NIE of −3.05 (−5.89, −0.215) and proportion non-mediated of 0.773 (0.455,1). ### UK The median (IQR) of long COVID QALDs was 295 (233, 342). Employment status was markedly skewed towards full-employment (51%), retirement (30%), part-time employment (10%) and unemployment (7%). The least represented categories were students (0.6%) and the furloughed (0.5%). Estimated long COVID QALDs were greatest among participants who reported being furloughed, students, or full-time employees and lowest among those in the unemployed and retired categories. Estimated long COVID QALDs were also slightly higher among males (W=104752, p<0.0001). This cohort was skewed towards older adults (mean (SD): 59.0 (12.6) years) and the most commonly reported comorbidity was hypertension (36%). Employment status was the leading predictor for long COVID QALDs in the UK, followed by psychological disorder, age, employment status category, chronic neurological disorder, and rheumatological disorder, based on RF #1 (Figure 2a). Sex followed in the rankings, which, along with the acute COVID-19 severity indicator, fell among the top ten predictors (Figure 2a). RF #2 further supported the predictive role of employment status and sex as a group, with the PCs of the socio-demographic variables leading, closely followed by the PCs of the group of mental health and neurological disorders (Figure 2b). Age alone ranked highly, even in comparison to grouped factors (Figure 2b). Findings from RF #3 aligned well with these results, with age, chronic neurological disorder, employment status indicators, and psychological disorder the most commonly selected variables, followed by rheumatological disorder, across key clusters (Figure 2c). ![Figure 2a.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/07/08/2024.02.21.24303099/F4.medium.gif) [Figure 2a.](http://medrxiv.org/content/early/2024/07/08/2024.02.21.24303099/F4) Figure 2a. Estimated variable importance measures, i.e. % increase in mean squared error or MSE, from individual random forest implementation (RF #1) for the UK. Variables with negative % MSE values are considered unimportant. ![Figure 2b.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/07/08/2024.02.21.24303099/F5.medium.gif) [Figure 2b.](http://medrxiv.org/content/early/2024/07/08/2024.02.21.24303099/F5) Figure 2b. Estimated variable importance measures, i.e. % increase in mean squared error or MSE, from pre-grouped random forest implementation (RF #2) for the UK. Rows indicate cluster names (a full list of variables belonging to each cluster can be found in Supplementary Table S3) and corresponding principal components, if the cluster consists of multiple variables. PC1 denotes principal component 1 and PC2 denotes principal component 2. ![Figure 2c.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/07/08/2024.02.21.24303099/F6.medium.gif) [Figure 2c.](http://medrxiv.org/content/early/2024/07/08/2024.02.21.24303099/F6) Figure 2c. Number of times (frequency) each variable appears in clusters selected for each CoV-VSURF run (RF #3) for the UK. Increased income/job stability, as proxied by employment status, was consistently associated with increased long COVID QALDs, irrespective of the binary designation. We found that self-reported full-time employment versus any other employment status category (excluding retirement) was associated with, on average, 31.7 (14.2, 49.3) higher long COVID QALDs, via the NDE, and 4.90 (−0.0652, 9.86) higher long COVID QALDs, via the NIE (narrowly non-significant), with a proportion non-mediated of 0.862 (0.729, 0.996). We obtained an even stronger and significant relationship between self-reported full-time employment versus unemployment, with, on average, 79.5 (50.0, 109) increased long COVID QALDs among the full-time employed relative to the unemployed (NDE) and 9.50 (1.04, 18.0) increased long COVID QALDs among the full-time employed versus unemployed (NIE). The proportion non-mediated was 0.905 (0.829, 0.981), suggesting that around 90.5% of the adjusted association between full-time employment versus unemployment on long COVID QALDs does not operate through the considered mediators. Female sex was associated with lower expected long COVID QALDs, with an NDE and NIE of −24.2 (−37.8, −10.7) and −9.61 (−16.3, −2.95), respectively. The corresponding proportion non-mediated was the lowest observed among all contrasts, with 72.9% (51.9%, 93.5%) of the adjusted association between female sex and long COVID QALDs being unexplained by the clinical intermediates. ### Russia The median (IQR) of long COVID QALDs was 353 (334-365). Employment status was markedly skewed towards full-employment (55%) and retirement (39%), with unemployment, part-time employment, Carers (each 2%), and Students (0.17%) seeing less representation. Long COVID QALDs were highest among students, part-time employees, full-time employees, and carers, and lowest among those in the retired and unemployed categories. Males reported higher long COVID QALDs than females (W=132896, p<0.0001). The mean age of participants was 59.6 years (SD: 14.4 years) and hypertension was the most frequently reported comorbidity (59%). According to RF#1, age, followed by employment status indicators, hypertension, and chronic neurological disorder, outranked all other variables in predicting long COVID QALDs in this cohort (Figure 3a). RF #2 generally supported these findings. The cluster containing solely age led the rankings. The principal components (PCs) of the cluster containing the socio-demographic variables, the first PC of the cluster containing hypertension and other cardiac disease, and the first PC of the cluster containing dementia and chronic neurological disorder (Figure 3b) then followed. Similarly, for RF #3, age, other chronic cardiac disease, chronic neurological disorder, as well as dementia, employment status indicators, hypertension, rheumatological disorder, and sex led the set of most frequently selected variables (Figure 3c). ![Figure 3a.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/07/08/2024.02.21.24303099/F7.medium.gif) [Figure 3a.](http://medrxiv.org/content/early/2024/07/08/2024.02.21.24303099/F7) Figure 3a. Estimated variable importance measures, i.e. % increase in mean squared error or MSE, from individual random forest implementation (RF #1) for Russia. Variables with negative % MSE values are considered unimportant. ![Figure 3b.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/07/08/2024.02.21.24303099/F8.medium.gif) [Figure 3b.](http://medrxiv.org/content/early/2024/07/08/2024.02.21.24303099/F8) Figure 3b. Estimated variable importance measures, i.e. % increase in mean squared error or MSE, from pre-grouped random forest implementation (RF #2) for Russia. Rows indicate cluster names (a full list of variables belonging to each cluster can be found in Supplementary Table S3) and corresponding principal components, if the cluster consists of multiple variables. PC1 denotes principal component 1 and PC2 denotes principal component 2. ![Figure 3c.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/07/08/2024.02.21.24303099/F9.medium.gif) [Figure 3c.](http://medrxiv.org/content/early/2024/07/08/2024.02.21.24303099/F9) Figure 3c. Number of times (frequency) each variable appears in clusters selected for each CoV-VSURF run (RF #3) for Russia. Full-time employment was associated with higher long COVID QALDs compared to all other employment status categories. 12.9 (95% CI: 5.28, 20.5) more long COVID QALDs were expected among subjects self-reporting full-time employment compared to all other employment categories, via the NDE, which was significant. An additional 4.03 (−1.37, 1.56) long COVID QALDs were expected among subjects self-reporting full-time employment compared to all other employment categories, via the NIE. The proportion non-mediated was estimated to be 0.93 (0.80, 1). Female sex was associated with lower COVID QALDs, with an estimated NDE of −7.49 (−13.3, −1.69) and NIE of −0.0547 (−2.02, 1.91). The estimated proportion mediated was negative, which corresponds to a proportion non-mediated exceeding 1. This result is intuitive given our estimates of the NIE, where the upper bound of the CI fell markedly above 0, indicating insufficient evidence in favor of a *positive* NIE. In other words, the NDE and NIE for female sex may act in opposite directions, i.e. mediators have an opposing intermediary effect, in this cohort. ## Discussion In this study, we provided a quantitative assessment of the extent to which social factors, compared to commonly highlighted clinical conditions, may relate to varying experiences with long COVID. The data provided compelling evidence for specific categories of pre-existing comorbidities, namely neurological, psychological, and rheumatological, and age, being major predictors of long COVID-associated QoL. Of note, we observed that educational attainment or employment status and sex at birth were generally as or more predictive of long COVID QALDs. Our mediation analyses further suggest that not only are indicators of social disadvantage highly predictive of lower long COVID QALDs, but also that the connection between these variables and long COVID QALDs could only be partially explained by key long COVID-predicting comorbidities. This general finding, i.e., that disparities are not solely attributable to underlying differences in comorbidity rates across various demographic groups, has been validated in other studies conducted over the course of the pandemic.12,17,30 Our study benefited from the use of a sizable and multi-national cohort with long-term QoL data, providing more nuanced information on post-acute COVID-19 experiences beyond simply whether patients experienced long COVID symptoms. Given the reasonably large sample sizes of our study cohorts, we were able to apply data-adaptable machine learning tools, including recent developments in causal machine learning.27 The variable selection methods we used avoid strict modeling assumptions and further accommodate inherent variable groupings. The causal mediation approaches we applied integrate flexible, but simultaneously relatively precise, algorithms27, providing a promising alternative to strictly parametric approaches.31 There are several important limitations of our analysis. First, we note that we did not have information on subject-specific duration of long COVID, and instead assumed a uniform duration, consistent with examples in the literature.32,33 Additionally, we were unable to examine the varying roles of the different social and clinical factors on *changes* in QoL, due to the aforementioned issues with recall. However, to assess whether differences in post-COVID QALDs between socioeconomic groups and self-reported sex were simply artifacts of baseline QoL differences, we compared pre-COVID QoL scores across groups. While these measures may be biased, they can provide a partial lens to any differences in baseline QoL. We found that for Norway and Russia, variability in estimated QALDs across socioeconomic categories and sex was much higher post-COVID, suggesting that underlying (non-COVID attributed) heterogeneity in QoL in these groups did not fully drive the differences we observed. The UK cohort had pre-COVID QoL measures for only 26% of subjects, so we could not make a standardized comparison. For future work, it is imperative to collect information on QoL measures at all stages of illness, not simply ex post facto, by positioning readily implementable study protocols at the outset of an outbreak. It can also be concluded that survivorship bias34 might affect our results, as only subjects who completed a follow-up survey at any follow-up interval can have their QoL measures recorded. Those lost to follow up due to death from early myocardial infarction, vascular strokes, etc., are also likely to have a reduced QoL prior to this event. However, no participants in all three cohorts died at any point during follow-up. While participant recruitment was extensive for each of our cohorts of focus, filtering the full multi-country dataset to subjects with available demographic, comorbidity, and QoL data led to sharply reduced sample sizes, especially for middle-income countries. Relatedly, variables like vaccination status and antiviral treatment were generally sparsely recorded. Thus, the prioritization of both data unification procedures that enable the synchronous collection of the range of variables considered, and the collection of data in less resource-rich/high-attention settings, are crucial. Additionally, the marked underrepresentation of race/ethnic minorities in the UK and Norway cohorts, and the lack of race/ethnicity data and relevance of this construct in the Russian cohort, prevented us from investigating the impacts of race/ethnicity. It is crucial that future studies seek to recruit more balanced cohorts in this respect, and, where applicable, address constraints with documenting such information. Finally, we were limited to specific socio-economic variables that may not fully reflect participants’ levels of socio-economic deprivation, which further varied by country. Thus, we can only draw conclusions about the role of socio-economic status in relation to the specific measures defined for each country. For future work, it would be useful to emphasize the collection of more proximate *shared* indicators of socio-economic status. Finally, we were limited to data on sex at birth, which does not capture important gender-based disparities that exist beyond this binary.35 Despite these challenges, the central aim of this analysis was to outline a robust statistical and causal-analytic framework for highlighting the contribution of social disparities to chronic ill-health. Our framework can be used to compare a collective of diverse variables, grouping related factors when necessary, as predictors for worsened post-acute COVID-19 QoL and to distill the unique role of any social variable of interest. Our data highlights the multifactorial relationship between pre-existing risk factors and socio-economic factors and long COVID QoL. As such, we demonstrate that accounting for social vulnerabilities when evaluating determinants of post-acute COVID-19 trajectories is essential and that studies and interventions focusing solely on clinical targets may not be sufficient. Conversely, transformational societal interventions, that can address disease exposures and access to care, educational, and employment, and other social determinants of health, have the opportunity to lead to potentially more comprehensive benefits and improve overall well-being in marginalized communities. ## Supporting information Supplementary Appendix [[supplements/303099_file02.pdf]](pending:yes) ## Data Availability The data that underpin this analysis are highly detailed clinical data on individuals hospitalised with COVID-19. Due to the sensitive nature of these data and the associated privacy concerns, they are available via a governed data access mechanism following review of a data access committee. Data can be requested via the IDDO COVID-19 Data Sharing Platform (http://www.iddo.org/covid-19). The Data Access Application, Terms of Access and details of the Data Access Committee are available on the website. Briefly, the requirements for access are a request from a qualified researcher working with a legal entity who have a health and/or research remit; a scientifically valid reason for data access which adheres to appropriate ethical principles.The full terms are at: https://www.iddo.org/document/covid-19-data-access-guidelines. A small subset of sites who contributed data to this analysis have not agreed to pooled data sharing as above. In the case of requiring access to these data, please contact the corresponding author in the first instance who will look to facilitate access. All code (with the exception of code used to process the individual datasets) is publicly available at: https://github.com/goshgondar2018/social\_long\_covid. [https://github.com/goshgondar2018/social\_long\_covid](https://github.com/goshgondar2018/social_long_covid) ## Funding Statement TFM acknowledges support from NIH Training Grant 2T32AI007535. LFR was funded by Universidad de La Sabana (MED-309-2021). MS has been funded (in part) by contracts 200-2016-91779 and cooperative agreement CDC-RFA-FT-23-0069 with the Centers for Disease Control and Prevention (CDC). The findings, conclusions, and views expressed are those of the author(s) and do not necessarily represent the official position of the CDC. MS was also partially supported by the National Institute of General Medical Sciences of the National Institutes of Health under award number R01GM130668.This work was made possible by the UK Foreign, Commonwealth and Development Office and Wellcome [215091/Z/18/Z, 222410/Z/21/Z, 225288/Z/22/Z and 220757/Z/20/Z], the Bill & Melinda Gates Foundation [OPP1209135], the philanthropic support of the donors to the University of Oxford’s COVID-19 Research Response Fund (0009109), grants from the National Institute for Health and Care Research (NIHR award CO-CIN-01/DH_/Department of Health/United Kingdom), the Medical Research Council (MRC grant MC_PC_19059), and by the NIHR Health Protection Research Unit (HPRU) in Emerging and Zoonotic Infections at University of Liverpool in partnership with Public Health England (PHE), (award 200907), NIHR HPRU in Respiratory Infections at Imperial College London with PHE (award 200927), Liverpool Experimental Cancer Medicine Centre (grant C18616/A25153), NIHR Biomedical Research Centre at Imperial College London (award ISBRC-1215-20013), and NIHR Clinical Research Network providing infrastructure support, the Comprehensive Local Research Networks (CLRNs), Cambridge NIHR Biomedical Research Centre (award NIHR203312), the Research Council of Norway grant no 312780, and a philanthropic donation from Vivaldi Invest A/S owned by Jon Stephenson von Tetzchner to the Norwegian SARS-CoV-2 study, the South Eastern Norway Health Authority and the Research Council of Norway. ## Competing interests MS has received institutional research funds from the Johnson and Johnson foundation and from Janssen global public health. MS also received institutional research funding from Pfizer. ## ISARIC Clinical Characterization Group Beatrice Alex, Eyvind W. Axelsen, Benjamin Bach, John Kenneth Baillie, Wendy S. Barclay, Joaquín Baruch, Husna Begum, Lucille Blumberg, Debby Bogaert, Fernando Augusto Bozza, Sonja Hjellegjerde Brunvoll, Polina Bugaeva, Aidan Burrell, Denis Butnaru, Roar Bævre-Jensen, Gail Carson, Meera Chand, Barbara Wanjiru Citarella, Sara Clohisey, Marie Connor, Graham S. Cooke, Andrew Dagens, John Arne Dahl, Jo Dalton, Ana da Silva Filipe, Emmanuelle Denis, Thushan de Silva, Pathik Dhangar, Annemarie B. Docherty, Christl A. Donnelly, Thomas Drake, Murray Dryden, Susanne Dudman, Jake Dunning, Anne Margarita Dyrhol-Riise, Linn Margrete Eggesbø, Merete Ellingjord-Dale, Cameron J. Fairfield, Tom Fletcher, Victor Fomin, Robert A. Fowler, Christophe Fraser, Linda Gail Skeie, Carrol Gamble, Michelle Girvan, Petr Glybochko, Christopher A. Green, William Greenhalf, Fiona Griffiths, Matthew Hall, Sophie Halpin, Bato Hammarström, Hayley Hardwick, Ewen M. Harrison, Janet Harrison, Lars Heggelund, Ross Hendry, Rupert Higgins, Antonia Ho, Jan Cato Holter, Peter Horby, Samreen Ijaz, Mette Stausland Istre, Clare Jackson, Waasila Jassat, Synne Jenum, Silje Bakken Jørgensen, Karl Trygve Kalleberg, Christiana Kartsonaki, Seán Keating, Sadie Kelly, Kalynn Kennon, Saye Khoo, Beathe Kiland Granerud, Anders Benjamin Kildal, Eyrun Floerecke Kjetland, Paul Klenerman, Gry Kloumann Bekken, Stephen R Knight, Andy Law, Jennifer Lee, Gary Leeming, Wei Shen Lim, Andreas Lind, Miles Lunn, Laura Marsh, John Marshall, Colin McArthur, Sarah E. McDonald, Kenneth A. McLean, Alexander J. Mentzer, Laura Merson, Alison M. Meynert, Sarah Moore, Shona C. Moore, Caroline Mudara, Daniel Munblit, Srinivas Murthy, Fredrik Müller, Karl Erik Müller, Nikita Nekliudov, Alistair D Nichol, Mahdad Noursadeghi, Anders Benteson Nygaard, Piero L. Olliaro, Wilna Oosthuyzen, Peter Openshaw, Massimo Palmarini, Carlo Palmieri, Prasan Kumar Panda, Rachael Parke, William A. Paxton, Frank Olav Pettersen, Riinu Pius, Georgios Pollakis, Mark G. Pritchard, Else Quist-Paulsen, Dag Henrik Reikvam, David L. Robertson, Amanda Rojek, Clark D. Russell, Aleksander Rygh Holten, Vanessa Sancho-Shimizu, Egle Saviciute, Janet T. Scott, Malcolm G. Semple, Catherine A. Shaw, Victoria Shaw, Louise Sigfrid, Mahendra Singh, Vegard Skogen, Sue Smith, Lene Bergendal Solberg, Tom Solomon, Shiranee Sriskandan, Trude Steinsvik, Birgitte Stiksrud, David Stuart, Charlotte Summers, Andrey Svistunov, Arne Søraas, Emma C. Thomson, Mathew Thorpe, Ryan S. Thwaites, Peter S Timashev, Kristian Tonby, Lance C.W. Turtle, Anders Tveita, Timothy M. Uyeki, Steve Webb, Jia Wei, Murray Wham, Maria Zambon. View this table: [Table 1:](http://medrxiv.org/content/early/2024/07/08/2024.02.21.24303099/T1) Table 1: Summary of demographic variables (excluding SES proxies) and common comorbidities in the final study populations for each cohort, post-missing data imputation. ## Acknowledgments The investigators thank all the clinical and research staff, who performed the follow-up assessments and collected this data, and the participants for their individual contributions in these difficult times. We would also like to thank the Long Covid Support group and ISARIC’s Global Support Centre for their invaluable support. We also acknowledge the support of the COVID clinical management team, AIIMS, Rishikesh, India; the Liverpool School of Tropical Medicine and the University of Oxford; Imperial NIHR Biomedical Research Centre; the dedication and hard work of the Norwegian SARS-CoV-2 study team; and preparedness work conducted by the Short Period Incidence Study of Severe Acute Respiratory Infection. This work uses data provided by patients and collected by the NHS as part of their care and support #DataSavesLives. The data used for this research were obtained from ISARIC4C. We are extremely grateful to the 2648 frontline NHS clinical and research staff and volunteer medical students who collected these data in challenging circumstances; and the generosity of the patients and their families for their individual contributions in these difficult times. The COVID-19 Clinical Information Network (CO-CIN) data was collated by ISARIC4C Investigators. We also acknowledge the support of Jeremy J Farrar and Nahoko Shindo. *The computations in this paper were run on the FASRC Cannon cluster supported by the FAS Division of Science Research Computing Group at Harvard University* ## Footnotes * Latest version of manuscript & supplementary appendix * Received February 21, 2024. * Revision received July 7, 2024. * Accepted July 8, 2024. * © 2024, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution-NonCommercial-NoDerivs 4.0 International), CC BY-NC-ND 4.0, as described at [http://creativecommons.org/licenses/by-nc-nd/4.0/](http://creativecommons.org/licenses/by-nc-nd/4.0/) ## References 1. 1.World Health Organization. Post COVID-19 condition (Long COVID). (2022). 2. 2.O’Mahoney, L. L. et al. The prevalence and long-term health effects of Long Covid among hospitalised and non-hospitalised populations: a systematic review and meta-analysis. eClinicalMedicine 55, 101762 (2023). 3. 3.Malik, P. et al. Post-acute COVID-19 syndrome (PCS) and health-related quality of life (HRQoL)—A systematic review and meta-analysis. J Med Virol 94, 253–262 (2022). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/JMV.27309&link_type=DOI) 4. 4.Sudre, C. H. et al. Attributes and predictors of long COVID. Nat Med 27, 626–631 (2021). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41591-021-01292-y&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F07%2F08%2F2024.02.21.24303099.atom) 5. 5.Thompson, E. J. et al. Long COVID burden and risk factors in 10 UK longitudinal studies and electronic health records. Nat Commun 13, 3528 (2022). 6. 6.Tsampasian, V. et al. Risk Factors Associated With Post−COVID-19 Condition: A Systematic Review and Meta-analysis. JAMA Intern Med 183, 566 (2023). 7. 7.Subramanian, A. et al. Symptoms and risk factors for long COVID in non-hospitalized adults. Nat Med 28, 1706–1714 (2022). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/S41591-022-01909-W&link_type=DOI) 8. 8.Office of Disease Prevention and Health Promotion, Office of the Assistant Secretary for Health, Office of the Secretary, U.S. Department of Health and Human Services. Social Determinants of Health. 9. 9.Jones, C. P., Jones, C. Y., Perry, G. S., Barclay, G. & Jones, C. A. Addressing the Social determinants of children’s Health: A cliff Analogy. JHCPU 20, 1–12 (2009). 10. 10.Berger, Z., Altiery De Jesus, V., Assoumou, S. A. & Greenhalgh, T. Long COVID and Health Inequities: The Role of Primary Care. Milbank Q 99, 519–541 (2021). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/1468-0009.12505&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F07%2F08%2F2024.02.21.24303099.atom) 11. 11.Bibbins-Domingo, K. Integrating Social Care Into the Delivery of Health Care. JAMA 322 (2019). 12. 12.Shabnam, S. et al. Socioeconomic inequalities of Long COVID: a retrospective population-based cohort study in the United Kingdom. J R Soc Med 116, 263–273 (2023). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1177/01410768231168377&link_type=DOI) 13. 13.Hirschtick, J. L. et al. Population-based estimates of post-acute sequelae of SARS-CoV-2 infection (PASC) prevalence and characteristics. Clin Infect Dis 73, 2055–2064 (2021). 14. 14.Müller, S. A. et al. Prevalence and risk factors for long COVID and post-COVID-19 condition in Africa: a systematic review. Lancet Glob Health 11, e1713–e1724 (2023). 15. 15.Robinson-Lane, S. G. et al. Race, Ethnicity, and 60-Day Outcomes After Hospitalization With COVID-19. J Am Med Dir Assoc 22, 2245–2250 (2021). 16. 16.Naidu, S. et al. The impact of ethnicity on the long-term sequelae of COVID-19: Follow-up from the first and second waves in North London. Thorax. 76, A141 (2021). 17. 17.Vahidy, F. S. et al. Racial and ethnic disparities in SARS-CoV-2 pandemic: analysis of a COVID-19 observational registry for a diverse US metropolitan population. BMJ Open 10, e039849 (2020). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiYm1qb3BlbiI7czo1OiJyZXNpZCI7czoxMjoiMTAvOC9lMDM5ODQ5IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjQvMDcvMDgvMjAyNC4wMi4yMS4yNDMwMzA5OS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 18. 18.Park, C., Ayoubkhani, D., et al. Short Report on Long COVID. (2021). 19. 19.Santamaria-Garcia, H. et al. Factors associated with healthy aging in Latin American populations. Nat Med 29, 2248–2258 (2023). 20. 20.Lu, J. et al. Educational inequalities in mortality and their mediators among generations across four decades: nationwide, population based, prospective cohort study based on the ChinaHEART project. BMJ 382, e073749 (2023) doi:10.1136/bmj-2022-073749. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiYm1qIjtzOjU6InJlc2lkIjtzOjE5OiIzODIvanVsMTlfMy9lMDczNzQ5IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjQvMDcvMDgvMjAyNC4wMi4yMS4yNDMwMzA5OS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 21. 21.ISARIC Clinical Characterization Group et al. ISARIC-COVID-19 dataset: A Prospective, Standardized, Global Dataset of Patients Hospitalized with COVID-19. Sci Data 9, 454 (2022). 22. 22.International Severe Acute Respiratory and emerging infection Consortium. COVID-19 Long term protocol. [https://isaric.org/research/covid-19-clinical-research-resources/covid-19-long-term-follow-up-study/](https://isaric.org/research/covid-19-clinical-research-resources/covid-19-long-term-follow-up-study/) 23. 23.Euroqol. EQ-5D-5L | About. (2021). [https://euroqol.org/information-and-support/euroqol-instruments/eq-5d-5l/](https://euroqol.org/information-and-support/euroqol-instruments/eq-5d-5l/) 24. 24.Sandmann, F. G. et al. Long-Term Health-Related Quality of Life in Non-Hospitalized Coronavirus Disease 2019 (COVID-19) Cases With Confirmed Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) Infection in England: Longitudinal Analysis and Cross-Sectional Comparison With Controls. Clin Infect Dis 75, e962–e973 (2022). 25. 25.DeBell, M. & Krosnick, J.A. Computing Weights for American National Election Study Survey Data. [https://electionstudies.org/wp-content/uploads/2018/04/nes012427.pdf](https://electionstudies.org/wp-content/uploads/2018/04/nes012427.pdf). 26. 26.Breiman, L. Random Forests. Machine Learning 45 (2001). 27. 27.Hejazi, N., Rudolph, K. & Díaz, I. medoutcon: Nonparametric efficient causal mediation analysis with machine learning in R. JOSS 7, 3979 (2022). 28. 28.Robins, J.M. & Greenland, S. Identifiability and Exchangeability for Direct and Indirect Effects. Epidemiology 3 (1992). 29. 29.Benkeser, D., Díaz, I. & Carone, M. Statistical Learning in Mediation Analysis − Chapter 3: Natural direct and indirect effects. (2021). 30. 30.Qeadan, F. et al. Racial disparities in COVID-19 outcomes exist despite comparable Elixhauser comorbidity indices between Blacks, Hispanics, Native Americans, and Whites. Sci Rep 11, 8738 (2021). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41598-021-88308-2&link_type=DOI) 31. 31.VanderWeele, T. & Vansteelandt, S. Mediation Analysis with Multiple Mediators. Epidemiologic Methods 2 (2014). 32. 32.Mizrahi, B. et al. Long covid outcomes at one year after mild SARS-CoV-2 infection: nationwide cohort study. BMJ 380, e072529 (2023). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiYm1qIjtzOjU6InJlc2lkIjtzOjIwOiIzODAvamFuMTFfMTAvZTA3MjUyOSI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDI0LzA3LzA4LzIwMjQuMDIuMjEuMjQzMDMwOTkuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 33. 33.Cai, J. et al. A one-year follow-up study of systematic impact of long COVID symptoms among patients post SARS-CoV-2 omicron variants infection in Shanghai, China. Emerg Microbes & infect 12, 2220578 (2023). 34. 34.Smith, L. H. Selection Mechanisms and Their Consequences: Understanding and Addressing Selection Bias. Curr Epidemiol Rep 7, 179–189 (2020). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s40471-020-00241-6&link_type=DOI) 35. 35.US National Center for Health Statistics. Long COVID: Household Pulse Survey. (2022).[https://www.cdc.gov/nchs/covid19/pulse/long-covid.htm](https://www.cdc.gov/nchs/covid19/pulse/long-covid.htm)