Replicated blood-based biomarkers for Myalgic Encephalomyelitis not explicable by inactivity ============================================================================================ * Sjoerd Viktor Beentjes * Julia Kaczmarczyk * Amanda Cassar * Gemma Louise Samms * Nima S. Hejazi * Ava Khamseh * Chris P. Ponting ## Abstract Myalgic Encephalomyelitis (ME; sometimes referred to as chronic fatigue syndrome) is a relatively common and female-biased disease of unknown pathogenesis that profoundly decreases patients’ health-related quality-of-life. ME diagnosis is hindered by the absence of robustly-defined and specific biomarkers that are easily measured from available sources such as blood, and unaffected by ME patients’ low level of physical activity. Previous studies of blood biomarkers have not yielded replicated results, perhaps due to low study sample sizes (*n <* 100). Here, we use UK Biobank (UKB) data for up to 1,455 ME cases and 131,303 population controls to discover hundreds of molecular and cellular blood traits that differ significantly between cases and controls. Importantly, 116 of these traits are replicated, as they are significant for both female and male cohorts. Our analysis used semi-parametric efficient estimators, an initial Super Learner fit followed by a one-step correction, three types of mediators, and natural direct and indirect estimands, to decompose the average effect of ME status on molecular and cellular traits. Strikingly, these trait differences cannot be explained by ME cases’ restricted activity. Of 3,237 traits considered, ME status had a significant effect on only one, via the “Duration of walk” (UKB field 874) mediator. By contrast, ME status had a significant direct effect on 290 traits (9%). As expected, these effects became more significant with increased stringency of case and control definition. Significant female and male traits were indicative of chronic inflammation, insulin resistance and liver disease. Individually, significant effects on blood traits, however, were not sufficient to cleanly distinguish cases from controls. Nevertheless, their large number, lack of sex-bias, and strong significance, despite the ‘healthy volunteer’ selection bias of UKB participants, keep alive the future ambition of a blood-based biomarker panel for accurate ME diagnosis. ## 1. Introduction Physical inactivity accelerates the loss of cardiovascular and strength fitness, shortens healthspan and increases all-cause mortality risk [1, 2, 3]. It lowers insulin sensitivity and elevates the synthesis of triglyceride, ceramide and sphingomyelin in muscle [4]. According to UK National Health Service guidance, exercise is “the miracle cure we’ve all been waiting for” [5]. Nevertheless, exercise is not a universal panacea: it is contraindicated among those with cardiovascular disease, anaemia and hyperthyroidism, for example [6]. A patient might also only accept exercise as treatment if they believe its benefit outweighs its cost [7]. Myalgic encephalomyelitis (ME; also known as chronic fatigue syndrome, CFS) is a disease of unknown pathogenesis defined by post-exertional malaise (PEM), the dramatic worsening of symptoms after even minor mental or physical exertion [8]], which usually lasts at least 24 hours, in contrast to other fatiguing illnesses [9]. ME has no cure and no widely effective therapy [10]. About 10% of people experiencing viral (such as with Epstein-Barr, Ross River virus or SARS-CoV-2 virus) or bacterial (such as with *Coxiella burnetii*) infection subsequently present ME or ME-like symptoms [11, 12]. In addition, over one-third of people with ME report not experiencing an infectious episode preceding their initial symptoms [13, 14]. Full recovery from ME is rare, at about 5% [15]. It is a female-dominant disease, with females outnumbering males by up to five-to-one; females also report more severe symptoms [13, 14]. In common with many female-biased diseases it has a high burden (*e.g.*, in disability-adjusted life years) and low overall research funding [16]. ME is not rare, as it affects 0.19% *−* 0.86% of people in western countries [17, 18]. Individuals with ME commonly report PEM, pain, fatigue, sensitivities to noise, and cognitive and autonomic deficits [13] and a health-related quality of life worse than 20 other conditions [19]. There are no clinical biomarkers for ME. A high priority for people with ME is an accurate and reliable diagnostic test [20]. Findings from dozens of biomarker studies have shown limited reproducibility, perhaps due to their typically low sample sizes, their frequent use of inappropriate statistical tests [21] and the known heterogeneity of ME’s symptoms and potentially aetiology [22]. Whilst cardiopulmonary exercise testing does not initially differentiate between people with ME and control individuals, it does so in a follow-up test one day later [23, 24]. This test, however, is not in common use because it risks triggering PEM. Any clinical biomarker would need to account for individuals’ inactivity relative to the general population. This is because many people with ME do not exercise and often restrict their activity [25] to reduce the risk of subsequent PEM. Some have proposed that it is this avoidance of activity that inhibits recovery by perpetuating ME symptoms following an acute illness [26, 27, 28]. However, therapies based on physical activity or exercise are not effective as a cure [29], implying that ME is instead an ongoing organic illness [30, 31]. It has also been claimed that any physiological abnormalities seen in people with ME might be caused by their inactivity [32]. In this study, we undertake 3 groups of analyses using UK Biobank (UKB) data [33] on (i) 31 blood cell and 30 blood biochemistry phenotypes; (ii) 251 NMR-measured metabolites; and, (iii) 2,923 proteins. Specifically, we quantify which blood traits, Nuclear Magnetic Resonance (NMR) metabolomics, and proteomics features are significantly different between ME cases versus controls, for males or females, or all combined, controlling for age (and sex for male and female combined analyses). The large UK Biobank data sets for ME cases and controls provided substantial statistical power to evaluate hypotheses, also allowing comparison between male-only and female-only analyses, something that had not been previously achievable. We take advantage of three mediators of sedentary lifestyle to determine whether any molecular or cellular trait associated with ME cases is explicable by physical inactivity. ## 2. Results ### 2.1. Study population: cases and controls We first defined 1,455 ME cases and 131,303 nonME population control individuals from the UKB ([33]; see Materials and Methods). For each group of analyses, cases and controls were restricted to those with measurements of 31 blood count and 30 blood biochemistry markers, or 251 NMR metabolites, or 2,923 protein levels, respectively. Collection of these biological samples was contemporaneous with self-reporting of CFS at the first visit to a UKB Assessment Centre (2006-2010). Numbers of samples in each category are shown in Table 1. ME sample sizes for measured outcome in blood traits, NMR metabolites and proteins are shown in Supplementary Fig. S1. View this table: [Table 1.](http://medrxiv.org/content/early/2024/08/28/2024.08.26.24312606/T1) Table 1. Numbers of UKB ME cases or non-ME controls per category. ### 2.2. Molecular and cellular traits significantly associated with ME We simultaneously quantified two effects of ME case status on molecular and cellular traits, the natural direct effect (NDE) and natural indirect effect (NIE) (Fig. 1A). We needed to control for age and sex because levels of some molecules are known to be age(*e.g.* HRG protein) and/or sex-dependent (*e.g.* ALT). NDE and NIE are mediational estimands that decompose the average effect of ME case status on molecular or cellular trait into (a) direct paths – those not involving the mediator (Fig. 1A, green) – and (b) indirect paths – those acting through the mediator (blue) – with level of activity as the mediator variable [34, 35] (see Materials and Methods). ![Figure 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/08/28/2024.08.26.24312606/F1.medium.gif) [Figure 1.](http://medrxiv.org/content/early/2024/08/28/2024.08.26.24312606/F1) Figure 1. **(A)** Directed Acyclic Graph for ME, taking age and sex as confounders and sedentary lifestyle (physical activity) as a mediator for ME’s effect on molecular and cellular traits. The causes of ME are an unknown variable (red). Therefore, all effect estimators are quantifying an association between ME and molecular or cellular traits and no causal statements are made. The “Age” variable (UKB field 21022) represents age at recruitment to UKB, rather than age of onset or diagnosis of ME. This variable affects the probability of having a ME diagnosis: recovery is minimal (*≈* 5%, [15]), and as they age people are increasingly more likely to be diagnosed with ME. As it also affects the molecular and cellular traits, age is treated as a confounder. **(B)** Venn diagrams displaying the number of significant findings in the males, females, combined and their intersection for NDE, mediator 874. Proteomics data have the smallest sample size (see Table 1) and least power, implying fewer significant results in male and females separately as compared to the combined analysis. As a mediator variable, we first used “Duration of walk” (UKB field 874). As expected, ME cases reported a lower duration of walk (mean: 44.0 mins/day) than controls (55.3 mins/day). At a false discovery rate [36] (FDR) *<* 0.05, significant direct effects were found for 36 (of 61 + 2 composite) blood traits, 189 (of 251) NMR metabolites and 65 (of 2,923) proteins (Fig. 1A). All estimates restrict to complete cases, removing individuals with missing trait data in that estimate. For all three analyses, the number of significant NDE results and their intersection in each of the male, female and combined categories, are presented in Fig. 1B. Significant NDEs on molecular and cellular blood traits for females or males or combined are shown in Fig. 2A. NDEs are strongly correlated between females and males (Fig. 2B). Twenty traits are separately significant in the two sexes (Fig. 2A, B) and thus their associations to ME status are independently replicated. A single trait (erythrocyte distribution width, sometimes a sign of anaemia) was also significant with positive NDE for males and negative NDE for females (Fig. 2A). ![](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/08/28/2024.08.26.24312606/F2/graphic-3.medium.gif) [](http://medrxiv.org/content/early/2024/08/28/2024.08.26.24312606/F2/graphic-3) ![](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/08/28/2024.08.26.24312606/F2/graphic-4.medium.gif) [](http://medrxiv.org/content/early/2024/08/28/2024.08.26.24312606/F2/graphic-4) Figure 2. Associational natural direct effects (NDE) of ME/CFS on molecular and cellular blood traits. **(A)** The sex-stratified analyses are presented in orange (female) and blue (male). For the combined analysis (grey), sex is additionally taken as a confounder. All traits that are significant for the UKB 874 mediator are shown (see Supplementary Table 7 for the UKB 884 and 894 mediators). Effect sizes (left) are plotted for the UKB 874 mediator (“Duration of walks”), for significant estimates (FDR *<* 0.05). Error bars indicate 95% confidence intervals. Note that the scale and unit of measurement for each trait (x-axis) are different. The analysis was repeated for the UKB 884 mediator (“Number of days/week of moderate physical activity”) and for the UKB 894 mediator (“Duration of moderate activity”), with the significant results (FDR *<* 0.05) in each category indicated by ‘+’ symbols for positive effects and ‘*−*’ for negative effects. Where there is no symbol, the effect was not significant. Notably, there were no discordant results across the three mediators. All blood trait names are as they appear in the UKB showcase, aside from TyG and TG-to-HDL-C ratio (indicated by *), which are composite measures of other blood traits. **(B)** Blood trait NDE z-scores, males (x-axis), females (y-axis). The Pearson correlation is 0.67 and significant. The red dots represent 14 blood traits that are significant in both males and females (FDR *<* 0.05). The yellow and blue dots represent blood traits that are significant in females only and males only, respectively (FDR *<* 0.05). The grey dots are significant in neither group while controlling FDR *<* 0.05. **(C)** Raw data empirical cumulative distribution functions (ECDFs) for TyG (top) and TG-to-HDL-C ratio (bottom), comparing controls (black) and cases (female on the left, male on the right). Among the 20 significantly associated traits for females and for males were traits indicative of chronic inflammation (elevated C-reactive protein [CRP] and cystatin C levels, and leukocyte and neutrophil counts), insulin resistance (elevated triglycerides-to-HDL cholesterol [TG-to-HDL-C] ratio, alanine aminotransferase [ALT], alkaline phosphatase [ALP] and gamma glutamyltransferase [GGT]), and liver disease (elevated ALT, ALP and GGT, and low urea levels) (Fig. 2A). Fig. 2C illustrates the shifts in two measures of insulin resistance, the TyG index [37, 38] (top) and TG-toHDL-C ratio [39] (bottom), between ME cases and controls. These are the UKB raw data, rather than results from mediation analysis. Strikingly, for the UKB 874 mediator, significant effects on ME case status were abundant for direct effects (*i.e.*, NDE; Fig. 2A; Fig. S2), but occurred only once (mean corpuscular haemoglobin; adjusted *p* = 0.043) for indirect effects (*i.e.*, NIE; Fig. 3). For every other one of the 61 + 2 composite blood traits, for females or males or both sexes combined, none was significant when controlling the FDR at *≤* 0.05 (Fig. 3). Results from applying two other mediators (Fig. 2A and Fig. 3) are presented later. ![Figure 3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/08/28/2024.08.26.24312606/F3.medium.gif) [Figure 3.](http://medrxiv.org/content/early/2024/08/28/2024.08.26.24312606/F3) Figure 3. Associational natural indirect effects (NIE) of ME/CFS on molecular and cellular blood traits. The sex-stratified analyses are presented in orange (female) and blue (male). For the combined analysis (grey), sex is additionally taken as a confounder. All traits that are significant for UKB mediator 884 are shown. This is the mediator with the most number of significant indirect effects. UKB mediator 874 has a single significant NIE (mean corpuscular haemoglobin for females) after FDR. UKB mediator 894 has no significant NIEs after FDR. Effect sizes are plotted for UKB mediator 884 “Number of days/week of moderate physical activity”, for significant estimates (FDR *<* 0.05). Error bars indicate 95% confidence intervals. Two other mediators, 874 “Duration of walks”, and 894 “Duration of moderate activity” do not yield significant NIEs (FDR *<* 0.05). Note that the scale and unit of measurement for each trait (x-axis) are different. Significant results (FDR *<* 0.05) for mediator 884 are indicated by ‘+’ for positive effects and ‘*−*’ for negative effects. Where there is no symbol, the effect was not significant. All blood trait names are as they appear in the UKB showcase, aside from TyG and TG-to-HDL-C ratio (indicated with *) which are composite measures of other blood traits. ### 2.3. Metabolite traits significantly associated with ME Of 251 NMR metabolite traits, 189 (75%) were significantly associated with ME status in an NDE analysis with females only (68 traits) or males only (10 traits) or in both the females only and males only analyses (96 traits) (UKB 874 mediator; Fig. 1B, Supplementary Table 4). Significant traits were mostly lipid levels, involving lipoproteins, cholesterol, and triglycerides. Results were highly concordant between females only and males only analyses (Fig. 4A and B) indicating that ME-specific blood metabolite differences are, again, generally not sex-biased. Previous ME/CFS metabolomic biomarker studies used one and three orders-of-magnitude fewer cases and controls, respectively [21]. The largest among these identified lowered phosphatidylcholines and cholines in blood from ME cases ([40], see also [41]), results that we replicated here (Fig. 4A). Higher triglycerides and lower HDL cholesterol in ME cases, observed using UKB enzymatic assays (Fig. 2A), were also observed as significant in the NMR metabolomics assays (Fig. 4A). Of 9 amino acids measured, only alanine was significantly elevated, and then only in female ME cases. Blood pyruvate and lactate, previously predicted to be ME biomarkers [42, 43], were also not significantly different between cases and controls. ![Figure 4.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/08/28/2024.08.26.24312606/F4.medium.gif) [Figure 4.](http://medrxiv.org/content/early/2024/08/28/2024.08.26.24312606/F4) Figure 4. Associational natural direct effects (NDE) of ME/CFS on NMR metabolites. **(A)** The sex-stratified analyses are presented in orange (female) and blue (male). For the combined analysis (grey), sex is additionally taken as a confounder. Eighteen of 184 traits are shown; results for all traits are provided in Supplementary Table 4. Effect sizes are plotted for mediator 874 “Duration of walks” for significant estimates (FDR *<* 0.05). Error bars indicate 95% confidence intervals. Note that the scale and unit of measurement (X-axis) are different for each metabolite. Asterisks (right) indicate effects that are significant (FDR *<* 0.05). Where there is no asterisk, the effect was not significant. There were no discordant results across the three analyses. All NMR metabolite names are as they appear in the UKB showcase. **(B)** NMR NDE values are strongly concordant between the two sexes. Shown are per-metabolite z-scores for males (x-axis) and females (y-axis). The Pearson correlation is 0.8 and significant. Red dots indicate metabolites that are significant in both males and females (FDR *<* 0.05). Yellow and blue dots represent metabolites that are significant in females only and males only, respectively (FDR *<* 0.05). Grey dots are significant in neither. None of the 251 metabolite traits was significant when controlling the FDR at *≤* 0.05 for indirect effects using the “Duration of walk” (UKB 874) mediator, for females or for males or for both combined (Fig. S2B). ### 2.4. Proteomic traits significantly associated with ME Repeating this NDE analysis using the UKB 874 mediator on levels of 2,923 proteins, measured using antibody-based assays, yielded only a single protein, extracellular superoxide dismutase or SOD3, whose abundance was significantly altered (FDR *<* 0.05) between cases and controls in both females and males. Relative to preceding analyses, this proteomic analysis is under-powered owing to there being fewer cases for whom data was available (Table 1) and its larger multiple testing burden. Implications of this association to SOD3 are unclear, although superoxide, SOD3’s substrate, is known to modulate the hyperalgesic response [44]. Maleor female-specific effects for the same protein are again correlated (Fig. 5; Supplementary Table 5). Considering all cases combined, 54 proteins are significant (FDR *<* 0.05; Figure 1B). Among these are 7 complement proteins (C1RL, C2, CFB, CFH, CFI, CFP and CR2) of the innate immune system, whose levels are all elevated in ME cases, including CR2 (complement C3d receptor 2), the receptor for Epstein-Barr virus (EBV) binding on B and T lymphocytes. Two of the up-regulated proteins (CDHR2 and CDHR5) together form the extracellular portion of the intermicrovillar adhesion complex, whose disruption leads to intestinal dysfunction and inflammatory bowel disease [45, 46]. ME cases also show increase in levels of leptin (LEP), which has a role in energy homeostasis [47]. Again, not a single protein among the 2, 923 yielded a significant NIE estimate for this mediator (Fig. S2B). ![Figure 5.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/08/28/2024.08.26.24312606/F5.medium.gif) [Figure 5.](http://medrxiv.org/content/early/2024/08/28/2024.08.26.24312606/F5) Figure 5. Protein NDE z-scores, males (x-axis), females (y-axis). The Pearson correlation is 0.26 and significant. The red dot represents the single protein (SOD3) that is significant in both males and females (FDR *<* 0.05). Yellow and blue dots indicate proteins that are significant in females only and in males only, respectively (FDR *<* 0.05). Grey dots show proteins that are significant in neither (*i.e.*, FDR *≥* 0.05). ![Figure 6.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/08/28/2024.08.26.24312606/F6.medium.gif) [Figure 6.](http://medrxiv.org/content/early/2024/08/28/2024.08.26.24312606/F6) Figure 6. **(A)** Blood trait total effect z-scores, males (x-axis), females (y-axis). The Pearson correlation is 0.86 and significant. The red dots represent 20 blood traits that are significant in both males and females (FDR *<* 0.05). The yellow and blue dots represent blood traits that are significant in females only and males only respectively (FDR *<* 0.05). The grey dots are significant in neither for FDR *<* 0.05. The *x* = *y* line indicates the line of equal z-scores for males and females. In general, in absolute value, the z-scores are higher for females than males. This is to be expected as the sample size is larger for females. **(B)** Metabolite total effect values are strongly concordant between the two sexes. Shown are per-metabolite z-scores for males (x-axis) and females (y-axis). The Pearson correlation is 0.91 and significant. Red dots indicate metabolites that are significant in both males and females (FDR *<* 0.05). Yellow and blue dots represent metabolites that are significant in females only and males only, respectively (FDR *<* 0.05). Grey dots are significant in neither. **(C)** Proteins total effect z-scores, males (x-axis), females (y-axis). The Pearson correlation is 0.33 and significant. Red dot represents the proteins (LEP, CDHR5, ADH4, RTN4R) that are significant in both males and females (FDR *<* 0.05). Yellow and blue dots represent proteins that are significant in females only and males only respectively (FDR *<* 0.05). The grey dots are significant in neither for FDR *<* 0.05. ### 2.5. Total effects We have shown above that direct effects dominate, so that indirect effects contribute little-or-nothing to molecular and cellular effects. In real-world settings, the quantity of most interest to clinicians will be the total effect (TE), accounting for age and sex, rather than the direct effect. Estimating the total effect for 63 blood traits finds 39 to be significant (FDR *<* 0.05) predictors of ME case status for females and males combined (Supplementary Fig. S2A). The traits that are robustly predictive of ME are those shown in Fig. 2A (with 4 exceptions: erythrocyte\_distribution_width, apoliprotein_b, creatinine and ldl_direct). For one or more of femaleor male-specific or combined TE analyses, a total of 251 proteins and 216 metabolites were additionally significant (FDR *<* 0.05; Supplementary Fig. S2A). Significantly enriched Gene Ontology (GO) terms for TE-significant proteins highlighted tumour necrosis factor (TNF) and interleukin-4 (IL4) production, and natural killer (NK) cell mediated cytotoxicity (Fig. S3). Nevertheless, TNF and IL4 proteins themselves were not significantly altered in abundance. Impaired NK cell cytotoxicity in ME/CFS, however, is one of the few cellular or molecular biomarkers that has often been replicated [48]. ### 2.6. Sensitivity analyses for blood traits Next, we investigated whether blood trait results replicate for 2 further mediators: “Number of days/week of moderate physical activity 10+ mins” (UKB field 884) and “Duration of moderate activity” (UKB field 894) questionnaire responses. As expected, ME cases reported less activity than controls: mean 2.77 vs 3.51 days/week, and 53.9 vs 60.0 mins/day for mediators 884 and 894, respectively. As before, significant effects on ME status were observed for direct effects, never indirect effects for the “Duration of moderate activity” mediator (UKB field 894) (Fig. 2A, Fig. 4). By contrast, for the “Number of days/week of moderate physical activity 10+ mins” (UKB field 884) mediator, 22 significant NIEs were identified: 14 (0.4%) and 8 (0.2%) traits at FDR *≤* 0.05 for combined female and male, and female-only data, respectively. This is an order-of-magnitude lower number of indirect effect findings, compared with the 290 (9% of all traits investigated) identified for direct effects using the UKB 874 (“Duration of walks”) mediator. Importantly, even when significant NIEs are found, they almost always contribute less to the total effect than NDEs (Fig. 7). ![Figure 7.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/08/28/2024.08.26.24312606/F7.medium.gif) [Figure 7.](http://medrxiv.org/content/early/2024/08/28/2024.08.26.24312606/F7) Figure 7. Associational NDE (blue) and NIE (red) as a fraction of the total effect for the effect of ME/CFS on molecular and cellular blood traits. The results are presented for male and female combined, for mediator 884 “Number of days/week of moderate physical activity”, the only mediator that exhibit indirect effects. Across all 61 blood traits, and the two composite metrics TyG and TG-to-HLD-C ratio, only 1 feature, Urate, has a larger NIE than NDE, for this mediator only. We additionally investigated the dependence of results on the choice of fitting algorithm(s) for blood traits. Specifically, the result in Fig. 2A are obtained using a cross-validating library of algorithms (Super Learner (SL), see Material and Methods). Results obtained with no SL – reducing the library to the baseline GLMnet – with mediator 874, for TE, NDE and NIE are provided in Supplementary Table 9. Although we recommend its use, leaving out the SL has only minor effect: 36 of 39 significant TE blood traits using UKB field 874 as mediator with the SL were also significant without its use, Supplementary Table 9. Full NDE and NIE values for all mediators with SL are provided in Supplementary Table 7. For TEs, 41 blood traits (as well as TyG and TG-to-HDL-C ratio) differ significantly between female or male ME cases and controls (Supplementary Table 3). To test whether extreme values affect these results, we winsorized the blood trait data at 0.5% and 1%. The results on the combined dataset are presented in Supplementary Fig. S4 and Supplementary Table 8, and remain robust. To obtain a high confidence set, we further restricted these traits to those significant for NDE for females and for males (mediators 874 and 884) and for females (mediator 894), resulting in 18 traits listed in Supplementary Table 2. Lastly, we found that TEs and NDEs increase as the stringency of case and control definitions increases (Fig. 8; see Supplementary Table 10 for full results). Specifically, we compared NDEs for molecular and cellular blood traits calculated from cases and controls as defined in Materials and Methods, but with or without overall health rating (UKB field 2178) of ‘Poor’ or ‘Fair’ at baseline for cases, and/or ‘Good’ or ‘Excellent’ for controls. ![Figure 8.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/08/28/2024.08.26.24312606/F8.medium.gif) [Figure 8.](http://medrxiv.org/content/early/2024/08/28/2024.08.26.24312606/F8) Figure 8. Total effects and NDEs for blood traits become more significant as the stringency of case and control definitions increases. **(A)** Total effect z-scores for ‘Poor/Fair’ for cases and ‘All’ (without restricting by health rating (UKB field 2178)) for controls versus z-scores for ‘All’ for cases and ‘All’ for controls (without restricting by health rating for cases or controls). The null hypothesis – that significance does not change for increasing stringency of case or control definition – is represented by the diagonal line. **(B)** Total effect z-scores for ‘Poor/Fair’ for cases and ‘Good/Excellent’ for controls, versus ‘Poor/Fair’ for cases vs ‘All’ for controls. **(C) and (D)** As in (A) or (B) but for NDE. ## 3. Discussion Our results reveal 511 blood-based biomarkers whose levels differ significantly between people with ME and those without ME (Fig. S2A). Our approach decomposed the total effect of ME on blood traits into two components: (1) the indirect effect of ME on these traits via activity, and (2) the direct effect through all other paths, not mediated via activity. We do not claim causality for our estimates, because the assumptions of no unmeasured confounding may be violated. Nevertheless, any “causal gap”, the difference between our estimates and any underlying causal estimand, cannot be due to age and sex, as we account for these factors. Our findings constitute differences of population estimates of blood biomarkers between case and control populations and do not provide individual-level predictions of caseness based on biomarker values. However, our results can be used for variable selection in training a prediction model, as long as an independent data set is used. If the same data is used twice, *i.e.*, both for variable selection and for training a prediction model, the resulting predictions will suffer from selective inference [49], with overly optimistic (invalid) prediction scores, and thus will not generalise to new cases. The large number of discoveries relative to previous studies likely reflects our study’s substantially higher numbers of cases and controls (Table 1). These large numbers allow many small average effects of ME status on molecular and cellular traits to be detected. Importantly, and unlike most previous studies, we *independently replicated* 166 biomarkers in both females and males (TEs; Fig. S2A). This indicates that our discoveries are both robust and not sex-biased. It thus provides strong evidence for ME disease pathophysiology being equivalent in both sexes. This is despite sex-bias of ME with respect to prevalence and onset, comorbidities, symptoms and other features [13, 50]. Importantly, these biomarker differences are not explicable by dissimilarities in physical activity: among 3,237 NIE estimates we obtained, ME status was significantly associated with only one trait (Fig. S2B). Blood traits thus distinguish ME cases from population controls, but not because of ME cases’ reduced physical activity levels. What then cause these molecular and cellular changes in blood if not physical activity? Our findings provide strong and replicated evidence for chronic low-level inflammation (elevated CRP and cystatin-C levels, and platelet, leukocyte and neutrophil counts), insulin resistance (elevated triglycerides-to-HDL-C ratio, ALT, ALP, GGT and HbA1c) and/or liver disease (elevated ALT, ALP, and GGT, and low urea levels) in ME (Fig. 2A). ME is thus portrayed by insulin resistance and systemic inflammation, with liver inflammation and dysfunction likely affecting lipid metabolism and the balance between HDL and LDL cholesterol. To our knowledge, the overall combination of blood marker changes we observed does not present in any other disease. For example, although primary biliary cholangitis is accompanied by elevated ALP and GGT levels (and post-exertional malaise [51]) it is also marked by high circulating levels of bilirubin rather than the lower levels we observe for ME (Fig. 2A). Nevertheless, because ME likely arises from multiple pathomechanisms and we did not further stratify cases, we cannot conclude that our results exclude other diseases from sharing a common aetiology with some ME cases. In general, shifts in trait values were modest. Among all 116 significant femaleand malereplicated traits, 91% had small-to-medium shifts (Cohen’s *d* between 0.2 and 0.5 [52]; Supplementary Table 11). No trait yielded clear separation in estimated effects between ME cases and controls, rather trait values overlapped extensively. For example, despite CRP level being significantly elevated in ME cases (TE analysis: adjusted *p* = 2.8 *×* 10*−*9; both sexes), only 4.8% of female and 2.5% of male ME cases (versus 2.2% and 1.8% controls, respectively) had CRP levels over 10mg/L, a moderate elevation that can indicate systemic inflammation in autoimmune disease. Consequently, no single blood trait we analysed will be an effective biomarker for ME. The major strength of the study is its large and deeply phenotyped cohort who were recruited, and their blood traits measured, using a single protocol. The study also controlled for potential confounders such as age, sex and physical inactivity. Additional mediators beyond physical activity were not considered as they were not directly relevant to this study’s principal hypothesis. The study was limited by the UK Biobank’s known healthy volunteer bias [53], possibly resulting in few, if any, people with severe ME symptoms at baseline participating. Future studies could test for the effect of symptom severity on the levels of biomarkers found to be significant in this study. UK Biobank recruited 40-69 year old participants [53], an age range when individuals are less likely to have a clinical diagnosis of ME [54]. We note that the list of cellular and molecular measurements in the UK Biobank is not exhaustive. For example, others have investigated potential biomarkers for oxidative stress [55] as well as gut metagenomics, immune-profiling and cytokines [56], which are absent from UKB. Evidence that there is a large number of replicated and diverse blood biomarkers that differentiate between ME cases and controls should now dispel any lingering perception that ME is psychosomatic [57]. These findings should also accelerate research into the minimum panel of blood traits required to accurately diagnose ME in real-world populations. Such a panel would be invaluable for diagnosis, for measuring response to future treatment or drug trials, and potentially for determining the worsening or progression of ME. Such a panel might also help to determine the distinctions or overlap between ME and symptomalogically similar diseases such as Long Covid and fibromyalgia. To assist the search for an effective biomarker panel for ME we provide the full results of this study in Supplementary Tables 3-5. ## 4. Materials and Methods ### 4.1. UK Biobank ME/CFS data processing We defined 1,455 ME cases and 131,303 non-ME control individuals from UKB [33] as follows. Cases self-reported a diagnosis of ‘Chronic Fatigue Syndrome’ (CFS) in verbal interview at their first visit to a UKB Assessment Centre (UKB field 20002); also, either they answered “Yes” to the question “Have you ever been told by a doctor that you have Myalgic Encephalomyelitis/Chronic Fatigue Syndrome?” in the ‘Experience of Pain Questionnaire’ (PQ) (2019-2020) (UKB field 120010), or they did not complete the PQ. They further reported an overall health rating (UKB field 2178) of ‘Poor’ or ‘Fair’ at baseline, and were of known genetic sex. Population controls did not self-report a CFS diagnosis in any of the 4 visits, answered “No” to the PQ question about a ME/CFS diagnosis, and were not linked to a Primary Care record (CTV3 or Read v2 code, Supplementary Table 1) of ME/CFS or to the ICD10:G93.3 code (‘Postviral fatigue syndrome‘) in Hospital Inpatient Data. They further reported an overall health rating (UKB field 2178) of ‘Good’ or ‘Excellent’ at baseline. UKB participants are older and report healthier lifestyles, higher levels of education and better health relative to the general UK population [58, 59]. UKB assessment at baseline was demanding in time (2-3h) and energy, including travel to the nearest of 22 centres. These requirements will have diminished the recruitment of people with severe or moderate, or even mild, ME symptoms. UKB blood samples were acquired and analysed as described previously [60, 61, 62]. For blood traits, we included two composite markers of insulin resistance: the triglyceride glucose (TyG) index [63, 64], and TG-to-HDL-C ratio [65]. Note that TyG is normally calculated using fasting levels of tryglycerides and plasma glucose [66], but these are not available from the UK Biobank. The ratio of triglycerides to HDL-cholesterol correlates inversely with the plasma level of small, dense LDL particles. For NMR metabolomics, we removed individuals whose NMR metabolite measurement has a QC flag indicating irregularities in the measurement, as per UKB category 221. For each estimator of type TE, NDE and NIE (below), we only considered individuals with the relevant variables measured. Specifically, for TE, we restricted to individuals with measured age, sex and outcome variable. For NDE and NIE, we additionally restricted to individuals with measured mediators of activity. Furthermore, for NDE and NIE, we removed individuals who answered ‘do not know’ or ‘prefer not to answer’ to the activity question (UKB datafield 874, 884, or 894). ### 4.2. Mediation estimators Causal mediation analysis, concerned with the quantification of the portion of a causal effect of an exposure on an outcome through a particular pathway, has been extensively discussed in the literature [67, 68]. The methodologies utilised in this work build upon natural (or pure) mediation estimands [34, 35]. Strategies for the construction of efficient estimators of non-parametrically defined causal mediation estimands, capable of incorporating machine learning, have been used in a variety of applications. Recent examples include understanding the biological mechanisms by which vaccines causally alter infection risk [69, 70, 71, 72], quantifying the effect of novel pharmacological therapies on substance abuse disorder relapse [73, 74] and the effects of housing vouchers on adolescent development [75], and modeling the effects of health disparities on quality of life [76]. Here we use state-of-the-art semi-parametric estimation techniques for non-parametric causal mediation analysis [77], implemented in the R package medoutcon [78, 79]. The NDE and NIE are mediational estimands that decompose the average effect (or average treatment effect, ATE) of ME status on molecular and cellular traits, Eq. 1. NDEs involve a comparison of two counterfactual trait outcomes, specifically: * (I) the level of the trait in a hypothetical scenario where every individual has ME, but rather than allowing ME to determine the level of activity, we fix their level of activity to the values they would naturally assume if they were not to have ME; and, * (II) the level of the trait in a hypothetical scenario where every individual is in the control group and their levels of activity are allowed to naturally respond to being in the control group. Comparison of these two trait levels yields a “direct” causal effect that quantifies the effect of ME on the trait through all paths other than the one mediated by activity. NIEs involve a comparison of two counterfactual trait outcomes, specifically: * (III) the level of the trait when every individual has ME and their levels of activity are allowed to naturally respond to ME; and, * (IV) the level of the trait in a hypothetical scenario where every individual has ME, but rather than allowing ME to determine the level of activity, we fix their activity level to the value they would naturally assume if they were not to have ME. Comparison of these two trait levels yields a causal “indirect” effect that quantifies the impact of ME on trait through activity (NIE). Crucially, the counterfactual trait outcomes (I) and (IV) are exactly the same quantity, and this insight gives rise to the “mediation formula” as follows: ![Formula][1] where *Y* (1) and *Y* (0) are potential outcomes in which an individual does or does not have ME, respectively. Similarly, *Y* (1*, M* (0)) is the potential outcome of an individual who has ME and whose mediator takes on the value it would have had if the individual did not have ME (given in words as (I) and (IV) above). Note also that *Y* (1) = *Y* (1*, M* (1)) and *Y* (0) = *Y* (0*, M* (0)). The left hand side of Eq. 1 defines the average treatment effect (ATE) of ME on blood trait *Y*, which we refer to as the total effect (TE). The right hand side of this equation is the sum of the NDE and NIE. Causal identification is the process of turning a causal quantity we wish to estimate (causal estimand – a functional of unobservable counterfactual data) into a statistical quantity we can estimate from observed data (statistical estimand – a functional of observed data). Causal identification does not require access to any data and is entirely distinct from statistical inference. There are 5 assumptions required for causal identifiability of Eq. 1: * (i) the Stable Unit Treatment Values Assumption (SUTVA) which includes consistency and no interference between units [80, 81]; * (ii) exchangeability (unconfoundedness), which is analogous to the randomization assumption applied to a joint intervention on both the treatment variable (here ME) and the mediator (here activity); * (iii) treatment positivity, which states that it must be possible to observe any given treatment value (here ME) across all strata of baseline covariates (age and sex); * (iv) mediator positivity, which states that it must be possible to observe any given mediator value across all strata defined by both treatment (ME) and baseline covariates (age and sex); and, * (v) Cross-world counterfactual independence *Y* (*T* = *t, M* = *m*) *⊥⊥ M* (*T* = *t′*) conditional on covariates, which is not empirically verifiable [82]. In our case, we do not claim causal identifiability because the assumptions of unconfoundedness (ii) may be violated, as made explicit in Fig. 1A (in red). Nevertheless, we can estimate the NDE and NIE as statistical quantities knowing that any causal gap will not be due to age or sex, as both of these variables have been taken into account as confounders. ### 4.3. Super Learner and one-step estimation We have used semi-parametric efficient estimators to estimate the TE, as well as the mediation effects NDE and NIE [79], on multiomic measurements. This estimation procedure consists of an initial Super Learner (SL) [83] fit to estimate relevant nuisance functions in as flexible a manner as allowed by the available data. This ensures that any model mis-specification bias is minimised. We then construct estimates of the NDE and NIE using a one-step bias-correction procedure, which appropriately handles the use of SL for nuisance parameter estimation while also allowing for uncertainty quantification, facilitating the construction of valid Wald-style confidence intervals based on the asymptotic properties of the one-step bias-corrected estimator [84]. The precise specification of these estimators is as follows. For the total effect, we have used the R package npcausal [85]. This package relies on the SuperLearner R package to specify models for fitting nuisance functions. We used: * (1) SL.earth, an implementation of multivariate adaptive regression splines [86]; * (2) SL.glmnet, penalised regression with a generalised linear model and hyperparameter *α* = 1, i.e., *L*1-penalised or Least Absolute Shrinkage and Selection Operator (LASSO) regression, with default 10-fold cross-validation; * (3) SL.glm.interaction, generalised linear model with main terms and 2-way interactions; * (4) SL.xgboost, extreme gradient boosting (XGB) used with default parameters [87]. For the mediation effects NDE and NIE, we used the R package medoutcon [78]. This package instead relies on the sl3 R package [88], an implementation of the ensemble machine learning algorithm of [83], to specify models for fitting nuisance functions. We used: * (1) Lrnr earth, an implementation of multivariate adaptive regression splines [86]; * (2) Lrnr glmnet, penalised linear regression with a generalised linear model and hyperparameters *α* = 1, i.e., *L*1-penalised or Least Absolute Shrinkage and Selection Operator (LASSO) regression, and default 3-fold cross-validation; * (3) Lrnr glm fast, a fast implementation of a generalised linear model used with main terms and 2-way interactions; and, * (4) Lrnr lightgbm, a fast and memory-efficient implementation of extreme gradient boosting (XGB) models from the lightgbm R package [89], used with default parameters. The estimation of NDE and NIE relies on the fitting of further nuisance functions for which we have used algorithms, such as the Highly Adaptive Lasso (HAL) [90, 91, 92], and parameter specifications recommended by medoutcon. ### 4.4. GO enrichment analysis We performed Gene Ontology analysis [93, 94, 95] on the set of significant TE estimates (positive only, negative only, or all combined) obtained from the male, female or combined populations. For the background protein set, we used all 2,923 proteins measured in UKB. We obtained significant results only for the set of proteins with a significant positive total effect in the female subset of the population at FDR *<* 0.05. The results are presented in Fig. S3. We used Rrvgo [95] to reduce redundancy of GO terms. ## Supporting information Supplementary Table captions [[supplements/312606_file02.docx]](pending:yes) Supplementary Table 1 [[supplements/312606_file03.xlsx]](pending:yes) Supplementary Table 2 [[supplements/312606_file04.xlsx]](pending:yes) Supplementary Table 3 [[supplements/312606_file05.xlsx]](pending:yes) Supplementary Table 4 [[supplements/312606_file06.xlsx]](pending:yes) Supplementary Table 5 [[supplements/312606_file07.xlsx]](pending:yes) Supplementary Table 6 [[supplements/312606_file08.xlsx]](pending:yes) Supplementary Table 7 [[supplements/312606_file09.xlsx]](pending:yes) Supplementary Table 8 [[supplements/312606_file10.xlsx]](pending:yes) Supplementary Table 9 [[supplements/312606_file11.xlsx]](pending:yes) Supplementary Table 10 [[supplements/312606_file12.xlsx]](pending:yes) Supplementary Table 11 [[supplements/312606_file13.xlsx]](pending:yes) ## Data Availability No data has been produced. All data is from UK Biobank. ## 5. Competing interests No competing interests declared. ![Figure S1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/08/28/2024.08.26.24312606/F9.medium.gif) [Figure S1.](http://medrxiv.org/content/early/2024/08/28/2024.08.26.24312606/F9) Figure S1. ME sample sizes for males and females, restricting to complete cases (individuals for whom a measurement is available). The minimum number of cases is indicated on each plot. **(A)** Blood traits, **(B)** NMR metabolites, **(C)** Proteomics. Neither of the two proteins with case sample size below 30 is significant after FDR correction. Full sample size data is provided as Supplementary Table 6. ![Figure S2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/08/28/2024.08.26.24312606/F10.medium.gif) [Figure S2.](http://medrxiv.org/content/early/2024/08/28/2024.08.26.24312606/F10) Figure S2. Venn diagrams displaying the number of significant findings in the males, females, combined and their intersection, mediator 874, for **(A)** total effect, and **(B)** NIE. ![Figure S3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/08/28/2024.08.26.24312606/F11.medium.gif) [Figure S3.](http://medrxiv.org/content/early/2024/08/28/2024.08.26.24312606/F11) Figure S3. GO pathway enrichment [93] for proteins with a significant positive total effect for ME/CFS vs control, restricted to females only. This is the subset with maximal power for GO analysis. All effects are TE, *i.e.*, there are no significant NIE for proteins. We performed a similar pathway GO enrichment analysis for proteins with a significant positive total effect for ME/CFS vs control on the population of males and the combined dataset, as well as all significant negative total effects and all significant total effects on the female, male and combined populations. These resulted in no significant GO term enrichments at FDR*<* 0.05. All measured UKB proteins were used as background for the GO analyses. ![Figure S4.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/08/28/2024.08.26.24312606/F12.medium.gif) [Figure S4.](http://medrxiv.org/content/early/2024/08/28/2024.08.26.24312606/F12) Figure S4. Significant blood traits are robust to winsorisation. The points represent total effect zscores for blood traits in the combined female and male analysis. The three shades of grey represent different degrees of winsorisation of the original data, with cases and controls combined prior to winsorisation. Nucleated red blood cell count and percent are only estimable at 0% winsorisation because for 0.5% winsorisation the number of cases is *≤* 5. Fib4 and eGFR composite measures were not estimated for 0% winsorisation due to extreme values in control samples (*e.g.*, individuals with platelet counts close to 0). ## Acknowledgements This work was supported by a grant for PhD-level research to GLS from ME Research UK (SCIO charity number SCO36942). This research has been conducted using the UK Biobank Resource under Application Number 76173. Access to this data was funded by the National Institute for Health and Care Research (NIHR) and Medical Research Council (MRC) under grant number MC PC 20005. AK was supported by a Langmuir Talent Development Fellowship from the Institute of Genetics and Cancer, and a philanthropic donation from Hugh and Josseline Langmuir. SB, AK and CP are thankful to M. E. Khamseh for helpful discussions, and to Simon McGrath and Julia Oakley for commenting on the draft manuscript. * Received August 26, 2024. * Revision received August 26, 2024. * Accepted August 28, 2024. * © 2024, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution 4.0 International), CC BY 4.0, as described at [http://creativecommons.org/licenses/by/4.0/](http://creativecommons.org/licenses/by/4.0/) ## References 1. [1].Booth FW, Roberts CK, Thyfault JP, Ruegsegger GN, Toedebusch RG. Role of Inactivity in Chronic Diseases: Evolutionary Insight and Pathophysiological Mechanisms. Physiological Reviews. 2017;97(4):1351–1402. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1152/physrev.00019.2016&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=28814614&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F08%2F28%2F2024.08.26.24312606.atom) 2. [2].Moore SC, Patel AV, Matthews CE, Berrington de Gonzalez A, Park Y, Katki HA, et al. Leisure Time Physical Activity of Moderate to Vigorous Intensity and Mortality: A Large Pooled Cohort Analysis. PLOS Medicine. 2012 11;9(11):1–14. 3. [3].Mok A, Khaw KT, Luben R, Wareham N, Brage S. Physical activity trajectories and mortality: population based cohort study. BMJ. 2019;365. 4. [4].Eggelbusch M, Charlton BT, Bosutti A, Ganse B, Giakoumaki I, Grootemaat AE, et al. The impact of bed rest on human skeletal muscle metabolism. Cell Reports Medicine. 2024;5(1):101372. 5. [5].Benefits of exercise;. Accessed: 2024-07-27. [https://www.nhs.uk/live-well/exercise/exercise-health-benefits/](https://www.nhs.uk/live-well/exercise/exercise-health-benefits/). 6. [6].Fletcher GF, Ades PA, Kligfield P, Arena R, Balady GJ, Bittner VA, et al. Exercise Standards for Testing and Training. Circulation. 2013;128(8):873–934. [FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiRlVMTCI7czoxMToiam91cm5hbENvZGUiO3M6MTQ6ImNpcmN1bGF0aW9uYWhhIjtzOjU6InJlc2lkIjtzOjk6IjEyOC84Lzg3MyI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDI0LzA4LzI4LzIwMjQuMDguMjYuMjQzMTI2MDYuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 7. [7].Rosenstock I, Strecher V, Becker M. Social learning theory and the Health Belief Model. Health education quarterly. 1988;15(2):175—183. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1177/109019818801500203&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=3378902&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F08%2F28%2F2024.08.26.24312606.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1988N373200003&link_type=ISI) 8. [8].Medicine I, Populations B, Syndrome C. Beyond myalgic encephalomyelitis/chronic fatigue syndrome: Redefining an illness; 2015. 9. [9].Cotler J, Holtzman C, Dudun C, Jason LA. A Brief Questionnaire to Assess Post-Exertional Malaise. Diagnostics. 2018;8(3). 10. [10].Myalgic encephalomyelitis (or encephalopathy)/chronic fatigue syndrome: diagnosis and management;. Accessed: 2023-08-15. [https://www.nice.org.uk/guidance/ng206](https://www.nice.org.uk/guidance/ng206). 11. [11].Hickie I, Davenport T, Wakefield D, Vollmer-Conna U, Cameron B, Vernon SD, et al. Post-infective and chronic fatigue syndromes precipitated by viral and non-viral pathogens: prospective cohort study. BMJ. 2006;333(7568):575. Available from: [https://www.bmj.com/content/333/7568/575](https://www.bmj.com/content/333/7568/575). 12. [12].Komaroff AL, Lipkin WI. ME/CFS and Long COVID share similar symptoms and biological abnormalities: road map to the literature. Frontiers in Medicine. 2023;10. Available from: [https://www.frontiersin.org/articles/](https://www.frontiersin.org/articles/) doi:10.3389/fmed.2023.1187163. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.3389/fmed.2023.1187163&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=37342500&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F08%2F28%2F2024.08.26.24312606.atom) 13. [13].Bretherick A, McGrath S, Devereux-Cooke A, Leary S, Northwood E, Redshaw A, et al. Typing myalgic encephalomyelitis by infection at onset: A DecodeME study. NIHR Open Research. 2023;3(20). 14. [14].Jason LA, Yoo S, Bhatia S. Patient perceptions of infectious illnesses preceding Myalgic Encephalomyelitis/Chronic Fatigue Syndrome. Chronic Illness. 2022;18(4):901–910. pmid:PMID: 34541918. Available from: doi:10.1177/17423953211043106. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1177/17423953211043106&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=PMID&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F08%2F28%2F2024.08.26.24312606.atom) 15. [15].Cairns R, Hotopf M. A systematic review describing the prognosis of chronic fatigue syndrome. Occupational Medicine. 2005 01;55(1):20–31. Available from: doi:10.1093/occmed/kqi013. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/occmed/kqi013&link_type=DOI) 16. [16].Smith K. Women’s health research lacks funding — these charts show how. Nature. 2023;617(7959):28–29. 17. [17].Nacul LC, Lacerda EM, Pheby D, Campion P, Molokhia M, Fayyaz S, et al. Prevalence of myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) in three regions of England: a repeated cross-sectional study in primary care. BMC Medicine. 2011;9(1):91. 18. [18].Valdez AR, Hancock EE, Adebayo S, Kiernicki DJ, Proskauer D, Attewell JR, et al. Estimating Prevalence, Demographics, and Costs of ME/CFS Using Large Scale Medical Claims Data and Machine Learning. Frontiers in Pediatrics. 2019;6. Available from: [https://www.frontiersin.org/articles/10.3389/fped.2018.00412](https://www.frontiersin.org/articles/10.3389/fped.2018.00412). 19. [19]. Falk Hvidberg M, Brinth LS, Olesen AV, Petersen KD, Ehlers L. The Health-Related Quality of Life for Patients with Myalgic Encephalomyelitis / Chronic Fatigue Syndrome (ME/CFS). PLOS ONE. 2015 07;10(7):1–16. Available from: doi:10.1371/journal.pone.0132421. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pone.0132421&link_type=DOI) 20. [20].Tyson S, Stanley K, Gronlund TA, Leary S, Dean ME, Dransfield C, et al. Research priorities for myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS): the results of a James Lind alliance priority setting exercise. Fatigue: Biomedicine, Health & Behavior. 2022;10(4):200–211. Available from: doi:10.1080/21641846.2022.2124775. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1080/21641846.2022.2124775&link_type=DOI) 21. [21].Maksoud R, Magawa C, Eaton-Fitch N, Thapaliya K, Marshall-Gradisnik S. Biomarkers for myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS): a systematic review. BMC Medicine. 2023;21(1):189. 22. [22].Huber K, Sunnquist M, Jason L. Latent class analysis of a heterogeneous international sample of patients with myalgic encephalomyelitis/chronic fatigue syndrome. Fatigue: Biomedicine, Health & Behavior. 2018 07;6:1–16. 23. [23].Snell CR, Stevens SR, Davenport TE, Van Ness JM. Discriminative Validity of Metabolic and Workload Measurements for Identifying People With Chronic Fatigue Syndrome. Physical Therapy. 2013 11;93(11):1484–1492. Available from: doi:10.2522/ptj.20110368. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6OToicHRqb3VybmFsIjtzOjU6InJlc2lkIjtzOjEwOiI5My8xMS8xNDg0IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjQvMDgvMjgvMjAyNC4wOC4yNi4yNDMxMjYwNi5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 24. [24].Keller B, Receno CN, Franconi CJ, Harenberg S, Stevens J, Mao X, et al. Cardiopulmonary and metabolic responses during a 2-day CPET in myalgic encephalomyelitis/chronic fatigue syndrome: translating reduced oxygen consumption to impairment status to treatment considerations. Journal of Translational Medicine. 2024;22(1):627. 25. [25].Silver A, Haeney M, Vijayadurai P, Wilks D, Pattrick M, Main CJ. The role of fear of physical movement and activity in chronic fatigue syndrome. Journal of Psychosomatic Research. 2002;52(6):485–493. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0022-3999(01)00298-7&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=12069873&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F08%2F28%2F2024.08.26.24312606.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000176495600009&link_type=ISI) 26. [26].Wessely S, David A, Butler S, Chalder T. Management of chronic (post-viral) fatigue syndrome. British Journal of General Practice. 1989;39(318):26–29. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoiYmpncCI7czo1OiJyZXNpZCI7czo5OiIzOS8zMTgvMjYiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyNC8wOC8yOC8yMDI0LjA4LjI2LjI0MzEyNjA2LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 27. [27].Moss-Morris R, Deary V, Castell B. Chapter 25 Chronic fatigue syndrome. In: Barnes MP, Good DC, editors. Neurological Rehabilitation. vol. 110 of Handbook of Clinical Neurology. Elsevier; 2013. p. 303–314. 28. [28].Sharpe M. Cognitive behavior therapy for chronic fatigue syndrome. The American Journal of Medicine. 1995 2024/06/22;98(4):419–420. 29. 29.for Health TNI, (NICE) CE. Myalgic encephalomyelitis (or encephalopathy)/chronic fatigue syndrome: diagnosis and management; 2021. Available from: [https://www.nice.org.uk/guidance/ng206](https://www.nice.org.uk/guidance/ng206). 30. [30].Geraghty K, Jason L, Sunnquist M, Tuller D, Blease C, Adeniji C. The ‘cognitive behavioural model’ of chronic fatigue syndrome: Critique of a flawed model. Health Psychology Open. 2019;6(1):2055102919838907. pmid:PMID: 31041108. Available from: doi:10.1177/2055102919838907. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1177/2055102919838907&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=PMID&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F08%2F28%2F2024.08.26.24312606.atom) 31. [31].of Medicine I. Beyond Myalgic Encephalomyelitis/Chronic Fatigue Syndrome: Redefining an Illness. Washington, DC: The National Academies Press; 2015. Available from: [https://nap.nationalacademies.org/catalog/19012/beyond-myalgic-encephalomyelitischronic-fatigue-syndrome-redefining-an-illness](https://nap.nationalacademies.org/catalog/19012/beyond-myalgic-encephalomyelitischronic-fatigue-syndrome-redefining-an-illness). 32. [32].White PD. The role of physical inactivity in the chronic fatigue syndrome. Journal of Psychosomatic Research. 2000;49(5):283–284. Available from: [https://www.sciencedirect.com/science/article/pii/](https://www.sciencedirect.com/science/article/pii/) S0022399900001951. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0022-3999(00)00195-1&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=11164051&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F08%2F28%2F2024.08.26.24312606.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000166153000001&link_type=ISI) 33. [33].Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562(7726):203–209. 34. [34].Robins JM, Greenland S. Identifiability and Exchangeability for Direct and Indirect Effects. Epidemiology. 1992;3(2):143–155. Available from: [http://www.jstor.org/stable/3702894](http://www.jstor.org/stable/3702894). 35. [35].Pearl J. Direct and indirect effects. UAI’01. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.; 2001. p. 411–420. 36. [36].Benjamini Y, Hochberg Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society Series B (Methodological). 1995;57(1):289–300. Available from: [http://www.jstor.org/stable/2346101](http://www.jstor.org/stable/2346101). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2307/2346101&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=WOS:A1995QE4&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F08%2F28%2F2024.08.26.24312606.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1995QE45300017&link_type=ISI) 37. [37].Fritz J, Bjørge T, Nagel G, Manjer J, Engeland A, Haggstrom C, et al. The triglyceride-glucose index as a measure of insulin resistance and risk of obesity-related cancers. International Journal of Epidemiology. 2019 04;49(1):193–204. Available from: doi:10.1093/ije/dyz053. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/ije/dyz053&link_type=DOI) 38. [38].Won KB, Park EJ, Han D, Lee JH, Choi SY, Chun EJ, et al. Triglyceride glucose index is an independent predictor for the progression of coronary artery calcification in the absence of heavy coronary artery calcification at baseline. Cardiovascular Diabetology. 2020;19(1):34. 39. [39].Oliveri A, Rebernick RJ, Kuppa A, Pant A, Chen Y, Du X, et al. Comprehensive genetic study of the insulin resistance marker TG:HDL-C in the UK Biobank. Nature Genetics. 2024;56(2):212–221. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-023-01625-2&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=38200128&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F08%2F28%2F2024.08.26.24312606.atom) 40. [40].Nagy-Szakal D, Barupal DK, Lee B, Che X, Williams BL, Kahn EJR, et al. Insights into myalgic encephalomyelitis/chronic fatigue syndrome phenotypes through comprehensive metabolomics. Scientific Reports. 2018;8(1):10056. 41. [41].Naviaux RK, Naviaux JC, Li K, Bright AT, Alaynick WA, Wang L, et al. Metabolic features of chronic fatigue syndrome. Proceedings of the National Academy of Sciences. 2016;113(37):E5472–E5480. Available from: [https://www.pnas.org/doi/abs/10.1073/pnas.1607571113](https://www.pnas.org/doi/abs/10.1073/pnas.1607571113). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoicG5hcyI7czo1OiJyZXNpZCI7czoxMjoiMTEzLzM3L0U1NDcyIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjQvMDgvMjgvMjAyNC4wOC4yNi4yNDMxMjYwNi5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 42. [42].Yamano E, Sugimoto M, Hirayama A, Kume S, Yamato M, Jin G, et al. Index markers of chronic fatigue syndrome with dysfunction of TCA and urea cycles. Scientific Reports. 2016;6(1):34990. 43. [43].Ghali A, Lacout C, Ghali M, Gury A, Beucher AB, Lozac’h P, et al. Elevated blood lactate in resting conditions correlate with post-exertional malaise severity in patients with Myalgic encephalomyelitis/Chronic fatigue syndrome. Scientific Reports. 2019;9(1):18817. 44. [44].Wang ZQ, Porreca F, Cuzzocrea S, Galen K, Lightfoot R, Masini E, et al. A Newly Identified Role for Superoxide in Inflammatory Pain. Journal of Pharmacology and Experimental Therapeutics. 2004;309(3):869–878. Available from: [https://jpet.aspetjournals.org/content/309/3/869](https://jpet.aspetjournals.org/content/309/3/869). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoianBldCI7czo1OiJyZXNpZCI7czo5OiIzMDkvMy84NjkiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyNC8wOC8yOC8yMDI0LjA4LjI2LjI0MzEyNjA2LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 45. [45].Crawley SW, Shifrin DA, Grega-Larson NE, McConnell RE, Benesh AE, Mao S, et al. Intestinal Brush Border Assembly Driven by Protocadherin-Based Intermicrovillar Adhesion. Cell. 2014;157(2):433–446. Available from: [https://www.sciencedirect.com/science/article/pii/S0092867414002153](https://www.sciencedirect.com/science/article/pii/S0092867414002153). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.cell.2014.01.067&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24725409&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F08%2F28%2F2024.08.26.24312606.atom) 46. [46].Mödl B, Awad M, Zwolanek D, Scharf I, Schwertner K, Milovanovic D, et al. Defects in microvillus crosslinking sensitize to colitis and inflammatory bowel disease. EMBO reports. 2023;24(10):e57084. Available from: [https://www.embopress.org/doi/abs/10.15252/embr.202357084](https://www.embopress.org/doi/abs/10.15252/embr.202357084). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.15252/embr.202357084&link_type=DOI) 47. [47].Triantafyllou G, Paschou S, Mantzoros C. Leptin and Hormones: Energy Homeostasis. Endocrinology & Metabolism Clinics of North America. 2016 09;45:633–45. 48. [48].Eaton-Fitch N, du Preez S, Cabanas H, Staines D, Marshall-Gradisnik S. A systematic review of natural killer cells profile and cytotoxic function in myalgic encephalomyelitis/chronic fatigue syndrome. Systematic Reviews. 2019;8(1):279. 49. [49].Taylor J, Tibshirani RJ. Statistical learning and selective inference. Proceedings of the National Academy of Sciences. 2015;112(25):7629–7634. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoicG5hcyI7czo1OiJyZXNpZCI7czoxMToiMTEyLzI1Lzc2MjkiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyNC8wOC8yOC8yMDI0LjA4LjI2LjI0MzEyNjA2LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 50. [50].Faro M, Sàaez-Franćas N, Castro-Marrero J, Aliste L, Fernandez de Sevilla T, Alegre J. Gender Differences in Chronic Fatigue Syndrome. Reumatoloǵıa Cĺınica (English Edition). 2016;12(2):72–77. 51. [51].Jopson L, Dyson JK, Jones DEJ. Understanding and Treating Fatigue in Primary Biliary Cirrhosis and Primary Sclerosing Cholangitis. Clinics in Liver Disease. 2016;20(1):131–142. Advances in Cholestatic Liver Diseases. Available from: [https://www.sciencedirect.com/science/article/pii/S1089326115000793](https://www.sciencedirect.com/science/article/pii/S1089326115000793). 52. [52].Cohen J. Statistical Power Analysis for the Behavioral Sciences. Taylor & Francis; 2013. Available from: [https://books.google.co.uk/books?id=cIJH0lR33bgC](https://books.google.co.uk/books?id=cIJH0lR33bgC). 53. [53].Fry A, Littlejohns TJ, Sudlow C, Doherty N, Adamska L, Sprosen T, et al. Comparison of Sociodemographic and Health-Related Characteristics of UK Biobank Participants With Those of the General Population. American Journal of Epidemiology. 2017 06;186(9):1026–1034. Available from: doi:10.1093/aje/kwx246. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/aje/kwx246&link_type=DOI) 54. [54].Samms GL, Ponting CP. Unequal access to diagnosis of myalgic encephalomyelitis in England. medRxiv. 2024;Available from: [https://www.medrxiv.org/content/early/2024/02/01/2024.01.31.24302070](https://www.medrxiv.org/content/early/2024/02/01/2024.01.31.24302070). 55. [55].Shankar V, Wilhelmy J, Curtis EJ, Michael B, Cervantes L, Mallajosyula VA, et al. Oxidative Stress is a shared characteristic of ME/CFS and Long COVID. bioRxiv. 2024;. 56. [56].Xiong R, Fleming E, Caldwell R, Vernon SD, Kozhaya L, Gunter C, et al. BioMapAI: Artificial Intelligence Multi-Omics Modeling of Myalgic Encephalomyelitis / Chronic Fatigue Syndrome. bioRxiv. 2024;. 57. [57].Froehlich L, Hattesohl DB, Cotler J, Jason LA, Scheibenbogen C, Behrends U. Causal attributions and perceived stigma for myalgic encephalomyelitis/chronic fatigue syndrome. Journal of Health Psychology. 2022;27(10):2291– 2304. pmid:PMID: 34240650. Available from: doi:10.1177/13591053211027631. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1177/13591053211027631&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=PMID&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F08%2F28%2F2024.08.26.24312606.atom) 58. [58].Stamatakis E, Owen KB, Shepherd L, Drayton B, Hamer M, Bauman AE. Is Cohort Representativeness Pasśe? Poststratified Associations of Lifestyle Risk Factors with Mortality in the UK Biobank. Epidemiology. 2021;32(2). 59. [59].Davis KAS, Coleman JRI, Adams M, Allen N, Breen G, Cullen B, et al. Mental health in UK Biobank – development, implementation and results from an online questionnaire completed by 157 366 participants: a reanalysis. BJPsych Open. 2020;6(2):e18. 60. [60].Elliott P, Peakman oboUB Tim C. The UK Biobank sample handling and storage protocol for the collection, processing and archiving of human blood and urine. International Journal of Epidemiology. 2008 04;37(2):234–244. Available from: doi:10.1093/ije/dym276. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/ije/dym276&link_type=DOI) 61. [61].UK Biobank biochemistry assay quality procedures;. Accessed: 2024-07-28. [https://biobank.ndph.ox.ac.uk/showcase/refer.cgi?id=5636](https://biobank.ndph.ox.ac.uk/showcase/refer.cgi?id=5636). 62. [62].UK Biobank companion document for serum biomarker data;. Accessed: 2024-07-28. [https://biobank.ndph.ox.ac.uk/showcase/refer.cgi?id=1227](https://biobank.ndph.ox.ac.uk/showcase/refer.cgi?id=1227). 63. [63].Che B, Zhong C, Zhang R, Pu L, Zhao T, Zhang Y, et al. Triglyceride-glucose index and triglyceride to highdensity lipoprotein cholesterol ratio as potential cardiovascular disease risk factors: an analysis of UK biobank data. Cardiovascular Diabetology. 2023;22(1):34. 64. [64].Si S, Li J, Li Y, Li W, Chen X, Yuan T, et al. Causal Effect of the Triglyceride-Glucose Index and the Joint Exposure of Higher Glucose and Triglyceride With Extensive Cardio-Cerebrovascular Metabolic Outcomes in the UK Biobank: A Mendelian Randomization Study. Frontiers in Cardiovascular Medicine. 2021;7. Available from: [https://www.frontiersin.org/articles/10.3389/fcvm.2020.583473](https://www.frontiersin.org/articles/10.3389/fcvm.2020.583473). 65. [65].Cordero A, Alegria-Ezquerra E. TG/HDL ratio as surrogate marker for insulin resistance; 2009. Available from: [https://www.escardio.org/Journals/E-Journal-of-Cardiology-Practice/Volume-8/TG-HDL-ratio-as-surrogate-marker-for-insulin-resistance](https://www.escardio.org/Journals/E-Journal-of-Cardiology-Practice/Volume-8/TG-HDL-ratio-as-surrogate-marker-for-insulin-resistance). 66. [66].Simental-Mendía LE, Rodŕıguez-Moŕan M, Guerrero-Romero F. The product of fasting glucose and triglycerides as surrogate for identifying insulin resistance in apparently healthy subjects. Metabolic syndrome and related disorders. 2008 December;6(4):299—304. Available from: doi:10.1089/met.2008.0034. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1089/met.2008.0034&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19067533&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F08%2F28%2F2024.08.26.24312606.atom) 67. [67].VanderWeele TJ. Mediation Analysis: A Practitioner’s Guide [Journal Article]. Annual Review of Public Health. 2016;37(Volume 37, 2016):17–32. Available from: [https://www.annualreviews.org/content/journals/](https://www.annualreviews.org/content/journals/) doi:10.1146/annurev-publhealth-032315-021402. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1146/annurev-publhealth-032315-021402&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26653405&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F08%2F28%2F2024.08.26.24312606.atom) 68. [68].Nguyen TQ, Schmid I, Stuart EA. Clarifying causal mediation analysis for the applied researcher: Defining effects based on what we want to learn. Psychological Methods. 2021 Apr;26(2):255–271. Available from: doi:10.1037/met0000299. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1037/met0000299&link_type=DOI) 69. [69].Hejazi NS, van der Laan MJ, Janes HE, Gilbert PB, Benkeser DC. Efficient Nonparametric Inference on the Effects of Stochastic Interventions under Two-Phase Sampling, with Applications to Vaccine Efficacy Trials. Biometrics. 2020 09;77(4):1241–1253. Available from: doi:10.1111/biom.13375. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/biom.13375&link_type=DOI) 70. [70].Benkeser D, Montefiori DC, McDermott AB, Fong Y, Janes HE, Deng W, et al. Comparing antibody assays as correlates of protection against COVID-19 in the COVE mRNA-1273 vaccine efficacy trial. Science Translational Medicine. 2023;15(692):eade9078. Available from: [https://www.science.org/doi/abs/10.1126/scitranslmed](https://www.science.org/doi/abs/10.1126/scitranslmed). ade9078. 71. [71].Huang Y, Hejazi NS, Blette B, Carpp LN, Benkeser D, Montefiori DC, et al. Stochastic interventional vaccine efficacy and principal surrogate analyses of antibody markers as correlates of protection against symptomatic COVID-19 in the COVE mRNA-1273 trial. Viruses. 2023;15(10):2029. 72. [72].Hejazi NS, Shen X, Carpp LN, Benkeser D, Follmann D, Janes HE, et al. Stochastic interventional approach to assessing immune correlates of protection: Application to the COVE mRNA-1273 vaccine trial. International Journal of Infectious Diseases. 2023;137:28–39. 73. [73].Rudolph KE, Díaz I, Hejazi NS, van der Laan MJ, Luo SX, Shulman M, et al. Explaining differential effects of medication for opioid use disorder using a novel approach incorporating mediating variables. Addiction. 2021;116(8):2094–2103. Available from: [https://onlinelibrary.wiley.com/doi/abs/10.1111/add.15377](https://onlinelibrary.wiley.com/doi/abs/10.1111/add.15377). 74. [74].Hejazi NS, Rudolph KE, Van Der Laan MJ, Díaz I. Nonparametric causal mediation analysis for stochastic interventional (in)direct effects. Biostatistics. 2022 02;24(3):686–707. Available from: doi:10.1093/ biostatistics/kxac002. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/&link_type=DOI) 75. [75].Rudolph KE, Gimbrone C, Díaz I. Helped into Harm: Mediation of a Housing Voucher Intervention on Mental Health and Substance Use in Boys. Epidemiology. 2021;32(3). 76. [76].Menkir TF, Citarella B, Sigfrid L, Doshi Y, Reyes F, Calvache JA, et al. Modeling the relative influence of socio-demographic variables on post-acute COVID-19 quality of life: an application to settings in Europe, Asia, Africa, and South America. 2024;. 77. [77].Díaz I, Hejazi NS, Rudolph KE, van der Laan MJ. Non-parametric efficient causal mediation with intermediate confounders. Biometrika. 2020;108(3):627–641. Available from: [https://arxiv.org/abs/1912.09936](https://arxiv.org/abs/1912.09936). 78. [78].Hejazi NS, Rudolph KE, Díaz I. medoutcon: Nonparametric efficient causal mediation analysis with machine learning in R. Journal of Open Source Software. 2022;Available from: doi:10.21105/joss.03979. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.21105/joss.03979&link_type=DOI) 79. [79].Hejazi NS, Díaz I, Rudolph KE. medoutcon: Efficient natural and interventional causal mediation analysis; 2022. R package version 0.1.6. Available from: [https://github.com/nhejazi/medoutcon](https://github.com/nhejazi/medoutcon). 80. [80].Rubin DB. Bayesian Inference for Causal Effects: The Role of Randomization. The Annals of Statistics. 1978;6(1):34–58. Available from: [http://www.jstor.org/stable/2958688](http://www.jstor.org/stable/2958688). 81. [81].Rubin DB. Randomization Analysis of Experimental Data: The Fisher Randomization Test Comment. Journal of the American Statistical Association. 1980;75(371):591–593. Available from: [http://www.jstor.org/stable/](http://www.jstor.org/stable/) 2287653. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2307/2287653&link_type=DOI) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1980KL55500018&link_type=ISI) 82. [82].Robins JM, Richardson TS. Alternative graphical causal models and the identification of direct effects. In: Shrout P, Keyes K, Ornstein K, editors. Causality and psychopathology: Finding the determinants of disorders and their cures. Oxford: Oxford University Press; 2011. . 83. [83].van der Laan MJ, Polley EC, Hubbard AE. Super learner. Stat Appl Genet Mol Biol. 2007;6:Art. 25, 23. 84. [84].Bickel PJ, Klaassen CAJ, Ritov Y, Wellner JA. Efficient and Adaptive Estimation for Semiparametric Models. Johns Hopkins series in the mathematical sciences. Springer New York; 1998. Available from: [https://books.google.co.uk/books?id=lSnTm6SC\_SMC](https://books.google.co.uk/books?id=lSnTm6SC_SMC). 85. [85].Kennedy E. npcausal: Nonparametric causal inference methods; 2021. R package version 0.1.1. Available from: [https://github.com/ehkennedy/npcausal](https://github.com/ehkennedy/npcausal). 86. [86].Friedman JH. Multivariate Adaptive Regression Splines. The Annals of Statistics. 1991;19(1):1 – 67. 87. [87].Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’16. New York, NY, USA: Association for Computing Machinery; 2016. p. 785–794. Available from: doi:10.1145/2939672.2939785. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1145/2939672.2939785&link_type=DOI) 88. [88].Coyle JR, Hejazi NS, Malenica I, Phillips RV, Sofrygin O. sl3: Super Machine Learning with pipelines;. R package. Available from: [https://github.com/tlverse/sl3](https://github.com/tlverse/sl3). 89. [89].Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, et al. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, et al., editors. Advances in Neural Information Processing Systems. vol. 30. Curran Associates, Inc.; 2017. Available from: [https://proceedings.neurips.cc/paper\_files/paper/2017/file/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf](https://proceedings.neurips.cc/paper_files/paper/2017/file/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf). 90. [90].van der Laan MJ. A generally efficient targeted minimum loss based estimator based on the highly adaptive lasso. The international journal of biostatistics. 2017;13(2). 91. [91].Hejazi NS, Coyle JR, van der Laan MJ. hal9001: Scalable highly adaptive lasso regression in R. Journal of Open Source Software. 2020 9;5(53):2526. Available from: doi:10.21105/joss.02526. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.21105/joss.02526&link_type=DOI) 92. [92].Coyle JR, Hejazi NS, Phillips RV, van der Laan LWP, van der Laan MJ. hal9001: The scalable highly adaptive lasso;. R package. Available from: [https://CRAN.R-project.org/package=hal9001](https://CRAN.R-project.org/package=hal9001). 93. [93].Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene Ontology: tool for the unification of biology. Nature Genetics. 2000;25(1):25–29. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/75556&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=10802651&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F08%2F28%2F2024.08.26.24312606.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000086884000011&link_type=ISI) 94. [94].Aleksander SA, Balhoff J, Carbon S, Cherry JM, Drabkin HJ, Ebert D, et al. The Gene Ontology knowledgebase in 2023. Genetics. 2023 03;224(1):iyad031. Available from: doi:10.1093/genetics/iyad031. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/genetics/iyad031&link_type=DOI) 95. [95].Sayols S. rrvgo: a Bioconductor package for interpreting lists of Gene Ontology terms. microPublication Biology. 2023;Available from: [https://www.micropublication.org/journals/biology/micropub-biology-000811](https://www.micropublication.org/journals/biology/micropub-biology-000811). [1]: /embed/graphic-11.gif