ABSTRACT
Background Modelling the prodrome to severe mental disorders (SMD), including unipolar mood disorders (UMD), bipolar mood disorders (BMD) and psychotic disorders (PSY), should consider both the evolution and interactions of symptoms and substance use (prodromal features) over time. Temporal network analysis can address this by representing prodromal features as nodes, with their connections (edges) indicating the likelihood of one feature preceding the other. Node centrality could reveal insights into important prodromal features and potential intervention targets. We developed a SMD network and compared sub-networks specific to UMD, BMD and PSY.
Methods We analysed 7,049 individuals with an SMD diagnosis (UMD:2,306; BMD:817; PSY:3,926) from the South London and Maudsley NHS Foundation Trust electronic health records. Using validated natural language processing algorithms, we extracted the occurrence of 61 prodromal features every three months from two years to six months prior to SMD onset. To construct temporal networks of prodromal features, we employed generalized vector autoregression panel analysis, adjusting for covariates. We computed edge weights (correlation coefficients, z) in autocorrelative, unidirectional and bidirectional relationships. Centrality was calculated as the sum of connections leaving (out-centrality, cout) or entering (in-centrality, cin) a node. We compared the three sub-networks (UMD, BMD, PSY) using permutation analysis.
Findings The strongest autocorrelation in the SMD network was tearfulness (z=·10). Unidirectional positive relationships were observed for irritability-agitation (z12=·03), mood instability-tearfulness (z12=·03) and irritability-aggression (z12=·03). Aggression-hostility (z12=·04, z21=·03), delusions-hallucinations (z12=·04, z21=·03) and aggression-agitation (z12=·03, z21=·03) were the strongest bidirectional relationships. The most central features included aggression (cout=·082) and tearfulness (cin=·124). The PSY sub-network showed few significant differences compared to UMD (3·9%) and BMD (1·6%), and UMD-BMD showed even fewer (0·4%).
Interpretations This study represents the most extensive temporal network analysis conducted on the longitudinal interplay of SMD prodromal features. These findings provide further evidence to support early detection services across SMD.
Evidence before this study Preventive approaches for severe mental disorders (SMD) can improve outcomes, however, their effectiveness relies on accurate knowledge of the prodromal symptoms and substance use preceding their onset and how they evolve over time. We searched PubMed from database inception to 26th January 2024 for studies investigating the dynamic prodromes for unipolar mood disorders (UMD), bipolar mood disorders (BMD) or psychotic disorders (PSY) published in English. The search terms were prodrom* AND (depression OR bipolar OR psychosis) AND (timecourse OR dynamic OR “network analysis” OR longitudinal). First, while many studies have investigated the prodromal phases of SMD, particularly for PSY, the majority of studies have taken a cross-sectional rather than longitudinal approach which are unable to detect causal dependence between and within prodromal symptoms and substance use. Second, there are no studies focusing on the evolution of features during the prodromal period. Finally, studies have focused on diagnosis-specific analyses, considering UMD, BMD or PSY alone, limiting the possibility for comparison between them.
Added value of this study We have used a temporal network analysis approach, in combination with a large electronic health record database (n=7,049) and natural language processing, to examine the dynamic evolution of symptoms and substance use in the prodrome to an SMD diagnosis in secondary mental healthcare. This is the largest network analysis investigating prodromal features in SMD, the first assessing longitudinal changes and the first to directly compare the prodromes to UMD, BMD and PSY. Our results add to the growing evidence for a transdiagnostic prodrome to SMD, by showing small differences between UMD, BMD and PSY in how symptoms and substance use evolve over the course of the prodrome.
Implications of all the available evidence Our study explores the patterns of evolution of symptom and substance use events across and within SMD diagnostic groups. We highlight the importance of understanding the dynamic progression of these prodromal features to fully characterise the prodrome to SMD. These findings, together with a growing literature base, also support the potential for broader transdiagnostic early detection services that provide preventive psychiatric care to individuals at risk for SMD.
1. BACKGROUND
Severe mental disorders (SMD) include non-psychotic unipolar mood disorders (UMD), non-psychotic bipolar mood disorders (BMD) and psychotic disorders (PSY), and are characterised by high clinical, societal, familial and personal burden.1–3 Electronic health records (EHRs) can provide an opportunity to examine prodromal symptoms contemporaneously, reducing recall bias and enriching our insight into symptom presentation during the prodrome.4 This knowledge can help enhance specialised preventive care for young people at-risk of emerging SMD.
Temporal network analysis allows us to statistically model the relationships between nodes (prodromal features) as edges within a network (prodrome) over time. Weak, sparse networks are more modifiable, while strong, dense networks resist change5, needing intensive interventions to alter them6 (e.g. preventing SMD onset). Edge estimates in temporal nodes could suggest directed causality between features, potentially enhancing our understanding of SMD development.7 Node centrality, representing connection strength in and out of a node,8 may highlight the significance of a prodromal feature in the progression of the disorder and its potential as an intervention target.9–12
Firstly, we aimed to develop a global transdiagnostic SMD network to quantify the temporal relationships between prodromal features. Secondly, we aimed to examine within-group differences by computing and comparing sub-networks specific to UMD, BMD and PSY.
2. METHODS
2.1. Data Source
Data were from the South London and Maudsley National Health Service Foundation Trust (SLaM). SLaM provides secondary mental healthcare across four socioeconomically diverse South London boroughs (Lambeth, Southwark, Lewisham and Croydon, 1.3 million people, eMethods 1). A Clinical Record Interactive Search (CRIS) tool was implemented in the EHR to facilitate research with full but anonymised clinical information.13 CRIS has already been extensively validated in previous research studies.14–16 CRIS received ethical approval as an anonymised dataset for secondary analyses from Oxfordshire REC C (Ref: 23/SC/0257).
2.2. Study Design
Retrospective (2-year), real-world, EHR cohort study (Figure 1). The 2-year period was chosen to mirror the typical duration of care in clinical services for primary indicated prevention of SMD (72.4% provide care for 24 months or less).17 The index date reflected the date of the first diagnosis within an individual’s SMD group recorded in the EHR (index diagnosis, T-0mo, Figure 1). The antecedent date was defined by a data cut-off at 6 months before the index date (T-6mo), defining the antecedent period, to avoid overlap with the actual onset of SMD. The lookback period (Figure 1) was defined as the 1.5-year period prior to the antecedent date (T-6mo). To minimise the time invariance imposed by network analyses,18 we split the 1.5yr lookback period into six three-month follow-up intervals.
2.3. Study Population
All individuals accessing SLaM services between 1st January 2008 and 10th August 2021 and receiving a primary (i.e. not comorbid) ICD-10 index diagnosis of any SMD were eligible. SMD was defined as either UMD, BMD or PSY (operationalised as in eTable 1). Individuals with multiple SMD diagnoses were assigned the diagnosis of greatest severity (i.e. UMD<BMD<PSY). Therefore, if an individual receives a diagnosis of UMD and BMD simultaneously, they would be included in the BMD group due to the higher associated severity.
Individuals with EHR entries (e.g. clinical notes and letters recorded in each month) recorded exclusively after the index date or exclusively in the antecedent period were excluded as they had no detectable prodrome. Individuals who only had empty EHR entries within the lookback period were also excluded, as well as those with EHR entries recorded within four or fewer follow-up intervals within the lookback period, as they did not have sufficient data to contribute to the fitted networks.
2.4. Variables
At index date, data were extracted from structured text on age, gender, self-assigned ethnicity (UK Office of National Statistics, eTable 2), ICD-10 diagnoses and prescription of antipsychotics, antidepressants, mood stabilisers and anxiolytics (see eTable 3 for medication classification details).
During the lookback period, data were extracted as binary variables on the occurrence (yes:1/no:0) of 61 natural language processing (NLP)-based prodromal features across each follow-up interval (FU 1-6; Figure 1). These NLP algorithms convert unstructured EHR information (i.e. free text) into structured quantifiable data.19 NLP algorithms with precision ≥80% (mean=90%) were included. Precision was defined as the ratio of the number of relevant (true positive) instances retrieved out of the total NLP-labelled positive instances (including irrelevant [false positive] and relevant [true positive] instances) in human-annotated EHR (see eMethods 2 for further details on NLP algorithm development and validation, and eTable 4 for the final list of NLP algorithms employed). Within each follow-up interval, the EHR entry frequency (number of entries) and length (total number of words recorded across all entries) were computed.
2.5. Statistical analysis
All analyses were conducted in R (version 4.2.3) on a virtual machine (AMD EPYC 7763 64-Core Processor) in Ubuntu 22.04.1 operating system. All analysis code is publicly available on GitHub: https://github.com/m-arribas/network_analysis.git.
2.5.1. Sociodemographic and Clinical Characteristics
We computed descriptive analyses for sociodemographic variables at index date (age, gender, self-assigned ethnicity) as well as the proportion (N [%]) of individuals with specific ICD-10 diagnoses and prescription of antipsychotics, antidepressants, mood stabilisers and anxiolytics at index in UMD, BMD and PSY.
In a sensitivity analysis, to test for any sampling bias in the final population, we compared excluded individuals (with four or fewer follow-up intervals) to those included (with five or more follow-up intervals) on sociodemographic variables (age, gender, self-assigned ethnicity), clinical variables (proportion of individuals belonging to each SMD group and medication prescriptions at index), as well as the severity of presenting features (frequency of prodromal clusters within antecedent period).
2.5.2. Network analysis
As a primary analysis, we quantified a set of local network metrics in a transdiagnostic SMD network (hereby called “SMD network”) on the entire study population. In a secondary analysis, we repeated this on each SMD sub-sample separately (UMD, BMD, PSY), to compute three diagnosis-specific sub-networks (hereby called “sub-networks”). For each network (SMD network and three sub-networks), the following steps (pre-processing, network development and stability assessment) were repeated separately in each relevant dataset using a similar step-wise procedure to prior work modelling temporal features in psychopathology.20
Pre-processing and network development methods are detailed in the Supplement (eMethods 3 and 4, respectively). For each network (SMD network and three sub-networks), we extracted the temporal (within individuals), contemporaneous (relationships between nodes averaged over time and averaged across the sample), and between-individuals subject matrices. From each matrix, the strength of connections between features (edge weights) were estimated as correlation coefficients (z), and categorised into 3 types: autocorrelative (node predicts itself in the next time point), unidirectional (node predicts another, without reciprocation) and bidirectional (mutual prediction between two nodes). Degree centrality were extracted from each graph. For temporal networks, centrality was defined as the sum of absolute (directed) edge weights in (in-centrality, cin) and out (out-centrality, cout) of a node (including autocorrelative edges). For contemporaneous and between-subject networks, centrality was defined as the sum of absolute (undirected) edge weights for a node (autocorrelative edges do not exist).
To evaluate robustness of the edge weight estimates and to avoid overfitting in our networks, we computed the stability of edges within each full fitted network using bootstrapping procedures: over 250 iterations, 25% of the sample was randomly held out and the full model refitted on the remaining 75% of participants (following standard methods).21 Within each iteration the selected data was pre-processed in the same manner as in the full model to control for errors and variance within the data cleaning and scaling process. The averaged edge weights and 95%CIs over all 250 iterations were retained and reported. All edges with 95%CIs crossing zero were forced to 0.
2.5.3. Permutation analysis
To test for statistically significant differences in the temporal, contemporaneous, and between-subject relationships across the three sub-networks (UMD, BMD, PSY) we conducted permutation analyses22.
To generate networks with the same topology required for valid comparisons, we re-fitted the three original sub-networks restricted to common features only, after pre-processing the data for each sub-sample.
In each permutation iteration, raw data in each sub-sample population (UMD, BMD, PSY) was randomly re-sampled for each individual node. The permuted dataset for each sub-sample was then pre-processed in the same manner as in the original sub-networks, and permuted sub-networks were fitted. From each permuted sub-network, temporal, contemporaneous and between-subjects matrix estimates were obtained, as in the main analysis. For each edge, the difference in permuted edge weights was calculated (UMD-BMD, UMD-PSY and BMD-PSY). Each edge weight comparison was visually inspected using histograms to assess normality of the permuted data.
For each edge weight comparison, the number of iterations where the permuted difference was equal to or greater than the absolute observed difference (from the actual dataset) was divided by the total number of permutations (250) to obtain the p-value. We corrected the resulting p-values for multiple comparisons using the false discovery rate method set at 5% to ensure the robustness of our findings and used ppermuted<0·05 as the threshold for statistical significance.
Observed differences in edge weights (from the actual dataset) and corrected p-values (from the permutation analysis) were reported, and visualised with heatmaps.
3. RESULTS
3.1. Study Population
The final study population (n=7,049, Table 1; eTable 5-6) had EHR entries with more than four follow-up intervals (mean number of intervals [SD]=5·69 [0·46]) within the lookback period (Figure 2). Included participants were similar to excluded participants in terms of sociodemographics and clinical characteristics (eResults 1). Eight participants were excluded as the imputation method was unable to converge on stable approximations.
3.2. Primary Analysis (SMD network)
Out of the 61 NLP-derived prodromal features, 38 displayed near-zero variance and were excluded, leaving 23 features for the analyses (eTable 7): aggression, agitation, anxiety, cannabis use, cocaine use, cognitive impairment, delusional thinking, disturbed sleep, emotional withdrawal, feeling hopeless, guilt, hallucinations (all), hostility, irritability, mood instability, paranoia, poor concentration, poor insight, poor motivation, suicidality, tearfulness, tobacco use, and weight loss (eFigure 1).
A saturated model (a densely connected network with all available edges) was fitted with the 23 features at 6 follow-up intervals (Figure 3A). This network demonstrated excellent fit (RMSEA=·0091 [95%CI: ·0088, ·0094]; X2(8625)=13650, p<·0001; CFI=·97; TLI=·97) and had better fit than a sparse network (pruned edges) (ϕ1X2(733)=2734·10, p<·0001). The model showed high recoverability (eResults 2) and robustness (see Figure 3B and eTables 8-9 for actual model and bootstrapped estimates).
The strongest autocorrelation was observed for tearfulness (correlation coefficient, z=·10), with all the other autocorrelations between 0·05-0·10 (Figure 3A). The most prominent unidirectional relationships were positive: irritability-aggression (z12=·03), irritability-agitation (z12=·03), hallucinations (all)-disturbed sleep (z12=·03) and mood instability-tearfulness (z12=·03). All other unidirectional relationships were | z12| <·03.
With respect to bidirectional relationships, positively recurring pairs were observed between aggression-hostility (z12=·04, z21=·03), delusional thinking-hallucinations (all) (z12=·04, z21=·03), aggression-agitation (z12=·03, z21=·03) and delusional thinking-hostility (z12=·02, z21=·03).
Considering centrality (Figure 3C), aggression (cout=·098), hostility (cout=·082), and hallucinations (all) (cout=·081) had the strongest out-centrality, whereas tearfulness (cin=·124), aggression (cin=·09) and delusional thinking (cin=·085) had the strongest in-centrality (eTable 10).
Results and visualisations for the contemporaneous and between-subject relationships of nodes are presented in eResults 3 and eFigure 2. See eTable 12 for actual model and bootstrapped estimates.
3.3 Secondary Analysis (sub-networks)
Out of the 61 NLP-derived prodromal features, after applying the relevant exclusions within each sub-sample, 21 features were included for the UMD network, 19 for BMD and 24 for PSY (eMethods 5).
A saturated model was fitted with the relevant features at 6 follow-up intervals in each sub-sample (UMD, BMD, PSY). Similarly to the primary analysis, saturated networks showed excellent fit and better fit than sparse models for the three networks (UMD: ΔX2(687)=1737, p<·0001; BMD: ΔX2(606)=1547, p<·0001; PSY: ΔX2(856)=2961, p<·0001). Further model fit results, including recoverability (eResults 4), and bootstrapping estimates (eTable 11, eFigure 3) can be found in the Supplement.
a. UMD
The strongest autocorrelations were observed for cannabis use (z=·12), feeling lonely (z=·12) and hallucinations (all) (z=·11) with all the other autocorrelations between 0·03-0·10 (Figure 4A).
The most prominent unidirectional relationships were all positive: poor motivation-low energy (z12=·06), tobacco use-weight loss (z12=·04), paranoia-nightmares (z12=·04) and mood instability-weight loss (z12=·04). All other unidirectional relationships were |z12|<·04. With respect to bidirectional relationships, positively recurring pairs were observed between guilt and tearfulness (z12=·03, z21=·03).
Considering centrality, weight loss (cin=·140), aggression (cin=·128) and suicidality (cin=·106) had the strongest in-centrality, whereas tobacco use (cout=·105), mood instability (cout=·103) and poor motivation (cout=·092), had the strongest out-centrality (eTable 13A).
b. BMD
The strongest autocorrelation was observed for hallucinations (all) (z=·13), with all the other autocorrelations between 0·03-0·10 (Figure 4B). The most prominent unidirectional relationships were mixed, with some positive: guilt-feeling hopeless (z12=·07), aggression-elation (z12=·06) and hallucination-suicidality (z12=·06); and others negative: guilt-paranoia (z12=-·07), irritability-tobacco use (z12=-·06), and feeling hopeless-elation (z12=-·06). All other unidirectional relationships were |z12|<·06. With respect to bidirectional relationships, positively recurring pairs were observed between elation-irritability (z12=·06, z21=·06).
Considering centrality, elation (cin=·176), irritability (cin=·157) and tobacco use (cin=·152), had the strongest in-centrality, whereas elation (cout=·165), irritability (cout=·163) and guilt (cout=·143) had the strongest out-centrality (eTable 13B).
c. PSY
The strongest autocorrelation was observed for feeling hopeless (z=·11) and tearfulness (z=·11), with all the other autocorrelations between 0·04-0·10 (Figure 4C). The most prominent unidirectional relationships were all positive: hallucinations (all)-disturbed sleep (z12=·04), hostility-arousal (z12=·04), irritability-agitation (z12=·04). All other unidirectional relationships were |z12|<·04. With respect to bidirectional relationships, positively recurring pairs were observed between aggression-hostility (z12=·04, z21=·03), delusional thinking-hallucinations (all) (z12=·04, z21=·03), aggression-agitation (z12=·04, z21=·04), arousal-elation (z12=·04, z21=·04).
Considering centrality, agitation (cin=.115), aggression (cin=·078) and arousal (cin=·072), had the strongest in-centrality, whereas aggression (cout=·143), hostility (cout=·135) and hallucinations (all) (cout=·104) had the strongest out-centrality (eTable 13C).
Results and visualisations for the contemporaneous and between-subject relationships of nodes for all sub-networks are presented in eResults 5. See eTable 12 for actual model and bootstrapped estimates.
3.4 Permutation Analysis
The final nodes for permutation analysis are found in eMethods 6, with the actual model estimates in eTable 14. The histograms of permuted edge weights exhibited a normal (bell-shaped), zero-centred curve, indicating that further iterations are unlikely to affect the distribution or the results of these analyses. Out of all possible edge weight comparisons in the permutation analysis, few of them were significantly different: UMD-PSY (3·9%), followed by BMD-PSY (1·6%) and then UMD-BMD (0·4%).
UMD showed a significantly stronger edge weight from irritability to tobacco use compared to BMD (zUMD-BMD=·059, ppermuted<·001). In addition, the following edge weights were significantly stronger compared to PSY: cannabis use (autocorrelation) (zUMD-PSY=·051, ppermuted<·001), agitation-suicidality (zUMD-PSY=·048, ppermuted<·001), tobacco use-suicidality (zUMD-PSY=·038, ppermuted<·001), mood instability-aggression (zUMD-PSY=·035, ppermuted<·001), suicidality-hallucinations (all) (zUMD-PSY=·034, ppermuted<·001) and cannabis-suicidality (zUMD-PSY=·033, ppermuted<·001) (Figure 5).
BMD did not show any edge weights that were significantly stronger than UMD, and only one edge weight was stronger compared to PSY (from tearfulness-cannabis use) (zBMD-PSY =·049, ppermuted<·001) (Figure 5).
PSY showed significantly stronger edge weights compared to UMD for the following edges: hallucinations (all)-paranoia (zPSY-UMD=·048, ppermuted<·001), paranoia (autocorrelation) (zPSY- UMD=·048, ppermuted<·001), tobacco use (autocorrelation) (zPSY-UMD=·045, ppermuted<·001) and paranoia-disturbed sleep (zPSY-UMD=·033, ppermuted<·001). PSY also showed significantly stronger edge weights compared to BMD for the edges: hallucinations (all)-disturbed sleep (zPSY- BMD=·072, ppermuted<·001), irritability-tobacco use (zPSY-BMD=·070, ppermuted<·001) and disturbed sleep-paranoia (zPSY-BMD=·053, ppermuted<·001) (Figure 5).
Histograms showing null distributions for significant temporal comparisons are presented in eFigure 4, with the results from the contemporaneous and between-subject matrices presented in eFigures 5-6.
4. DISCUSSION
This study represents the most extensive temporal network analysis modelling the evolution of prodromal features in SMD, with respect to both the breadth of features and the large sample. We found a dynamic and densely interconnected prodromal phase in the lead-up to SMD onset, mainly with autocorrelations and unidirectional relationships. Notably, this network structure shows consistencies across the SMD diagnostic groups, highlighting a transdiagnostic overlap in the prodromal stages of these conditions.
First, our analysis elucidates the dynamic progression of the SMD prodrome. The best fitting networks were those where all nodes were heavily interconnected. The nature of the associations between prodromal features were predominantly positive; presenting with one feature typically predicts the emergence, rather than absence, of the same or another feature in the future. This reflects the sequential build-up in severity or complexity of the SMD prodrome nearing disorder onset and supports early detection efforts. The strongest positive associations were autocorrelations, meaning that once an individual experiences a feature, it tends to persist during the prodrome. Moreover, there was a higher prevalence of unidirectional (feature A leads to feature B at the next time point but not vice versa) rather than bidirectional (either feature leads to the other emerging at the next time point) relationships. Understanding these dynamics can help map how prodromal features evolve and the impact of targeted interventions.
Second, our analysis revealed that denser networks, with higher saturation, fit better than sparse networks for both the SMD network and the sub-networks specific to SMD diagnostic groups. This highlights the complexity of the interrelationships among prodromal features and holds implications for early detection strategies. Effective prevention of SMD or addressing existing prodromal symptoms requires high-intensity interventions targeting the most influential network features to reduce the risk of further prodromal features emerging.
Third, our findings provide additional evidence to the concept of transdiagnostic features within the prodromal phase of SMD (at least in the context of secondary mental health care).23–28 The minimal edge weight differences among UMD, BMD and PSY sub-networks in the permutation analysis suggest that there are only few relationships specific to diagnostic groups.
Echoing our earlier findings,28 PSY exhibited the most distinctive pattern of relationships between prodromal features, with stronger connections compared to both UMD and BMD in symptom pairs relating to positive symptoms, tobacco use and disturbed sleep. The prominence of positive symptoms in PSY affirms the relevance of psychometric tools, like the CAARMS29 and SIPS,30 which primarily assess positive symptoms to identify psychosis risk with excellent population-level prognostic accuracy.31 However, individuals who test positive on these tools are less likely to develop psychosis than not.31 Our results could inform refined versions of these tools, which are more sensitive to psychosis risk, or individualised prediction models. Interestingly, these findings underscore the disruptive impact of positive psychotic symptoms on sleep. Given that clinical high risk for psychosis individuals (CHR-P)32,33 and people with psychosis34,35 often experience sleep issues, our findings reinforce the potential for interventions targeting sleep disturbances.36
Moreover, central features, indicating highest network influence, differed across sub-networks. In UMD, suicidality was central, which has been shown to be more common in UMD than other mental disorders37 (with suicidal ideation and attempt rates at 53% and 31%, respectively38,39), but less prominent in early prodromal stages.40 For BMD, centrality of elation and irritability aligns with hypomanic symptoms as diagnostic risk factors41, supporting psychometric instruments42–44 for bipolar at-risk45 focusing on these symptoms. In PSY, aggression and agitation were central, but this finding requires careful interpretation as our NLP algorithm for aggression does not distinguish between forms of violence directed to others or oneself, and individuals with PSY are more likely to be victims of violence than the general population.2,46
Our results show evidence for an existing prodrome in UMD, BMD and PSY which is similar but not completely overlapping across the SMD diagnostic groups. This finding supports the potential to broaden preventive services which target SMD. While transdiagnostic early detection services are emerging, they can encompass a range of at-risk states beyond those studied here, including eating disorders, anxiety and personality disorders47–49, and their effectiveness is yet to be determined50. Like with the CHR-P state, effective recruitment strategies are crucial for risk enrichment31 and optimizing preventive intervention potential.
This study, while comprehensive, is subject to several limitations. First, the features which comprise the networks are prodromal in the sense that they are the symptoms that are detectable in secondary care prior to these diagnoses. However, despite our extensive range of prodromal, sociodemographic and treatment variables, there may still be unaccounted factors that influence the temporal evolution of SMD prodromes, such as functioning.51 Future work should focus on mapping specific symptom trajectories to identify confounding factors affecting symptom presence and absence in extensive networks such as ours. Second, to reduce the missingness in the dataset, we used a relatively short look-back period. However, this two-year period before disorder onset aligns with the typical duration of clinical care for at-risk individuals.52 Third, the final population presents a selection bias towards those receiving more frequent secondary care, limiting generalizability. Similarly, specific features, such as disorganized symptoms, may be underrepresented due to the need for consistent clinical visits. However, there were minimal differences between included and excluded individuals, in terms of demographics, clinical variables and presenting symptoms. EHR and NLP-related limitations are discussed in eLimitations 1.
Overall, our study highlights the presence of a detectable transdiagnostic SMD prodrome by modelling the evolution of symptoms and substance use over time. Our findings illustrate the need to understand dynamic symptom progression to fully characterise the prodrome to SMD. These findings also support the potential for broader transdiagnostic early detection services for SMD that provide preventive care to individuals at-risk and a research platform for investigating putative interventions.
Data Availability
The data accessed by CRIS remain within an NHS firewall and governance is provided by a patient-led oversight committee. Subject to these conditions, data access is encouraged and those interested should contact Robert Stewart (robert.stewart{at}kcl.ac.uk), CRIS academic lead. There is no permission for data sharing. Covariance matrices to estimate networks and all analysis code are available on GitHub: https://github.com/m-arribas/network_analysis.git.
Funding
MA is supported by the UK Medical Research Council (MR/N013700/1) and King’s College London member of the MRC Doctoral Training Partnership in Biomedical Sciences. JMB has received funding from the Wellcome Trust (WT228268/Z/23/Z). RP has received funding from an NIHR Advanced Fellowship (NIHR301690) and a Medical Research Council (MRC) Health Data Research UK Fellowship (MR/S003118/1). PFP is supported by #NEXTGENERATIONEU (NGEU), funded by the Ministry of University and Research (MUR), National Recovery and Resilience Plan (NRRP), project MNESYS (PE0000006) – A Multiscale integrated approach to the study of the nervous system in health and disease (DN. 1553 11.10.2022).
Data sharing
The data accessed by CRIS remain within an NHS firewall and governance is provided by a patient-led oversight committee. Subject to these conditions, data access is encouraged and those interested should contact Robert Stewart (robert.stewart{at}kcl.ac.uk), CRIS academic lead. Further details regarding the CRIS platform can be found elsewhere13. There is no permission for data sharing. Covariance matrices to estimate networks and all analysis code are available on GitHub: https://github.com/m-arribas/network_analysis.git.
Ethics committee approval
Permissions for the study were granted by the Oxfordshire Research Ethics Committee C; because the data set comprised deidentified data, informed consent was not required13.
Authors’ contribution
MA, JMB and DO designed the study under PFP’s supervision. MA and DO ran the statistical analyses. All authors drafted, edited, and approved the final version of the manuscript.
Conflict of interest
MA has been employed by F. Hoffmann-La Roche AG outside of the current study. RP has received grant funding from Janssen, and consulting fees from Holmusk, Akrivia Health, Columbia Data Analytics, Boehringer Ingelheim and Otsuka. PFP has received research funds or personal fees from Lundbeck, Angelini, Menarini, Sunovion, Boehringer Ingelheim, Mindstrong, Proxymm Science, outside the current study.
Footnotes
↵* Joint senior authorship