Longitudinal evolution of the transdiagnostic prodrome to severe mental disorders: a dynamic temporal network analysis informed by natural language processing and electronic health records ============================================================================================================================================================================================ * Maite Arribas * Joseph M. Barnby * Rashmi Patel * Robert A. McCutcheon * Daisy Kornblum * Hitesh Shetty * Kamil Krakowski * Daniel Stahl * Nikolaos Koutsouleris * Philip McGuire * Paolo Fusar-Poli * Dominic Oliver ## ABSTRACT **Background** Modelling the prodrome to severe mental disorders (SMD), including unipolar mood disorders (UMD), bipolar mood disorders (BMD) and psychotic disorders (PSY), should consider both the evolution and interactions of symptoms and substance use (prodromal features) over time. Temporal network analysis can address this by representing prodromal features as nodes, with their connections (edges) indicating the likelihood of one feature preceding the other. Node centrality could reveal insights into important prodromal features and potential intervention targets. We developed a SMD network and compared sub-networks specific to UMD, BMD and PSY. **Methods** We analysed 7,049 individuals with an SMD diagnosis (UMD:2,306; BMD:817; PSY:3,926) from the South London and Maudsley NHS Foundation Trust electronic health records. Using validated natural language processing algorithms, we extracted the occurrence of 61 prodromal features every three months from two years to six months prior to SMD onset. To construct temporal networks of prodromal features, we employed generalized vector autoregression panel analysis, adjusting for covariates. We computed edge weights (correlation coefficients, *z*) in autocorrelative, unidirectional and bidirectional relationships. Centrality was calculated as the sum of connections leaving (out-centrality, *cout*) or entering (in-centrality, *cin*) a node. We compared the three sub-networks (UMD, BMD, PSY) using permutation analysis. **Findings** The strongest autocorrelation in the SMD network was tearfulness (*z*=·10). Unidirectional positive relationships were observed for irritability-agitation (*z12*=·03), mood instability-tearfulness (*z12*=·03) and irritability-aggression (*z12*=·03). Aggression-hostility (*z12*=·04, *z21*=·03), delusions-hallucinations (*z12*=·04, *z21*=·03) and aggression-agitation (*z12*=·03, *z21*=·03) were the strongest bidirectional relationships. The most central features included aggression (*cout*=·082) and tearfulness (*cin*=·124). The PSY sub-network showed few significant differences compared to UMD (3·9%) and BMD (1·6%), and UMD-BMD showed even fewer (0·4%). **Interpretations** This study represents the most extensive temporal network analysis conducted on the longitudinal interplay of SMD prodromal features. These findings provide further evidence to support early detection services across SMD. **Evidence before this study** Preventive approaches for severe mental disorders (SMD) can improve outcomes, however, their effectiveness relies on accurate knowledge of the prodromal symptoms and substance use preceding their onset and how they evolve over time. We searched PubMed from database inception to 26th January 2024 for studies investigating the dynamic prodromes for unipolar mood disorders (UMD), bipolar mood disorders (BMD) or psychotic disorders (PSY) published in English. The search terms were prodrom* AND (depression OR bipolar OR psychosis) AND (timecourse OR dynamic OR “network analysis” OR longitudinal). First, while many studies have investigated the prodromal phases of SMD, particularly for PSY, the majority of studies have taken a cross-sectional rather than longitudinal approach which are unable to detect causal dependence between and within prodromal symptoms and substance use. Second, there are no studies focusing on the evolution of features during the prodromal period. Finally, studies have focused on diagnosis-specific analyses, considering UMD, BMD or PSY alone, limiting the possibility for comparison between them. **Added value of this study** We have used a temporal network analysis approach, in combination with a large electronic health record database (n=7,049) and natural language processing, to examine the dynamic evolution of symptoms and substance use in the prodrome to an SMD diagnosis in secondary mental healthcare. This is the largest network analysis investigating prodromal features in SMD, the first assessing longitudinal changes and the first to directly compare the prodromes to UMD, BMD and PSY. Our results add to the growing evidence for a transdiagnostic prodrome to SMD, by showing small differences between UMD, BMD and PSY in how symptoms and substance use evolve over the course of the prodrome. **Implications of all the available evidence** Our study explores the patterns of evolution of symptom and substance use events across and within SMD diagnostic groups. We highlight the importance of understanding the dynamic progression of these prodromal features to fully characterise the prodrome to SMD. These findings, together with a growing literature base, also support the potential for broader transdiagnostic early detection services that provide preventive psychiatric care to individuals at risk for SMD. Keywords * psychosis * bipolar * depression * network analysis * electronic health record * artificial intelligence * natural language processing * early detection * severe mental disorder * temporal network analysis ## 1. BACKGROUND Severe mental disorders (SMD) include non-psychotic unipolar mood disorders (UMD), non-psychotic bipolar mood disorders (BMD) and psychotic disorders (PSY), and are characterised by high clinical, societal, familial and personal burden.1–3 Electronic health records (EHRs) can provide an opportunity to examine prodromal symptoms contemporaneously, reducing recall bias and enriching our insight into symptom presentation during the prodrome.4 This knowledge can help enhance specialised preventive care for young people at-risk of emerging SMD. Temporal network analysis allows us to statistically model the relationships between nodes (prodromal features) as edges within a network (prodrome) over time. Weak, sparse networks are more modifiable, while strong, dense networks resist change5, needing intensive interventions to alter them6 (e.g. preventing SMD onset). Edge estimates in temporal nodes could suggest directed causality between features, potentially enhancing our understanding of SMD development.7 Node centrality, representing connection strength in and out of a node,8 may highlight the significance of a prodromal feature in the progression of the disorder and its potential as an intervention target.9–12 Firstly, we aimed to develop a global transdiagnostic SMD network to quantify the temporal relationships between prodromal features. Secondly, we aimed to examine within-group differences by computing and comparing sub-networks specific to UMD, BMD and PSY. ## 2. METHODS ### 2.1. Data Source Data were from the South London and Maudsley National Health Service Foundation Trust (SLaM). SLaM provides secondary mental healthcare across four socioeconomically diverse South London boroughs (Lambeth, Southwark, Lewisham and Croydon, 1.3 million people, eMethods 1). A Clinical Record Interactive Search (CRIS) tool was implemented in the EHR to facilitate research with full but anonymised clinical information.13 CRIS has already been extensively validated in previous research studies.14–16 CRIS received ethical approval as an anonymised dataset for secondary analyses from Oxfordshire REC C (Ref: 23/SC/0257). ### 2.2. Study Design Retrospective (2-year), real-world, EHR cohort study (Figure 1). The 2-year period was chosen to mirror the typical duration of care in clinical services for primary indicated prevention of SMD (72.4% provide care for 24 months or less).17 The index date reflected the date of the first diagnosis within an individual’s SMD group recorded in the EHR (index diagnosis, T-0mo, Figure 1). The antecedent date was defined by a data cut-off at 6 months before the index date (T-6mo), defining the antecedent period, to avoid overlap with the actual onset of SMD. The lookback period (Figure 1) was defined as the 1.5-year period prior to the antecedent date (T-6mo). To minimise the time invariance imposed by network analyses,18 we split the 1.5yr lookback period into six three-month follow-up intervals. ![Figure 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/03/09/2024.03.08.24303965/F1.medium.gif) [Figure 1.](http://medrxiv.org/content/early/2024/03/09/2024.03.08.24303965/F1) Figure 1. Study design. The look-back period was split into six three-month follow-up intervals (FU 1-6) relative to the index date (T-0mo) of SMD diagnosis. This pipeline (steps 1-4) was followed for both the primary analysis (SMD model) and secondary analysis (sub-networks). ### 2.3. Study Population All individuals accessing SLaM services between 1st January 2008 and 10th August 2021 and receiving a primary (i.e. not comorbid) ICD-10 index diagnosis of any SMD were eligible. SMD was defined as either UMD, BMD or PSY (operationalised as in eTable 1). Individuals with multiple SMD diagnoses were assigned the diagnosis of greatest severity (i.e. UMD ·022) and labelled (|*z*| > ·03). For visualisation purposes, nodes are clustered into six categories (depressive, manic, negative, positive, substance use and other) according to the type of prodromal feature. **B.** Bootstrapped (250 repetitions; black) vs actual model (n=7,140; red) edge weight estimates (|*z*| > ·022). Edges are directed such that “node1 – node2” represent the edge from node1 to edge2. All edges were positive except the one marked with an asterisk (HOST-INS). **C.** Centrality measures for all nodes AGGR: aggression, AGIT: agitation, ANX: anxiety, CANN: cannabis use, COC: cocaine use, COGN: cognitive impairment, CONC: poor concentration, DEL: delusional thinking, EMOT: emotional withdrawal, GUIL: guilt, HALL: hallucinations (all), HOPE: feeling hopeless, HOST: hostility, INS: poor insight, IRR: irritability, MOOD: mood instability, MOTIV: poor motivation, PAR: paranoia, SLEEP: disturbed sleep, SUIC: suicidality, TEAR: tearfulness, TOB: tobacco use, WGHT: weight loss The strongest autocorrelation was observed for tearfulness (correlation coefficient, *z*=·10), with all the other autocorrelations between 0·05-0·10 (Figure 3A). The most prominent unidirectional relationships were positive: irritability-aggression (*z*12=·03), irritability-agitation (*z*12=·03), hallucinations (all)-disturbed sleep (*z*12=·03) and mood instability-tearfulness (*z*12=·03). All other unidirectional relationships were | *z*12| <·03. With respect to bidirectional relationships, positively recurring pairs were observed between aggression-hostility (*z*12=·04, *z*21=·03), delusional thinking-hallucinations (all) (*z*12=·04, *z*21=·03), aggression-agitation (*z*12=·03, *z*21=·03) and delusional thinking-hostility (*z*12=·02, *z*21=·03). Considering centrality (Figure 3C), aggression (*cout*=·098), hostility (*cout*=·082), and hallucinations (all) (*cout*=·081) had the strongest out-centrality, whereas tearfulness (*cin*=·124), aggression (*cin*=·09) and delusional thinking (*cin*=·085) had the strongest in-centrality (eTable 10). Results and visualisations for the contemporaneous and between-subject relationships of nodes are presented in eResults 3 and eFigure 2. See eTable 12 for actual model and bootstrapped estimates. ### 3.3 Secondary Analysis (sub-networks) Out of the 61 NLP-derived prodromal features, after applying the relevant exclusions within each sub-sample, 21 features were included for the UMD network, 19 for BMD and 24 for PSY (eMethods 5). A saturated model was fitted with the relevant features at 6 follow-up intervals in each sub-sample (UMD, BMD, PSY). Similarly to the primary analysis, saturated networks showed excellent fit and better fit than sparse models for the three networks (UMD: Δ*X*2(687)=1737, p<·0001; BMD: Δ*X*2(606)=1547, p<·0001; PSY: Δ*X*2(856)=2961, p<·0001). Further model fit results, including recoverability (eResults 4), and bootstrapping estimates (eTable 11, eFigure 3) can be found in the Supplement. #### a. UMD The strongest autocorrelations were observed for cannabis use (*z*=·12), feeling lonely (*z*=·12) and hallucinations (all) (*z*=·11) with all the other autocorrelations between 0·03-0·10 (Figure 4A). ![Figure 4](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/03/09/2024.03.08.24303965/F4.medium.gif) [Figure 4](http://medrxiv.org/content/early/2024/03/09/2024.03.08.24303965/F4) Figure 4 Temporal relationships between nodes in sub-networks Temporal network graphs displaying positive (blue) and negative (red) relationships between nodes from actual model estimates for sub-networks (**A.** UMD, **B.** BMD, **C.** PSY). Edges are displayed as lines, with the thickness representing the strength of the edge weight estimate (correlation coefficient, *z*). Edges are thresholded (UMD: |*z*| > ·026, BMD: |*z*| > ·045, PSY: |*z*| > ·03) and labelled (UMD: |*z*| > ·04, UMD: |*z*| > ·06, UMD: |*z*| > ·05). For visualisation purposes, nodes are clustered into six categories (depressive, manic, negative, positive, substance use and other) according to the type of prodromal feature. D. Centrality measures for all nodes in sub-networks (green: UMD, blue: BMD, red: PSY) AGGR: aggression, AGIT: agitation, ANX: anxiety, AROUS: arousal, CANN: cannabis use, COC: cocaine use, COGN: cognitive impairment, CONC: poor concentration, DEL: delusional thinking, ELAT: elation, EMOT: emotional withdrawal, GUIL: guilt, HALL: hallucinations (all), HOPE: feeling hopeless, HOST: hostility, INS: poor insight, IRR: irritability, LONE: feeling lonely, LOW: low energy, MOOD: mood instability, MOTIV: poor motivation, NIGHT: nightmares, PAR: paranoia, SLEEP: disturbed sleep, SUIC: suicidality, TEAR: tearfulness, TOB: tobacco use, WGHT: weight loss The most prominent unidirectional relationships were all positive: poor motivation-low energy (*z*12=·06), tobacco use-weight loss (*z*12=·04), paranoia-nightmares (*z*12=·04) and mood instability-weight loss (*z*12=·04). All other unidirectional relationships were |*z*12|<·04. With respect to bidirectional relationships, positively recurring pairs were observed between guilt and tearfulness (*z*12=·03, *z*21=·03). Considering centrality, weight loss (*cin*=·140), aggression (*cin*=·128) and suicidality (*cin*=·106) had the strongest in-centrality, whereas tobacco use (*cout*=·105), mood instability (*cout*=·103) and poor motivation (*cout*=·092), had the strongest out-centrality (eTable 13A). #### b. BMD The strongest autocorrelation was observed for hallucinations (all) (*z*=·13), with all the other autocorrelations between 0·03-0·10 (Figure 4B). The most prominent unidirectional relationships were mixed, with some positive: guilt-feeling hopeless (z12=·07), aggression-elation (*z*12=·06) and hallucination-suicidality (*z*12=·06); and others negative: guilt-paranoia (*z*12=-·07), irritability-tobacco use (*z*12=-·06), and feeling hopeless-elation (*z*12=-·06). All other unidirectional relationships were |*z*12|<·06. With respect to bidirectional relationships, positively recurring pairs were observed between elation-irritability (*z*12=·06, *z*21=·06). Considering centrality, elation (*cin*=·176), irritability (*cin*=·157) and tobacco use (*cin*=·152), had the strongest in-centrality, whereas elation (*cout*=·165), irritability (*cout*=·163) and guilt (*cout*=·143) had the strongest out-centrality (eTable 13B). #### c. PSY The strongest autocorrelation was observed for feeling hopeless (*z*=·11) and tearfulness (*z*=·11), with all the other autocorrelations between 0·04-0·10 (Figure 4C). The most prominent unidirectional relationships were all positive: hallucinations (all)-disturbed sleep (*z*12=·04), hostility-arousal (*z*12=·04), irritability-agitation (*z*12=·04). All other unidirectional relationships were |*z*12|<·04. With respect to bidirectional relationships, positively recurring pairs were observed between aggression-hostility (*z*12=·04, *z*21=·03), delusional thinking-hallucinations (all) (*z*12=·04, *z*21=·03), aggression-agitation (*z*12=·04, *z*21=·04), arousal-elation (*z*12=·04, *z*21=·04). Considering centrality, agitation (*cin*=.115), aggression (*cin*=·078) and arousal (*cin*=·072), had the strongest in-centrality, whereas aggression (*cout*=·143), hostility (*cout*=·135) and hallucinations (all) (*cout*=·104) had the strongest out-centrality (eTable 13C). Results and visualisations for the contemporaneous and between-subject relationships of nodes for all sub-networks are presented in eResults 5. See eTable 12 for actual model and bootstrapped estimates. ### 3.4 Permutation Analysis The final nodes for permutation analysis are found in eMethods 6, with the actual model estimates in eTable 14. The histograms of permuted edge weights exhibited a normal (bell-shaped), zero-centred curve, indicating that further iterations are unlikely to affect the distribution or the results of these analyses. Out of all possible edge weight comparisons in the permutation analysis, few of them were significantly different: UMD-PSY (3·9%), followed by BMD-PSY (1·6%) and then UMD-BMD (0·4%). UMD showed a significantly stronger edge weight from irritability to tobacco use compared to BMD (*z*UMD-BMD=·059, p*permuted*<·001). In addition, the following edge weights were significantly stronger compared to PSY: cannabis use (autocorrelation) (*z*UMD-PSY=·051, p*permuted*<·001), agitation-suicidality (*z*UMD-PSY=·048, p*permuted*<·001), tobacco use-suicidality (*z*UMD-PSY=·038, p*permuted*<·001), mood instability-aggression (*z*UMD-PSY=·035, p*permuted*<·001), suicidality-hallucinations (all) (*z*UMD-PSY=·034, p*permuted*<·001) and cannabis-suicidality (*z*UMD-PSY=·033, p*permuted*<·001) (Figure 5). ![Figure 5.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/03/09/2024.03.08.24303965/F5.medium.gif) [Figure 5.](http://medrxiv.org/content/early/2024/03/09/2024.03.08.24303965/F5) Figure 5. Heat-maps for pairwise edge comparisons (UMD-BMD, BMD-PSY, UMD-PSY) in temporal sub-networks in permutation analysis. Magnitude and direction of effect size is colour-coded such that for the pairwise comparison Group1-Group2, yellow indicates the edge estimate is more positive in Group1>Group2 and blue indicates the opposite Group1