Predictive Machine Learning for Personalised Medicine in Major Depressive Disorder
==================================================================================

* Viktoria-Eleni Gountouna
* Mairead L. Bermingham
* Ksenia Kuznetsova
* Daniel Urda Muñoz
* Felix Agakov
* Siân E. Robson
* Joeri J. Meijsen
* Archie Campbell
* Caroline Hayward
* Eleanor M. Wigmore
* Toni-Kim Clarke
* Ana Maria Fernandez
* Donald J. MacIntyre
* Paul McKeigue
* David J. Porteous
* Kristin K. Nicodemus

## Abstract

Depression is a common psychiatric disorder with substantial recurrence risk. Accurate prediction from easily collected data would aid in diagnosis, treatment and prevention. We used machine learning in the Generation Scotland cohort to predict lifetime risk of depression and, among cases, recurrent depression. Rank aggregation was used to combine results across ten different algorithms and identify highly predictive variables. The model containing all but the cardiometabolic predictors had the highest predictive ability on independent data. Rank aggregation produced a reduced set of predictors without decreasing predictive performance (lifetime: 20 out of 154 predictors and Receiver Operating Characteristic area under the curve (AUC)=0·84, recurrent: 10 out of 180 predictors and AUC=0·76). Here we develop a pipeline which leads to a small set of highly predictive variables. This information can be easily collected with a smartphone ‘application’ to help diagnosis and treatment, while longitudinal tracking may help patients in self-management.

**Significance** Depression is the most common psychiatric disorder and a leading cause of disability worldwide. Patients are often diagnosed and treated by non-specialist clinicians who have limited time available to assess them. We present a novel methodology which allowed us to identify a small set of highly predictive variables for a diagnosis of depression, or recurrent depression in patients. This information can easily be collected using a tablet or smartphone application in the clinic to aid diagnosis.

Keywords
*   machine learning
*   prediction
*   psychiatry
*   depression

## Introduction

Major depressive disorder (MDD) is one of the most common mental disorders with a lifetime prevalence of around 15% (1), and it is also frequently recurrent (2-5) or persistent. The World Health Organization predicts that by 2030 13% of the total global disease burden will be accounted for by depression (6). Reducing the burden of MDD is a key public health challenge for the 21st century (7), with accurate diagnosis and prediction of recurrence of critical importance in reducing the disease burden. However, algorithms to predict incident and recurrent MDD that include data above that obtained on the basis of a structured clinical interview have contributed to improved prediction accuracies and may also improve diagnosis (8-10), especially when many patients see non-specialist clinicians such as general practitioners for diagnosis and treatment, where structured clinical interviews are impractical due to time constraints.

Prediction algorithms using standard statistical methodology have been developed previously for incident and/or recurrent MDD (7-12). These algorithms generated nominal to fairly high discriminative accuracy up to a C-statistic or AUC value of 0·79 (8-10). In these previous studies, the number and type of available predictors were limited to a small number of clinical features and applied standard statistical methodologies to develop prediction algorithms. However, in the age of “Big Data” – large-scale biobanking efforts with large sample sizes and hundreds to thousands of potential predictors – these standard statistical methodologies may be limited in their ability to advance personalized medicine in high-dimensional data. However, for clinical utility it is crucial to identify a concise and easily measured set of predictors with high predictive ability which can be used to determine a patient’s risk for lifetime or recurrent MDD. Ideally, these data could be combined with electronic health records to improve prediction and increase clinical utility. Indeed, advanced approaches such as machine learning have been successfully applied to prediction of treatment outcome in MDD (13).

We applied a set of modern “Big Data” approaches – a set of state-of-the-art machine learning methodologies – to the Generation Scotland: Scottish Family Health Study (GS:SFHS) cohort to predict lifetime and recurrent MDD. The predictors in GS:SFHS include cognitive function, personality, health and family history, socio-demographic, biometric, clinical and genomic data (14-15). We sought to apply these algorithms to a set of nested models to assess (a) algorithm performance and (b) to define a sparse set of predictors that may be useful in prediction in clinical practice by using a novel approach: the Markov Chain 4 (MC4) algorithm, which was developed for rank aggregation of Internet search engine rankings (16).

## Results

### Design Framework

We employed an array of machine learning algorithms, including tree-based, regression-based, neural networks and support vector machines (Methods). Data was divided into training and test sets (Fig. 1a). The training data was used to optimise hyperparameters for each algorithm using ten-fold cross validation and build prediction models (Fig. 1b). We assessed the performance of four nested models, described in detail in the Methods. Rank aggregation was used on the variable importance measures from the training data to select the top variables across methods. We performed independent replication on the test data using the models built on the training data with the area under the ROC curve as an outcome of interest.

![Fig. 1](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/02/11/2022.02.11.22270724/F1.medium.gif)

[Fig. 1](http://medrxiv.org/content/early/2022/02/11/2022.02.11.22270724/F1)

Fig. 1 
Schematic of Study Design. (a). Training and test sample creation. (b). Overall schematic of the study design. First, the training data are put through 10-fold cross validation to estimate appropriate hyperparameters. Second, the correct hyperparameters are applied to the full training set to create a model for prediction, and the variable importance measure rankings are recorded. Finally, the models developed on the training data are used for prediction of either lifetime MDD or single versus recurrent MDD on the test data.

### Demographics

For lifetime MDD, in both the training and test data, individuals with MDD were significantly younger, more likely to be female, to not be living as a couple, to live alone, to report lower income and live in areas that are more socioeconomically deprived than controls (all *p*-values < Bonferroni-corrected threshold of 0·005; SI Appendix, Table S1). There were no significant demographic differences between single and recurrent cases in the training or test sets using the Bonferroni *p*-value threshold.

View this table:
[Table S1.](http://medrxiv.org/content/early/2022/02/11/2022.02.11.22270724/T6)

Table S1. Demographic and Socioeconomic Characteristics in Cases and Controls and Single vs. Recurrent MDD

### Predictive Performance

The predictive performance across the four models and machine learning algorithms for lifetime and recurrent MDD are summarised in Table 1 for lifetime MDD and Table 2 for single versus recurrent MDD. In both study designs, model one performed the poorest, model two increased predictive performance, and model three outperformed all other models. Model four, including cardiometabolic predictors, did not significantly improve upon model three; in fact, for some algorithms the inclusion of these additional variables led to significantly decreased predictive ability. For the best-performing algorithm in lifetime MDD, gradient descent boosting (GDB), model three increased the AUC versus model two from approximately 0·71 to 0·84, which is a more than threefold increase in information for discrimination, from 0.4 bits to 1.4 bits, although AUC values were slightly lower than what is commonly used for biomarkers (17). This improvement was also observed for recurrent MDD, although AUC values for Model two were smaller, less than 0·6. The increase in AUC for Model three for the best model, random forest (RF), increased the AUC from Model two of 0·58 to around 0·76.

View this table:
[Table 1.](http://medrxiv.org/content/early/2022/02/11/2022.02.11.22270724/T1)

Table 1. AUC Results for Predicting MDD on Independent Test Data: Lifetime MDD

View this table:
[Table 2.](http://medrxiv.org/content/early/2022/02/11/2022.02.11.22270724/T2)

Table 2. AUC Results for Predicting MDD on Independent Test Data: Single versus Recurrent MDD

After adjusting for 45 tests within each model and focusing on Model three as the best performing model, the GDB AUC was significantly higher than both C5.0 and conditional inference forest (CIF; *p*-values 0·00017 and 0·000028, respectively; SI Appendix, Table S2); other algorithms performed as well as GDB. For single versus recurrent MDD, C5.0 and linear support vector machine (SVM-L) showed significantly poorer performance versus the best-performing algorithms (SI Appendix, Table S3).

View this table:
[Table S2.](http://medrxiv.org/content/early/2022/02/11/2022.02.11.22270724/T7)

Table S2. ROC Curve Comparisons between Methods within Models for Lifetime MDD

View this table:
[Table S3.](http://medrxiv.org/content/early/2022/02/11/2022.02.11.22270724/T8)

Table S3. ROC Curve Comparisons between Methods within Models for Single versus Recurrent MDD

### Ranked Variable Importance and MC4 Analysis

The MC4 aggregate-ranked variable importance measures were used to calculate AUC for both lifetime and recurrent MDD, selecting ten, 20, 30…N of the ranked list to determine the minimum number of variables required to attain maximum AUC values for prediction of MDD in the independent test set. For lifetime MDD, this value was reached after the addition of the top 20 variables. The top ranked variables included (in order): neuroticism, General Health Questionnaire (GHQ) total score, GHQ depression, GHQ somatic symptoms, age, family history of depression, income, live as a couple, sex, mother with depression, whether the participant owned their home, GHQ anxiety, ever smoked, Mood Disorder Questionnaire (MDQ) score, Schizotypal Personality Questionnaire (SPQ) total score, educational qualification, pain intensity, GHQ social dysfunction, whether they ever had chronic pain, and if they live with someone else or live alone (Fig. 2a). For recurrent MDD, the maximum AUC was achieved after including only ten of the top-ranked variables. The top-ranking variables included (in order of rank): age at MDD onset, neuroticism, GHQ total score, digit symbol substitution (processing speed), GHQ somatic symptoms, GHQ depression, age, GHQ social dysfunction, whether they own their home and age when starting smoking (Fig. 2b). We observed considerable overlap between the two sets of predictors, including GHQ total and subscale depression, somatic symptoms and depression scores, neuroticism, age, and whether the participant owns their home. However, there were interesting differences, where for lifetime MDD family history of depression, MDQ and SPQ scores were critical predictors that were not included in the models predicting recurrent depression; and for recurrent depression processing speed was ranked highly by MC4, which was not included in the set of predictors for lifetime depression.

![Fig. 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/02/11/2022.02.11.22270724/F2.medium.gif)

[Fig. 2.](http://medrxiv.org/content/early/2022/02/11/2022.02.11.22270724/F2)

Fig. 2. 
Concordance of individual algorithm rankings versus MC4 overall ranking. (a) Top panel shows concordance in lifetime MDD. Darkest blue = individual methods ranked in top 10, next darkest blue = top 20, light blue = top half, lightest blue = bottom half. The number of predictors in C5.0 (52) and FSR (41) was less than the top half of predictors, so any predictor with a zero-valued VIM was set to dark blue. GHQ = General Health Questionnaire, MDQ = Mood Disorder Questionnaire, SPQ = Schizotypal Personality Questionnaire (b) Bottom panel shows concordance in single versus recurrent episode MDD. Darkest orange = individual methods ranked in top 10, next darkest orange = top 20, light orange = top half, lightest orange = bottom half. The number of predictors in C5.0 (40), EN (37) and FSR (30) was less than the top half of predictors, so any predictor with a zero-valued VIM was set to dark orange. GHQ = General Health Questionnaire.

A comparison of the MC4-ranked variables and variables ranked by individual algorithms (Fig. 2a and 2b) showed that, with the exception of CIF in recurrent depression, none of the methods were fully consistent with the aggregated ranking. In fact, EN only ranked three predictors in the top 20 for lifetime depression and none in the top ten for recurrent depression. However, using the MC4-ranked subset of 20 variables for lifetime MDD and ten variables for single episode versus recurrent MDD led to equal – if not improved – performance across all methods. In lifetime MDD, the MC4-based model AUC values did not differ from those using the full set of 154 variables (Table 2). In single versus recurrent MDD, the reduced set of ten variables produced from MC4 improved predictive performance at the *p*-value < 0·05 level across six of the ten methods, for example, increasing the AUC for SVM-L from 0·695 using 180 variables in Model three to 0·771 using the MC4 set of ten.

### Leave-One-Out Analysis of MC4 Sets

To assess the relative contribution of predictors in the MC4 sets, we performed leave-one-variable-out analysis of these subsets, holding the random number seed constant. For lifetime MDD, only the removal of neuroticism showed a significant reduction in AUC values across machines (Δ AUC range = 0·024-0·034; *p*-values range = 0·00018 – 2·80 × 10−08; Supplementary Table 4). For single versus recurrent MDD, the removal of age at onset showed the largest reduction in AUC values across algorithms, with a change in AUC ranging from 0.052 for CIF to 0.088 for C5.0 (significant *p*-value range = 0·00052 to 2·36×10−05). Other variables showed a variable pattern across methods when being removed; none showed a consistent increase or decrease, which was expected as MC4 aggregates rankings across methods. However, the change in AUC values were small for all other variables.

View this table:
[Table S4.](http://medrxiv.org/content/early/2022/02/11/2022.02.11.22270724/T9)

Table S4. AUC Differences between MC4 Full Model and Leave-One-Out Analysis of Top 20 Predictors in Lifetime Depression

### Effect size of Model 3 and MC4 Models

Hedges *g* was used to estimate effect sizes across different algorithms for Model three and the MC4 model in both study designs (Supplementary Table 6). Lifetime effect sizes were large, with all *g* values > 1·0 (Model three range = 1·06 to 1·41 MC4 range = 1·05 to 1·26). For single versus recurrent MDD, the use of the MC4 set increased *g* values; in fact, for several algorithms the *g* values were within the range of “medium” (e.g., < 0·8) when using Model three, but all *g* values were large (ranging from 0·93 to 1·13) using MC4 (Supplementary Table 6).

View this table:
[Table S5.](http://medrxiv.org/content/early/2022/02/11/2022.02.11.22270724/T10)

Table S5. AUC Differences between MC4 Full Model and Leave-One-Out Analysis of Top 20 Predictors in Single versus Recurrent Depression

View this table:
[Table S6.](http://medrxiv.org/content/early/2022/02/11/2022.02.11.22270724/T11)

Table S6. Effect Size Given by Hedges *g* for Model 3 and MC4 Model in Lifetime and Single versus Recurrent MDD

## Discussion

Using state-of-the-art machine learning methods combined with a novel usage of the MC4 algorithm, we have defined a set of predictive variables with AUC values that are larger than those observed in previous studies; for lifetime MDD, the AUC value is within the range of predictive ability for use in clinic. These algorithms were trained on a large biobank with deep phenotyping, in contrast to previous approaches, and may be easily applied to other similar “Big Data” applications.

The most consistent demographic factors associated with lifetime MDD risk in previous studies have been female sex, younger age and low socioeconomic status (18-22). Psychosocial risk factors commonly associated with MDD include negative life events, traumatic experiences, work-related stress, financial strain, poor marital or interpersonal relationships, lack of social support and low self-esteem (10, 23-26). The number of previous episodes, the level of residual symptoms, and childhood maltreatment are consistent risk factors of recurrent MDD (27-28). We highlight that our findings suggest neuroticism is one of the top predictors of lifetime MDD and also recurrent MDD. Neuroticism is easily measured by a general practitioner in clinic and would be the best single measure for prediction of both types of depression.

Using our algorithm, we provide descriptive characteristics of individuals who are at each quintile of our prediction engine, as an illustration of patients a general practitioner may see in clinic (Table 3). For example, versus individuals who would have the lowest predicted risk of lifetime MDD (Table 4), individuals at higher risk levels would be more likely to score higher on measures of neuroticism, general psychological distress (GHQ), have a family history of depression, be a current smoker and live alone. For individuals with a higher predicted probability of having recurrent depression versus single-episode (Table 5), higher risk for recurrent MDD would be more likely observed in individuals with a lower age at MDD onset, higher neuroticism, higher psychological distress (GHQ), and poorer performance on digit symbol coding.

View this table:
[Table 3.](http://medrxiv.org/content/early/2022/02/11/2022.02.11.22270724/T3)

Table 3. AUC Results for Best Performing Model versus MC4 Model

View this table:
[Table 4.](http://medrxiv.org/content/early/2022/02/11/2022.02.11.22270724/T4)

Table 4. Examples of Individuals at Quintiles of Predicted MDD using MC4 Predictors: Lifetime MDD

View this table:
[Table 5.](http://medrxiv.org/content/early/2022/02/11/2022.02.11.22270724/T5)

Table 5. Examples of Individuals at Quintiles of Predicted MDD using MC4 Predictors :Single versus Recurrent MDD

The main strength of this study is that it has assessed the importance and contribution of well-known and novel predictors of MDD measured using standardised tests in a large population-based cohort; and further introduces the MC4 algorithm for rank aggregation of results. Generation Scotland is a large homogeneous population that to date is one of the few cohorts containing a clinical definition of MDD on a large-scale sample. The two study designs (lifetime MDD–healthy controls and single episode-recurrent MDD) allow for insights into shared and differing risk factors. Limitations of this study are the sample size and the lack of replication in an independent large-scale cohort with a clinical definition of MDD.

This study has shown that a combination of simple self-report measures can form the basis of models that provide excellent prediction of lifetime or recurrent MDD. Given that the measures identified through the MC4 algorithm are relatively easy to obtain, a possible implementation of the findings of this work could be to collect these measures at regular intervals, which would provide a basis for longitudinal assessment of change in individuals or groups of patients. Such evidence on patient outcomes would be useful in tracking the impact of interventions, especially treatment response to antidepressants or psychological interventions such as cognitive behavioural therapy. If incorporated into clinical practice, the use of algorithms to predict a patient’s risk of MDD could have a substantial impact on the timeliness and effectiveness of diagnosis and treatment. The measures can be obtained outside of the clinic and the resulting data-based risk prediction can inform clinicians’ decisions, allowing more time during consultations to discuss issues specific to the patient to accommodate a personalised approach. Another exciting prospect for the implementation of the results of this study could therefore be to use the MC4 algorithm as the basis of a decision support tool that could be incorporated into a mobile application (app) for the benefit of patients, support workers and clinicians. Patients could enter their own data into the app and a built-in algorithm would be able to forewarn or reassure about lifestyle changes that might presage or mitigate recurrence. Apps are increasingly being developed and used successfully in health and social care to access treatment guidelines and to support decisions about patient screening, treatment options and drug dosage (29). In the current era of increased availability of data, new systems for linking and utilising healthcare and other records could provide crucial and timely information to support personalised healthcare.

## Materials and Methods

### Generation Scotland: Scottish Family Health Study

Our sample was drawn from the GS:SFHS, a large population-based study. A full description of the cohort and protocol for recruitment is described in detail elsewhere (14-15). Briefly, the GS:SFHS is a representative survey of the Scottish population, consisting of 23,960 individuals over 18 years of age recruited between the years 2006 and 2011. All components of GS:SFHS received ethical approval from the NHS Tayside Committee on Medical Research Ethics (REC Reference Number: 05/S1401/89) and all participants gave written informed consent. Generation Scotland data are available under managed access by submitting a proposal to the Generation Scotland Data Access Committee.

The selection criteria for participants included: Caucasian ethnicity, born in the UK and phenotype data available from attendance at a Generation Scotland research clinic (30). We further restricted the analysis to unrelated individuals, as the machine learning algorithms employed required independent observations. Cases were not matched to controls as doing so would remove the effect of the matching criteria from analysis; thus preventing assessment of potential interaction effects for those criteria.

### Assessment of MDD

All participants in GS:SFHS were screened for psychiatric disorders and those who responded positively to screening questions were invited to continue with the MDD and bipolar disorder modules from the Structured Clinical Interview for Diagnostic and

Statistical Manual of Mental Disorders, Fourth Edition (SCID;31). The diagnostic interview was conducted in person by trained clinical research nurses.

### Predictors of MDD: Self-Report, Clinical, Cognitive and Demographic

#### MDD Symptomology

For analysis of single versus recurrent depression, we included 12 symptom predictors from the SCID interview, plus age at onset and a variable indicating which nurse conducted the interview. Missing values for SCID interview questions were controlled for by creating a binary vector indicating which values were not recorded.

#### General Psychological Distress

Psychological distress was measured using the 28-item General Health Questionnaire (GHQ; 32) at baseline by a trained clinical nurse. Fifty-seven percent of the study participants were also given the Mood Disorder Questionnaire (MDQ; 33) and Schizotypal Personality Questionnaire-Brief (SPQ-B; 34).

#### Cognitive Performance

Cognitive performance was assessed using measures of processing speed (Wechsler Digit Symbol Substitution Task; 35), verbal declarative memory (Wechsler Logical Memory Test; 36) both immediate and delayed, executive function measured with the letter-based phonemic verbal fluency test using the letters C, F, and L, each for one minute (37) and vocabulary (Mill Hill Vocabulary Scale; 38).

#### Demographic and Socioeconomic Status

The Scottish Index of Multiple Deprivation (SIMD; 39) 2009, matched to each participant’s postcode, was used to assess socioeconomic status. SIMD is a ranking based on seven domains: income, employment, health, education, geographic access, crime, and housing.

#### Clinical Measurements, Personal and Family Medical History

Clinical measures included height, weight, BMI, waist, hip, waist:hip ratio, percent body fat, systolic and diastolic blood-pressure (both mean of two measurements), respiratory function (forced expiratory volume in one second, forced vital capacity and forced expiratory flow; each measured three times) and blood sodium, potassium, urea, creatinine, glucose, total cholesterol and HDL-cholesterol levels (14). Participants were asked about whether they or their first-degree relatives had a history of depression, Parkinson disease, Alzheimer disease, diabetes, asthma, cardiovascular disease (including heart disease, hypertension and stroke), osteoarthritis, rheumatoid arthritis, hip fracture or common cancers (including breast, bowel, lung and prostate).

#### Alcohol and Tobacco Use

Participants were identified as current drinkers, former drinkers (either stopping greater than or less than 12 months) or never drinkers. Current alcohol consumption was characterised as either more or less than usual for each participant. Total consumption was measured in self-reported units of alcohol consumed in the previous week. Participants were classified into two groups: ever-smokers (current smokers and former smokers) and never smokers, with ever-smokers also asked at what age they started smoking and amount of tobacco consumption.

#### Chronic Pain

A validated self-report chronic pain questionnaire was used to assess pain severity based on its intensity and impact of pain on daily functioning in the previous three months40.

### Analytic Methods

#### Training and Test Datasets

One thousand and nineteen randomly-selected unrelated participants who had at least one episode of major depressive disorder were randomly selected as cases. Randomly-selected, healthy unrelated individuals (N = 3,994) with no lifetime diagnosis of psychiatric disorders served as controls. For analyses examining single versus recurrent major depressive disorder, we randomly resampled unrelated individuals from GS:SFHS to increase sample size (as controls were excluded), leading to a total of 1,198.

The final data sets were divided into (a) training data for machine learning hyper-parameter optimisation and model building and (b) test data (Fig. 1a). The test data were not used to develop any prediction models and formed an independent replication set. Many machine learning methods for binary classification require relatively equal numbers of cases and controls to assure optimised prediction (41). We performed a down-sampling procedure to balance the case-control ratio in the training data. The minority class (e.g., cases with MDD and recurrent MDD) were randomly split into a training set (63%) and an independent test set (37%). The majority class (e.g., controls and single episode MDD) was randomly under-sampled to make its frequency equal to the minority class for the training data (see Table 1 for Ns). Hedges g (42) was used to estimate effect sizes because of imbalance in the independent test sets.

#### Machine Learning Algorithms and Hyper-Parameter Settings

A panel of predictive machine learning algorithms was applied to the data. We applied the C5.0 tree-based algorithm (43-44), two tree-based ensemble algorithms (random forest, RF; 45-46) and conditional inference forest (CIF; 47-48), a gradient descent boosting algorithm (GDB; 49-50), elastic net, a regularised regression technique (EN; 51-52), feed-forward neural networks (NN; 53-54), and support vector machines (SVM; 55-57) with the following kernels: linear (SVM-L), polynomial (SVM-P) and radial basis function (SVM-R). We also used forward stepwise regression with Akaike’s Information Criterion (58). These methods were selected as best examples of a range of machine learning methodologies commonly applied to biologic data.

The R package caret (59) was used to perform ten-fold cross validation on the training data to set optimised values for the hyper-parameters for each algorithm (Fig. 1b; details about hyper-parameter selection in SI Appendix, Methods). All analyses were conducted in R version 3.2.4 (60).

#### Receiver Operating Characteristic (ROC) area under the curve (AUC) and Variable Importance Measures (VIMs)

The outcome of interest was the area under the ROC curve from the independent test data using models built on the training data. We used the nonparametric DeLong (61) method as implemented in pROC (62) to test for statistical differences between areas under the curve among independent test data. We sought to find the largest ROC AUC whilst minimising the number of variables required such that a smaller set could potentially be used for prediction by general practitioners. We also report an alternative measure to the AUC to aid in interpretation, where to quantify differences in predictive performance we use the expected weight of evidence as measured in bits (63). Machine learning algorithms generate measures, or VIMs, of the relative contribution of predictors in classifier construction, or a measure of the strength of association between predictor and outcome. For C5.0 the VIM used was the percent of observations in the nodes directly under the split for that variable; for RF and CIF it was the permutation-based VIM; for EN and FSR it was the absolute value of the coefficients, for NN and all SVM we used caret’s internal VIM function, which is not dependent on the algorithm and simply measures the AUC increase when including the variable over a null model as these methods do not provide a direct measure of variable importance.

#### Markov Chain 4 (MC4) Algorithm

We took ranked variable importance measures from the training data using the optimal set of hyper-parameters for each algorithm and applied the Markov Chain 4 (MC4) algorithm (16, 64) to provide an aggregated variable ranking. To begin, the MC4 algorithm considers a single set *U* that contains the union of the top *k* elements from all lists. For each pair *i* and *j* in the list (*i* ≠ *j*), set the Markov chain transition matrix *M* element *m**ij* to 1/|*U*| if > 50% of the lists rank *j* above *i* and 0/|*U*| otherwise. If *i* and *j* are never contained in the same list, then *m**ij* = *m**ji* = 0.5/|*U*|, and let *m**ii* =. If ε is a small positive constant, it creates an ergodic transition matrix *M* by ((1-ε)**m*) + ε/|*U*|. The resulting stationary probabilities are then used to create an aggregate ranking of the variables, with higher probabilities denoting a higher rank (i.e., the Markov process spent longer time in those states). The proposed novel use of meta-ranking technology provides robust inferences about the relative contribution to risk for MDD across different machines. We then selected the top ten, 20, …, N ranked variables and fitted them in the best-performing model to assess the number of variables required to reach maximum predictive ability. Leaving one predictor out at a time, we assessed the relative contribution of each predictor to the final model AUC values in the independent test set.

#### Models Assessed

We assessed performance of four nested models, where *outcome**k* was either (1) lifetime MDD versus controls or (2) single versus recurrent depression: ![Formula][1]</img>  ![Formula][2]</img>  Building upon model two, the third model included mood, schizotypy and cognitive variables; family and personal medical history (excluding stroke, metabolic and cardiovascular history); as well as smoking and alcohol consumption. In addition, for the analysis of single versus recurrent MDD, we added variables derived from the SCID interview; this led to a total of 154 variables for lifetime MDD and 180 for analysis of single versus recurrent MDD. We hypothesised that cardiometabolic predictors may not improve prediction in all MDD cases but instead particularly in those with a later age at onset (65). Model four included Model three plus cardiovascular and stroke medical and family history; obesity, diabetes, FEV and lab measurements as these are commonly co-morbid with depression (N predictors: lifetime MDD: 211; single versus recurrent MDD: 237).

Hyperparameters were optimised using ten-fold cross validation on the training data. We used the nonparametric DeLong approach to test for statistical differences between areas under the curve among independent test data.

## Supporting information

Supplemental Materials [[supplements/270724_file02.docx]](pending:yes)

## Data Availability

Generation Scotland data are available under managed access by submitting a proposal to the Generation Scotland Data Access Committee.

[https://www.ed.ac.uk/generation-scotland](https://www.ed.ac.uk/generation-scotland) 

## Author Contributions

KKN designed the study, supervised and performed analysis, and wrote the manuscript. VEG, MLB, and KK performed analysis and wrote the manuscript. JJM and SER wrote the manuscript. DJP co-designed the study and edited the manuscript. DUM, FA, PM, AC, CH, EW, TC, AMF, DJM performed analysis or assistance in data coding and edited the manuscript. AM edited the manuscript.

## Supporting Information

### Supplementary Methods

#### Machine learning optimisation

The optimisation was set to a grid search for C5.0 (number of trials 1-20), CIF and RF (number of trees set to 1000, number of variables selected at each split (*mtry*) set to a grid of 10-15 values up to the total number of variables), GDB (interaction depth grid set to 1-3, number of iterations grid set to between 1 000 and 5 000, shrinkage held constant at 0.001), EN (grid for alpha and lambda set between 0 and 1). For NN and SVM models, we used a random search as the number of hyper-parameters was large and a random search has been shown to have equal or improved performance versus a grid search for these methods (1). Data were centred and scaled before analysis using SVMs as different scales of measurement can lead to numerical difficulties during the calculation of the inner products of the variables.

#### Imputation

For variables with less than 5% missing data, multivariate imputation by chained equations (MICE; 2-3) was used to replace missing values.

## Acknowledgments

VEG and KKN were supported by a University of Edinburgh-Medical Research Council ([https://mrc.ukri.org/](https://mrc.ukri.org/)) Confidence in Concept award and by a Wellcome Trust ([https://wellcome.ac.uk/](https://wellcome.ac.uk/))-University of Edinburgh Institutional Strategic Support Fund award (to KKN). KKN, MLB, DUM, FA, PM, DJP were supported by a University of Edinburgh-Medical Research Council ([https://mrc.ukri.org/](https://mrc.ukri.org/)) Confidence in Concept award (to DJP). KKN and JJM were supported by the Rosetrees Trust ([http://www.rosetreestrust.co.uk/](http://www.rosetreestrust.co.uk/); M405) and VEG, SER and KKN by a Chancellor’s Fellowship from the University of Edinburgh. DUM was supported by European Union FP7 ([https://ec.europa.eu/research/fp7/index_en.cfm](https://ec.europa.eu/research/fp7/index_en.cfm)) 316861 MLPM. DJM acknowledges the financial support of NHS Research Scotland ([http://www.nhsresearchscotland.org.uk/](http://www.nhsresearchscotland.org.uk/); NRS), through NHS Lothian. Generation Scotland received core support from the Chief Scientist Office of the Scottish Government Health Directorates ([http://www.cso.scot.nhs.uk/](http://www.cso.scot.nhs.uk/)) [CZD/16/6] and the Scottish Funding Council ([http://www.sfc.ac.uk/](http://www.sfc.ac.uk/)) [HR03006]. Genotyping of the GS:SFHS samples was carried out by the Genetics Core Laboratory at the Wellcome Trust Clinical Research Facility, Edinburgh, Scotland and was funded by the Medical Research Council UK ([https://mrc.ukri.org/](https://mrc.ukri.org/)) and the Wellcome Trust ([https://wellcome.ac.uk/](https://wellcome.ac.uk/); Wellcome Trust Strategic Award “STratifying Resilience and Depression Longitudinally” (STRADL) Reference 104036/Z/14/Z). We are grateful to all the families who took part, the general practitioners and the Scottish School of Primary Care for their help in recruiting them, and the whole Generation Scotland team. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

## Footnotes

*   The authors have declared that no conflict of interest exists.

*   Received February 11, 2022.
*   Revision received February 11, 2022.
*   Accepted February 11, 2022.


*   © 2022, Posted by Cold Spring Harbor Laboratory

This pre-print is available under a Creative Commons License (Attribution-NoDerivs 4.0 International), CC BY-ND 4.0, as described at [http://creativecommons.org/licenses/by-nd/4.0/](http://creativecommons.org/licenses/by-nd/4.0/)

## References

1.  1.Kessler RC, et al. (2005) Lifetime Prevalence and Age-of-Onset Distributions of DSM-IV Disorders in the National Comorbidity Survey Replication. Arch Gen Psychiatry 62(6):593.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1001/archpsyc.62.6.593&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=15939837&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F11%2F2022.02.11.22270724.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000229628400003&link_type=ISI) 

2.  2.Hardeveld F, Spijker J, De Graaf R, Nolen WA, Beekman ATF (2013) Recurrence of major depressive disorder and its predictors in the general population: results from The Netherlands Mental Health Survey and Incidence Study (NEMESIS). Psychol Med 43(01):39–48.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1017/S0033291712002395&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23111147&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F11%2F2022.02.11.22270724.atom) 

3.  3.Mueller TI, et al. (1999) Recurrence after recovery from major depressive disorder during 15 years of observational follow-up. Am J Psychiatry 156(7):1000–6.
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=10401442&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F11%2F2022.02.11.22270724.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000081187200004&link_type=ISI) 

4.  4.Wang J (2004) A longitudinal population-based study of treated and untreated major depression. Med Care 42(6):543–550.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1097/01.mlr.0000128001.73998.5c&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=15167322&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F11%2F2022.02.11.22270724.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000221585500006&link_type=ISI) 

5.  5.Wang JL, Patten SB, Currie S, Sareen J, Schmitz N (2012) Predictors of 1-year outcomes of major depressive disorder among individuals with a lifetime diagnosis: a population-based study. Psychol Med 42(02):327–334.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1017/S0033291711001218&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21740627&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F11%2F2022.02.11.22270724.atom) 

6.  6.Sayers J (2001) The world health report 2001 - Mental health: new understanding, new hope. Bull World Health Organ 79:1085–1085.
    
    
7.  7.King M, et al. (2008) Development and Validation of an International Risk Prediction Algorithm for Episodes of Major Depression in General Practice Attendees. Arch Gen Psychiatry 65(12):1368.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1001/archpsyc.65.12.1368&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19047523&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F11%2F2022.02.11.22270724.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000261286400003&link_type=ISI) 

8.  8.van Loo HM, Aggen SH, Gardner CO, Kendler KS (2015) Multiple risk factors predict recurrence of major depressive disorder in women. J Affect Disord 180:52–61.
    
    
9.  9.Wang JL, et al. (2013) Development and validation of prediction algorithms for major depressive episode in the general population. J Affect Disord 151(1):39–45.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.jad.2013.05.045&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23790813&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F11%2F2022.02.11.22270724.atom) 

10. 10.Wang JL, et al. (2014) DEVELOPMENT AND VALIDATION OF A PREDICTION ALGORITHM FOR USE BY HEALTH PROFESSIONALS IN PREDICTION OF RECURRENCE OF MAJOR DEPRESSION. Depress Anxiety 31(5):451–457.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/da.22215&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24877248&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F11%2F2022.02.11.22270724.atom) 

11. 11.King M, et al. (2013) Predicting onset of major depression in general practice attendees in Europe: extending the application of the predictD risk algorithm from 12 to 24 months. Psychol Med 43(09):1929–1939.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1017/S0033291712002693&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23286278&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F11%2F2022.02.11.22270724.atom) 

12. 12.Wang J, et al. (2014) A prediction algorithm for first onset of major depression in the general population: development and validation. J Epidemiol Community Health 68(5):418–24.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoiamVjaCI7czo1OiJyZXNpZCI7czo4OiI2OC81LzQxOCI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIyLzAyLzExLzIwMjIuMDIuMTEuMjIyNzA3MjQuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 

13. 13.Chekroud AM, et al. (2016) Cross-trial prediction of treatment outcome in depression: a machine learning approach. The Lancet Psychiatry 3(3):243–250.
    
    
14. 14.Smith BH, et al. (2013) Cohort Profile: Generation Scotland: Scottish Family Health Study (GS:SFHS). The study, its participants and their potential for genetic research on health and illness. Int J Epidemiol 42(3):689–700.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/ije/dys084&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22786799&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F11%2F2022.02.11.22270724.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000322955900012&link_type=ISI) 

15. 15.Smith BH, et al. (2006) Generation Scotland: the Scottish Family Health Study; a new resource for researching genes and heritability. BMC Med Genet 7(1):74.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/1471-2350-7-74&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=17014726&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F11%2F2022.02.11.22270724.atom) 

16. 16.Dwork C, Kumar R, Naor M, Sivakumar D (2001) Rank aggregation methods for the Web. Proceedings of the Tenth International Conference on World Wide Web - WWW ‘01 (ACM Press, New York, New York, USA), pp 613–622.
    
    
17. 17.Xia J, Broadhurst DI, Wilson M, Wishart DS (2013) Translational biomarker discovery in clinical metabolomics: an introductory tutorial. Metabolomics 9(2):280–299.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s11306-012-0482-9&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23543913&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F11%2F2022.02.11.22270724.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000316913200003&link_type=ISI) 

18. 18.Skapinakis P, Weich S, Lewis G, Singleton N, Araya R (2006) Socio-economic position and common mental disorders. Br J Psychiatry 189(02):109–117.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MTA6ImJqcHJjcHN5Y2giO3M6NToicmVzaWQiO3M6OToiMTg5LzIvMTA5IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjIvMDIvMTEvMjAyMi4wMi4xMS4yMjI3MDcyNC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

19. 19.Wang JL, Schmitz N, Dewa CS (2010) Socioeconomic status and the risk of major depression: the Canadian National Population Health Survey. J Epidemiol Community Health 64(5):447–52.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoiamVjaCI7czo1OiJyZXNpZCI7czo4OiI2NC81LzQ0NyI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIyLzAyLzExLzIwMjIuMDIuMTEuMjIyNzA3MjQuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 

20. 20.Weich S, Lewis G (1998) Material standard of living, social class, and the prevalence of the common mental disorders in Great Britain. J Epidemiol Community Health 52(1):8–14.
    
    [Abstract](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoiamVjaCI7czo1OiJyZXNpZCI7czo2OiI1Mi8xLzgiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMi8wMi8xMS8yMDIyLjAyLjExLjIyMjcwNzI0LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 

21. 21.Weich S, Lewis G (1998) Poverty, unemployment, and common mental disorders: population based cohort study. BMJ 317(7151):115–9.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiYm1qIjtzOjU6InJlc2lkIjtzOjEyOiIzMTcvNzE1MS8xMTUiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMi8wMi8xMS8yMDIyLjAyLjExLjIyMjcwNzI0LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 

22. 22.Weich S, Sloggett A, Lewis G (1998) Social roles and gender difference in the prevalence of common mental disorders. Br J Psychiatry 173(06):489–493.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MTA6ImJqcHJjcHN5Y2giO3M6NToicmVzaWQiO3M6OToiMTczLzYvNDg5IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjIvMDIvMTEvMjAyMi4wMi4xMS4yMjI3MDcyNC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

23. 23.Blazer DG, Hybels CF (2005) Origins of depression in later life. Psychol Med 35(09):1241.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1017/S0033291705004411&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=16168147&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F11%2F2022.02.11.22270724.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000232320800001&link_type=ISI) 

24. 24.Goldberg D (2006) The aetiology of depression. Psychol Med 36(10):1341.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1017/S0033291706007665&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=16740176&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F11%2F2022.02.11.22270724.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000241671800001&link_type=ISI) 

25. 25.Hardeveld F, Spijker J, De Graaf R, Nolen WA, Beekman ATF (2010) Prevalence and predictors of recurrence of major depressive disorder in the adult population. Acta Psychiatr Scand 122(3):184–191.
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20003092&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F11%2F2022.02.11.22270724.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000280366000002&link_type=ISI) 

26. 26.Hope S, Power C, Rodgers B (1999) Does financial hardship account for elevated psychological distress in lone mothers? Soc Sci Med 49(12):1637–1649.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0277-9536(99)00251-8&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=10574235&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F11%2F2022.02.11.22270724.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000082985100005&link_type=ISI) 

27. 27.Wang J, Patten SB, Currie S, Sareen J, Schmitz N (2012) A Population-based Longitudinal Study on Work Environmental Factors and the Risk of Major Depressive Disorder. Am J Epidemiol 176(1):52–59.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/aje/kwr473&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22556191&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F11%2F2022.02.11.22270724.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000305827500006&link_type=ISI) 

28. 28.Nanni V, Uher R, Danese A (2012) Childhood Maltreatment Predicts Unfavorable Course of Illness and Treatment Outcome in Depression: A Meta-Analysis. Am J Psychiatry 169(2):141–151.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1176/appi.ajp.2011.11020335&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22420036&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F11%2F2022.02.11.22270724.atom) 

29. 29.NHS Education for Scotland Mobile Knowledge.
    
    
30. 30.Raven JC, Court JH (1977) Manual for Raven’s Progressive Matrices and Vocabulary Scales (HK Lewis).
    
    
31. 31.SIMD (2009) Scottish Index of Multiple Deprivation.
    
    
32. 32.Kerr SM, et al. (2013) Pedigree and genotyping quality analyses of over 10,000 DNA samples from the Generation Scotland: Scottish Family Health Study. BMC Med Genet 14(1):38.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/1471-2350-14-38&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23521772&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F11%2F2022.02.11.22270724.atom) 

33. 33.First M (2005) Structured clinical interview for DSM-IV-TR axis I disorders: patient edition (Biometrics Research Department, Columbia University).
    
    
34. 34.Goldberg DP, Hillier VF (1979) A scaled version of the General Health Questionnaire. Psychol Med 9(01):139.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1017/S0033291700021644&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=424481&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F11%2F2022.02.11.22270724.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1979GM11800016&link_type=ISI) 

35. 35.Hirschfeld RMA (2002) The Mood Disorder Questionnaire: A Simple, Patient-Rated Screening Instrument for Bipolar Disorder. Prim Care Companion J Clin Psychiatry 4(1):9–11.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.4088/PCC.v04n0104&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=15014728&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F11%2F2022.02.11.22270724.atom) 

36. 36.Raine A, Benishay D (1995) The SPQ-B: A Brief Screening Instrument for Schizotypal Personality Disorder. J Pers Disord 9(4):346–355.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1521/pedi.1995.9.4.346&link_type=DOI) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1995TK92300006&link_type=ISI) 

37. 37.Wechsler D (1998) WAIS-III UK Wechsler Adult Intelligence Scale (Psychol Corp).
    
    
38. 38.Wechsler D (1998) WAIS-III UK Wechsler Memory Scale-Revised (Psychol Corp).
    
    
39. 39.Lezak M, Howieson D, Bigler E, Tranel D (2012) Neuropsychological Assessment (Oxford University Press).
    
    
40. 40.Von Korffa M, Ormela J, Keefeb FJ, Dworkinc SF (1992) Grading the severity of chronic pain. Pain 50(2):133–149.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/0304-3959(92)90154-4&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=1408309&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F11%2F2022.02.11.22270724.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1992JL12900002&link_type=ISI) 

41. 41.Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2012) A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches. IEEE Trans Syst Man, Cybern Part C (Applications Rev 42(4):463–484.
    
    
42. 42.Ridgeway G (2010) gbm: generalized boosted regression models. R package version 1.6-3.1.
    
    
43. 43.Hedges L V. (1981) Distribution Theory for Glass’s Estimator of Effect size and Related Estimators. J Educ Stat 6(2):107–128.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2307/1164588&link_type=DOI) 

44. 44.Quinlan JR (John R (1993) C4.5 : programs for machine learning (Morgan Kaufmann Publishers).
    
    
45. 45.Kuhn M, Weston S, Coulter N, Culp M (2015) C5.0 decision trees and rule-based models for pattern recognition. R package version 0.1.0-24.
    
    
46. 46.Breiman L (2001) Random Forests. Mach Learn 45(1):5–32.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1023/A:1010933404324&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=28752533&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F11%2F2022.02.11.22270724.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000170489900001&link_type=ISI) 

47. 47.Wright MN (2017) ranger: A fast implementation of Random Forests. R package version 0.8.0.
    
    
48. 48.Hothorn T, Hornik K, Zeileis A (2006) Unbiased Recursive Partitioning: A Conditional Inference Framework. J Comput Graph Stat 15(3):651–674.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1198/106186006X133933&link_type=DOI) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000240705500009&link_type=ISI) 

49. 49.Hothorn T, Hornik K, Strobl C, Zeileis A (2017) party: A laboratory for recursive partytioning. R package version 1.2-3.
    
    
50. 50.Friedman JH (2001) Greedy Function Approximation: A Gradient Boosting Machine. Ann Stat 29:1189–1232.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1214/aos/1013203451&link_type=DOI) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000173361700001&link_type=ISI) 

51. 51.Friedman J, Hastie T, Tibshirani R (2010) Regularization Paths for Generalized Linear Models via Coordinate Descent. J Stat Softw 33(1):1–22.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/j.1467-9868.2005.00503.x&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20808728&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F11%2F2022.02.11.22270724.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000275203200001&link_type=ISI) 

52. 52.Friedman J, Hastie T, Simon N, Qian J, Tibshirani R (2017) glmnet: Lasso and elastic-net regularized generalized linear models. R package version 2.0-13.
    
    
53. 53.DeLong ER, DeLong DM, Clarke-Pearson DL (1988) Comparing the Areas under Two or More Correlated Receiver Operating Characteristic Curves: A Nonparametric Approach. Biometrics 44(3):837.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2307/2531595&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=3203132&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F11%2F2022.02.11.22270724.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1988Q069100016&link_type=ISI) 

54. 54.Robin X, et al. (2011) pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics 12(1):77.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/1471-2105-12-77&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21414208&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F11%2F2022.02.11.22270724.atom) 

55. 55.Ripley BD (1996) Pattern Recognition and Neural Networks (Cambridge University Press).
    
    
56. 56.Ripley B (2016) nnet: Feed-forward neural networks and multinomial log-linear models. R package version 7.3-12.
    
    
57. 57.Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/BF00994018&link_type=DOI) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1995RX35400003&link_type=ISI) 

58. 58.David Meyer D, et al. (2017) e1071: Misc functions of the Department of Statistics, Probability Theory Group (formerly: E1071), TU Wien. R package version 1.6-8.
    
    
59. 59.Karatozoglou A, Smola A, Hornik K (2016) kernlab: Kernel-based machine learning lab. R package version 0.9-25.
    
    
60. 60.Ripley B, et al. (2017) Support functions and datasets for Venables and Ripley’s MASS. R package 7.3-47.
    
    
61. 61.Kuhn M (2017) caret: Classification and regression training. R package 6.0-77.
    
    
62. 62.R Core Team (2016) R: A Language and Environment for Statistical Computing. Available at: [https://www.r-project.org/](https://www.r-project.org/).
    
    
63. 63.McKeigue P (2017) Sample size requirements for learning to classify with high-dimensional biomarker panels. Stat Methods Med Res:096228021773880.
    
    
64. 64.Slawski M, Boulesteix A-L (2017) GeneSelector: Stability and aggregation of ranked gene lists. R package version 2.28.0.
    
    
65. 65.Seldenrijk A, et al. (2011) Carotid atherosclerosis in depression and anxiety: Associations for age of depression onset. World J Biol Psychiatry 12(7):549–558.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.3109/15622975.2011.583942&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21745125&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F02%2F11%2F2022.02.11.22270724.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000296238500008&link_type=ISI) 

## Supporting Information References

1.  1.Bergstra J, Bengio Y (2012) Random Search for Hyper-Parameter Optimization. J Mach Learn Res 13(Feb):281–305.
    
    
2.  2.van Buuren S, Groothuis-Oudshoorn K, van Buuren S, Groothuis-Oudshoorn K (2011) mice: Multivariate Imputation by Chained Equations in R. J Stat Softw 045(i03).
    
    
3.  3.van Buuren S, et al. (2017) mice: Multivariate imputation by chained equations. R package version 2.46.0.

 [1]: /embed/graphic-8.gif
 [2]: /embed/graphic-9.gif