Abstract
Multiple sclerosis is a heterogeneous disease with an unpredictable course. We applied machine learning to generate individualised risk scores of disability worsening and stratify patients into subgroups with different prognosis.
Clinical data and MRI scans from published randomised clinical trials in patients with relapsing-remitting and progressive MS were divided into training (n=5,483) and external validation data sets (n=2,668). We processed brain MRI scans to obtain 18 measures for lobar grey matter, deep grey matter and lesion volumes, and T1-/T2-weighted ratio of the normal-appearing white matter regions. We developed a machine learning model, called subpopulation risk stratification (SunRiSe), that combines multi-parametric clinical and MRI data to estimate individualised risk scores and stratify patients into subgroups on the basis of this risk; in particular, we entered MRI measures, the Expanded Disability Status Scale, age and gender to generate risk scores of disability worsening (i.e., the time to confirmed disability worsening). Based on SunRiSe risk scores, high-, medium-, and low-risk subpopulations were defined at study entry. We assessed whether selecting patients at high risk of disability worsening reduces sample size compared to when all risk groups were sampled together.
In both the training and external validation data sets, SunRiSe-stratified patients in three groups associated with different levels of risk of disability worsening. In the external validation data set, patients at high risk were mainly progressive MS and had more disability events compared to those at medium-risk (hazard ratio [HR]=1.34, p<0.0001) and low-risk (HR=1.51, p<0.0001). At study entry, male gender, older age, higher lesion load, higher disability, lower lobar cortical grey matter, lower normal-appearing white matter T1/T2 ratio and lower deep grey matter volumes, were the most important variables in defining the SunRiSe risk score.
The inclusion of patients predicted to be at high risk, reduced (i) duration of an event-driven trial by an average of 4.5 months (±2.1 months); (ii) the number of participants in a randomised trial by approximately 200, with 80% statistical power to detect a 30% treatment effect.
Machine learning provides a personalised risk score that can identify patients who have the greatest risk of disability worsening and therefore should be treated with the most effective medications and monitored more closely. Risk stratification allows the enrichment of clinical trials with patients more likely to worsen, and thereby reduces trial duration and sample size.
Competing Interest Statement
The full disclosure statement can be found at the end of the manuscript.
Funding Statement
This investigation was supported (in part) by (an) award(s) from the International Progressive MS Alliance, award reference number PA-1412-02420.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The Institutional Review Board at the Montreal Neurological Institute (MNI), Quebec, Canada, approved this study (Reference number: IRB00010120) under the auspices of the International Progressive MS Alliance.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
The data sets are controlled by pharmaceutical companies. Requests to access data can be forwarded to data controllers listed in our previous publication17. Processed CSV files can be from the corresponding author by any qualified investigator for reproducing the results of this study.