Abstract
Background Producing transparent interpretable algorithms summarizing clinical trial outcomes to accurately predict individual patient’s responses would be a significant advance. We hypothesized that software designed to analyze biomedical data, based on evolutionary computation (EC), could produce summary algorithmic biomarkers from a clinical trial, predictive of individual responses to therapy.
Methods and Findings A previously published randomized double-blind placebo controlled clinical trial was analyzed. Patients with active rheumatoid arthritis on a stable dose of methotrexate and naive to anti-tumor necrosis factor biologic therapy, were randomized to receive infliximab or placebo. The primary endpoint was synovial disease activity assessed by magnetic resonance imaging. Secondary endpoints included the Disease Activity Score 28 (DAS28). Baseline peripheral blood gene expression variable data were available for 59 patients, plus the treatment variable, infliximab or placebo, yielding a total of 52,379 baseline variables. The binary dependent variable for analysis was DAS28 response, defined by a decrease in DAS28 score of ≥1.2, at 14 weeks. At 14 weeks, 20 of the 30 patients receiving infliximab had responded, and ten of the 29 patients receiving placebo had responded. The software derived an algorithm, with 4 gene expression variables plus treatment assignment and 12 mathematical operations, that correctly predicted responders versus non-responders for all 59 patients with available gene expression data, giving 100% accuracy, 100% sensitivity and 100% specificity. We present the algorithm to provide transparency and to enable verification. Excluding the 4 gene expression variables, we then derived similarly predictive algorithms with 4 other gene expression variables. We hypothesized that the software could derive algorithms as predictors of treatment response to anti-tumor necrosis factor biologic therapy using just these 8 gene expression variables using previously published independent datasets from 6 rheumatoid arthritis studies. In each validation analysis the accuracy of the predictors we derived surpassed those previously reported by the original study authors.
Conclusions and Relevance Software based on EC summarized the outcome of a clinical trial, with transparent biomarker algorithms correctly predicted the clinical outcome for all 59 RA patients. The biomarker variables were validated in 6 independent RA cohorts. This approach simplifies and expedites the development of algorithmic biomarkers accurately predicting individual treatment response, thereby enabling the deployment of precision medicine, and, in the future, providing a basis for dynamic labeling of prescription drugs. Original Trial Registration used for analysis: ClinicalTrials.gov registration: NCT01313520
Competing Interest Statement
Competing interests: In accordance with the journal?s policy we report that the authors of this manuscript have the following competing interests: - PL is the Chief Executive Officer, a founder and employee of Liquid Biosciences. - PL is a Director of Ignite Biomedical. - KH is an unpaid advisor to Liquid Biosciences. - PL and KH own stock in Liquid Biosciences. - MMcD, DH and VS have no competing interests. - There are no additional declarations from the authors relevant to this research relating to employment, consultancy, products in development, patents, or revenues from marketed products to declare. - Ignite Biomedical is developing a predictive test based on the biomarkers reported in the manuscript.
Clinical Trial
N/A because all the data used were de-identified and publicly available in prior publications.
Funding Statement
Yes
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Not applicable because all the data used were de-identified and publicly available, neither ethics committee approval nor informed consent were required.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
The authors confirm that the data underlying the findings are fully available without restriction from the Gene Expression Omnibus archive: GEO accession GSE58795, GSE5392, GSE12051, GSE15258, GSE33377, GSE78068 and GSE20690. Other relevant data are in the paper and Supporting Information files. We have also made the pivotal discovery algorithm in the manuscript available in different formats and also the provided the data for the 4 gene expression variables it contains available in the Supporting Information Files to facilitate validation.
Abbreviations
- ACR
- American College of Rheumatology
- CDAI
- clinical disease activity index
- CRP
- C-reactive protein
- DAS28
- disease activity score 28
- DLDA
- diagonal linear discriminant analysis
- DMARD
- disease modifying anti-rheumatic drugs
- DQDA
- diagonal quadratic discriminant analysis
- EC
- evolutionary computation
- EULAR
- European League Against Rheumatism
- GEO
- gene expression omnibus;
- LOO
- leave one out
- LDA
- linear discriminant analysis
- MTX
- methotrexate
- OLS
- orthogonal least squares
- RA
- rheumatoid arthritis
- RF
- random forest
- SVM
- support vector machine
- TNF
- tumor necrosis factor
- UHC
- unsupervised hierarchical clustering