PT - JOURNAL ARTICLE AU - Horgan, Kevin AU - McDermott, Michael F. AU - Harrington, Douglas AU - Simonyan, Vahan AU - Lilley, Patrick TI - AI based on evolutionary computation yields algorithmic biomarker summary of a randomized rheumatoid arthritis clinical trial, accurately predicting individual patient outcomes, enabling precision medicine AID - 10.1101/2024.01.29.24301910 DP - 2024 Jan 01 TA - medRxiv PG - 2024.01.29.24301910 4099 - http://medrxiv.org/content/early/2024/01/30/2024.01.29.24301910.short 4100 - http://medrxiv.org/content/early/2024/01/30/2024.01.29.24301910.full AB - Background Producing transparent interpretable algorithms summarizing clinical trial outcomes to accurately predict individual patient’s responses would be a significant advance. We hypothesized that software designed to analyze biomedical data, based on evolutionary computation (EC), could produce summary algorithmic biomarkers from a clinical trial, predictive of individual responses to therapy.Methods and Findings A previously published randomized double-blind placebo controlled clinical trial was analyzed. Patients with active rheumatoid arthritis on a stable dose of methotrexate and naive to anti-tumor necrosis factor biologic therapy, were randomized to receive infliximab or placebo. The primary endpoint was synovial disease activity assessed by magnetic resonance imaging. Secondary endpoints included the Disease Activity Score 28 (DAS28). Baseline peripheral blood gene expression variable data were available for 59 patients, plus the treatment variable, infliximab or placebo, yielding a total of 52,379 baseline variables. The binary dependent variable for analysis was DAS28 response, defined by a decrease in DAS28 score of ≥1.2, at 14 weeks. At 14 weeks, 20 of the 30 patients receiving infliximab had responded, and ten of the 29 patients receiving placebo had responded. The software derived an algorithm, with 4 gene expression variables plus treatment assignment and 12 mathematical operations, that correctly predicted responders versus non-responders for all 59 patients with available gene expression data, giving 100% accuracy, 100% sensitivity and 100% specificity. We present the algorithm to provide transparency and to enable verification. Excluding the 4 gene expression variables, we then derived similarly predictive algorithms with 4 other gene expression variables. We hypothesized that the software could derive algorithms as predictors of treatment response to anti-tumor necrosis factor biologic therapy using just these 8 gene expression variables using previously published independent datasets from 6 rheumatoid arthritis studies. In each validation analysis the accuracy of the predictors we derived surpassed those previously reported by the original study authors.Conclusions and Relevance Software based on EC summarized the outcome of a clinical trial, with transparent biomarker algorithms correctly predicted the clinical outcome for all 59 RA patients. The biomarker variables were validated in 6 independent RA cohorts. This approach simplifies and expedites the development of algorithmic biomarkers accurately predicting individual treatment response, thereby enabling the deployment of precision medicine, and, in the future, providing a basis for dynamic labeling of prescription drugs. Original Trial Registration used for analysis: ClinicalTrials.gov registration: NCT01313520Competing Interest StatementCompeting interests: In accordance with the journal?s policy we report that the authors of this manuscript have the following competing interests: - PL is the Chief Executive Officer, a founder and employee of Liquid Biosciences. - PL is a Director of Ignite Biomedical. - KH is an unpaid advisor to Liquid Biosciences. - PL and KH own stock in Liquid Biosciences. - MMcD, DH and VS have no competing interests. - There are no additional declarations from the authors relevant to this research relating to employment, consultancy, products in development, patents, or revenues from marketed products to declare. - Ignite Biomedical is developing a predictive test based on the biomarkers reported in the manuscript.Clinical TrialN/A because all the data used were de-identified and publicly available in prior publications.Funding StatementYesAuthor DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:Not applicable because all the data used were de-identified and publicly available, neither ethics committee approval nor informed consent were required.I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesThe authors confirm that the data underlying the findings are fully available without restriction from the Gene Expression Omnibus archive: GEO accession GSE58795, GSE5392, GSE12051, GSE15258, GSE33377, GSE78068 and GSE20690. Other relevant data are in the paper and Supporting Information files. We have also made the pivotal discovery algorithm in the manuscript available in different formats and also the provided the data for the 4 gene expression variables it contains available in the Supporting Information Files to facilitate validation.ACRAmerican College of RheumatologyCDAIclinical disease activity indexCRPC-reactive proteinDAS28disease activity score 28DLDAdiagonal linear discriminant analysisDMARDdisease modifying anti-rheumatic drugsDQDAdiagonal quadratic discriminant analysisECevolutionary computationEULAREuropean League Against RheumatismGEOgene expression omnibus;LOOleave one outLDAlinear discriminant analysisMTXmethotrexateOLSorthogonal least squaresRArheumatoid arthritisRFrandom forestSVMsupport vector machineTNFtumor necrosis factorUHCunsupervised hierarchical clustering