PT - JOURNAL ARTICLE AU - Windisch, Paul AU - Dennstädt, Fabio AU - Koechli, Carole AU - Förster, Robert AU - Schröder, Christina AU - Aebersold, Daniel M. AU - Zwahlen, Daniel R. TI - Metastatic vs. Localized Disease As Inclusion Criteria That Can Be Automatically Extracted From Randomized Controlled Trials Using Natural Language Processing AID - 10.1101/2024.06.17.24309020 DP - 2024 Jan 01 TA - medRxiv PG - 2024.06.17.24309020 4099 - http://medrxiv.org/content/early/2024/06/17/2024.06.17.24309020.short 4100 - http://medrxiv.org/content/early/2024/06/17/2024.06.17.24309020.full AB - Background Extracting inclusion and exclusion criteria in a structured, automated fashion remains a challenge to developing better search functionalities or automating systematic reviews of randomized controlled trials in oncology. The question “Did this trial enroll patients with localized disease, metastatic disease, or both?” could be used to narrow down the number of potentially relevant trials when conducting a search.Methods 600 trials from high-impact medical journals were classified depending on whether they allowed for the inclusion of patients with localized and/or metastatic disease. 500 trials were used to develop and validate three different models with 100 trials being stored away for testing.Results On the test set, a rule-based system using regular expressions achieved an F1-score of 0.72 (95% CI: 0.64 - 0.81) for the prediction of whether the trial allowed for the inclusion of patients with localized disease and 0.77 (95% CI: 0.69 - 0.85) for metastatic disease. A transformer-based machine learning model achieved F1 scores of 0.97 (95% CI: 0.93 - 1.00) and 0.88 (95% CI: 0.82 - 0.94), respectively. The best performance was achieved by a combined approach where the rule-based system was allowed to overrule the machine learning model with F1 scores of 0.97 (95% CI: 0.94 - 1.00) and 0.89 (95% CI: 0.83 - 0.95), respectively.Conclusion Automatic classification of cancer trials with regard to the inclusion of patients with localized and or metastatic disease is feasible. Turning the extraction of trial criteria into classification problems could, in selected cases, improve text-mining approaches in evidence-based medicine.Competing Interest StatementP.W. has a patent application titled "Method for detection of neurological abnormalities" outside of the submitted work. The remaining authors declare no conflict of interest.Funding StatementNo funding was received for this project. Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The study only used results from research published as journal articles.I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesData is currently available online at https://github.com/windisch-paul/metastatic_vs_local. Submission to a repository (Dryad) will be initiated after the submission of the preprint. https://github.com/windisch-paul/metastatic_vs_local