PT - JOURNAL ARTICLE AU - de la Oliva, Víctor AU - Esteban-Medina, Alberto AU - Alejos, Laura AU - Muñoyerro-Muñiz, Dolores AU - Villegas, Román AU - Dopazo, Joaquín AU - Loucera, Carlos TI - Early prediction of ovarian cancer risk based on real world data AID - 10.1101/2024.07.26.24310994 DP - 2024 Jan 01 TA - medRxiv PG - 2024.07.26.24310994 4099 - http://medrxiv.org/content/early/2024/07/27/2024.07.26.24310994.short 4100 - http://medrxiv.org/content/early/2024/07/27/2024.07.26.24310994.full AB - This study presents the development of an early prediction model for high-grade serous ovarian cancer (HGSOC) using real-world data from the Andalusian Health Population Database (BPS), containing electronic health records (EHR) of over 15 million patients. Leveraging the extensive data availability, the model aims to identify individuals at high risk of HGSOC without the need for specific tumor markers or prior stratification into risk groups. Utilizing an Explainable Boosting Machine (EBM) algorithm, the model incorporates diverse clinical variables including demographics, chronic diseases, symptoms, blood test results, and healthcare utilization patterns. The model was trained and validated using a total of 3,088 HGSOC patients diagnosed between 2018 and 2022 along with 114,942 controls of similar characteristics, to emulate the prevalence of the disease, achieving a sensitivity of 0.65 and a specificity of 0.85. This study underscores the importance of using patient data from the general population, demonstrating that effective early detection models can be developed from routinely collected healthcare data. The approach addresses limitations of traditional screening methods by providing a cost-effective and broadly applicable tool for early cancer detection, potentially improving patient outcomes through timely interventions. The interpretability of the early prediction model also offers insights into the most significant predictors of cancer risk, further enhancing its utility in clinical settings.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis study was funded by AstraZeneca project "Retrospective observational study for the development of early predictors of high grade serous ovarian cancer" (ES-2021-3211) and is also supported by grants PID2020-117979RB-I00 from the Spanish Ministry of Science and Innovation and grant IE19_259 FPS from Consejeria de Salud y Consumo, Junta de Andalucia.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The Ethics Committee for the Coordination of Biomedical Research in Andalusia granted approval for the study titled "Retrospective observational study for the development of early predictor of ovarian cancer" (29th March, 2022, Acta 03/22) and waived informed consent for the secondary use of clinical data for research purposesI confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.Yes