Abstract
This study investigates the potential of multimodal data for prostate cancer (PCa) risk prediction using the All of Us (AoU) research program dataset. By integrating polygenic risk scores (PRSs) with diverse clinical, survey, and genomic data, we developed a model that identifies established PCa risk factors, such as age and family history, and a novel factor: recent healthcare visits are linked to reduced risk. The model’s performance, notably the false positive rate, is improved compared to traditional methods, despite the lack of Prostate-Specific Antigen (PSA) data. The findings demonstrate that incorporating comprehensive multimodal data from AoU can enhance PCa risk prediction and provide a robust framework for future clinical applications.
Code Available https://github.com/ashlew23/pc_multimodal
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This work was supported in part by the Blavatnik Fellowship.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
StanforD University has signed a DUA with All of Us Research Program allowing all researchers who complete the necessary steps to utilize the data (https://www.researchallofus.org/institutional-agreements/). The Registered Tier and Controlled Tier data available on the Research Hub contains data from participants who have consented to be involved in the All of Us Research Program, including data from electronic health records (EHRs), surveys, and physical measurements. All data available to researchers has had direct identifiers removed and has been further modified to minimize re-identification risks. This includes removing all explicit identifiers in both EHRs and participant provided information, all free-text fields, geolocation data smaller than U.S. state level, living situations, race and ethnicity subcategories, active duty military status, cause of death, and diagnosis codes subject to public knowledge. Additionally, the following demographic fields are generalized: race and ethnicity, education, employment, and information regarding sex at birth, gender identity, and sexual orientation. Also, all dates are systematically shifted backwards by a random number between 1 and 365, and data from participants over the age of 89 are removed. The All of Us Research Program data will be accessed for research strictly using the Researcher Workbench (researchallofus.org). External data can be brought into this secure environment; however, researchers are restricted from importing any individually identifiable information and from row-level linkage of the external data. Data searches, cohort building, and analysis will solely take place on the Researcher Workbench, a secure cloud-based resource with statistical analysis software available for use with All of Us data. Researchers are granted access to the Researcher Workbench after their affiliated institution signs a Data Use and Registration Agreement, and they create an account, including setting up two-factor authentication, verify their identity through Login.gov or ID.me, complete the All of Us Responsible Conduct of Research training, and sign a Data User Code of Conduct, which prohibits any re-identification of All of Us participants. For more information, please visit researchallofus.org.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
Email: alewis23{at}stanford.edu, yashk{at}live.com, boussard{at}stanford.edu, jbrooks1{at}stanford.edu
↵* Shared senior authorship
Typographical error in the number of cases and controls corrected; Information on data source added; Discussion updated to enhance clarity; Author affiliations updated
Data Availability
All data produced in the present study are available upon reasonable request to the authors