Abstract
Precision medicine promises significant health benefits but faces challenges such as the need for complex data management and analytics, interdisciplinary collaboration, and education of researchers, healthcare professionals, and participants. Addressing these needs requires the integration of computational experts, engineers, designers, and healthcare professionals to develop user-friendly systems and shared terminologies. The widespread adoption of large language models (LLMs) like GPT-4 and Claude 3 highlights the importance of making complex data accessible to non-specialists. The Stanford Data Ocean (SDO) strives to mitigate these challenges through a scalable, cloud-based platform that supports data management for various data types, advanced research, and personalized learning in precision medicine. SDO provides AI tutors and AI-powered data visualization tools to enhance educational and research outcomes and make data analysis accessible for users from diverse educational backgrounds. By extending engagement and cutting-edge research capabilities globally, SDO particularly benefits economically disadvantaged and historically marginalized communities, fostering interdisciplinary biomedical research and bridging the gap between education and practical application in the biomedical field.
Competing Interest Statement
MPS is a cofounder and scientific advisor of Personalis, SensOmics, Qbio, January AI, Fodsel, Filtricine, Protos, RTHM, Iollo, Marble Therapeutics, Crosshair Therapeutics, NextThought, and Mirvie. He is also a scientific advisor of Jupiter, Neuvivo, Swaza, Mitrix, Yuvan, TranscribeGlass, and Applied Cognition. PS is currently an employee of Amazon Web Services. The other authors declare no competing interests.
Funding Statement
This work was supported by NIH grants (5R01NR020105, U54HG012723, S10OD025212, U01HG007611, U54HG006996, U54DK102556) and gifts from the departmental funding from the Stanford Genetics department. SMS-FR was supported by the National Institutes of Health (NIH) Grant K08 ES028825. We acknowledge Amazon Web Services, Microsoft Azure and OpenAI for this research. This research also received support from Stanford Institute for Human-Centered Artificial Intelligence (HAI) and Stanford Accelerator for Learning. We acknowledge the Stanford Genetics Bioinformatics Service Center for providing this research's gateway to the SCG cluster, Google Cloud Platform, and Amazon Web Services. We gratefully acknowledge Conectado Inc. for their crucial support in the outreach and recruitment of underserved students for our research.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
At the outset of the project, the research team consulted with institutional experts ensure that systems and processes utilizing the Stanford Data Ocean (SDO) reflected lifecycle planning and oversight of data, especially data derived from human beings and sensitive categories of health data. Foremost among these were Stanford University's IRB and Research Data Governance and Privacy Director based within the Vice Provost and Dean of Research. These authorities agreed that SDO's systems and processes enable IRB oversight of data derived from human beings on a per-project basis. Studies that involve human subjects research or sensitive data must show that they have received IRB approval. In addition, the SDO's anticipated uses of data was reviewed and approved by Stanford University's Privacy Office through its Data Risk Assessment process (DRA #2622).
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
The datasets supporting the findings of Figures 5 and 6 are publicly accessible, as detailed in the “Data Availability” sections of the respective source papers. For the Stanford Data Ocean’s (SDO) learners’ pre- and post-surveys, as well as the AI Tutor questions, all personally identifiable information has been removed to ensure privacy and confidentiality. This includes the deletion of email addresses, first and last names, and any other information that could be used to identify individual participants.