Systematic benchmarking demonstrates large language models have not reached the diagnostic accuracy of traditional rare-disease decision support tools
View ORCID ProfileJustin T Reese, View ORCID ProfileLeonardo Chimirri, View ORCID ProfileYasemin Bridges, View ORCID ProfileDaniel Danis, View ORCID ProfileJ Harry Caufield, View ORCID ProfileKyran Wissink, View ORCID ProfileJulie A McMurry, View ORCID ProfileAdam SL Graefe, View ORCID ProfileElena Casiraghi, View ORCID ProfileGiorgio Valentini, View ORCID ProfileJulius OB Jacobsen, View ORCID ProfileMelissa Haendel, View ORCID ProfileDamian Smedley, View ORCID ProfileChristopher J Mungall, View ORCID ProfilePeter N Robinson
doi: https://doi.org/10.1101/2024.07.22.24310816
Justin T Reese
1Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
2Monarch Initiative
Leonardo Chimirri
2Monarch Initiative
3Berlin Institute of Health at Charite Universitaetsmedizin Berlin, Berlin, Germany
Yasemin Bridges
2Monarch Initiative
4William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK
Daniel Danis
2Monarch Initiative
3Berlin Institute of Health at Charite Universitaetsmedizin Berlin, Berlin, Germany
J Harry Caufield
2Monarch Initiative
5University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Kyran Wissink
3Berlin Institute of Health at Charite Universitaetsmedizin Berlin, Berlin, Germany
Julie A McMurry
2Monarch Initiative
5University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Adam SL Graefe
3Berlin Institute of Health at Charite Universitaetsmedizin Berlin, Berlin, Germany
Elena Casiraghi
6Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
7AnacletoLab, Dipartimento di Informatica, Università degli Studi di Milano, Milano, Italy
8ELLIS-European Laboratory for Learning and Intelligent Systems
Giorgio Valentini
7AnacletoLab, Dipartimento di Informatica, Università degli Studi di Milano, Milano, Italy
8ELLIS-European Laboratory for Learning and Intelligent Systems
Julius OB Jacobsen
2Monarch Initiative
4William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK
Melissa Haendel
2Monarch Initiative
5University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Damian Smedley
2Monarch Initiative
4William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK
Christopher J Mungall
2Monarch Initiative
6Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
Peter N Robinson
2Monarch Initiative
3Berlin Institute of Health at Charite Universitaetsmedizin Berlin, Berlin, Germany
8ELLIS-European Laboratory for Learning and Intelligent Systems
9The Jackson Institute for Genomic Medicine, 10 Discovery Drive, Farmington CT 06032, USA
Data Availability
All data produced are available online on Zenodo at: https://zenodo.org/records/14008477.
Posted November 07, 2024.
Systematic benchmarking demonstrates large language models have not reached the diagnostic accuracy of traditional rare-disease decision support tools
Justin T Reese, Leonardo Chimirri, Yasemin Bridges, Daniel Danis, J Harry Caufield, Kyran Wissink, Julie A McMurry, Adam SL Graefe, Elena Casiraghi, Giorgio Valentini, Julius OB Jacobsen, Melissa Haendel, Damian Smedley, Christopher J Mungall, Peter N Robinson
medRxiv 2024.07.22.24310816; doi: https://doi.org/10.1101/2024.07.22.24310816
Systematic benchmarking demonstrates large language models have not reached the diagnostic accuracy of traditional rare-disease decision support tools
Justin T Reese, Leonardo Chimirri, Yasemin Bridges, Daniel Danis, J Harry Caufield, Kyran Wissink, Julie A McMurry, Adam SL Graefe, Elena Casiraghi, Giorgio Valentini, Julius OB Jacobsen, Melissa Haendel, Damian Smedley, Christopher J Mungall, Peter N Robinson
medRxiv 2024.07.22.24310816; doi: https://doi.org/10.1101/2024.07.22.24310816
Subject Area
Subject Areas
- Addiction Medicine (399)
- Allergy and Immunology (710)
- Anesthesia (201)
- Cardiovascular Medicine (2947)
- Dermatology (249)
- Emergency Medicine (440)
- Epidemiology (12753)
- Forensic Medicine (12)
- Gastroenterology (828)
- Genetic and Genomic Medicine (4587)
- Geriatric Medicine (419)
- Health Economics (729)
- Health Informatics (2918)
- Health Policy (1069)
- Hematology (389)
- HIV/AIDS (924)
- Medical Education (426)
- Medical Ethics (115)
- Nephrology (469)
- Neurology (4362)
- Nursing (236)
- Nutrition (639)
- Oncology (2273)
- Ophthalmology (647)
- Orthopedics (258)
- Otolaryngology (325)
- Pain Medicine (279)
- Palliative Medicine (83)
- Pathology (501)
- Pediatrics (1197)
- Primary Care Research (496)
- Public and Global Health (6944)
- Radiology and Imaging (1529)
- Respiratory Medicine (915)
- Rheumatology (438)
- Sports Medicine (385)
- Surgery (489)
- Toxicology (60)
- Transplantation (212)
- Urology (181)