Comparing natural language processing representations of disease sequences for prediction in the electronic healthcare record
View ORCID ProfileThomas Beaney, Sneha Jha, Asem Alaa, Alexander Smith, View ORCID ProfileJonathan Clarke, Thomas Woodcock, View ORCID ProfileAzeem Majeed, Paul Aylin, View ORCID ProfileMauricio Barahona
doi: https://doi.org/10.1101/2023.11.16.23298640
Thomas Beaney
1Department of Primary Care and Public Health, Imperial College London, London, W6 8RP, United Kingdom
2Centre for Mathematics of Precision Healthcare, Department of Mathematics, Imperial College London, London, SW7 2AZ, United Kingdom
Sneha Jha
2Centre for Mathematics of Precision Healthcare, Department of Mathematics, Imperial College London, London, SW7 2AZ, United Kingdom
Asem Alaa
2Centre for Mathematics of Precision Healthcare, Department of Mathematics, Imperial College London, London, SW7 2AZ, United Kingdom
Alexander Smith
3Department of Epidemiology and Biostatistics, Imperial College London, London, W2 1PG, United Kingdom
Jonathan Clarke
2Centre for Mathematics of Precision Healthcare, Department of Mathematics, Imperial College London, London, SW7 2AZ, United Kingdom
Thomas Woodcock
1Department of Primary Care and Public Health, Imperial College London, London, W6 8RP, United Kingdom
Azeem Majeed
1Department of Primary Care and Public Health, Imperial College London, London, W6 8RP, United Kingdom
Paul Aylin
1Department of Primary Care and Public Health, Imperial College London, London, W6 8RP, United Kingdom
Mauricio Barahona
2Centre for Mathematics of Precision Healthcare, Department of Mathematics, Imperial College London, London, SW7 2AZ, United Kingdom
Data Availability
This study uses patient data which is not publicly available but can be requested for users meeting certain requirements: https://cprd.com/research-applications. Codes, including the Medcode to disease mapping are available from https://tbeaney.github.io/MMclustering/
Posted November 17, 2023.
Comparing natural language processing representations of disease sequences for prediction in the electronic healthcare record
Thomas Beaney, Sneha Jha, Asem Alaa, Alexander Smith, Jonathan Clarke, Thomas Woodcock, Azeem Majeed, Paul Aylin, Mauricio Barahona
medRxiv 2023.11.16.23298640; doi: https://doi.org/10.1101/2023.11.16.23298640
Comparing natural language processing representations of disease sequences for prediction in the electronic healthcare record
Thomas Beaney, Sneha Jha, Asem Alaa, Alexander Smith, Jonathan Clarke, Thomas Woodcock, Azeem Majeed, Paul Aylin, Mauricio Barahona
medRxiv 2023.11.16.23298640; doi: https://doi.org/10.1101/2023.11.16.23298640
Subject Area
Subject Areas
- Addiction Medicine (386)
- Allergy and Immunology (701)
- Anesthesia (193)
- Cardiovascular Medicine (2859)
- Dermatology (244)
- Emergency Medicine (431)
- Epidemiology (12569)
- Forensic Medicine (10)
- Gastroenterology (807)
- Genetic and Genomic Medicine (4447)
- Geriatric Medicine (402)
- Health Economics (716)
- Health Informatics (2856)
- Health Policy (1050)
- Hematology (376)
- HIV/AIDS (893)
- Medical Education (415)
- Medical Ethics (114)
- Nephrology (464)
- Neurology (4201)
- Nursing (223)
- Nutrition (617)
- Oncology (2205)
- Ophthalmology (626)
- Orthopedics (254)
- Otolaryngology (319)
- Pain Medicine (269)
- Palliative Medicine (83)
- Pathology (488)
- Pediatrics (1172)
- Primary Care Research (483)
- Public and Global Health (6787)
- Radiology and Imaging (1494)
- Respiratory Medicine (902)
- Rheumatology (430)
- Sports Medicine (369)
- Surgery (473)
- Toxicology (57)
- Transplantation (202)
- Urology (174)