PT - JOURNAL ARTICLE AU - Jaskari, Joel AU - Sahlsten, Jaakko AU - Summanen, Paula AU - Moilanen, Jukka AU - Lehtola, Erika AU - Aho, Marjo AU - Säpyskä, Elina AU - Hietala, Kustaa AU - Kaski, Kimmo TI - DR-GPT: a large language model for medical report analysis of diabetic retinopathy patients AID - 10.1101/2024.01.12.24301230 DP - 2024 Jan 01 TA - medRxiv PG - 2024.01.12.24301230 4099 - http://medrxiv.org/content/early/2024/01/17/2024.01.12.24301230.short 4100 - http://medrxiv.org/content/early/2024/01/17/2024.01.12.24301230.full AB - Diabetic retinopathy (DR) is a sight-threatening condition caused by diabetes. Screening programmes for DR include eye examinations, where the patient’s fundi are photographed, and the findings, including DR severity, are recorded in the medical report. However, statistical analyses based on DR severity require structured labels that calls for laborious manual annotation process if the report format is unstructured. In this work, we propose a large language model DR-GPT for classification of the DR severity from unstructured medical reports. On a clinical set of medical reports, DR-GPT reaches 0.975 quadratic weighted Cohen’s kappa using truncated Early Treatment Diabetic Retinopathy Study scale. When DR-GPT annotations for unlabeled data are paired with corresponding fundus images, the additional data improves image classifier performance with statistical significance. Our analysis shows that large language models can be applied for unstructured medical report databases to classify diabetic retinopathy with a variety of applications.Competing Interest StatementThe authors have declared no competing interest.Funding StatementYesAuthor DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.Not ApplicableThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:This study is based on a retrospective and registry-based dataset, and as such does not involve experiments on humans and/or the use of human tissue samples and no patients were imaged for this study. Studies based on retrospective and registry-based dataset do not need ethical permission or informed consent from subjects according to the law of Finland (Medical Research Act (488/1999) and Act on Secondary Use of Health and Social Data (552/2019)) and according to European General Data Protection Regulation (GDPR) rules 216/679. The research permit was granted by the Helsinki University Hospital Chief Medical Officer (decision number 67/2020), Helsinki, Finland, July 1, 2020.I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.Not ApplicableI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).Not ApplicableI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.Not ApplicableData cannot be shared publicly because of the data protection law of Finland, the General Data Protection Regulation (GDPR) of European Union, and our research permission granted by Helsinki University Hospital that do not allow sharing of individual patients’ data. Data are available from Helsinki University Hospital for researchers who meet the criteria for access to confidential data.