PT - JOURNAL ARTICLE AU - Gao, Xin AU - Zhang, Meihui AU - Chen, Longfei AU - Qiu, Jun AU - Zhao, Shanbo AU - Li, Junjie AU - Hua, Tiantian AU - Jin, Ying AU - Wu, Zhiqiang AU - Hou, Haotian AU - Wang, Yunling AU - Zhao, Wei AU - Li, Yuxin AU - Duan, Yunyun AU - Ye, Chuyang AU - Liu, Yaou TI - Simple Words over Rich Imaging: Accurate Brain Disease Classification via Language Model Analysis of Radiological Reports AID - 10.1101/2024.11.13.24317214 DP - 2024 Jan 01 TA - medRxiv PG - 2024.11.13.24317214 4099 - http://medrxiv.org/content/early/2024/11/15/2024.11.13.24317214.short 4100 - http://medrxiv.org/content/early/2024/11/15/2024.11.13.24317214.full AB - Brain diseases exert profound detrimental effects on human health by affecting the central nervous system. Accurate automated diagnosis of brain diseases is imperative to delay the progression of illness and enhance long-term prognosis. However, existing image-based diagnostic approaches struggle to achieve satisfactory performance due to the high dimensionality of imaging data. Radiological reports, which are required in clinical routine to describe image findings, provide a more straightforward comprehension of the imaging data, yet they have been neglected in automated brain disease classification. In this work, we explore automated brain disease classification via radiological reports and language models and compare the results with conventional image-based methods. Specifically, in the report-based diagnostic approach, we fine-tune Pre-trained Language Models (PLMs) and Large Language Models (LLMs) based on the findings part of radiological reports to achieve disease classification. Four clinically relevant brain disease classification tasks were performed in our experiments, involving 12 datasets with a total number of 14,970 patients, including two independent validation sets. The best language model reached an average area under the receiver operating characteristic curve (AUC) of 84.75%, an average accuracy (ACC) of 79.48%, and an average F1-score of 79.45%. Compared with the best image-based model, it achieved an average improvement of 10.34%, 10.75%, and 9.95% in terms of AUC, ACC, and F1-score, respectively. The language model also outperformed junior radiologists by 9.47% in terms of ACC. Moreover, the report-based model exhibited better adaptability to missing image contrasts and cross-site data variability than image-based models. Together, these results show that brain disease classification via language model analysis of radiological reports can be more reliable than image-based classification, and our work demonstrates the potential of using radiological reports for accurate diagnosis of brain diseases.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis study was supported by the Beijing Municipal Natural Science Foundation (7242273 \& JQ20035), Fundamental Research Funds for the Central Universities (2022CX11008), Xiaomi Young Scholars Program, National Natural Science Foundation of China (81870958 \& 81571631), and Special Fund of the Pediatric Medical Coordinated Development Center of Beijing Hospitals Authority (XTYB201831).Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:IRB of Beiing Tiantan Hospital, Capital Medical University gave ethical approval for this work. The Ethics Approval Number: KY2022-078-04. The full project title : A study on Non-invasive Prediction of Molecular Pathological Classification and Clinical Prognosis of Brain Gliomas Based on Preoperative MRl. The data of approval :2022/09/14.I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesAll data produced in the present study are available upon reasonable request to the authors. All data produced in the present work are contained in the manuscript.