Abstract
Background With the advent of large language models (LLM), such as ChatGPT, natural language processing (NLP) is revolutionizing healthcare. We systematically reviewed NLP’s role in rheumatology and assessed its impact on diagnostics, disease monitoring, and treatment strategies.
Methods Following PRISMA guidelines, we conducted a systematic search to identify original research articles exploring NLP applications in rheumatology. This search was performed in PubMed, Embase, Web of Science, and Scopus until January 2024.
Results Our search produced 17 studies that showcased diverse applications of NLP in rheumatology, addressing disease diagnosis, data handling, and monitoring.
Notably, GPT-4 demonstrated strong performance in diagnosing and managing rheumatic diseases. Performance metrics indicated high accuracy and reliability in various tasks. However, challenges like data dependency and limited generalizability were noted.
Conclusion NLP, and especially LLM, show promise in advancing rheumatology practice, enhancing diagnostic precision, data handling, and patient care. Future research should address current limitations, focusing on data integrity and model generalizability.
Introduction
The integration of artificial intelligence (AI) in medicine is revolutionizing healthcare (1–3). Central to this AI revolution in healthcare are Natural Language Processing (NLP) methodologies and the use of generative Large Language Models (LLM) (4–6), marked by the introduction of ChatGPT at the end of 2022. These advanced technologies have demonstrated remarkable capability in interpreting and analyzing clinical data in a human-like manner (6,7).
This technological evolution holds particular promise for rheumatology—a field grappling with a diverse range of disorders characterized by significant variability in organs involved, symptoms, treatment responses, and prognosis, which complicates patient management (8–10). NLP, known for its capability to process unstructured clinical data, is gaining recognition in rheumatology as a valuable tool, enhancing both patient care and research methodologies (11–14).
Our study aims to systematically review NLP and LLM contributions to the field of Rheumatology. This effort seeks not only to inform healthcare professionals about the benefits of NLP but also to pave the way for future research.
Methods
Search Strategy
This systematic review was registered with the International Prospective Register of Systematic Reviews - PROSPERO (Registration code CRD42024509490). We adhered to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines (15,16).
A systematic search was conducted across PubMed, Embase, Web of Science, and Scopus, up until January 2024. We complemented the search via reference screening for any additional papers. We aimed to identify original research articles that investigated the application of NLP and LLM in the diagnosis or prediction of rheumatological diseases.
The search utilized a combination of keywords including “Natural Language Processing”, “NLP”, “Large Language Models”, “LLMs”, “Artificial Intelligence Models”, “AI Models”, “Rheumatology”, “Rheumatologic Diseases”, “Rheumatoid Arthritis”, “Systemic Lupus Erythematosus”, “Sjogren’s Syndrome”, “Scleroderma”, “Polymyositis”, “Dermatomyositis”, “Ankylosing Spondylitis”, “Psoriatic Arthritis”, “Gout”, “Osteoarthritis”, “Data Analysis”, “Predictive Modeling”, “Pattern Recognition”, “Text Mining”, “Electronic Health Records”, “EHR Analysis”, “Diagnosis”, and “Prediction”.
Specific search strings for each database are detailed in the Supplementary Materials, tailored to PubMed, Embase, Scopus, and Web of Science. The search strings employed in each database varied slightly to optimize the retrieval of relevant articles, encompassing a broad spectrum of studies that intersect the fields of natural language processing, and various rheumatologic diseases.
Study Selection
We included original research articles that focused on the application of NLP and LLM in diagnosing, classifying, or predicting rheumatic diseases. Studies were selected if they provided data for assessing the performance metrics of AI models, such as area under the curve, accuracy, sensitivity, and specificity. We excluded review papers, case reports, conference abstracts, editorials, preprints, and studies not conducted in English. We also excluded studies employing AI techniques unrelated to NLP.
Data Extraction
Two independent reviewers extracted relevant information using a standardized form. Data points included year of publication, study design, sample size, specific conversational NLP techniques used, dataset details for model training and validation, performance metrics, and key findings. Discrepancies between reviewers were resolved through discussion, and a third reviewer was consulted when necessary.
Risk of Bias
To evaluate the quality and robustness of the methodologies in the included studies, the Quality Assessment Tool for Observational Cohort and Cross-Sectional Studies tool was used (17).
Results
Search Results and Study Selection
The process of study selection and the screening methodology are detailed in the PRISMA flow chart (Figure 1). Our search yielded a total of 691 articles, with a breakdown of 106 from PubMed, 213 from Embase, 246 from Scopus, and 126 from Web of Science. Following the removal of 402 duplicates, 289 articles remained for title and abstract screening. This process led to the exclusion of 226 articles, narrowing the field to 63 full-text articles for thorough evaluation.
Ultimately, 16 studies were chosen for inclusion based on their relevance and adherence to our criteria. One additional study was identified and included through reference screening (11–14,18–30). Therefore, the review culminated in a total of 17 studies. The included studies were published between 2010 and 2024 (Figure 1).
Quality assessment
We assessed the quality of the included 17 studies using the Quality Assessment Tool for Observational Cohort and Cross-Sectional Studies (17), excluding criteria not applicable to our study designs. Most studies clearly stated their research questions, defined study populations, and employed consistent selection criteria. However, a common gap was the lack of justification for sample sizes and power descriptions.
While exposure measures were typically clear and valid, not all studies measured exposure before the outcome, affecting their quality ratings. Additionally, the blinding of outcome assessors and adjustment for key confounding variables were frequently overlooked. Despite these issues, the overall quality of the studies varied from ‘Fair’ to ‘Good’, with only one study rated as ‘Poor’ (Table 1).
Summary of the included studies
The included 17 studies highlight the evolving application of NLP and language models like ChatGPT and BERT in rheumatology (Table 2). These studies, encompassing sample sizes ranging from a few hundred to over 34 million clinical notes, underscore the role of NLP in various rheumatologic conditions, including axial spondyloarthritis, psoriatic arthritis, rheumatoid arthritis, gout, and systemic sclerosis. The clinical tasks addressed were diverse, covering disease activity assessment, condition identification, and diagnostic accuracy enhancements (Table 3). Importantly, the studies primarily focused on the use of NLP and language models, often supplementing them with tabular machine learning techniques to improve outcomes (Figure 2). Performance metrics, such as precision, recall, F1 scores, and area under the curve, generally pointed to a high level of accuracy and reliability in the application of NLP and language models (Tables 3-4).
NLP tasks in Rheumatology Research
Disease Diagnosis and Classification
Significant strides in rheumatology diagnosis and classification are evident through studies utilizing NLP. Krusche et al. (2024) employed GPT-4 in diagnosing inflammatory rheumatic diseases, achieving a correct top diagnosis in 35% of cases (13). In a similar vein, van Leeuwen et al. (2024) effectively identified ANCA- associated vasculitis using an AI tool with NLP, showing a sensitivity range of 96.3% to 98.0% (11). Additionally, Love et al. (2011) improved the accuracy of psoriatic arthritis diagnoses in EMRs using NLP, achieving a PPV of 93% (26). These studies demonstrate the precision and reliability of NLP in disease identification and classification. Zhao et al. (2020) also contributed to this area by improving the identification of axial spondyloarthritis, sacroiliitis, and HLA-B27 positive patients using EHRs, with their unsupervised algorithm achieving a sensitivity of 78% and specificity of 94% (20). Humbert-Droz et al. (2023) exemplified the use of NLP in analyzing large-scale clinical data, extracting Rheumatoid Arthritis outcomes from over 34 million notes with high sensitivity (95%) and PPV (87%) (23). Walsh et al. (2020) developed algorithms to identify Axial Spondyloarthritis with impressive accuracy. Their Spond NLP algorithm particularly stood out for its sensitivity and specificity, illustrating the potential of NLP in early disease detection and risk assessment (29).
Disease Activity Assessment and Management
NLP has shown promise in advancing disease activity assessment and management (Figure 3).
The SpAINET study by Benavent et al. (2023) utilized NLP to manage axial spondyloarthritis and psoriatic arthritis, with notable precision in disease activity assessment (Etanercept precision score of 1.000) (18). England et al. (2024) developed a NLP tool for extracting Forced Vital Capacity from EHRs, demonstrating high correlation (r = 0.94) with pulmonary function test values, thus underlining the potential of NLP in enhancing patient care through precise data analysis (25).
Predictive Modeling and Risk Assessment
Redd et al. (2014) utilized NLP to identify systemic sclerosis patients at risk for renal crisis, underscoring the role of NLP in predictive health analytics (30). Additionally, Lin et al. (2015) focused on identifying methotrexate-induced liver toxicity in patients with rheumatoid arthritis using NLP and machine learning classification algorithms (12). Their approach achieved a positive predictive value of 0.756, highlighting NLP’s capability to anticipate treatment-related complications, an essential aspect of patient safety and personalized care.
Discussion
Our systematic review reveals a potential for integration of NLP and LLMs in rheumatology. These models can improve diagnostics, disease monitoring, and treatment strategies across various rheumatic disorders.
NLP has the potential to transform rheumatology by enhancing patient care and research (9). With the rise of digital patient health records and advanced diagnostics, there’s a surge in patient data. AI methods, including NLP, machine learning, and deep learning, are pivotal in harnessing this data for predicting outcomes and guiding clinical decisions (1,8). In rheumatology, AI models have significantly improved the diagnosis of diseases like rheumatoid arthritis using various models (9,10,31). These models aid in screening, disease identification, patient phenotyping in EHRs, assessing treatment responses, and monitoring disease progression (31,32).
Additionally, AI contributes to risk assessment for comorbidities, drug discovery, and advancing basic science research, making it a powerful tool in modern rheumatology practice (8,9,31–34). Research has shown NLP has practical applications in rheumatology. Studies like Osborne et al.’s (2021) on identifying gout flares using NLP in emergency settings, and Krusche et al.’s (2024) work on employing GPT-4 for diagnosing inflammatory rheumatic diseases, showcase NLP’s potential in enhancing diagnostic accuracy (13,19).
The application of LLMs, such as GPT-4, appears to hold significant potential. In Krusche et al.’s study, GPT-4 not only demonstrated remarkable accuracy in diagnosing various rheumatic diseases, but also surpassed even expert rheumatologists in precision (achieving 60% accuracy in the top 3 diagnoses compared to 55% by rheumatologists) (13). Despite the study’s limited sample size, these findings indicate promise for LLMs in enhancing triage and diagnostic processes in routine clinical practice.
Similarly, Zhao et al.’s (2020) improvement in identifying axial spondyloarthritis patients and Humbert-Droz et al.’s (2023) extraction of rheumatoid arthritis outcomes from clinical notes demonstrate NLP’s effectiveness in disease classification and management (20,23).
Despite the benefits, the application of NLP in rheumatology comes with limitations. Challenges include data quality dependency, small sample sizes, and the need for wider generalizability in studies (9). Additionally, the complexity of rheumatic diseases necessitates sophisticated and adaptable models, which can accurately reflect diseases’ heterogeneity (8,9).
In conclusion, NLP, and especially LLM, show promise in advancing rheumatology practice, enhancing diagnostic precision, data handling, and patient care. Future research should address current limitations, focusing on data integrity and model generalizability.
Data Availability
All data produced in the present study are available upon reasonable request to the authors
Supplementary materials
The Boolean strings used for each database:
Pubmed: (“Natural Language Processing” OR “NLP” OR “Large Language Models” OR “LLMs” OR “Artificial Intelligence Models” OR “AI Models”) AND (“Rheumatology” OR “Rheumatologic Diseases” OR “Rheumatoid Arthritis” OR “Systemic Lupus Erythematosus” OR “Sjogren’s Syndrome” OR “Scleroderma” OR “Polymyositis” OR “Dermatomyositis” OR “Ankylosing Spondylitis” OR “Psoriatic Arthritis” OR “Gout” OR “Osteoarthritis”) AND (“Data Analysis” OR “Predictive Modeling” OR “Pattern Recognition” OR “Text Mining” OR “Electronic Health Records” OR “EHR Analysis” OR “Diagnosis” OR “Prediction”)
Embase: (’natural language processing’ OR ‘nlp’ OR ‘large language models’ OR ‘llms’ OR ‘artificial intelligence models’ OR ‘ai models’) AND (’rheumatology’ OR ‘rheumatologic diseases’ OR ‘rheumatoid arthritis’ OR ‘systemic lupus erythematosus’ OR ‘sjogren’s syndrome’ OR ‘scleroderma’ OR ‘polymyositis’ OR ‘dermatomyositis’ OR ‘ankylosing spondylitis’ OR ‘psoriatic arthritis’ OR ‘gout’ OR ‘osteoarthritis’) AND (’data analysis’ OR ‘predictive modeling’ OR ‘pattern recognition’ OR ‘text mining’ OR ‘electronic health records’ OR ‘ehr analysis’ OR ‘diagnosis’ OR ‘prediction’)
Scopus: (TITLE-ABS-KEY (“Natural Language Processing” OR “NLP” OR “Large Language Models” OR “LLMs” OR “AI” OR “Artificial Intelligence”) AND TITLE- ABS-KEY (“Rheumatology” OR “Rheumatoid Arthritis” OR “Systemic Lupus Erythematosus” OR “Sjogren's Syndrome” OR “Scleroderma” OR “Polymyositis” OR “Dermatomyositis” OR “Ankylosing Spondylitis” OR “Psoriatic Arthritis”) AND TITLE-ABS-KEY (“Machine Learning” OR “Predictive Models” OR “Text Mining” OR “EHR Analysis”))
Web of Science: (TS=(“Natural Language Processing” OR “NLP” OR “Large Language Models” OR “LLMs” OR “AI” OR “Artificial Intelligence”) AND TS=(“Rheumatology” OR “Rheumatoid Arthritis” OR “Systemic Lupus Erythematosus” OR “Sjogren’s Syndrome” OR “Scleroderma” OR “Polymyositis” OR “Dermatomyositis” OR “Ankylosing Spondylitis” OR “Psoriatic Arthritis”) AND TS=(“Machine Learning” OR “Predictive Models” OR “Text Mining” OR “EHR Analysis”))
Acknowledgment
none
Footnotes
Financial disclosure – none