Abstract
Osteoarthritis (OA) and rheumatoid arthritis (RA) are the two most common rheumatic diseases worldwide, causing pain and disability. Both conditions are highly heterogeneous, and their onset occurs insidiously with non-specific symptoms, so they are not always distinguishable from other arthritis during the initial stages. This makes early diagnosis difficult and resource-demanding in clinical environments. Here, we estimated its diagnostic performance in classifying ATR-FTIR spectra obtained from serum samples from OA patients, RA patients, and healthy controls. Altogether, 334 serum samples were obtained from 100 OA patients, 134 RA patients, and 100 healthy controls. The infrared spectral acquisition was performed on air-dried 1µl of serum with a diamond-ATR-FTIR spectrometer. Machine learning models combining Partial Least Squares Discriminant Analysis (PLS-DA) and Support Vector Machine (SVM) were trained to binary classify preprocessed ATR-FTIR spectra as follows: controls vs. OA, controls vs. RA, and OA vs. RA. For a separated test dataset and the validation dataset, the overall model performance was better in classifying OA and RA patients, followed by the RA and controls, and lastly, between OA and controls, with corresponding AUC-ROC values: 0.84, 0.76, 0.72 (test dataset) and 0.94, 0.92, 79 (validation dataset). In conclusion, this study reports robust binary classifier models to differentiate blood serum from the two most common rheumatic diseases, showing the potential of ATR-FTIR as an effective aid in rheumatic disease classification.
1. Introduction
Osteoarthritis (OA) and rheumatoid arthritis (RA) are the two most common forms of arthritis, responsible for disability and chronic pain in adults, causing significant socioeconomic burden worldwide [1,2]. Both diseases are highly heterogeneous, with a complex pathophysiology of an unknown origin. OA is best characterized as a degenerative whole-joint disease most frequently occurring in knees, where various joint tissues undergo structural modifications, leading to failure. [3,4]. RA is a systemic inflammatory autoimmune disease often characterized as symmetrical polyarthritis, affecting small and large synovial joints throughout the body. In addition, some patients develop extra-articular disease manifestations, such as interstitial lung disease, vasculitis, neuropathy, serositis, glomerulonephritis, inflammatory eye involvement, felty syndrome, myopathy, and amyloidosis. [5,6] It is well-recognized that in both OA and RA, the disruption at the molecular level occurs years before the clinical symptoms, making them asymptomatic and thus difficult to diagnose in the initial stages. [5,7] Therefore, the diagnostic process for OA and RA is complex, relying on a combination of clinical exams, serological tests, and imaging, as there is no single diagnostic marker. Current classification criteria (ACR/EULAR [8] and NICE [9]) show varying sensitivity and specificity over time [10,11], highlighting the need for faster, more efficient diagnostic tools for early-stage detection and personalized treatment.
In recent years, Attenuated Total Reflectance Fourier Transform Infrared spectroscopy (ATR-FTIR) studies with blood serum and machine learning (ML) based analysis techniques have been the subject of active research to develop diagnostic methods for various diseases [12–15]. ATR-FTIR is a non-expensive vibrational spectroscopy method that provides a label-free, analytical way to study the biochemical composition of small sample volumes with minimal preparation. In ATR-FTIR spectroscopy, the sample is placed in contact with the ATR reflection element that directs the IR radiation into the sample. Due to the optical properties of the ATR element, the IR light travels through the element as an evanescent wave as a frequency-dependent absorption occurs within the sample if the chemical bond of the molecule is vibrating at the same frequency as the incident IR radiation. [16] The absorbance spectrum represents the convolution of individual spectra initiated by all covalent bonds of infrared-active biomolecules, providing characteristic information of the biochemical composition of the sample, i.e., the molecular “fingerprint.”
This is also evidenced by previous studies demonstrating the potential of ATR-FTIR and FTIR spectroscopy to discriminate rheumatic diseases from human blood serum samples [17–22] and synovial fluid [23]. Although synovial fluid can reflect a more detailed metabolic status of OA and RA and may lead to better diagnostic results, as the early study suggests, the clinical value of blood serum is higher since it is more available and better represents the systematic effects of a disease in individuals, considering the complexity of rheumatic diseases. [24,25] Recent studies in food science [26], pharmacology [27], and cancer research [28–31] have utilized Partial Least Squares Discriminant Analysis (PLS-DA) as a dimension reduction technique in combination with Support Vector Machine (SVM) to classify spectral data.
In this study, we assessed the potential of ATR-FTIR spectroscopy in diagnosing two common rheumatic diseases. We aimed to investigate the classification of OA and RA patients from healthy controls using blood serum with IR spectra using an innovative approach, combining both PLS-DA and SVM algorithms. We determined the robustness and potential of the model in diagnosing arthritic diseases.
2. Material and Methods
2.1 Ethics and Study Design
All the blood serum samples were collected from a local biobank, Biobank Borealis of Northern Finland (Oulu, Finland), governed by the Finnish Biobank Act 688/2012. The biobank approved study approval requests with a detailed study protocol and a sample request (corresponding biobank project number: BB_2021_5009) and did not require a statement from the local ethics committee. Table 1 represents the available inclusion and exclusion criteria for the participants of this study.
2.2 Sample set
The sample set consisted of 334 serum samples from healthy (control group), OA patients, and RA patients. The samples were collected into serum gel sample tubes, following a 30-minute stand time before centrifugation. The centrifugation was performed at room temperature according to the instructions by Nord Lab (https://www.nordlab.fi/) (2500g, 10 min) and stored at −80°C until pipetting. Subsequently, 20µl of serum was sectioned into Eppendorf tubes and then frozen at −80°C until the ATR-FTIR measurements. Table 2 represents the available demographics of the study participants obtained from Biobank Borealis.
2.3 ATR-FTIR measurements
The measurements were performed with an ATR-FTIR spectrometer (Thermo Scientific Nicolet iS5, Thermo Nicolet Corporation, Madison, WI, USA) with an id7-ATR-diamond accessory. For spectral acquisition, the vendor-provided OMNIC™ software was used. Before each measurement, a background spectrum was collected. One microliter of serum was pipetted onto the ATR crystal, and the spectral acquisition was performed within the MID-IR region (400 - 4000 cm-1), averaging 64 repeated scans with a spectral resolution of 4 cm-1. During each measurement, 10 IR spectra were collected while the serum sample was drying (approx. 20 min) to monitor water evaporation. Only the final (tenth) spectrum without the interfering water band (900 – 1800 cm-1 & 3000 – 3700 cm-1) [32] was used for further preprocessing and analysis. To guarantee the reliability of the measurements and to prevent any potential distortions resulting from the device or the surrounding environment, the measurement protocol described above was repeated three times and averaged for each sample. Between repetitions, the ATR crystal was cleaned from serum by erasing it with 70% ethanol, then with Virkon, and finally with 70% ethanol.
2.4 Spectral pre-processing
All spectral analyses were performed using MATLAB® (MathWorks R2023b, 9.13.0.2105380, Natick, Massachusetts). While the fingerprint region (800 – 1800 cm-1) is commonly chosen for its distinct peak characteristics, we also aim to assess if the entire spectral range contributes to the model development, given the intricate nature of rheumatic diseases. Thus, the spectral data was truncated into two separate regions: 800 – 1800 cm-1 and 800 – 3700 cm-1, which were further processed similarly. [33] After the data truncation, the offset was corrected for the spectra, followed by vector normalization to minimize the non-biochemical effects on the data. In addition, the first and second derivatives of the spectra were calculated to examine whether the derivative spectral data affects the classification performance and enhances the differences between groups. The derivatives were then truncated and normalized. The derivatives were calculated using the Savitzky-Golay filter, a built-in function provided by MATLAB®, with a polynomial order of 2 and a window length of 9.
2.5 Binary classifier model
First, a Principal Component Analysis (PCA) was applied to the preprocessed data, as it is a common unsupervised method for dimension reduction before classification analysis. However, PCA could not find discriminative patterns between groups, as shown in Supplementary Figure S2. Furthermore, statistical tests were assessed to find relevant wavenumbers (t-test with permutation); however, no significant wavenumbers were identified. Therefore, a supervised method for dimension reduction for further classification was seen as the most robust approach.
Three separate binary classification models were trained between the groups: 1) control and OA, 2) control and RA, and 3) OA and RA. The well-established chemometric method, PLS-DA, was combined with SVM for classification. PLS-DA is a powerful classification technique for high-dimensional datasets by projecting them into latent variables (LVs) in which Linear Discriminant Analysis (LDA) is then applied [33]. However, in cases where the differences between groups are minimal, the groups may not be linearly separable in latent variable space, making LDA inadequate. Therefore, combining PLS-DA and SVM may provide superior performance compared to PLS-DA alone. The power of SVMs is that they perform better on data with nonlinear characteristics when kernel functions are used. These functions map the data into higher-dimensional space, where a linear separation plane can be found. Here, we used the Gaussian Radial Basis Function: as a kernel.
The training protocol of the PLS-DA-SVM model is summarized in Figure 1. To prevent data leakage and overfitting, the preprocessed data was split into a train set Xtrain constituting 70% of the data and the rest 30% to test set, Xtest. Following the division, a MATLAB build-in PLS algorithm was implemented on the training set to construct the PLS-score matrix ttrain and weight vector wtrain. The PLS-score matrix ttrain, was used as an input in the SVM classifier to train the model.
Three hyperparameters needed to be optimized before the final SVM model training: the number of PLS components for PLS-DA, the Kernel scale σ, and the Penalty coefficient C for SVM. Prior to actual SVM model training, we tried to optimize the number of PLS components by assessing the change in the 10-fold cross-validation (CV) loss of the SVM model as a function of the maximum number of 30 PLS components. However, as illustrated in Supplementary Figure S5, the CV error does not reach a saturation point or reaches zero (indicating overfitting). This makes the PLS component selection based on a cut-off (saturation) point inadequate. Therefore, we trained SVM models with five, ten, fifteen, and twenty PLS components and selected the best one based on the AUC-ROC value of the model. Following the dimension reduction, Bayesian optimization was employed to optimize the kernel scale C and penalty coefficient σ by minimizing (resampled) the 10-fold CV loop.
After the hyperparameter optimization, the SVM model was trained in a new (resampled)10-fold CV loop. After training, the performance of the SVM model was evaluated with ttest determined based on the weight matrix calculated from Xtrain according to the following formula:
The training protocol described above was iterated 100 times. In each iteration, a new random splitting (resampling) in training and testing sets Xtrain, and Xtest, was performed, and the results across iterations were aggregated and averaged. This nested CV is recommended for limited data sets, ensuring an unbiased model development and an honest generalization estimate. [34]
3. Results
3.1 Spectral differences between OA, RA, and healthy patients
Visual examination of the mean spectra reveals minimal biochemical differences between groups, as shown in Figure 2 (all preprocessed spectra are plotted in Figure S1), with assignments of the standard vibrational modes of blood serum. [35] The difference spectrum also reveals minor differences between groups, mainly in locations assigned to Amide I and II peaks and PO2--stretching (Figure 3). Moreover, the first and second mean derivative spectra (supplementary Figure S3-S4) showed no significant differences. Figure 4 represents the data in LV space. The first three LVs do not show linear group separation, thus necessitating a non-linear discrimination method (SVM).
3.2 Classification performance
Overall, the PLS-DA-SVM classifier performed best with normalized spectral data at the wavenumber region of 800 – 3700 cm-1, the results of which are presented here and summarized in Table 3. The supplementary material (Table S1-S10) presents the classifier’s performance from the fingerprint region of the normalized and derivative data (800 – 1800 cm-1).
The PLS-DA-SVM model achieved its best performance when differentiating OA and RA spectra. The best observed AUC-ROC value in the validation set was 0.94, with corresponding accuracy, sensitivity, and specificity of 94%, 89%, and 85%, respectively. The OA vs. RA classification performed best in the validation set with AUC-ROC value, accuracy, sensitivity, and specificity, 0.94, 89%, 90%, and 85%, respectively. The second-best performance was control vs. RA classification (AUC-ROC of 0.92), with accuracy, sensitivity, and specificity of 84%, 96%, and 69%, respectively. However, the classification of OA and control spectra demonstrated a lower AUC-ROC of 0.79, with 73% accuracy, 70% sensitivity, and 79% specificity in the validation set, and 0.72, 67%, 70%, and 63% in the test set. In the test set, OA vs. RA classification maintained the best performance, with an AUC-ROC of 0.84, accuracy of 83%, sensitivity of 78%, and specificity of 75%. The classification between RA and control spectra showed similarly high performance, achieving an AUC-ROC of 0.76, with 71% accuracy, 72% sensitivity, and 71% specificity. However, the OA and control spectra classification demonstrated a lower AUC-ROC of 0.72, with 67% accuracy, 70% sensitivity, and 63% specificity.
In the validation set, the averaged performance across 100 iterations, OA vs. RA classification remained the highest, with an AUC-ROC of 0.91, accuracy of 85%, sensitivity of 87%, and specificity of 83%. The second-best performance occurred in the control vs. RA classification, with an AUC-ROC of 0.87, 81% accuracy, 84% sensitivity, and 77% specificity. The classification of OA and control spectra was the lowest, with an AUC-ROC of 0.70, accuracy of 66%, sensitivity of 67%, and specificity of 68%. In the test set, OA vs. RA classification maintained the best performance, with an AUC-ROC of 0.70, accuracy of 83%, sensitivity of 78%, and specificity of 75%. The second-best performance was control vs. RA classification (AUC-ROC of 0.60), with accuracy, sensitivity, and specificity of 69%, 76%, and 55%, respectively. The OA vs. control classification yielded an AUC-ROC of 0.61, 58% accuracy, 62% sensitivity, and 56% specificity.
4. Discussion
In this study, we evaluated the potential of ATR-FTIR spectroscopy to distinguish well-characterized cohorts of OA, RA patients, and a healthy control group based on their blood serum. We trained PLS-DA-SVM hybrid classifier models with ATR-FTIR spectral data and tested the classification performance of different spectral preprocessing approaches and spectral regions of interest. The PLS-DA-SVM models were trained with a nested cross-validation (CV) approach and achieved over 80% AUC-ROC in distinguishing spectra of the RA group from the OA group.
The difference spectrum between the RA and control groups (control mean spectrum subtracted from RA mean spectrum) reveals a positive peak at 1626 cm-1, consistent with findings from a previous study that identified the same peak, along with two additional peaks at 1628 cm-1 and 1627 cm-1, all of which were strongly negatively correlated with RF [22]. We also observed a prominent positive peak in the difference spectrum at 1629 cm-1 (OA mean spectrum subtracted from RA mean spectrum), which corresponds well to the two other identified peaks. This wavenumber is associated with the β-sheet of IgG3 and IgG2. [36]. Considering the ATR-FTIR’s sensitivity to conformational changes in proteins and the dominant structure of β-sheet in FC-tail for RF binding, this finding may reflect the conformational differences in the antibody binding sites between individuals. It has recently been reported that these individual conformation patterns can explain the presence of natural RF and pathophysiological RFs in autoimmunity [37] and, therefore, play a role in the difference peaks observed. Moreover, the glycosylation of IgG is linked to both OA and RA, which could have contributed to the subtle differences observed between bands in the carbohydrate region (1180-1000 cm⁻¹), as it is reported to be sensitive to protein glycosylation in human plasma [38,39].
Prior studies have performed classification using the whole spectral range (700 – 4000 cm-1) [17] or selecting significant wavenumbers from various ranges (650 – 4000 cm-1 and combinations of 450 – 1700 cm-1 and 1701 – 4000 cm-1) [18,22]. Furthermore, two of these studies utilized the first derivative of IR spectra [17,22]. In our study, the IR spectra among control, OA, and RA groups were similar, with only minor fluctuations in absorbance values. However, our mean spectra (both normalized and first derivative) across groups were similar to the previously reported data [17,18,22]. Moreover, unlike previous ATR-FTIR studies, we did not find statistically significant wavenumbers that could serve as spectral markers or aid in feature reduction [18,22]. Our results suggest that using normalized data from the entire spectral region results in the most generalizable model. Furthermore, derivative spectra did not add value to our model compared to studies by Durlik-Popińska et al. [22] and Lechowicz et al. [17], in which the first derivative had the most value for classification.
The highest model performance between OA and RA groups suggests that the serum biochemical responses differ most between rheumatic diseases. This can also be because OA phenotypes include more biomechanical and structural changes, while RA has metabolic primarily and inflammatory phenotypes. This hypothesis is supported by a previous FTIR study by Wu X et al., which shows the most prominent differences between mean FTIR spectra of ankylosing spondylitis (AS), RA, and OA patients [21]. Moreover, a similar study used deep learning to classify FTIR spectra obtained from AS and RA patients with encouraging results. This also could explain the classification performance between the control RA group since RA is a well-established inflammatory disease with several serological effects. We found the lowest performance between the control and OA groups. This finding may reflect that knee OA’s pathophysiology is mainly due to biomechanical forces [40], and thus, biochemical processes behind knee OA are better reflected in the synovial fluid than in the serum, as systematic effects may play a minor role and are undetectable [41].
We used a nested CV to train the classifier models, which is shown to be the most relatable of true error [34,42,43]. This may yield lower overall performance for the models. A previous study used discriminant analysis as a feature selection technique for all the data before the actual modeling process [18]. This process causes potential data leakage and may lead to overfitting. They also used more than one spectrum from the same individual in modeling. This may lead to an over-optimistic model, primarily when leave-one-patient-out cross-validation (LOPOCV) was not used [18]. Correspondingly, the study by Lechowicz et al.[17] used three spectra replicates from each individual in the KNN model without LOPOCV, uncertainly estimating the model’s generalization ability.
The PLS-DA-SVM method combines aspects of principal component analysis (PCA), linear discriminant analysis (LDA), and canonical correlation analysis (CCA) to reduce the dimensionality of the data [44]. In PLS-DA (or PLS regression), the covariance between predictive and responsible variables is maximized to maximize the difference between sample groups in LV space (PLS-score matrix) [44], in which the response variable information is needed. Therefore, it is essential not to imply this latent LV construction straightaway to the whole data set before training the model due to the evident data leakage and to produce a non-biased, reliable generalization ability estimation of the model. In this study, we acknowledged the challenge of potential over-optimism in model evaluation by applying PLS as a dimension-reduction technique. This approach is the first to handle generalization issues in spectroscopy-based cancer diagnostics effectively [28–31,45,46], alprazolam qualification [27], and the classification of ionic liquids, coffee variants, and olive oils [47]. The latent variable spaces, derived from the equation presented in Section 2.5, demonstrate a more realistic but limited generalization, highlighting a significant step forward for future research.
One limitation of this study is the relatively small sample size and lack of diversity. Additionally, not filtering out abundant serum proteins may have affected the spectra. For instance, Hackshaw et al. utilized a centrifugal membrane filter to eliminate nominal molecular mass blood components in metabolic fingerprinting of Raman spectra of rheumatic disorders [19]. Detecting crucial protein markers can be difficult due to the high levels of proteins in the serum, which might interfere with the spectral signal. Nevertheless, we maintained a consistent measurement setup for all serum samples for comparison and classification. Furthermore, it is crucial to avoid algorithmic bias and acquire sufficient data. However, achieving adequate laboratory data collection requires substantial resources, rendering it a challenging task in the future. The differences in results compared to previous ATR-FTIR studies may also indicate the challenges of ATR-FTIR’s ability to apply to various populations and spectrometers when identifying OA and RA from serum. Our sample set displayed a significant degree of similarity in their biochemical composition, which may have contributed to the consistency of the results. Despite these challenges, the primary strength of our study lies in the use of robust and generalized PLS-DA-SVM classification models.
5. Conclusions
In conclusion, our PLS-DA-SVM classifier models performed best while separating OA from RA serum samples. These results indicate that ATR-FTIR has the potential to detect systematic inflammation-induced differences within the human serum. It may be a rapid and economic aid in classifying rheumatic diseases.
Author contributions: CRediT
Conceptualization: MM, SDG, LR & SS, Data curation: MM, Formal analysis: MM, Investigation: MM, Methodology: MM, Resources: MM & SDG, Supervision: SDG & SS, Writing – original draft: MM & SDG, Writing – review and editing: SDG, LR & SS
Declaration of conflict of interest
All the authors gave their final approval and agreed to be accountable for all aspects of the work. The authors declare that there is no conflict of interest.
Data Availability
All data produced in the present study are available upon reasonable request to the authors.
Acknowledgments
We want to acknowledge the clinical staff at Biobank Borealis, Oulu, Finland (https://oys.fi/biopankki/) and sample donors for providing the blood serum sample set.
Footnotes
Co-authors: Shuvashis Das Gupta, E-mail: Shuvashis.Dasgupta{at}oulu.fi, Lassi Rieppo, E-mail: lassi.rieppo{at}polar.com, Simo Saarakkala, E-mail: Simo.Saarakkala{at}oulu.fi