ABSTRACT
Infectious diseases can create significant global threats to public health and economic stability by creating pandemics. SARS-CoV-2 is a recent example. Early detection of infectious diseases is crucial to prevent global outbreaks. Mpox, a contagious viral disease first detected in humans in 1970, has experienced multiple outbreaks in recent decades, which emphasizes the development of tools for its early detection. In this paper, we develop a hybrid deep learning framework for Mpox detection. This framework allows us to construct hybrid deep learning models combining deep learning architectures as a feature extraction tool with Machine Learning classifiers and perform a comprehensive analysis of Mpox detection from image data. Our best-performing model consists of MobileNetV2 with LightGBM classifier, which achieves an accuracy of 91.49%, 91.87% weighted precision, 91.49% weighted recall, 91.51% weighted F1-score and Matthews Correlation Coefficient score of 0.83.
1. Introduction
Mpox, also known as monkeypox (MPX), is caused by the Mpox virus (MPXV), is a zoonotic disease, more specifically an infectious skin disease that can spread between animals and humans. It is a 200-250 nm brick-like or ovoid shaped double helix DNA virus that belongs to the Orthopoxvirus genus, Proxviriade family, and Chordopoxvirinae subfamily (Moore et al. (2021)). Along with MPXV, variola major virus, which causes smallpox, variola minor virus (also known as variola alastrim) from the same genus, also infects the human body (Petersen et al. (2019)). MPXV can be transmitted into the human body in several ways, from animal to human and human to human (Vaughan et al. (2020)). Moreover, it can be transmitted by contact with contaminated objects, a patient’s droplets, body fluids, and lesions (Vaughan et al. (2020)). After exposure to MPXV, it takes 3-17 days to develop symptoms, which can last nearly 2-5 weeks (Centers for Disease Control and Prevention (2024a)). A few symptoms of Mpox are fever, swollen lymph nodes, headache, muscle aches, and backache (Centers for Disease Control and Prevention (2024a)).
The first human case caused by MPXV was reported in 1970 (Ladnyj et al. (1972); Gessain et al. (2022)). In 2022-23, there was a global outbreak of Mpox (Ianache et al. (2024)). Before that, between 1970 and 2017, several outbreaks occurred in central and west Africa (Organization et al. (2022); Breman et al. (1980); Foster et al. (1972)). Clade I, previously known as the Congo Basin Clade or Central African Clade, and Clade II, previously known as West African
Clade, were responsible for the outbreaks in Central and West Africa (Eke (1972); Brown and Leggat (2016)). The first outbreak outside Africa occurred in the USA in 2003 (McCollum and Damon (2014); Brown and Leggat (2016)). In 2023, 809 positive cases were reported in different states of the USA, and 1,945 have been reported up to September 09, 2024 (Centers for Disease Control and Prevention (2024b)). In Figure 1, we present a scenario based on data collected from Centers for Disease Control and Prevention (2024b) and Ritchie et al. (2024) respectively for different states in the USA and Worldwide. From these reported cases, it is visible that developing tools for the early detection of Mpox has evolved into a significant research problem.
Mpox reported cases in the USA and Worldwide. 1a and 1b illustrate the number of infected cases in 2023 and 2024, respectively, in different states of the USA, and 1c shows cumulative infected cases reported globally up to September 2024.
Biomedical images have been a valuable tool in disease diagnosis for an extended period. In general, images have found wide-ranging applications in various fields, including health care, for example, brain tumor detection from MRI images (Bhattacharyya and Kim (2011); Amin et al. (2020); Anantharajan et al. (2024); Arumugam et al. (2024); Khan et al. (2024)), skin cancer detection from histopathological images (Lu and Mandal (2015); Esteva et al. (2017); Kimeswenger et al. (2021); Akilandasowmya et al. (2024); Ali et al. (2021); Strzelecki et al. (2024)); in agriculture, such as plant disease detection, for example, wheat disease detection (Franke and Menz (2007); Lu et al. (2017); Goyal et al. (2021); Verma et al. (2024)); in disaster management such as flood detection by satellite images (Kussul et al. (2008); Vanama et al. (2021); Composto et al. (2024)); in retail and E-commerce (Li and Li (2019); Chen et al. (2021);
Liang et al. (2016)). Mathematically, one can model a grayscale image as a function f : ℝ2 → [0, 255]. Similarly, a color image can be modeled as f : ℝ2 → [0, 255]3. Note that [0, 255] denotes the intensity values of the image.
Recently, biomedical images have become very useful to detect infectious diseases such as COVID-19 using chest X-ray images (Sailunaz et al. (2023); Bhattacharya et al. (2021); Jiang et al. (2020); Ebenezer et al. (2022); Ismael and Şengür (2021)). It is important to note that a commonly used diagnostic tool for infectious diseases is Polymerase Chain Reaction (PCR) test (Fredricks and Relman (1999)). It provides precise genetic analysis. However, PCR has risks of contamination, and limitations in detecting large or unknown sequences (Yang and Rothman (2004)). Moreover, it requires specialized equipment and expertise, making it less accessible in some resource-limited settings (Yang and Rothman (2004)).
Deep Learning (DL) techniques have become an alternative to detect infectious diseases. For a detailed review, one can see Sharma and Guleria (2024), Ibrahim et al. (2024), Bhatele et al. (2024), Chen et al. (2024), Foltz et al. (2024). While PCR is more likely inclined toward the lab resources and has a risk of contamination, DL techniques are mainly focused on biomedical images. DL techniques are data driven and a subclass of artificial intelligence (Huang et al. (2019)). It can predict patterns from large and complex datasets (Huang et al. (2019)). It involves training neural networks that can mimic the neural orientation of the human brain (Huang et al. (2019)). Most importantly, once the model is trained, it can be deployed on a simpler system to predict, classify, and generate output (Huang et al. (2019)).
In addition to DL, there are Machine Learning (ML) tools for classification. For example, Support vector machine (SVM) (Suthaharan and Suthaharan (2016b)), Logistic regression (Musa (2013)), Random Forest (RF) (Breiman (2001); Biau and Scornet (2016); Rashidi et al. (2019)), Naïve bayes (NB) (Webb et al. (2010)). Due to computational efficiency these classifiers have found wide ranging applications in various fields. For detailed review see Pereira et al. (2009), Narudin et al. (2016), and Abro et al. (2021).
Furthermore, One can study the dynamics of any infectious disease by developing models using a system of ordinary or partial differential equations. A simple framework for these models is the SIR (Susceptible, Infectious, Recovered) model, given by (Hattaf and Dutta (2020)):
with initial conditions S(0) = S0, I(0) = I0, R(0) = R0. Here S, I, and R are the number of individuals in the Susceptible, Infectious, and Recovered compartments, Λ represents the recruitment number in the Susceptible compartment, a is the infection or disease transmission rate from susceptible to infection, and b denotes the recovery rate. Note that t > 0 and the total population is given by:
Using this model framework, one can compute the basic reproduction number ℛ0, which is a threshold that determines whether the disease will spread or die out.
Due to the mathematical elegance, this model is widely popular and found wide ranging applications in the published literature. For example, consider the SEIR model, an extended version of (1) proposed by Li (2011) to study Malaria transmission. This model incorporates the life stages of mosquitoes that reflect their biological life cycles and malaria transmission dynamics between humans and mosquitoes. Etbaigha et al. (2018) developed a compartmental model to study influenza A virus (IAV) transmission within a swine farm. The model investigated different stages of infection and immunity within the swine population. Ahmed et al. (2021) proposed a SEIAPH (Susceptible, Exposed, Infectious, Asymptotic, Pre-symptomatic, Symptomatic hospitalized, and Recovered) model that reflects a critical aspect of SARS-CoV-2 virus transmission concerning the role of asymptomatic carriers. Mahata et al. (2022) proposed an advanced model incorporating fractional order derivatives and vaccination strategies in the classical SEIR model. Moreover, they introduced optimal control strategies and provided stability analysis of the model. Furthermore, they provided a comparison between the classical and fractional-order SEIR models.
Recently, this infectious disease model has been studied to analyze the dynamics of Mpox and predict the virus’s spread. Liu et al. (2023) proposed a SEIR-type model for continuous monitoring and early intervention of the spread of Mpox virus. They proposed a robust model and performed sensitivity analyses for different parameters such as infection rate and recovery rate. Most importantly, they estimated the basic reproduction number for 2022 outbreak to investigate how rapidly the virus could spread. Batiha et al. (2023) investigated the dynamics of Mpox outbreak using fractional-order SEIR model. They analyzed the effects of vaccination in the transmission and mitigation of Mpox virus. Betti et al. (2023) developed a model focusing on pair formation that reflects how close contact with individuals contributes to the spread of Mpox. Additionally, they investigated recovery dynamics to observe how quickly infected individuals recover and how this recovery rate affects the overall spread of the infection. Using infectious disease modeling and statistical techniques, Zhang et al. (2024) forecasted the global spread and trajectory of the Mpox virus. They emphasized identifying key factors responsible for the spread of MPXV. By analyzing the current data, they predicted the future scenario of the outbreak. Kaftan et al. (2024) performed a comprehensive analysis of the 2022 Mpox outbreak in New York City. They analyzed and compared different mathematical models, including SEIR, and evaluated their effectiveness in predicting and forecasting outbreak patterns. Molla et al. (2023) has provided a comprehensive review on infectious disease modeling to investigate the Mpox transmission.
Ali et al. (2022) created an image dataset to detect and analyze Mpox with DL algorithms. The dataset consists of images with a resolution of 224 × 224 × 3 and allows the implementation of DL models. Ali et al. (2022) applied three pre-trained models, VGG-16, InceptionV3, and ResNet-50, to classify Mpox images. In addition to that, they combined these DL algorithms by employing majority voting and proposed an ensemble model. Among these models, ResNet-50 achieved 82.96% accuracy, 87% precision, 83% recall, and 84% F1 score, while other models, including the ensemble, produced lower scores in every metric. Using the same dataset, Sahin et al. (2022) developed a DL framework tailored for mobile based applications to diagnose Mpox lesions. They obtained 91.11% accuracy, 90% precision, 90% recall, and 90% F1 score. Sitaula and Shahi (2022) proposed an ensemble based DL framework focused not only on Mpox detection but also on classifying other Skin diseases. Their dataset was curated with samples of different skin diseases. Initially, they implemented 13 different DL techniques, then ensembled two best-performing algorithms—Xception and DenseNet-169 and achieved 87.13% accuracy, 85.44% precision, 85.47% recall, and 85.40% F1 score. Dwivedi et al. (2022) explored ResNet-50, Efficient-Netb3, and EfficientNetb7 on a dataset that contains 160 Mpox lesions, 178 chickenpox, 54 cowpox, 358 small pox, and 50 healthy skins. They obtained 87% accuracy, 92% precision, 87% recall, and 90% F1 score by EfficientNetb3.
Uysal (2023) proposed a hybrid DL framework for Mpox detection. Unlike developing an ensemble model employing majority voting, they combined the two models with the highest accuracy with the Long Short-term Memory (LSTM) model and obtained 87% accuracy, 84% precision, 87% recall, and 85% F1 score. Nayak et al. (2023) tested GoogleNet, ReseNet-18, ResNet-50, ResNet-101, and SquezeNet for multi-classification and Mpox detection and obtained 91.19% average accuracy. Explainable Artificial Intelligence (XAI) is crucial in Medical research. Nayak et al. (2023) introduced XAI to make the DL model more in-terpretable. Gupta et al. (2023) developed a blockchain enabled DL framework for early detection of Mpox lesions. They obtained 98% accuracy using a dataset with 1905 images. Sorayaie Azar et al. (2023) implemented ResNetV2, InceptionV3, ResNet152V2, VGG-16, VGG-19, Xception, DenseNet-201 on a dataset that consists of 43 Mpox, 47 Chickenpox, 27 Measles, 54 normal images. They found that DenseNet-201 outper-formed other models and produced 95.18% accuracy. However, they have augmented the sample size ten times before implementing transfer learning models. Akula and Pushkar (2023) investigated seven different pre-trained transfer learning models and achieved an accuracy of 89.24% with the Xception model.
Recently, Ahsan et al. (2024) proposed a modified VGG-16 and ResNet-50 model to classify Mpox. To ensure data privacy, they have used a federated learning strategy. Meena et al. (2024) used VGG19, Xception, InceptionV3 transfer learning models on augmented dataset created by Ali et al. (2021), which includes, 1428 Mpox and 1764 others Skin lesions images. They achieved 98% accuracy by using InceptionV3 model. Yolcu Oztel (2024) explored Vision Transformers (VITs) and Convolutional Neural Networks (CNNs) for Mpox lesion analysis. They proposed an ensemble learning strategy using the bagging ensemble technique and obtained 81.91% strategy.
In this paper, we predict Mpox from image dataset.
Our contributions include:
We develop a hybrid Deep learning (DL) framework combining a pre-trained DL architecture with a Machine Learning (ML) classifier. The pre-trained DL architecture will be used for feature extraction from image data. After that, the ML model will be trained for Mpox prediction.
We develop several hybrid DL models using our DL framework and perform a comprehensive analysis of Mpox prediction. To evaluate the model, we compute accuracy, precision, weighted precision, recall, weighted recall, F1-score, weighted F1-score, and Matthews Correlation Coefficient (MCC) score.
Finally, we compare our best-performing model with previously published results to highlight its effectiveness in Mpox prediction.
We organize this paper as follows: In section 2, we develop a hybrid DL framework. Next, we present our experimental resources along with a brief discussion about our dataset, train-test splitting procedure, and data augmentation strategy in section 3. In the same section, we also discuss the pre-trained DL architectures that we have used in this paper and different ML tools. After that, in section 4, we discuss our results and compare our best performing model with previously published results. Finally, we conclude our work.
2. Hybrid Deep Learning Framework for Mpox Detection
Consider the dataset ,where 𝒳i denotes the ith image and 𝒴i represents its correspionding image label. We develiop the framework by combining a pre-trained DL architecture and an ML classifier in the following steps:
Step 1 (Preprocessing): In this initial phase, we clean the dataset through various processes, including noise reduction, handling mislabeled images, and removing duplicate images to ensure the dataset is prepared for optimal model performance.
Step 2 (Train-test split): We partition the given dataset S into two distinct subsets, STrain and STest, such that
where STrain denotes the training set containing 80% of the sample and STest denotes the test set containing 20% of the samples from the dataset S. We perform data augmentation on STrain.
Step 3 (Feature extraction): In this step, using a pre-trained DL architecture ℱ, we extract features from each image as follows:
where
and
represent the features extracted from
and
respectively.
Note that each extracted feature is a tensor of dimension l×p×q, where l and p are the height and width of the feature map and q denotes the number of feature channels. In order to train a ML classifier, we have reshaped each of these features into a d-dimensional vector where d = l × p × q. The reshaping procedure is performed as follows:
where j = 1, 2, …, N, and k = 1, 2, …, M.
Step 4 (Train classification model): Using these features ,we train a classification model 𝒢. After that, we test the model using the features
and predict the labels
as follows:
Step 5 (Model Evaluation): Finally, we evaluate the hybrid model using standard metrics such as accuracy, precision, weighted precision, recall, weighted recall, F1 score, weighted F1 score, and Matthews Correlation Coefficient (MCC) score. We discuss these metrics in subsection 3.4.
In Figure 2, we illustrate this framework in detail.
A hybrid Deep Learning framework for Mpox classification, combining a pre-trained Deep Learning architecture and Machine Learning classifier.
3. Resources for Constructing Hybrid Models
In this section, we discuss our dataset, data augmentation strategy, DL architectures, and ML classification models, which we have used to construct hybrid models.
3.1. Data Description
In this paper, we have used a publicly available a datset for Mpox detection created by Ali et al. (2021). The dataset contains images with a resolution of 224 × 224 × 3. It is divided into two categories–Mpox and non-Mpox, where the latter includes images of other diseases–chickenpox and measles. In Table 1, we present a summary of the dataset, including the number of images in each category.
3.1.1. Train-Test Split and Data Augmentation
We split the dataset into two subsets– Train set, which contains 80%, and Test set, which includes 20% of the samples from the original dataset. We present a detailed summary in Table 2.
We have performed data augmentation in the training-dataset. Since the dataset is imbalanced, one can observe it from Table 1 and Table 2; we have balanced the number of Mpox and Other images using standard augmentation strategies. In Table 3, we present these in detail.
3.2. Pre-trained DL Architectures
We have used several pre-trained DL architectures for feature extraction, including DenseNet, MobileNet, Inception, Inception-Residual Network, and Xception.
We discuss them briefly in the following subsections.
3.2.1. DenseNet
All the layers of this architecture are connected in a feedforward manner, where all the features from the preceding layers are concatenated and transferred to the successive layers (Zhu and Newsam (2017); Dhillon and Verma (2020)). This process allows to reuse the features, leverage the gradient flow, and prevent the vanishing gradient problem. In addition to that, it has transition layers that consist of batch normalization, ReLu activation function, 1 × 1 convolution to reduce dimensions, and 2 × 2 average pooling that controls the growth of the number of features and consequently reduces the computational complexity (Zhu and Newsam (2017); Dhillon and Verma (2020)). In this paper, we have implemented three different versions of DenseNet. These are DenseNet-121, DenseNet-169, and DenseNet-201 (see Huang et al. (2017)). These versions, respectively, contain 121, 169, and 201 layers (Huang et al. (2017)).
3.2.2. MobileNet
It is an alternative to CNNs, computationally more efficient than CNNs and tailored for deployment on devices with limited computational capacities, such as smartphones (Howard et al. (2017)). Unlike CNNs, which use regular convolution operations where a filter processes all the input channels (e.g., red, green, blue for a color image) together, MobileNet uses a depth-wise separable convolution (DPSC) operation (Nelson and Gailly (1995)). DPSC comprises into two steps— Depthwise convolution (DC) and Pointwise convolution (PC) (Nelson and Gailly (1995)). Initially DC, filters are applied to each input channel individually and reduce the number of parameters, which makes the computational process faster compared to the regular convolution operation. Then, PC combines the outputs from DC and transforms them into a single output (Nelson and Gailly (1995)). In this paper, we employed MobileNetV1 (Howard et al. (2017)) and MobileNetV2 (Sandler et al. (2018)) for feature extraction from images.
3.2.3. Inception
This DL architecture uses an inception module, which allows Inception to learn global and local patterns and leverage its ability to learn features comprehensively (Kolla et al. (2023)). More specifically, the inception module applies convolutional filters of different sizes (1 × 1, 3 × 3, and 5 × 5) in parallel, which allows this architecture to learn features simultaneously at multiple scales (Alzubaidi et al. (2021)). In addition, 1 × 1 convolution reduces the dimensionality before applying the larger filters, which leverages the architecture’s computational efficiency (Alzubaidi et al. (2021)). It was first introduced in 2014 by Google (Szegedy et al. (2015)). Inception-V3 architecture was introduced in 2016. It has 159 layers, and for feature extraction, it uses a standard convolutional block (Kassani et al. (2019)).
3.2.4. Inception-Residual Network
It ensembles the architecture of Inception and Residual networks, incorporating residual connections within the inception block structure (Szegedy et al. (2017)). Residual connections resolve the vanishing gradient issue and enhance the efficiency of this architecture in classification tasks, particularly for image data (Szegedy et al. (2017)). In this paper, we have used Inception-ResNet-V2 architecture for feature extraction.
3.2.5. Xception
The architecture of Xception is an improvement over Inception (Chollet (2017)). One can consider this a bundle of depthwise separable convolution (DSC) with residual connection (RC), where DSC reduces the computational cost as well as improves memory usage by separating the learning of channel-wise and space-wise features, and RC solves the vanishing gradient issue (Alzubaidi et al. (2021)). It has 36 convolutional layers structured into 14 modules (Kassani et al. (2019)).
3.3. Machine Learning Classifiers
We have used the following feature set
to train several ML classifiers. In the following subsections, we provide an overview of how these classifiers generally function.
3.3.1. Support Vector Machine
A crucial step in training a support vector machine (SVM) is to find a hyperplane, given by:
so that the margin between the nearest data points of each class and the hyperplane is maximized. Here, w∈ ℝd denotes the weight, and b∈ ℝ denotes the bias in (8). It is important to note hat to train an VM; the labels must be mapped either -1 and 1 or 0 and 1 (Stitson et al. (1996); Suthaharan and Suthaharan (2016b)). For linearly separable data, to obtain the final classifier x ↦ sign (w T z + b, one has to solve the following optimization problem (Kecman (2005)):
where j ∈ {1, 2, …, N}. Note that sign (wT z + b)= 1 if wT z + b > 0, and 0 otherwise.
In this paper, to train SVM, we have mapped the labels associated with each feature into 0 and 1.
3.3.2. Random Forest
It performs classification task by ensembling predictions made by multiple decision trees based on majority voting strategy (Breiman (2001)). Suppose there are ℓ decision trees T1, T2, …, Tℓ in the forest. These trees are also known as “base learner” (Cutler et al. (2012)). Initially, it trains each decision tree on a subset drawn from the training feature space using bootstrap sampling with replacement. Each of these trees make a prediction for a given input z, then final prediction is made as (Cutler et al. (2012)):
where 𝕀 denotes the indicator function,
denotes the ℓ-th fitted tree at z, and C is the set of all possible values of 𝒴.
3.3.1. Logistic Regression
For any probability of p success expressed in terms of odds as (Kleinbaum et al. (2002); Nick and Campbell (2007); LaValley (2008)):
the Logisitic regression (LR) is defined as follows:
Here β = {β1, β2, …, βn} is the coefficient vector of z and β0 is a constant.
3.3.5. Decision Tree
In general, a Decision tree (DT) is trained by deploying a continuously growing binary decision tree in training features .It starts from a root node and ends with the decision outputs known as the leaf node (De Ville (2013); Song and Ying (2015); Suthaharan and Suthaharan (2016a)).
3.3.5. K-Nearest Neighborhood
The K-Nearest Neighbor (K-NN) classifier predicts the label of a target point z∗ based on the patterns that are nearest to z∗. To identify these patterns, the classifier computes the similarity between the target point z∗ and the input features using distance metrics such as the Euclidean distance or the Minkowski distance (Peterson (2009); Kramer (2013)). For binary classification, K-NN is defined as (Kramer (2013)):
where K denotes the neighborhood size, and NK (z∗) denotes the set of indices of the K nearest neighbor. In this paper, we have considered K = 5.
3.3.6. Naïve Bayes Classifier
It assumes that the features are conditionally independent given any target label (Szostak (2012)). In other words, it considers the contribution of each feature independently. For binary classification, it predicts a label as follows (Szostak (2012); Permission (2005)):
In this paper, for each zj, we have considered that the likelihood function ℙ(zj |𝒴 =k) is normally distributed with mean μjk and variance σi.
3.3.7. Adaptive Boosting
Adaptive boosting (AdaBoost) combines weak classifiers T1, T2, …, Tm and creates a strong classifier as follows:
where m denotes the total number of classifiers, and is the weight assigned to the weak classifier (see eund and Schapire (1997); Singh et al. (2023
3.3.8. Extreme Gradient Boosting (XGBoost)
It is an improvement over Gradient boosting (GB), which forms a model for prediction by ensembling multiple decision trees where each successive tree depends upon the previous tree and tries to fix errors made by the preceding ones (Chen (2015); Chen and Guestrin (2016)). One of the important features of XGBoost is it prevents overfitting by regularizing the loss function as follows:
where
denotes the predicted observations, ℒ denotes the loss function, and ℛ is the regularization term, given by:
Here λ denotes the regularized parameter to scale the penalty, γ the minimum loss needed to further partition the leaf node, w denotes the weight of the leafs, t is the number of leaves in the tree.
3.3.9. Light Gradient Boosting (LightGBM)
It was built at the top of the Gradient boosting framework for classification and other machine-learning tasks. It was developed by Microsoft (Ke et al. (2017)). Its architecture includes histogram based learning, which reduces memory usage and accelerates computational speed, leaf-wise tree growth, Gradient based one-sided sampling (GOSS), exclusive feature bundling, which groups categorical features with common values, parallel and GPU learning, and regularization to prevent overfitting such as L1 or L2 regularization (Ke et al. (2017)).
In this paper, we utilized packages and modules from the Scikit-learn library in Python to implement the classifiers (Pedregosa et al. (2011)).
3.4. Evaluation Metrics
In this section, we present evaluation metrics to evaluate hybrid DL models. Our goal is to classify Mpox and other conditions from image data. To formalize the evaluation process, we construct the following hypotheses:
H0 : The observation belongs to the “Others” class
HA : The observation belongs to the “Mpox” class
where H0 denotes the null hypothesis and HA is the alternative hypothesis. Based on the predictions made by a hybrid DL model, the possible outcomes of this hypothesis test are:
i) True positive (TP): observation is correctly predicted as Mpox. In other words, H0 is rejected.
ii) True negative (TN): H0 is accepted, that is observation is classified as others.
iii) False positive (FP): observation is incorrectly classified as Mpox when it is actually others, which means that the H0 is rejected when it is true.
iv) False negative (FN): observation is incorrectly classified as others when it is actually Mpox. In other words, H0 is accepted when it is false.
These outcomes can be represented in a matrix form, commonly referred to as the confusion matrix (CM), as follows:
We have computed accuracy, precision, recall, F1 score, weighted precision, weighted recall, weighted F1 score, and Matthews Correlation Coefficient (MCC) using these outcomes. In Table 4, we present the evaluation metrics in detail.
In addition to these evaluation metrics, we have also computed the area under the receiver operating characteristic curve (ROC). ROC plots the sensitivity (true positive rate) against the false positive rate (FPR). FPR is given by:
For n points on the ROC curve, area under the curve (AUC) can be computed using Trapezoidal rule approximation as follows:
4. Results and Discussion
The objective of this research is to develop a hybrid DL framework that integrates pre-trained DL architectures with ML classifiers to predict Mpox from image data. Lack of image data and computational complexity are two stumbling blocks for Mpox prediction. This framework enables the construction of hybrid DL models capable of identifying Mpox using a small, publicly available dataset while optimizing memory usage and computational time. We have comprehensively analyzed this framework using several hybrid DL models (see Table 5, Table 6, and Table 7).
The hybrid models listed in Table 5 combine different DenseNet architectures (DenseNet-201, DenseNet-169, and DensNet-121) with ML classifiers. Notably, the hybrid model consists of DenseNet-201, and LR (D201LR) obtained an accuracy of 85.11%, outper-forming other models that incorporated DenseNet-201 as a feature extraction tool. This model identified 18 true positive Mpox images out of 21 Mpox cases reported in the test dataset (see Figure 3a). On the other hand, DenseNet-201 combined with K-NN (D201KNN) achieved the highest precision (90%) among all DenseNet-201 incorporated hybrid models, highlighting its ability to minimize false positive cases.
Confusion matrices and ROC curves for the selected hybrid models, incorporating DenseNet-201, DenseNet-169, and DenseNet-121 deep learning architectures as feature extraction tools from the image data. These models are selected based on their MCC score from each category listed in Table 5. Figure 3a, 3b and 3c are the confusion matrices for D201LR, D169LR, and D121LightGBM respectively, and 3d, 3e, and 3f illustrate their corresponding ROC curves.
The highest overall accuracy, 87.23%, was achieved by the model consisting of DenseNet-169 with LR (D169LR). While it outperforms other hybrid models, its precision is significantly small (82.61%) compared to D201KNN. It shows that D169LR had a slightly higher rate of predicting false positive cases. However, D169LR identified 19 true positive Mpox images (Figure 3b), which is slightly higher than D201KNN.
DenseNet-169 with Adaboost (D169Adaboost) is another model which attained the same accuracy as D169LR (87.23%). Other evaluation metrics of this model, such as weighted precision (87.39%), weighted recall (87.23%), F1-score (85%), and weighted F1-score (87.15%) are comparable to D169LR, although its precision is higher than D169LR, which indicates that its strength while minimizing false positive is essential.
DenseNet-169 with SVM (D169SVM) is another noteworthy hybrid model for its highest recall rate, 90.48%, the same as D169LR. Hybird Models, DenseNet-121 with LR (D121LR) and DenseNet-121 with Light-GBM (D121LightGBM) both performed equally in terms of every metric for Mpox prediction (see Table 5).
However, D169LR outperformed other models in MCC score (Table 5). Note that, MCC score provides valuable information about the model performance, most importantly when the dataset is imbalanced. D169LR provided a score of 0.75, which highlights its strength in Mpox prediction. While D169Adaboost archived a very close sore (0.74) to D169LR, performance of other competing models are significantly low. From the hybrid models enlisted in Table 5, we have selected three best performing models based on their MCC scores.
In Figure 3, we incorporated Confusion matrices and Receiver Operating Characteristic (ROC) curves of these models. Since D121SVM and D121LightGBM performed equally in each evaluation metric, we only considered one of these two models in Figure 3. Confusion matrices allowed us to distinguish between positive and negative cases with accuracy. Along with these matrices, we investigated ROC curves for these models to assess model performance. Area Under Curve (AUC) of D161LR reflects its efficiency in Mpox identification.
In Table 6, we have constructed a few more hybrid models by incorporating InceptionV3, InceptionRes-NetV2, and Xception architecture as a feature extraction tool. Out of all the models developed based on InceptionV3, InceptionV3 with AdaBoost (IV3AdaBoost) obtained the highest accuracy (87.23%). Although IV3XGBoost (InceptionV3 with XGBoost) outperformed IV3AdaBoost in sensitivity (95.24%), its MCC score (0.72) and other metric scores are significantly lower than IV3AdaBoost. IV3AdaBoost and D169LR performed equally to classify Mpox and other images since both models scored equal values in every evaluation metric.
InceptionResNetV2 with RF (IRV2LR) performed better in every metric compared to other models, which are based on InceptionResNetV2 architecture. Even though IRV2LR and D201LR have equal accuracy (85.11%) and MCC score (0.70), their precision, weighted precision, recall, weighted recall, F1 score, and weighted F1 score are significantly different. However, the AUC of D201LR is slightly higher than IRV2LR.
Among hybrid models integrating Xception as a feature extraction tool, XcepSVM (Xception with SVM) and XecpLR (Xeception with LR) performed better than others. However, its accuracy (78.72%) is significantly low compared to D169LR and IRV2LR.
XcepSVM and XecpLR identified 13 images as Mpox (see Figure 4c). Since XcepSVM and XecpLR performed equally, we incorporated a confusion matrix and ROC curve only for XcepSVM (Figure 4c).
Performance evaluation of the hybrid models IV3AdaBoost (4a, 4d), IRV2LR (4b, 4e), and XcepSVM (4c, 4f) based on confusion matrices and ROC curves. These models are selected based on their MCC score from the list of models generated incorporating InceptionV3, InceptionResNetV2, and Xception as feature extraction tools (see Table 6).
In Table 7, we have more hybrid models that were developed by combining MobileNet architecture and ML classifiers. MobileNetV1 and MobileNetV2 were used to create these models. MV1LR (MobileNetV1 with LR) obtained 80.85% accuracy, which indicates its efficiency in Mpox prediction. In addition to that, its other evaluation metrics, such as precision (83.33%), are also significantly higher than other models that were developed based on MobileNetV1. Although its sensitivity or recall (71.43%) is lower than the model MV1LightGBM (MobileNetV1 with LightGBM), which is 76.19%, its MCC score of 0.61 outperformed other models. It identified 15 Mpox images and 23 other images (see Figure 5 (a)).
Performance evaluation of hybrid models constructed using MobileNetV1 and MobileNetV2 architectures. Figures 5a and 5c are the confusion matrix and ROC curve for the MV1LR model, respectively, while Figures 5b and 5d show the confusion matrix and ROC curve for the MV2LightGBM model.
MV2LightGBM obtained accuracy 91 49%, which is highest among the models developed based on the MobileNetV2 architecture. Although its precision score (86.96%) is slightly lower than the precision score of MV2AdaBosst (MobileNetV2 with AdaBoost) (88.89%), it has outperformed other models in MCC score. Moreover, MV2LightGBM obtained the highest accuracy over all the hybrid models developed listed in Table-5, Table-6 and Table-7.
86.96% Precision score of MV2LightGBM indicates its ability to identify true positive cases of Mpox, which is higher than MV1LR. Furthermore, its weighted precision of 91.87% shows an overall scenario of how MV2LightGBM is suitable for binary classification, especially for an imbalanced dataset. In addition to that, its 91.87% recall and 95.24% weighted recall highlighted the robustness of this model in detecting TP positive cases out of all TP positive cases. More specifically, one can see that this model identified 20 Mpox images out of 21 (Figure 5b), which is the highest number of predictions over all other models.
Next, the 90. 91% F1 score and the 91. 51% weighted F1 score of MV2LightGBM highlighted the strength of this model in minimizing false positive and false negative cases. In other words, it can correctly identify 90.91% of the true positive instances while keeping the false positive and false negative cases low. Finally, the MCC score of this model (0.83) established it as an efficient model for the prediction of Mpox.
4.1. Comparison with Existing Methods
Through our analysis, MV2LightGBM emerged as the best-performing hybrid model, achieving the highest accuracy compared to all other evaluated hybrid models. This section compares this model with the different models previously published in the literature for Mpox detection. In this paper, we have used the dataset created by Ali et al. (2022). However, MV2LightGBM, which obtained 91.49% accuracy, outperformed the model proposed by Ali et al. (2022). Moreover, it has outperformed the model proposed by Sahin et al. (2022) by a slight margin. Note that Sahin et al. (2022) also used the same dataset created by Ali et al. (2022). Furthermore, MV2LightGBM outperformed Sahin et al. (2022) and Dwivedi et al. (2022), not only in terms of accuracy but also across other evaluation metrics (see Table 8).
5. Conclusion
Mpox is a zoonotic infectious skin disease that has raised concern due to its global outbreaks. The construction of an early detection tool for Mpox has emerged as a significant problem in preventing a possible future outbreak. In this paper, we developed a framework for the early detection of Mpox from biomedical image data. Our framework is constructed with a DL architecture and an ML classifier. In our framework, the DL architecture is accountable for feature extraction, and ML classifiers are used for classification. We built several hybrid models to perform a comprehensive analysis and evaluated these models with standard evaluation metrics. Finally, we compared the results of our best-performing model with those published in the literature. Our results showed significant improvements over the results reported in the literature. In future work, we aim to investigate the efficiency of this framework for other infectious diseases.
Data Availability
All data produced are available online at kaggle.com
https://www.kaggle.com/datasets/nafin59/monkeypox-skin-lesion-dataset
Funding
The authors did not receive any funding for this research.
Competing Interests
The authors declare no competing interests.