Abstract
The unintended biases introduced by optimization and machine learning (ML) models are a topic of great interest to medical professionals. Bias in healthcare decisions can cause patients from vulnerable populations (e.g., racially minoritized, low-income, or living in rural areas) to have lower access to resources and inferior outcomes, thus exacerbating societal unfairness. In this systematic literature review, we present a structured overview of the literature regarding fair decision making in healthcare until April 2024. After screening 782 unique references, we identified 103 articles within the scope of our review. We categorize the identified articles into the following three sections: algorithmic bias, fairness metrics, and bias mitigation techniques. Specifically, we identify examples of algorithmic, data, and publication bias as they are typically encountered in research and practice. Subsequently, we define and discuss the fairness metrics previously considered in the literature, including notions of fairness through unawareness, demographic parity, equal opportunity, and equal odds. Lastly, we summarize the bias mitigation techniques available in the optimization and ML literature by classifying them into pre-processing, in-processing, and post-processing approaches. Fairness in decision making is an emerging field, poised to substantially reduce social inequities and improve the overall well-being of underrepresented groups. Our review aims to increase awareness of fairness in healthcare decision making and facilitate the selection of appropriate approaches under varying scenarios.
1. Introduction
An increasing number of healthcare researchers and practitioners are leveraging quantitative methodologies to improve decision making. These methodologies aim to achieve desirable resource allocation strategies, testing procedures, or treatment protocols. Optimization and machine learning (ML) models have been proven to be extremely helpful in medical decision making and health policy [1–7]. For example, researchers have been applying optimization models to schedule patients based on their predicted no-show rates; and reinforcement learning has been used for treatment recommendation to ensure effective prescription and low mortality rate [8, 9]. However, such advanced decision-making methods can lead to inequitable outcomes as they do not give sufficient attention to underrepresented groups [10], such as those who are racially minoritized or low-income. Our survey presents a review of fair decision-making techniques and concepts in healthcare, revealing how we can leverage optimization and ML approaches to ensure equitable access to healthcare.
The unintended biases introduced by optimization and ML algorithms are of special interest to practitioners and researchers [11–14]. For example, compared with policies generated for White veterans, medical policies generated by reinforcement learning often offer Black veterans fewer opportunities to receive cardiovascular screenings. Hispanic patients are disproportionately underdiagnosed by convolutional neural networks because they tend to have limited access to healthcare resources and the convolutional neural networks cannot perform satisfyingly with inadequate data [15]. Such bias may cause decision-makers to distribute fewer medical resources to racially minoritized subgroups. Clinicians have utilized machine learning models in Warfarin dosing, which shows superior performance in European patients but unsatisfying outcome in Asian patients [16, 17].
Current literature reviews focus on the biases brought or perpetuated by ML in prediction settings. For example, Ahmad and collaborators show that ML-based predictions often yield fewer satisfying outcomes in underrepresented groups due to their insufficient data [18]. Mishler and coauthors reveal that predictors are sensitive to proportions of different populations, and incorporating fairness definitions can help avoid such issues [19]. In contrast to previous reviews centering on prediction, we focus on fairness within the context of decision making. It is worth noting that our review has some overlaps with Smith et al.’s survey [14]. The main differences between our surveys are: 1) their work focuses on fairness in reinforcement learning exclusively, while our paper explores other in-processing techniques such as mixed-integer programming and stochastic programming; 2) we review different pre-processing techniques, such as fair data transformers and natural language processing; and 3) we consider post-processing methods such as Laplacian smoothing and multi-accuracy approaches. The fair reinforcement learning methods cited in Smith et al. are included in our review for completion purposes [20–23]. However, we refer the interested reader to their review for an in-depth description of these works.
Our review begins by describing our literature search strategy. Then, we portray bias categories commonly encountered in decision making, followed by a summary of fairness metrics. Next, we present bias mitigation techniques to extend the traditional decision-making framework while achieving fair choices. Finally, we outline our survey’s contributions and limitations, as well as promising future directions.
2. Search Strategies
For our systematic review, we searched the Google Scholar database for records related to fair decision making in healthcare. The electronic search strategy used the terms “decision making” and “healthcare”, combined with one of the terms in “bias”, “fairness”, or “equity”, and one of the terms in “optimization”, “machine learning”, “deep learning”, “reinforcement learning”, “game”, or “network”. The keyword combinations we used are demonstrated in Figure 1. We included all records mentioning these keywords in the publication title, abstract, or full text. The publication language was restricted to English. The search’s last update was in April of 2024.
The titles and abstracts of the articles were screened by one investigator, then the selected manuscripts were double-checked by another researcher. Records were excluded if the publications: 1) did not address healthcare topics and could not be extended to healthcare easily; 2) did not focus on decision making (e.g., papers focusing on predictions); or 3) only included introductory text or conference abstract. Of the remaining publications on methodology or review of bias, fairness metrics, and bias mitigation methods, the full text was screened by investigator before the final discussion with researcher. The articles and surveys are categorized into three sections: bias in healthcare (section 3.1), fairness metrics (section 3.2), and fair decision-making models and algorithms (section 3.3). Discordance between authors was settled through discussion until the consensus had been achieved.
Fair decision-making concepts (i.e., types of biases, fairness metrics, or bias mitigation approaches) were extracted from the full text of the selected works by investigator. To ensure the accuracy and completeness of the extracted aspects, the selected concepts were double-checked by researchers. All chosen concepts were grouped into section 3.1, section 3.2 or section 3.3 to reflect their relations.
3. Literature Review Results
The systematic review led to 782 records; 208 of them were unrelated and could not be easily transferred to the healthcare domain, and 212 of them did not focus on decision making. Furthermore, 70 records were excluded because they were introductory text or conference abstracts. Of the remaining articles, 153 papers discussed fair decision making in healthcare without a focus on methodology or specific examples of application. In 139 papers left, we excluded 36 additional articles since they did not meet the inclusion criteria described in section 2. Of the remaining 103 papers, 21 were review papers and 82 were original papers. The flow diagram for literature review is shown in Figure 2.
Of the 103 included papers, all concepts related to fair decision-making in healthcare were extracted. We created a structured coverage of the key findings in the following sections. section 3.1 categorizes biases into three groups: 1) algorithmic bias, 2) data bias, and 3) publication bias. section 3.2 classifies fairness metrics as one of the following three types: 1) fairness through unawareness, 2) demographic parity, and 3) equal opportunity. section 3.3 categorizes bias mitigation techniques into three distinct classes: pre-processing, in-processing, and post-processing methodologies.
For the in-processing section, we describe the cutting-edge optimization and ML methods used to alleviate bias. At a high level, there are two mainstream approaches to achieving fairness in these methods: incorporating fairness in objectives or adding fairness-enhancing constraints.
3.1 Bias in Healthcare Decision Making
Biases can exist in data and algorithms, which may impede decision-making systems from generating equitable outcomes among subgroups. In this section, we summarize some sources of bias impacting decision making across healthcare domains. These biases can be categorized into one of the following classes: algorithmic bias, data bias, and publication bias. section 3.3 of our survey will summarize methods addressing these biases.
3.1.1 Algorithmic Bias
Algorithmic bias stems from computational procedure failing to consider fairness in their execution. This type of bias is a result of improper algorithmic design, which consequently may influence user behavior [24]. For example, optimization-based vaccine allocation algorithms aiming to maximize overall social welfare may exacerbate demographic disparities since underrepresented populations may have less access to vaccines under this objective [25]. Similarly, ambulance allocation models may fail to consider fairness of unit availability across different populations by solely maximizing the overall survival rate. While the survival rate may be high among the entire population, it can be low among patients from vulnerable populations, such as people with lower socioeconomic status. Algorithmic bias also exists in ML models. For instance, Samorani and coauthors have found that ML-based scheduling models have a higher likelihood of assigning Black patients to overbooked slots [26], resulting in worse service experience and longer waiting time.
3.1.2 Data Bias
Data bias refers to the unfairness generated by prejudiced data sources. Unbiased datasets are often necessary for high-quality decision-making model [27]. However, socioeconomic and racial disparities in resource availability may lead to skewed datasets [28]. Two common data biases in healthcare are aggregation biases and representation biases. Aggregation bias refers to the effect of aggregating data without considering disparities among subgroups [29]. For example, hemoglobin A1c level, a widely accepted indicator of diabetes, vary across sex and ethnicity. If we ignore the subgroup differences in the data and draw conclusions for subpopulations based on the entire population, we may introduce aggregation biases [10, 11, 30, 31]. Representation bias occurs when the data cannot represent the characteristics of all subgroups [32–36]. For instance, providers may have fewer electronic health records (EHR) for people from lower socioeconomic status as they may have limited access to electronic healthcare systems. Hence, we can expect more missing data among people from lower socioeconomic status, which indicates their data cannot demonstrate their overall characteristics. If decision-making models are built with data underrepresenting this population, the developed models may be biased against those with lower socioeconomic status [37, 38].
Another source of data bias is response bias. Response bias occurs when data are labeled inconsistently or collected by unreliable methods. Response bias frequently happens in self-reported data or surveys due to participants’ inaccurate answers. For instance, when medical students rate their mental health conditions in surveys, they tend to provide socially acceptable answers. However, such answers often do not align with their true mental conditions as students [39]. Since policymakers may harness data to make public health decisions, response bias can skew decision making [40]. Therefore, models built from data with response bias may underestimate the seriousness of medical students’ mental health problems [28].
3.1.3 Publication Bias
Publication bias happens when the researchers’ decision to publish a paper depends on the study results. Compared with studies without positive results, publishing medical studies with positive results is generally easier [41]. This phenomenon may lead to the overestimation of certain clinical treatments as evidence against a treatment is not made available. Medical practitioners may then make treatment decisions based on biased outcomes, giving rise to degraded treatment effects.
An example of publication bias occurred during the COVID-19 pandemic. The academic papers concerning COVID-19 treatment were published rapidly within this period. Most publications showed promising outcomes of COVID-19 treatments, while less satisfying research results only had a slim chance of being published [42]. When related treatments were applied to the general population, many of these treatments produced inferior outcomes compared to publication results [42]. Another example comes from the Cochrane Review on topical benzoyl peroxide, a widely used acne treatment. In 2019, Yang and coauthors found that most benzoyl peroxide studies before 2015 were still unpublished because of their negative results [43]. This finding suggests published literature could not reliably reflect the overall effect of benzoyl peroxide, and medical practitioners applying benzoyl peroxide may not have achieved the desired treatment effects.
3.2 Fairness Metrics
In this section, we present the evaluation of three types of metrics: fairness through unawareness, demographic parity and equal opportunity. To portray the ideas of different fairness metrics, we use a vaccine distribution example. For illustration purposes, we restrict our attention to only one binary sensitive attribute, denoted by A. An example of this type of attribute may be a dichotomized version of race, which includes White and people of color as its categories. The metrics covered by the section are summarized in Table 1.
X denotes entire characteristics of the population, g denotes the sensitive attribute (in our case sensitive attribute is race), X+ denotes the characteristics of qualified patients, rg1 denotes the cumulated rewards of White patients while rg2 denotes the cumulated rewards of people of color. A qualified subgroup represents a subset of the general population that may be of special interest to decision makers. For example, certain patients, such as senior citizens and those residing in areas with inadequate healthcare resources, are considered qualified patients as they may be more susceptible to the disease.
3.2.1 Fairness Through Unawareness
Fairness through unawareness is the base fairness metric [15]. It does not consider any sensitive attribute during the decision-making process. Within the context of our vaccination distribution example, fairness through unawareness means the model is considered fair if it does not consider race while deciding the vaccination distribution. However, simply ignoring sensitive attributes may not remove inequity, because other variables can be highly correlated with the sensitive traits. This method has been proven to be invalid in many cases [44].
3.2.2 Demographic Parity
The demographic parity fairness metric aims to ensure the expected reward of a decision-making model is independent of sensitive attributes [13]. Independence of sensitive attributes indicates the outcomes (i.e., expected cumulative rewards) must be equivalent in privileged and unprivileged groups [45]. In our vaccine distribution example, demographic parity ensures that, when other information is the same (e.g., age, socioeconomic status), White patients and patients of color have the same expected cumulated rewards. The problem with demographic parity is that it does not consider the population’s ground-truth qualifications. For example, suppose patients of color are more vulnerable to a disease; this indicates that these patients have a higher ground-truth qualification and thus should receive a greater expected cumulative reward from the vaccine. If we apply demographic parity in this case, we fail to consider the varied ground-truth qualification across races in decision making.
3.2.3 Equal Opportunity
Similar to demographic parity, equal opportunity verifies whether the expected cumulated rewards for privileged and unprivileged groups are the same [24]. However, demographic parity applies to the entire population, while equal opportunity applies solely to a qualified population [45]. Within our example, this metric requires that within a qualified population, the cumulated expected reward of receiving vaccine among the unprivileged group (A = people of color) is the same as the cumulated expected reward of receiving vaccine among the privileged group (A = White). However, equal opportunity fails to investigate fairness among truly unqualified people.
In summary, fairness through unawareness is the most straightforward metric but is invalid in many settings. Demographic parity evaluates if decision making is independent of sensitive attributes within the entire population. Equal opportunity measures whether decision making is independent of sensitive attributes among a qualified subgroup, which applies to smaller populations compared to demographic parity.
3.3 Bias Mitigation
In this section, we summarize different bias mitigation approaches used across healthcare domains. The methods can be categorized into pre-processing, in-processing, and post-processing. Pre-processing mechanisms clean and manipulate the input data before it is used in decision-making models [46]. In-processing methodologies refer to building unbiased algorithms directly [47]. Post-processing methods calibrate algorithmic outcomes to achieve fairness [48].
3.3.1 Pre-Processing
Datasets may be biased, which can cause skewed decisions. For instance, when a dataset is imbalanced, decisions may be biased toward subpopulations with smaller sizes [49]. Pre-processing methods can help circumvent possible biases in this setting. There are five commonly used approaches for pre-processing: reweighting the underrepresented populations, resampling, natural language processing, post-survey analysis, and fair data transformers. The identified pre-processing methods, along with their respective reference, areas of application, fairness metrics, and targeted bias categories, are summarized in Table 2.
Reweighting
Reweighting assigns greater weights to underrepresented instances [50]. Biases may be introduced to decision-making tasks if we fail to process underrepresented population’s data. Skewed data can also lead to threatening consequences for underrepresented populations. For example, African Americans and Asians have fewer instances in genome studies, which gives rise to higher misclassification rates for the two subgroups in clinical research [11]. Classification methods can investigate the features of each piece of data and use them to decide which category or label the data belongs to. Since physicians rely on classification methods for diagnosis and treatment design, varying misclassification rates for different subgroups can trigger bias in decision making. A remedy for similar issues is to assign greater weights to underrepresented instances, hence data from all populations play an equal role in the modeling process [50]. Kumar and coauthors have shown distributing more weight to underrepresented groups can improve fairness in medical image classification by 8% [51]. However, some reweighting methods, such as inverse propensity score weighting, can potentially increase biases since they calibrate the distributions of all variables simultaneously [57].
Resampling
Resampling is a technique to ensure the data is balanced (i.e., has an near-equal number of instances from each subgroup) by repeatedly drawing samples from the same data [49]. In practice, the majority groups may disproportionately outnumber the remaining groups. Using imbalanced data directly may favor the majority groups while disregarding the minority ones. To avoid this concern, we can resample from the minority groups, so the majority and minority groups have approximately the same size. For example, Chawla and collaborators deploy a synthetic minority resampling technique to decide whether a patient needs diabetes treatment. Their method successfully shrinks the gap of true positive rates between majority and minority groups [53]. However, medical data (such as EHR) are typically complex, and resampling may lead to overfitting [57]. Researchers and practitioners may avoid overfitting by using cross-validation, which provides a more accurate estimate of a model’s performance on unseen data.
Natural language processing
Natural language processing removes biased information from text data before feeding data to decision-making algorithms [58]. Due to the complexity of healthcare data, natural language processing is becoming increasingly popular in fair pre-processing settings. This step ensures the algorithms do not consider sensitive attributes during decision making. For example, Minot and coauthors identify and remove gender-related languages from EHRs by using bidirectional encoder representations. Then, they deploy classification algorithms to evaluate health conditions and give clinical suggestions. Their results show that fairness across genders improves with only a mild degradation in performance [59].
Post-survey analysis
Post-survey analysis for data bias mitigation refers to the process of analyzing survey data after its collection to identify, assess, and correct various types of biases that may have been introduced during the data collection phase [27]. For example, some respondents might misremember their health history; hence the treatment effect for them is likely to be inferior compared to respondents remembering correctly. Researchers can mitigate this by cross-referencing survey responses with medical records or by shortening the recall period to mitigate such bias [56].
Fair data transformers
Fair data transformers extract features from the input data in a fair way. Such transformers modify the input data to achieve fairness [46]. Data transformers, such as principal component analysis, are popular techniques for pre-processing data. In practice, multiple transformers are typically evaluated for a specific fairness metric (e.g., demographic parity). The transformers achieving the fairest output are used to produce inputs for ML and optimization algorithms. Biswas and collaborators have shown that a proper data transformer can significantly improve the fairness of outcomes [46]. Researchers have observed that among data transformers, selecting a subset of features can introduce unfairness [60]. Feature standardization and non-linear transformers are relatively fair transformers, although they can be biased under special conditions such as having too many outliers. These observations indicate that the appropriate transformer must be selected on a case-by-case basis. Though fair data transformers have not been deployed in healthcare to the best of our knowledge, it is easy to extend a fair data transformer to healthcare data preprocessing. For example, in the context of vaccination distribution, we can collect data such as age, sex, population density, and incidence rate in the areas where individuals live. Then, we apply several fair data transformers and feed the transformed data into the same decision-making algorithm. After the algorithm outputs vaccination distribution decisions, we can evaluate the fairness of policies and choose the data transformer that produces the fairest output.
3.3.2 In-Processing
In-processing methodologies incorporate one or more fairness metrics directly in the design of algorithms to lessen biases. These bias mitigation techniques are attracting increased attention within domains such as resource allocation, scheduling, and clinical treatment [26, 61]. Overall, we find six typically used in-processing methods in the literature: mixed-integer programming, stochastic programming, deep reinforcement learning, survival analysis, multi-objective Markov Decision Process, and constrained Markov Decision Process. Table 3 demonstrates the methods and the corresponding references and applications.
Mixed-integer programming
Mixed-integer programming has been widely used to ensure fairness in decision making [62–77]. Emergency department overcrowding has become a nationwide crisis over the last decade [62]. To resolve the overcrowding issue, researchers have applied mixed-integer programming to build fair medical resource distribution models. Mixed-integer programming is a type of constrained optimization problem that allows for both integer and continuous variables in its objective and constraints [86]. For example, Acuna and coauthors added equity constraints to ensure that the minimal quality of care for every emergency is greater than or equal to a threshold β in an ambulance allocation situation [62]. These constraints guarantee patients suffering from uncommon diseases still receive necessary clinical support. Their equity constraints are demonstrated below:
where I denotes the set of all possible diseases and i ∈ I denotes disease i. Moreover, J denotes the set of all emergency departments, j ∈ J refers to the emergency department j, q{i,j}is the quality of care for disease i offered by emergency department j, and X{i,j} is a binary decision variable. If department j provides the care for disease i, then X{i,j} =1, otherwise X{i,j} = 0. Lastly, β ∈ [0,1] is selected based on domain experts’ suggestions, where 0 denotes the worst quality and 1 the best quality.
Another ubiquitous way to fulfill fairness requirements in healthcare decision making is to modify the objective function of an optimization approach. When medical resources are scarce, people from vulnerable groups may have lower access to them. To ensure fairness towards vulnerable populations, the objective function of an algorithm can be set to maximize the smallest number of allocated resources across all population subgroups [25]. This objective ensures that each subgroup receives their required medical support.
Stochastic programming
Researchers have also leveraged stochastic programming techniques to generate in-processing bias mitigation techniques [78, 79]. These techniques optimize an objective function while representing uncertainty through probability distributions [87]. To optimize patients’ waiting time, we can add fair constraints in stochastic optimization models to limit the expected difference between the maximum waiting time and minimum waiting time. The constraint can be formularized as [78]:
Here, k = 1,2, … T denotes the time slots when decisions are made, n denotes the n-th patient, and ⍺ ≥ 0 is the threshold suggested by domain experts. The expected waiting time of the n-th patient scheduled to interval k is represented by . These constraints guarantee the expected waiting time among all patients does not vary drastically.
Deep reinforcement learning
Reinforcement learning and deep learning play a pivotal role in in-processing methods. Reinforcement learning is a type of ML where the algorithms learn to make decisions by performing actions and observing the results in an environment of interest [88]. Deep learning uses multiple neural network layers and activation functions to extract new features from the input data [89], being capable of recapitulating and modeling complex patterns in data. Deep reinforcement learning is a combination of deep learning and reinforcement learning. Deep reinforcement learning can be used to solve Markov Decision Process (MDP) models. An MDP is a mathematical framework used for modeling decision making in situations where outcomes are partly random and partly under the control of a decision-maker [90]. In practice, an MDP may encompass a massive number of system configurations (i.e., states), becoming computationally intractable by traditional reinforcement learning methods. However, deep reinforcement learning can take advantage of deep learning to represent a policy (i.e., sequence of procedures for decision making at each state) as a neural network and learn to find a policy that optimizes model outcomes (i.e., rewards) [80–83]. Yang et. al redefine the rewards of deep reinforcement learning to achieve fairness [21]. In their approach, the absolute value of rewards of a certain subgroup are smaller if the size of the group is large. The reward function is demonstrated below:
Here, st denotes the state at time t, ap denotes the diagnosis of the model for a person in group p, and lp denotes the ground-truth disease of the patient from group p. The parameter λp is the reward of group p adjusted by its size. Specifically, a positive reward is given if the agent gives the correct diagnosis, and a negative reward is given otherwise. The authors require the absolute reward for minorities becomes greater than the absolute reward of majorities. This definition of rewards helps the solution approach give more attention to minority groups.
Fair survival analysis
Fair survival models provide an additional tool for decision making in healthcare settings [84]. Traditional survival analysis estimates the time until an event of interest [91]. Fair survival models incorporate event probabilities and fairness violations. The objective of the fair model is below:
where Lx(β)is the log-likelihood of a Cox proportional-hazards model that measures the probability of getting a disease during a certain period and Fx(β)is the fairness penalty. Moreover, λ is the weight of the fairness penalty in the objective. The difference between the highest and lowest probabilities of disease incidence within a cohort is utilized as the metric for evaluating fairness. Then, they feed the input data to train the model (i.e., learn the parameters β to optimize the objective). The outcome is used to generate a waitlist of patients, which decides the sequence of resource allocation. Their numerical experiment shows the fair survival model can substantially boost the group disease risk range.
Multi-objective Markov Decision Process
While it has not been applied to healthcare settings to the best of our knowledge, the multi-objective MDP is a promising approach to alleviate the potential effect of bias. This model is an extension of the traditional MDP with the difference that the reward function depends on a utility objective and a fairness objective [92]. Ge and coauthors have applied the Pareto frontier to identify the policy that optimizes both utility and fairness elements [85]. The modified reward function is:
where R(s, a) is a reward vector containing rewards r for all objectives after taking the action a at state s, and w is the weight for each objective. They apply reinforcement learning to learn the weight w. Their result shows that there exists a trade-off between utility and fairness performance, and we can choose the final policy based on user preferences [85]. Though multi-objective MDP has not been applied to fair decision making in healthcare to the best of our knowledge, it is possible to deploy these methods to generate fair clinical decisions. For example, if we need to guarantee similar vaccination rates between males and females, we can add this fairness objective into our model. The Pareto frontier can return optimal policies that consider both vaccine utility and distribution fairness.
Constrained Markov Decision Process
The Constrained Markov Decision Process (CMDP) is another prospective direction. Compared to traditional MDP, CMDP can accommodate fair constraints in decision making [93]. In CMDP, we can formulate the cost function regarding fairness, and then choose policies leading to fairness cost less or equal to the threshold [20]. The constraint can be formulated as:
where Ctdenotes the fairness cost at time t, γ ∈ (0,1) is a discounted factor representing the fairness violations at the current time over the future, and d denotes the threshold for accumulated discounted fairness cost. This model has not been utilized in fair healthcare decision making to the best of our knowledge. However, we can model fairness metrics as constraints, and choose a set of policies that satisfy these constraints. Afterward, we can investigate which policy in this set gives the optimal discounted accumulated reward.
3.3.3 Post-processing
Post-processing methods calibrate algorithmic outcomes to achieve fairness. We identify the following post-processing methods relevant to healthcare applications: Laplacian smoothing, multi-accuracy approaches, and expert systems. The post-processing bias mitigation methods included in this review are shown in Table 4.
Laplacian smoothing method
The Laplacian smoothing method is a technique to reduce the noise of the data while preserving the important characteristics of the solution technique [97]. For instance, we can take advantage of this method to guarantee comparable results among similar individuals while preserving the performance (i.e., satisfying loss) of the algorithms [48]. Researchers have shown this technique may improve outcome consistency by approximately significantly. The Laplacian smoothing method can be extended to healthcare. For example, after a reinforcement learning algorithm produces treatment plans, a Laplacian smoothing method can guarantee comparable treatment plans are assigned to similar patients.
Multi-accuracy approaches
We can also apply multi-accuracy approaches to combine several weak learners to achieve high accuracy rates among all subpopulation groups [94]. These methods play a vital role in classification techniques by assigning larger weights to samples identified incorrectly in weak learners. Subsequently, the following weak learners pay extra attention to misidentified samples and adjust their results accordingly [98]. This post-processing technique can improve the accuracy rate of subgroups with the worst classification error, which shows a promising future for complex problems such as population health assessment. With accurate and fair classification for target populations, physicians can deploy algorithm-based treatment design to achieve desirable medical outcomes.
Expert systems
Lastly, the clinical expertise of medical practitioners may help increase the fairness of algorithms [96, 99, 100]. For example, reinforcement learning techniques can suggest several near-equivalent actions, then we can rely on clinicians to decide what actions can lead to the fairest outcome [22]. This approach may enable improved decisions to overcome potential biases while leveraging practitioners’ experience.
The distribution of bias mitigation techniques across the identified papers is demonstrated in Figure 3. Moreover, we include the distribution of papers across areas of application in Figure 4.
4. Conclusion and Future Research Directions
Compared to traditional decision-making techniques, fair decision-making approaches attempt to yield near-optimal and equitable outcomes. This review summarized the state-of-the-art fair decision-making approaches in healthcare settings. We found that even though a plethora of fairness methods has been proposed, most of them focus on prediction rather than decision making, and our survey bridges this gap. First, we presented different categories of biases for data and models. Then, we described multiple fairness metrics that have been used in existing literature. One of the main contributions of our review is that we categorized the literature on decision making in healthcare into pre-processing, in-processing, and post-processing bias mitigation methods. We elaborated on the high-level ideas and examples for methodologies mentioned in our survey. Another important contribution of this systematic review is that we summarized multiple fairness metrics and pointed out their use across applications. Lastly, we explored multiple bias mitigation technologies that have not been applied in healthcare and illustrated how they may be employed in healthcare settings.
Since the most relevant research projects to this review were conducted in the United States, most examples in our paper are cases in this country. Thus, this review may not sufficiently reflect the reality in other parts of the world. Additionally, we might have missed keywords during our literature review, leading to the omission of works that used excluded terms. Finally, we limited our search to articles in English, so we were unable to capture potentially insightful publications in other languages.
Several areas are worth exploring for future research directions. The first promising field is algorithm explainability. Many decision-making algorithms in healthcare are considered black boxes that are hard to understand. The lack of explainability is an obstacle for practitioners to identify if the model is relying on biased features [89]. Explainable models can resolve this concern since they can reveal underlying structures in a clear way, contributing to removing potential decision biases. Another related emerging field is the combination of interpretability and fairness. Fair interpretable models guarantee the algorithmic outputs align with professionals’ instincts. While increased interpretability can win more trust among practitioners, it may hurt the model’s fairness. Hence, we need to consider how to strike a balance between fairness and interpretability [101].
Another promising field is the study of context-aware fairness metrics. Researchers have found that different fairness metrics can be incompatible. Thus, we cannot expect a model to satisfy all fairness metrics [98]. In such contexts, it is critical to understand which type of metric we should consider in a specific circumstance. Identifying the best fairness metric for a specific problem will likely require cooperation between modelers and domain experts [102]. Exploring the combination of multiple fairness metrics in decision making is also a potential direction, allowing algorithms to satisfy multiple fairness requirements simultaneously.
It is also worthwhile to bridge the gap between prediction and fair decision making. Current research usually follows a “prediction then optimization” pipeline, but innovative approaches can be explored to incorporate the decision error induced by prediction into the objective function of optimization [103]. These approaches have the potential to achieve fair prediction and optimization simultaneously.
Given the growing importance of decision-making approaches in healthcare, fairness considerations and bias-mitigation approaches are increasingly vital. Our survey may aid practitioners in 1) understanding potential sources of biases in decision making; 2) choosing the appropriate fairness metric to evaluate decision-making models; and 3) selecting the appropriate pre-processing, in-processing, and post-processing techniques to reduce bias. In conclusion, this survey sheds light on the current state and challenges of fair decision-making in healthcare, highlighting the crucial need for continuous improvement in policies and practices to ensure equitable healthcare outcomes for all individuals.
Data Availability
All data produced are available online at Google Scholar.
Footnotes
Zequn Chen, M.S.; zequn.chen.th{at}dartmouth.edu; (603) 646-3457; 15 Thayer Dr, Hanover, NH 03755; Wesley J. Marrero, Ph.D.; wesley.marrero{at}dartmouth.edu; (603) 646-3457; 15 Thayer Dr, Hanover, NH 03755
We update the fairness metrics session and bias mitigation session by adding formulas and examples of applications to both.
References
- 1.↵
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.
- 34.
- 35.
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.
- 53.↵
- 54.
- 55.
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.
- 64.
- 65.
- 66.
- 67.
- 68.
- 69.
- 70.
- 71.
- 72.
- 73.
- 74.
- 75.
- 76.
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.
- 82.
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵
- 93.↵
- 94.↵
- 95.
- 96.↵
- 97.↵
- 98.↵
- 99.↵
- 100.↵
- 101.↵
- 102.↵
- 103.↵