Generative AI Mitigates Representation Bias and Improves Model Fairness Through Synthetic Health Data

Nicolo Micheletti; Raffaele Marchesi; Nicholas I-Hsien Kuo; Sebastiano Barbieri; Giuseppe Jurman; Venet Osmani

doi:10.1101/2023.09.26.23296163

Abstract

Representation bias in health data can lead to unfair decisions, compromising the generalisability of research findings. As a consequence, underrepresented subpopulations, such as those from specific ethnic backgrounds or genders, do not benefit equally from clinical discoveries. Several approaches have been developed to mitigate representation bias, ranging from simple resampling methods, such as SMOTE, to recent approaches based on generative adversarial networks (GAN). However, generating high-dimensional time-series synthetic health data remains a significant challenge. In this work we propose a novel CA-GAN architecture that synthesises authentic, high-dimensional time series data. CA-GAN outperforms state-of-the-art methods in a qualitative and a quantitative evaluation while avoiding mode collapse, a serious GAN failure. We evaluate our CA-GAN’s generalisability in mitigating representation bias and improving model fairness for Black patients, as well as female patients. We perform evaluation using two diverse, real-world clinical datasets, comprising 7535 patients with hypotension and sepsis. Finally, we show that CA-GAN generates authentic data of the minority class while faithfully maintaining the original distribution of data, resulting in improving performance in a downstream predictive task.

1 Introduction

Clinical practice is poised to benefit from developments in machine learning as data-driven digital health technologies transform health care [1]. Digital health can catalyse the World Health Organisation’s (WHO) vision of promoting equitable, affordable, and universal access to health and care [2]. However, as machine learning methods increasingly weave themselves into the societal fabric, critical issues related to fairness and algorithmic bias in decision-making are coming to light. Algorithmic bias can originate from diverse sources, including socio-economic factors, where income disparities between ethno-racial groups are reflected in algorithms deciding which patients need care [3]. Bias can also originate from the underrepresentation of particular demographics, such as ethnicity, gender, and age in the datasets used to develop machine learning models, known as health data poverty [4]. Health data poverty impedes underrepresented subpopulations from benefiting from clinical discoveries, compromising the generalisability of research findings and leading to representation bias that can compound health disparities.

The machine learning community has developed several approaches to mitigate representation bias, with data resampling being the most widely used. Oversampling generates representative synthetic data from the underrepresented subpopulation (minority class), resulting in a similar or equal representation. Synthetic Minority Over-sampling TEchnique (SMOTE) [5] is a representative example of this method, where synthetic samples lie between a randomly selected data sample and its randomly selected neighbour (using k-nearest neighbour algorithm [6]). SMOTE and related methods [7–9] are popular approaches due to their simplicity and computational efficiency.

However, SMOTE, when used with high-dimensional time-series data, may decrease data variability and introduce correlation between samples [10–12]. In response, alternative approaches based on Generative Adversarial Networks (GAN) are gaining ground [13–17]. GANs have shown incredible results in generating realistic images [18], text [19], and speech [20] in addition to improving privacy [21]. However, while GANs address some of the issues of SMOTE-based approaches, the generation of high-dimensional time-series data remains a significant research challenge [22–24].

To address this challenge, we propose a new generative architecture called Conditional Augmentation GAN (CA-GAN). Our CA-GAN extends Wasser-stein GAN with Gradient Penalty [25, 26], presented in the Health Gym study [27] (referred to in this paper as WGAN-GP*). However, our work has a different objective. Instead of generating new synthetic datasets, we condition our GAN to augment the minority class only, while maintaining correlations between the variables and correlations over time, in contrast to the recent work [28]. As a result, CA-GAN captures the distribution of the overall dataset, including the majority class. We compare the performance of our CA-GAN with WGAN-GP* and SMOTE in generating synthetic data of patients of an under-represented ethnicity (Black patients in our case) as well as gender (female). We use two critical care datasets comprising acute hypotension (n=3343) and sepsis (n=4192), resulting in 7535 patients overall.

Our datasets include both categorical and continuous variables with diverse distributions and are derived from the well-studied MIMIC-III critical care database [29]. While these datasets were chosen as a proof of concept, our architecture is disease agnostic.

Our work makes the following contributions: (1) We propose a new CA-GAN architecture that addresses the shortcomings of traditional and the state-of-the-art methods in generating high-dimensional, time-series, synthetic data, using two real-world datasets. (2) Our multi-metric evaluation using qualitative and quantitative methods demonstrate superior performance of CA-GAN with respect to the state of the art architecture, while avoiding mode-collapse, a significant GAN failure. (3) We evaluate our CA-GAN against SMOTE, a naive but effective and popular resampling method, however limited in the synthesis of authentic data. (4) We show the impact of synthetic data augmentation in improving model performance for the minority (underrepresented) class, resulting in a fairer model between Black and White ethnicities. (5) We also show that CA-GAN can synthesise realistic data of diverse ethnicities and genders to augment the real data, improving the performance in a downstream predictive task.

2 Results

We primarily focus on multi-metric evaluation of synthetic data generated by our CA-GAN architecture in comparison to the data generated by state-of-the-art WGAN-GP* architecture and the popular SMOTE approach. We provide a separate analysis on the impact of synthetic data generated by our architecture to mitigate representation bias and improve model fairness in Section 2.5. Considering significant challenges in evaluating generative models in general, [30], and high-dimensional time-series data in particular [23], we adopted a holistic approach to evaluating our work based on both qualitative and quantitative methods. We present the results of the data generated comparing the performance of the three methods in augmenting the underrepresented (minority) class, namely Black ethnicity (shown in this section) and female gender (shown in Appendix G).

2.1 Qualitative evaluation

To gain initial insights into the obtained results, we conduct a qualitative evaluation employing visual representation methods that show the coverage of synthetic data with respect to the real data. We use Principal Component Analysis (PCA) to project the real and synthetic data onto a two-dimensional space. We also use t-distributed Stochastic Neighbor Embedding (t-SNE) [31] to plot both real and synthetic datasets in a two-dimensional latent space while preserving the local neighbourhood relationships between data points. To compare the performance between the methods and ensure a consistent visualisation of the real data, we have computed a common t-SNE embedding.

In Appendix H we present the results of Uniform Manifold Approximation and Projection (UMAP) [32], which offers better preservation of the global structure of the dataset when compared to t-SNE. The parameters of t-SNE and UMAP are the same for all three methods as shown in Appendix C.

The results are illustrated in Figure 1 (acute hypotension) and Figure 2 (sepsis). Synthetic data generated by CA-GAN exhibits significant overlap with the real data, indicating our model’s ability to accurately capture the underlying structure of real data. This is especially evident in PCA, where the representations reveal that the synthetic data generated by CA-GAN provides the best overall coverage of the real data distribution. Further evidence is provided from the marginal distributions. For the acute hypotension dataset (Figure 1), both WGAN-GP* and SMOTE show evidence of mode collapse (evident also in the t-SNE plots), where synthetic data is generated from a limited space. Similarly, for the sepsis dataset (Figure 2), CA-GAN covers more of the real data distribution compared to SMOTE, while WGAN-GP* again tends towards mode collapse.

Fig. 1:

Two-dimensional representations of the acute hypotension dataset for Black patients, including marginal distributions of the principal components. Top panels: PCA two-dimensional representation of real (red) and synthetic (blue) data, where CA-GAN provides the best overall coverage of real data distribution, while SMOTE and WGAN-GP* show evidence of reduced coverage and mode collapse. Bottom panels: t-SNE two-dimensional representation of real data (red) and synthetic data (blue) for the three methods SMOTE, WGAN-GP*, and CA-GAN. It can be seen that CA-GAN more uniformly covers the real distribution, while SMOTE does not cover a significant part of it (top right in the panel) and WGAN-GP* coverage is almost completely separated from the real data.

Fig. 2:

Two-dimensional representations of the sepsis dataset for Black patients, including marginal distributions of the principal components. Top panels: PCA two-dimensional representation of real (red) and synthetic (blue) data, where CA-GAN provides more coverage than SMOTE (especially in the top right and bottom left part of the panel), while WGAN-GP* provides the lowest coverage. Bottom panels: t-SNE two-dimensional representation of real data (red) and synthetic data (blue) for the three methods SMOTE, WGAN-GP*, and CA-GAN. It can be seen that SMOTE follows an interpolation pattern, while CA-GAN expands into latent space, generating authentic data points while remaining within the clusters identified by t-SNE. Data generated by WGAN-GP* fall outside of the real data.

Figures 1 and Figure 2 in the bottom panels show the t-SNE representations of the real and synthetic data. For the acute hypotension dataset, CA-GAN more uniformly covers the distribution of real data, while SMOTE does not cover a significant part of it (top right in the panel). This is also evident from the marginal distributions. WGAN-GP* coverage is almost completely separated from the real data. For the sepsis dataset, t-SNE shows that SMOTE follows an interpolation pattern failing to expand into the latent space. In contrast, CA-GAN successfully expands the distribution into the latent space, generating authentic data points, while remaining within the clusters identified by t-SNE. Data generated by WGAN-GP* fall outside of the real data.

Figures 1 and 2 provide evidence that state of the art WGAN-GP* appears to suffer from mode collapse, a significant limitation of GANs [33]. Mode collapse occurs when the generator produces a limited variety of samples despite being trained on a diverse dataset. The generator cannot fully capture the complexity of the target distribution, limiting the quantity of generated samples and resulting in repetitive output. This is because the generator can get stuck in a local minimum where a few outputs are repeatedly generated, even though the training data contains more modes that can be learned. This presents a significant challenge in generating high-quality, authentic samples, while our CA-GAN model overcomes this limitation.

The evidence that CA-GAN captures accurately the underlying structure of real data is further reinforced based on joint distribution of variables, which we show in Appendix D. In Appendix H we also present the UMAP latent representation of the data, which preserves the global structure in Figures H6 (acute hypotension) and H7 (sepsis).

Finally, we show the distribution of individual variables of synthetic data overlaid on the distribution of the real data. We use this to compare the performance of our method with state of the art WGAN-GP* as well as SMOTE as the baseline method, using acute hypotension dataset in Figure 3 and sepsis in Appendix A. Joint distributions are shown in Appendix D). The distribution of synthetic data generated by our CA-GAN exhibits the closest match to that of the real data. This close alignment is particularly evident in variables related to blood pressure, including MAP, diastolic, and systolic measurements. However, certain variables, such as urine and ALT/AST, pose challenges for all three methods. These variables have highly skewed, non-Gaussian distributions with long tails, making them difficult to transform effectively using power or logarithmic transformations. In contrast, our CA-GAN and WP-GAN* effectively capture the distribution of categorical variables.

Fig. 3:

Distribution plots of each variable, overlaying real and synthetic data for acute hypotension dataset. Distribution of variables related to blood pressure (MAP, diastolic and systolic) is captured well by our method in comparison to WGAN-GP* and SMOTE. CA-GAN performs better also for categorical variables, while all the three methods struggle with variables with long tail, non-normal distributions.

Conversely, SMOTE encounters difficulties with several variables, including both the numeric variable of urine and the categorical variable of the Glasgow Coma Score (GCS). These observations are also reflected in the quantitative evaluation in Section 2.2. The variables in the sepsis dataset are not only more than twice as many as those in acute hypotension but also have more complex distributions. Variables such as SGOT, SGPT, total bilirubin, maximum dose of vasopressors, and others have extremely long tails. The three methods struggle to generate these kinds of distributions and show a tendency to converge to the median value. In contrast, the behaviour is similar to acute hypotension for categorical and numerical variables normally distributed.

2.2 Quantitative evaluation

We used Kullback-Leibler (KL) divergence [34] to measure the similarity between the discrete density function of the real data and that of the synthetic data. For each variable v of the dataset, we calculate: where Q_v is the true distribution of the variable and P_v is the generated distribution. The smaller the divergence, the more similar the distributions; zero indicates identical distributions. The left half of Tables 1a and 1b show the results of the KL divergence for each variable. Our CA-GAN method has the lowest median across all variables for acute hypotension and sepsis data compared to WGAN-GP* and SMOTE. This is despite the fact that SMOTE is specifically designed to maintain the distribution of the original variables.

View this table:

Table 1: KL-Divergence and Maximum Mean Discrepancy between the distribution of real and synthetic data for each variable of the datasets.

In addition, we used Maximum Mean Discrepancy (MMD) [35] to calculate the distance between the distributions based on kernel embeddings, that is, the distance of the distributions represented as elements of a reproducing kernel Hilbert space (RKHS). We used a Radial Basis Function (RBF) Kernel: with σ = 1. The right half of Tables 1a and 1b shows the MMD results for SMOTE, WGAN-GP* and our CA-GAN. Again, our model has the best median performance across all the variables for acute hypotension data, while for sepsis data, SMOTE shows a difference in performance by 0.00028. In summary, CA-GAN performs best in the acute hypotension dataset by a wide margin while showing comparable performance with SMOTE in the sepsis dataset.

2.3 Variable correlations

We used the Kendall rank correlation coefficient τ [36] to investigate whether synthetic data maintained original correlations between variables found in the real data of acute hypotension and sepsis datasets. This choice is motivated by the fact that the τ coefficient does not assume a normal distribution, which is the case for some of our variables, of the sepsis dataset in particular (as shown in Figure 3 and Appendix A). Figure 4 shows the results of Kendall’s rank correlation coefficients. For the acute hypotension dataset (Figure 4a), CA-GAN captures the original variable correlations, as does SMOTE, with the former having the closest results on categorical variables, while the latter on numerical ones. WGAN-GP* shows the worst performance, accentuating correlations that do not exist in real data. Similar patterns are also obtained for the variables of patients with sepsis in Figure 4b.

Fig. 4:

Kendall’s rank correlation coefficients for the real data and the data generated with CA-GAN, WGAN-GP*, and SMOTE.

2.4 Synthetic data authenticity

When generating synthetic data, the output must be a realistic representation of the original data. Still, we also need to verify that the model has not merely learned to copy the real data. GANs are prone to overfitting by memorising the real data [37]; therefore, we use Euclidean Distance (L₂ Norm) to evaluate the originality of our model’s output. Our analysis shows that the smallest distance between a synthetic and a real sample is 52.6 for acute hypotension and 44.2 for sepsis, indicating that the generated synthetic data are not a mere copy of the real data. This result, coupled with the visual representation of CA-GAN (shown in Figure 1 and 2), illustrates the ability of our model to generate authentic data. SMOTE, which by design interpolates the original data points, is unable to explore the underlying multidimensional space. Therefore its generated data samples are much closer to the real ones, with a minimum Euclidean distance of 0.0023 for acute hypotension and 0.033 for sepsis. A summary of distance metrics, including median, mean, standard deviation, maximum and minimum is shown in Appendix E, Tables E3a and E3b.

2.5 Improving model fairness

Having evaluated the quality of synthetic data generated by our CA-GAN, we now focus on evaluating the potential of synthetic data in improving model fairness. Machine learning literature has highlighted many examples where datasets with class imbalance lead to unfair decisions for the minority class [38]. This is especially important when those decisions significantly affect the health of patients and the society at large, as is the case with our study. We measure fairness by comparing model performance across subgroups, based on the approach described in [39]. In this respect we have carried out an analysis to understand the performance of a predictive model within Black patients and White patients separately and the impact of synthetic data. For this purpose we chose the task of predicting lactate within each ethnic subgroup, based on our previous work [40]. Lactate is essential in guiding clinical decision making for patients with sepsis and ultimately affects patients’ survival [41, 42]. Therefore, any differences between Black and White patients in lactate prediction performance would result in potentially unfair treatment decisions due to data representation bias. Based on our work in [40] we trained an LSTM classifier to predict whether the outcome of the last lactate value in the time series of the patients was above a critical threshold, using as input the previous observations of the clinical variables in our dataset. The model was validated on real data only, using a stratified 3-fold cross validation with 10 repetitions. Initially we used only the real (original) sepsis dataset containing around 11% of Black patients. The predictive performance of an LSTM model within the Black patient cohort was AUC of 0.65. This is in contrast to AUC of 0.70 within the White patient cohort. For comparison the overall performance of the model when including both ethnicities was 0.70. Then we augmented the original Black patient cohort with synthetic data (conditioned on Black patients only) generated by our CA-GAN method, such that the representation between Black and White patients was equal. As a consequence, the predictive performance within Black patient cohort increased to AUC of 0.69, while maintaining the AUC of 0.70 for the White patients cohort and the full dataset with both ethnicities. Synthetic data augmentation resulted in a fairer model between ethnicities, with a statistically significant difference between non-augmented and augmented datasets.

2.6 Downstream regression task

Finally, we also sought to evaluate the ability of CA-GAN to maintain the temporal properties of time series data, considering ethnicity (shown here) and gender (shown in Appendix G). Since our objective is to augment the minority class to mitigate representation bias, we wanted to verify that the datasets augmented with synthetic data generated by our model can maintain or improve the predictive performance of the original data on a downstream task. Initially, we trained only a Bidirectional Long Short-Term Memory (BiLSTM) with real data as the baseline. Later, we trained the BiLSTM with synthetic and augmented datasets separately, considering ethnicity and gender diversity. They were used to evaluate the performance in a regression task. To address the inherent randomness in the models, we used CA-GAN to create five different synthetic datasets. For each of these datasets, we trained five BiLSTM models, resulting in a total of 25 BiLSTM models (5 datasets × 5 models per dataset). A complete description of how we performed these experiments is provided in Appendix F.

Table 2a and Table 2b show the mean absolute errors between the BiLSTM prediction and the actual acute hypotension and sepsis observations, respectively. In the first column, we show the results achieved using only the real data to make the predictions; in the second column, the results using only the synthetic data; and finally, in the third column, the results achieved by predicting with the augmented dataset, that is, with both the real and synthetic data together, considering both ethnic and gender diversity in the augmentation process. Overall, adding the synthetic data reduces the predictive error. This indicates that the temporal characteristics of the data generated by our CA-GAN model are close enough to those of the real data to maintain the original predictive performance. Thus, the augmented dataset could be used in a downstream task, mitigating the representation bias. Similar findings are also shown for the gender-augmented dataset in Appendix G.

View this table:

Table 2: Mean prediction errors of a BiLSTM trained on real, synthetic, and augmented data for a downstream prediction task. The numbers in parentheses represent the standard deviation

It should be noted that the errors in Input Fluids Total and Total Urine Output are exceptionally high compared to the other variables. This is because predicting these variables is generally challenging, stemming partly from how they are collected and recorded rather than an issue inherent to synthetic data generation.

3 Discussion

As machine intelligence scales upwards in clinical decision-making, the risk of perpetuating existing health inequities increases significantly. This is because biased decision-making can continuously feed back the data used to train the models, creating a vicious circle that further ingrains discrimination towards underrepresented groups. Representation bias, in particular, frequently occurs in health data, leading to decisions that may not be in the best interests of all patients, favouring specific subpopulations while treating underrepresented sub-populations, such as those with standard set characteristics including ethnicity, gender, and disability unfavourably.

To address these issues, representation must be improved before algorithmic decision-making becomes integral to clinical practice. While unequal representation is a multifaceted challenge involving diverse factors such as socio-economic, cultural, systemic, and data, our work represents a step towards addressing one significant facet of this challenge: mitigating existing representation bias in health data.

We have shown that our work can generate high-quality synthetic data when evaluated against state-of-the-art architectures and traditional approaches such as SMOTE. SMOTE has notable advantages over other data generation techniques as it requires no training and can work with smaller datasets. It can mirror non-normal distributions even if it tends to overestimate the median in long-tail distributions. This is in contrast to GANs, which struggle with these types of distributions. However, generating authentic data remains a significant challenge for SMOTE, especially important when considering confidentiality of data and patients’ privacy.

Through qualitative and quantitative evaluation, we have shown that CA-GAN can generate authentic data samples with high distribution coverage, avoiding mode collapse failure, while ensuring that the generated data are not copies of the real data. We have also shown that augmenting the dataset with the synthetic data generated by CA-GAN leads to lower errors in the downstream regression task. This indicates that our model can generalise well from the original data.

A notable advantage of our approach is that it uses the overall dataset, and not only the minority class, as is the case with WGAN-GP* and SMOTE. This means that CA-GAN can be applied in smaller datasets and those with highly imbalanced classes, such as rare diseases.

Furthermore, we evaluated our method on two datasets with diverse characteristics and found that our CA-GAN performed better on the acute hypotension dataset. This may be because some of the numerical variables in the sepsis dataset have long-tailed distributions, presenting a modelling challenge for all the methods. Similar challenges are observed for variables with non-normal distributions. Additionally, the sepsis dataset contained fewer data points per patient than acute hypotension (15 versus 48 observations). A shorter input sequence may have created difficulty for BiLSTM modules to learn the underlying structure of the original data effectively, coupled with a higher number of variables (twice as many) in the sepsis dataset. We also note that the lower number of patients in the acute hypotension dataset does not impact the generative performance of our method. This ability to work with fewer data points (patients) is encouraging, given the overall objective of our goal of augmenting representation.

The task of generalising to unseen data categories is particularly challenging due to the inherent unknowns these categories represent. While a definitive strategy is yet to be developed, we believe that integrating conditional generation with metric learning, as seen in prototypical networks [43], could provide a dual advantage. This integration could not only facilitate the generation of novel data points but also offer a quantifiable framework to assess their similarity or dissimilarity to known data categories. Such an approach could extend the capabilities of CA-GAN beyond data augmentation, potentially improving the interpretability and applicability of the synthetic data generated.

In terms of evaluation metrics, our current approach primarily focuses on statistical properties of the generated data. However, to ascertain the practical utility and accuracy of the synthetic data produced by CA-GAN, we recognise the importance of domain-specific validations, including performance in subpopulations [44, 45]. Inspired by the collaborative efforts outlined in [46], we are committed to exploring partnerships with experts in relevant fields such as healthcare and clinical practice to guide the development of evaluation metrics. Their input will ensure that our synthetic datasets can meet the rigorous demands of real-world applications and contribute meaningfully to the domains they are intended to serve.

Our architecture can provide a solid basis to generate privacy preserving synthetic data and mitigate barriers to access clinical data. This is because, we have ensured that the synthetic data generated by our CA-GAN are not a mere copy of the real data on one hand, while on the other, the synthetic data reflect the distributions of the real data. However, privacy preserving aspects will require additional analysis, such as reconstruction attacks, which are beyond the scope of the current work.

While CA-GAN architecture showed superior performance with respect to state of the art method as well as computationally inexpensive approaches, some limitations are present. Namely, CA-GAN may require additional optimisations to further increase performance on datasets with variables with non Gaussian distributions and those with long-tailed distributions. Furthermore, additional analysis will be required to evaluate the generalisation capability of our architecture with datasets of different characteristics. In this respect we aim to refine CA-GAN while exploring alternative architectures (such as Diffusion Models) in addressing some of these limitations. One approach might be using convolutional neural networks (CNNs) or Temporal Convolutional Networks (TCNs) as a promising direction to potentially improve the efficiency of our model. The work of Bai et al. [47] provides an empirical foundation for this approach, indicating that such network structures can rival the performance of recurrent networks for sequence modelling tasks. Additionally, prior research [48] suggests that simplifying the internal mechanisms of these recurrent units can lead to improvements in both performance and computational efficiency. These insights provide a strong impetus for our future work, where we aim to refine the CA-GAN model to harness the benefits of these alternative architectures without compromising its ability to perform conditional generation.

Finally, we are aware that the use of synthetic data may generate several ethical and policy implications including the fact that synthetic data cannot fully address the historical biases and discriminatory practices which are often reflected in the data [49]. While our work can mitigate existing representation biases, we must ensure that this does not come at the risk of disincentivising participation of underrepresented groups or perpetuating other types of data biases [50, 51]. Finally, while we showed the utility of synthetic data, we also note that the findings should always be confirmed using real data.

4 Methods

We begin by formally formulating the problem we are addressing. Then we discuss the data sources we used to train our models and compare and contrast Generative Adversarial Networks (GANs) and Conditional Generative Adversarial Networks (CGANs). We also provide an in-depth analysis of the baseline model for this work, WGAN-GP*. Finally, we present the architecture of our proposed Conditional Augmentation GAN (CA-GAN) and discuss its advantages over other methods.

4.1 Problem Formulation

Let A be a vector space of features and let a ∈ A represent a feature vector. Let L = {0, 1} be a binary distribution modifier, and let l a binary mask extracted from L. We consider a data set with l = 0, where individual samples are indexed by n ∈ {1, …, N } and we also consider a data set with l = 1, where individual samples are indexed by m ∈ {N + 1, …, N + M }, and N > M. We define the training data set D as D = D₀ ∪ D₁.

Our goal is to learn a density function that approximates the true distribution d{A} of D. We also define as with l = 1 applied.

To balance the number of samples in D, we draw random variables X from and add them to D₁ until N = M. Thus, we balance out D.

4.2 Data sources, variables and patient population

Our analysis uses two datasets extracted from the MIMIC-III database. The detailed data pre-processing steps are outlined in our previous publication [27], in Section 1.2 and Section 1.3 of the supplementary material. We chose these two datasets as they have already been used in the study describing WGAN-GP* [27] making the comparison with our method fairer. We decided to test the methods for the oversampling of only one minority class, thus including only patients that belonged to the White (coded as Caucasian) or Black ethnic groups. We used a similar approach for gender, shown in Appendix G.

The acute hypotension dataset comprises 3343 patients admitted to critical care; the patients were either of Black (395) or White (2948) ethnicity. Each patient is represented by 48 data points, corresponding to the first 48 hours after the admission, and 20 variables, namely nine numeric, four categorical, and seven binary variables. Details of this dataset are presented in Appendix B, Table B1.

The Sepsis dataset comprises 4192 patients admitted to critical care of either Black (461) or White (3731) ethnicity. Each patient is represented by 15 data points, corresponding to observations taken every four hours from admission, and 44 variables, namely 35 numeric, six categorical, and three binary variables. Details of this dataset are presented in Appendix B, Table B2.

4.3 GAN vs CGAN

The Generative Adversarial Network (GAN) [52] entails two components: a generator and a discriminator. The generator G is fed a noise vector z taken from a latent distribution p_z and outputs a sample of synthetic data. The discriminator D inputs either fake samples created by the generator or real samples x taken from the true data distribution p_data. Hence, the GAN can be represented by the following minimax loss function:

The goal of the discriminator is to maximise the probability of discerning fake from real data, whilst the purpose of the generator is to make samples realistic enough to fool the discriminator, i.e., to minimise .As a result of the reciprocal competition, both the generator and discriminator improve during training.

The limitations of vanilla GAN models become evident when working with highly imbalanced datasets, where there might not be sufficient samples to train the models to generate minority-class samples. A modified version of GAN, the Conditional GAN [53], solves this problem using labels y in both the generator and discriminator. The additional information y divides the generation and the discrimination in different classes. Hence, the model can now be trained on the whole dataset to generate only minority-class samples. Thus, the loss function is modified as follows:

GAN and CGAN, overall, share the same significant weaknesses during training, namely mode collapse and vanishing gradient [33]. In addition, as GANs were initially designed to generate images, they have been shown unsuitable for generating time-series [54] and discrete data samples [55].

4.4 WGAN-GP*

The WGAN-GP* introduced by Kuo et al. [27] solved many of the limitations of vanilla GANs. The model was a modified version of a WGAN-GP [25, 26]; thus, it applied the Earth Mover distance (EM) [56] to the distributions, which had been shown to solve both vanishing gradient and mode collapse [57]. In addition, the model applied the Gradient Penalty during training, which helped to enforce the Lipschitz constraint on the discriminator efficiently. In contrast with vanilla WGAN-GP, WGAN-GP* employed soft embeddings [58, 59], which allowed the model to use inputs as numeric vectors for both binary and categorical variables, and a Bidirectional LSTM layer [60, 61], which allowed for the generation of samples in time-series. While L_D was kept the same, L_G was modified by Kuo et al. [27] by introducing alignment loss, which helped the model to capture correlation among variables over time better. Hence, the loss functions of WGAN-GP* are the following:

To calculate alignment loss, we computed Pearson’s r correlation [62] for every unique pair of variables X⁽ⁱ⁾ and X^(j). We then applied the L₁ loss to the differences in the correlations between r_syn and r_real, with λ_corr representing a constant acting as a strength regulator of the loss.

In their follow-up papers, Kuo et al. noted that their simulated data based on their proposed WGAN-GP* lacked diversity. In [63], the authors found that WGAN-GP* continued to suffer from mode collapse like the vanilla GAN. Similar to our own CA-GAN, the authors extended the WGAN-GP setup with a conditional element where they externally stored features of the real data during training and replayed them to the generator sub-network at test time. In [64], the same panel of researchers also experimented with diffusion models [65] and found that diffusion models better represent binary and categorical variables. Nonetheless, they demonstrated that GAN-based models encoded less bias (in the means and variances) of the numeric variable distributions.

4.5 CA-GAN

We built our CA-GAN by conditioning the generator and the discriminator on static labels y. Hence, the updated loss functions used by our model are as follows:

Where y can be any categorical label. During training, the label y was used to differentiate the minority from the majority class, and during generation, they were used to create fake samples of the minority class.

Compared to WGAN-GP* we also increased the number of BiLSTMs from 1 to 3 both in the generator and the discriminator, as stacked BiLSTMs have been shown to capture complex time-series better [66]. In addition, we decreased the learning rate and batch size during training. An overview of the CA-GAN architecture is shown in Figure 5.

Fig. 5:

Proposed architecture of our CA-GAN.

Data Availability

The data underlying this article are freely available in the MIMIC-III repository

https://mimic.physionet.org/

Declarations

4.6 Funding

None

4.7 Competing interests

None

4.8 Ethics approval

The data in MIMIC-III was previously de-identified, and the institutional review boards of the Massachusetts Institute of Technology (No. 0403000206) and Beth Israel Deaconess Medical Center (2001-P-001699/14) both approved the use of the database for research.

4.9 Availability of data and materials

The data underlying this article are publicly available in the MIMIC-III repository (https://mimic.physionet.org/).

4. 10 Code availability

The source code will be made available upon publication at the github repository: https://github.com/nic-olo/CA-GAN

Appendix A Distribution Plots for Sepsis

Fig. A1:

Overlaid distribution plots of real data and CA-GAN synthetic data for each variable in the sepsis dataset.

Fig. A2:

Overlaid distribution plots of real data and WGAN-GP* synthetic data for each variable in the sepsis dataset.

Fig. A3:

Overlaid distribution plots of real data and SMOTE synthetic data for each variable in the sepsis dataset.

Appendix B Datasets

View this table:

Table B1: Variables in the acute hypotension dataset. For each variable, the data type, the unit in which it is expressed, and the distribution statistics are presented.

B.1 Data preprocessing

Detailed description of the data preprocessing steps are available from our previous publication [27] in Section 1 of the supplementary material ¹. However, for completeness we highlight some of the main approaches. For the acute hypotension dataset we included adult patients (18 or over) in the MIMIC-III dataset with at least 24 hours of data, aggregating 48 hours of clinical variables from patients with seven or more mean arterial pressure (MAP) values of 65 mmHg or less, indicating acute hypotension. Missing values were replaced with the last available data, while an indicator variable was used to denote whether a value was measured or not. For the sepsis dataset we included adult patients only who had any suspicious infections based on history of administering antibiotics. We included all the variables from at least 44 hours before the suspected infection and up to 28 hours after. The missing data was imputed using nearest neighbour method. More detailed information is available in https://static-content.springer.com/esm/art%3A10.1038%2Fs41597-022-01784-7/MediaObjects/41597_2022_1784_MOESM4_ESM.pdf

View this table:

Table B2: Variables in the sepsis dataset. For each variable, the data type, the unit in which it is expressed, and the distribution statistics are presented.

Appendix C UMAP and t-SNE parameters

In this study, we used t-SNE and UMAP algorithms to perform dimensionality reduction on our datasets and highlight the differences between the results of the three methods under analysis. The following parameters were used for each algorithm:

t-SNE:

Library: scikit-learn version 1.2.2

Parameters for sepsis: n_components = 2, n_iter = 500,

learning_rate = 100, perplexity = 50

Parameters for acute hypotension: n_components = 2, n_iter = 100,

learning_rate = 1000, perplexity = 30

UMAP:

Library: umap-learn version 0.5.3

Parameters: n_neighbors = 5, spread = 5, min_dist = 0.5

Appendix D Joint distributions of variables

We have carried out an analysis of a set of variables that are clinically known to follow a joint distribution, namely systolic, diastolic and mean arterial blood pressure. This was to investigate whether CA-GAN can capture joint distributions of variables and whether synthetic data are clinically meaningful, where we know that systolic blood pressure is always higher than diastolic blood pressure. Using a scatter plot in Figure D4 we show that the joint distribution of real data is similar to that of synthetic data.

Following from this, we have also implemented several sanity checks based on clinical knowledge to ensure that the generated synthetic data is clinically meaningful. In this respect, we have performed checks to investigate whether systolic BP values are always lower than diastolic. From our analysis, in the real sepsis dataset, 99.94% of values of these variables were correct; that is, systolic values were always lower than the diastolic values. In the synthetic sepsis dataset, this figure was 99.93%, with only 0.01% difference between the real and the synthetic dataset. On the other hand for the hypotension dataset there were 99.86% correct values of systolic and diastolic variables versus 99.84% in the synthetic dataset, representing a 0.02% difference. This analysis suggests that our architecture captures the structure of the real data quite well.

Fig. D4:

Joint distribution plot of real sepsis data and CA-GAN synthetic data for the variables Systolic blood pressure, Diastolic blood pressure and Mean arterial pressure.

Appendix E Summary of distance metrics

View this table:

Table E3: Statistics for KL-Divergence and Maximum Mean Discrepancy between the distribution of real and synthetic data.

Appendix F Description of Downstream Regression Task

For this task, the BiLSTM is trained on the first 20 hours for hypotension and 10 hours for sepsis of the patients’ values to predict the next hour, using a sliding window approach. To ensure the fairness of our result, 15% of the time series data points of Black patients and a proportional representation of different genders were set apart as a test set. To account for the stochasticity of the models, we generated five synthetic datasets with CA-GAN. Then, we trained 5 BiLSTM models on each dataset for a total of 25 BiLSTM models (5x(5 per dataset)). The predictions of the trained models were then compared with the test dataset, and the resulting error was averaged across the five models and, subsequently, the five datasets.

Fig. F5:

Diagram of BiLSTM Model Training for Continuous Variable Prediction.

Appendix G Downstream regression task on gender-conditioned data

We have devised an additional dataset of patient subpopulations with gender as the underrepresented category. Specifically, we have created two datasets: one with the original data (Real) and the other (the Real surrogate) from which we have removed 80% of the data from female patients. We have replaced the removed portion of the dataset with the synthetic data generated from our architecture. Finally, we used a downstream regression task in the same manner as we used it with the ethnicity data to evaluate the performance of our approach.

Our results show that the performance of both algorithms is comparable, namely using the original dataset, we obtain a Mean Absolute Error (MAE) of 4.784 (±0.393), while we obtain an MAE of 3.193 (±0.052) using the synthetic dataset, indicating that the CA-GAN generates faithful synthetic data. What’s even more interesting is that combining synthetic data with the real data further reduced the MAE to 2.870 (±0.220), which is lower than that of the real data alone (MAE of 3.294 (±0.241)).

View this table:

Table G4: Results of downstream regression task on gender-conditioned data, based on Mean Absolute Error (MAE), with standard deviation shown in brackets. Real surrogate represents the real datatset from which we have removed 80% of the data from female patients.

Appendix H UMAP Plots

Fig. H6:

UMAP two-dimensional representations of the acute hypotension dataset for Black patients.

Fig. H7:

UMAP two-dimensional representations of the sepsis dataset for Black patients.

Footnotes

Introduced the concept of fairness, updated figures
↵¹ https://static-content.springer.com/esm/art%3A10.1038%2Fs41597-022-01784-7/MediaObjects/41597_2022_1784_MOESM4_ESM.pdf

References

[1].↵
Topol, E.J.: High-performance medicine: the convergence of human and artificial intelligence. Nat. Med. 25(1), 44–56 (2019). doi:10.1038/s41591-018-0300-7
OpenUrl CrossRef PubMed
[2].↵
1. Organization, W.H. (ed
.): Global Strategy on Digital Health 2020-2025. World Health Organization, Genève, Switzerland (2021)
[3].↵
Obermeyer, Z., Powers, B., Vogeli, C., Mullainathan, S.: Dissecting racial bias in an algorithm used to manage the health of populations. Science 366(6464), 447–453 (2019). doi:10.1126/science.aax2342
OpenUrl Abstract/FREE Full Text
[4].↵
Ibrahim, H., Liu, X., Zariffa, N., Morris, A.D., Denniston, A.K.: Health data poverty: an assailable barrier to equitable digital health care. The Lancet Digital Health 3(4), 260–265 (2021). doi:10.1016/S2589-7500(20)30317-4
OpenUrl CrossRef
[5].↵
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research 16, 321–357 (2002). doi:10.1613/jair.953
OpenUrl CrossRef PubMed
[6].↵
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans-actions on Information Theory 13(1), 21–27 (1967). doi:10.1109/TIT.1967.1053964
OpenUrl CrossRef
[7].↵
Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. In: Lecture Notes in Computer Science. Lecture notes in computer science, pp. 878–887. Springer, Berlin, Heidelberg (2005). doi:10.1007/11538059_91
OpenUrl CrossRef
[8].
Gosain, A., Sardana, S.: Farthest SMOTE: A modified SMOTE approach. In: Advances in Intelligent Systems and Computing. Advances in intelligent systems and computing, pp. 309–320. Springer, Singapore (2019). doi:10.1007/978-981-10-8055-5_28
OpenUrl CrossRef
[9].↵
Nguyen, H.M., Cooper, E.W., Kamei, K.: Borderline over-sampling for imbalanced data classification. Int. J. Knowl. Eng. Soft Data Paradig. 3(1), 4 (2011). doi:10.1504/IJKESDP.2011.039875
OpenUrl CrossRef
[10].↵
Blagus, R., Lusa, L.: SMOTE for high-dimensional class-imbalanced data. BMC Bioinformatics 14(1), 106 (2013). doi:10.1186/1471-2105-14-106
OpenUrl CrossRef PubMed
[11].
Fernandez, A., Garcia, S., Herrera, F., Chawla, N.V.: SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary. Journal of Artificial Intelligence Research 61, 863–905 (2018). doi:10.1613/jair.1.11192
OpenUrl CrossRef
[12].↵
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering 21(9), 1263–1284 (2009). doi:10.1109/TKDE.2008.239
OpenUrl CrossRef Web of Science
[13].↵
Lu, C., Reddy, C.K., Wang, P., Nie, D., Ning, Y.: Multi-Label Clinical Time-Series Generation via Conditional GAN. arXiv (2022). doi:10.48550/ARXIV.2204.04797. https://arxiv.org/abs/2204.04797
OpenUrl CrossRef
[14].
Engelmann, J., Lessmann, S.: Conditional wasserstein gan-based over-sampling of tabular data for imbalanced learning. Expert Systems with Applications 174, 114582 (2021). doi:10.1016/j.eswa.2021.114582
OpenUrl CrossRef
[15].
Zheng, M., Li, T., Zhu, R., Tang, Y., Tang, M., Lin, L., Ma, Z.: Condi-tional wasserstein generative adversarial network-gradient penalty-based approach to alleviating imbalanced data classification. Information Sciences 512, 1009–1023 (2020). doi:10.1016/j.ins.2019.10.014
OpenUrl CrossRef
[16].
Seibold, M., Hoch, A., Farshad, M., Navab, N., Fürnstahl, P.: Conditional Generative Data Augmentation for Clinical Audio Datasets. arXiv (2022). doi:10.48550/ARXIV.2203.11570. https://arxiv.org/abs/2203.11570
OpenUrl CrossRef
[17].↵
Gao, X., Deng, F., Yue, X.: Data augmentation in fault diagnosis based on the wasserstein generative adversarial network with gradient penalty. Neurocomputing 396, 487–494 (2020). doi:10.1016/j.neucom.2018.10.109
OpenUrl CrossRef
[18].↵
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4396–4405 (2019). doi:10.1109/CVPR.2019.00453
OpenUrl CrossRef
[19].↵
de Rosa, G.H., Papa, J.P.: A survey on text generation using generative adversarial networks. Pattern Recognit. 119(108098), 108098 (2021). doi:10.1016/j.patcog.2021.108098
OpenUrl CrossRef
[20].↵
Kong, J., Kim, J., Bae, J.: Hifi-gan: Generative adversarial networks for efficient and high fidelity speech synthesis. Advances in Neural Information Processing Systems 33, 17022–17033 (2020)
OpenUrl
[21].↵
Savage, N.: Synthetic data could be better than real data. Nature (2023). doi:10.1038/d41586-023-01445-8
OpenUrl CrossRef
[22].↵
Brophy, E., Wang, Z., She, Q., Ward, T.: Generative adversarial networks in time series: A systematic literature review. ACM Comput. Surv. 55(10) (2023). doi:10.1145/3559540
OpenUrl CrossRef
[23].↵
Alaa, A., Van Breugel, B., Saveliev, E.S., van der Schaar, M.: How faithful is your synthetic data? sample-level metrics for evaluating and auditing generative models. In: International Conference on Machine Learning, pp. 290–306 (2022). PMLR
[24].↵
Ghosheh, G., Li, J., Zhu, T.: A review of Generative Adversarial Networks for Electronic Health Records: applications, evaluation measures and data sources. arXiv (2022). doi:10.48550/ARXIV.2203.07018. https://arxiv.org/abs/2203.07018
OpenUrl CrossRef
[25].↵
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: International Conference on Machine Learning, pp. 214–223 (2017). PMLR
[26].↵
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of wasserstein gans. Advances in neural information processing systems 30 (2017)
[27].↵
Kuo, N.I.-H., Polizzotto, M.N., Finfer, S., Garcia, F., Sönnerborg, A., Zazzi, M., Böhm, M., Kaiser, R., Jorm, L., Barbieri, S.: The health gym: synthetic health-related datasets for the development of reinforcement learning algorithms. Scientific Data 9(1) (2022). doi:10.1038/s41597-022-01784-7
OpenUrl CrossRef
[28].↵
Juwara, L., El-Hussuna, A., El Emam, K.: An evaluation of synthetic data augmentation for mitigating covariate bias in health data. Patterns 5(4) (2024)
[29].↵
Johnson, A.E.W., Pollard, T.J., Shen, L., Lehman, L.-w.H., Feng, M., Ghassemi, M., Moody, B., Szolovits, P., Celi, L.A., Mark, R.G.: MIMIC-III, a freely accessible critical care database. Scientific Data 3(1) (2016). doi:10.1038/sdata.2016.35
OpenUrl CrossRef PubMed
[30].↵
Theis, L., van den Oord, A., Bethge, M.: A note on the evaluation of gen-erative models. In: International Conference on Learning Representations (2016). http://arxiv.org/abs/1511.01844
[31].↵
van der Maaten, L., Hinton, G.: Visualizing data using t-sne. Journal of Machine Learning Research 9(86), 2579–2605 (2008)
OpenUrl
[32].↵
McInnes, L., Healy, J., Melville, J.: UMAP: Uniform Manifold Approxi-mation and Projection for Dimension Reduction (2020)
[33].↵
Goodfellow, I.J.: NIPS 2016 tutorial: Generative adversarial networks. CoRR abs/1701.00160 (2017) https://arxiv.org/abs/1701.00160
[34].↵
Kullback, S., Leibler, R.A.: On information and sufficiency. The Annals of Mathematical Statistics 22(1), 79–86 (1951). Accessed 2022-09-14
OpenUrl CrossRef
[35].↵
Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. Journal of Machine Learning Research 13(25), 723–773 (2012)
OpenUrl
[36].↵
Kendall, M.G.: THE TREATMENT OF TIES IN RANKING PROB-LEMS. Biometrika 33(3), 239–251 (1945). doi:10.1093/biomet/33.3.239
OpenUrl CrossRef PubMed
[37].↵
Yazici, Y., Foo, C.-S., Winkler, S., Yap, K.-H., Chandrasekhar, V.: Empir-ical analysis of overfitting and mode drop in gan training. In: 2020 IEEE International Conference on Image Processing (ICIP), pp. 1651–1655 (2020). doi:10.1109/ICIP40778.2020.9191083
OpenUrl CrossRef
[38].↵
Caton, S., Haas, C.: Fairness in machine learning: A survey. ACM Computing Surveys 56(7), 1–38 (2024). doi:10.1145/3616865
OpenUrl CrossRef
[39].↵
Ricci Lara, M.A., Echeveste, R., Ferrante, E.: Addressing fairness in artificial intelligence for medical imaging. Nature Communications 13(1) (2022). doi:10.1038/s41467-022-32186-3
OpenUrl CrossRef
[40].↵
Mamandipoor, B., Yeung, W., Agha-Mir-Salim, L., Stone, D.J., Osmani, V., Celi, L.A.: Prediction of blood lactate values in critically ill patients: a retrospective multi-center cohort study. Journal of clinical monitoring and computing, 1–11 (2022). doi:10.1007/s10877-021-00739-4
OpenUrl CrossRef
[41].↵
Rezar, R., Mamandipoor, B., Seelmaier, C., Jung, C., Lichtenauer, M., Hoppe, U.C., Kaufmann, R., Osmani, V., Wernly, B.: Hyperlactatemia and altered lactate kinetics are associated with excess mortality in sepsis: A mul-ticenter retrospective observational study. Wiener klinische Wochenschrift 135(3), 80–88 (2023). doi:10.1007/s00508-022-02130-y
OpenUrl CrossRef
[42].↵
Bruno, R.R., Wernly, B., Binneboessel, S., Baldia, P., Duse, D.A., Erkens, R., Kelm, M., Mamandipoor, B., Osmani, V., Jung, C.: Failure of lactate clearance predicts the outcome of critically ill septic patients. Diagnostics 10(12) (2020). doi:10.3390/diagnostics10121105
OpenUrl CrossRef
[43].↵
Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. Advances in neural information processing systems 30 (2017)
[44].↵
Carrington, A.M., Manuel, D.G., Fieguth, P.W., Ramsay, T., Osmani, V., Wernly, B., Bennett, C., Hawken, S., McInnes, M., Magwood, O., et al: Deep roc analysis and auc as balanced average accuracy to improve model selection, understanding and interpretation. arXiv preprint arXiv:2103.11357 (2021). doi:10.48550/arXiv.2103.11357
OpenUrl CrossRef
[45].↵
Carrington, A.M., Manuel, D.G., Fieguth, P.W., Ramsay, T., Osmani, V., Wernly, B., Bennett, C., Hawken, S., Magwood, O., Sheikh, Y., McInnes, M., Holzinger, A.: Deep roc analysis and auc as balanced average accuracy, for improved classifier selection, audit and explanation. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 329–341 (2023). doi:10.1109/tpami.2022.3145392
OpenUrl CrossRef
[46].↵
Kuo, N.I.-H., Perez-Concha, O., Hanly, M., Mnatzaganian, E., Hao, B., Di Sipio, M., Yu, G., Vanjara, J., Valerie, I.C., de Oliveira Costa, J., Churches, T., Lujic, S., Hegarty, J., Jorm, L., Barbieri, S.: Enrich-ing Data Science and Healthcare Education: Application and Impact of Synthetic Datasets through the Health Gym Project. JMIR Medical Education. forthcoming/in press (2023). doi:10.2196/51388. https://preprints.jmir.org/preprint/51388
OpenUrl CrossRef
[47].↵
Bai, S., Kolter, J.Z., Koltun, V.: An empirical evaluation of generic con-volutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271 (2018)
[48].↵
Kuo, N.I., Harandi, M., Fourrier, N., Walder, C., Ferraro, G., Suominen, H.: An input residual connection for simplifying gated recurrent neural networks. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2020). IEEE
[49].↵
Johansson, P., Bright, J., Krishna, S., Fischer, C., Leslie, D.: Explor-ing responsible applications of Synthetic Data to advance Online Safety Research and Development (2024). https://arxiv.org/abs/2402.04910
[50].↵
Velichkovska, B., Gjoreski, H., Denkovski, D., Kalendar, M., Mul-lan, I.D., Gichoya, J.W., Martinez, N., Celi, L.A., Osmani, V.: AI learns racial information from the values of vital signs. medRxiv (2023) https://arxiv.org/abs/ https://www.medrxiv.org/content/early/2023/12/11/2023.12.11.23299819.full.pdf. doi:10.1101/2023.12.11.23299819
OpenUrl Abstract/FREE Full Text
[51].↵
Velichkovska, B., Gjoreski, H., Denkovski, D., Kalendar, M., Mamandipoor, B., Celi, L.A., Osmani, V.: Vital signs as a source of racial bias. medRxiv, 2022–020322270291 (2022). doi:10.1101/2022.02.03.22270291
OpenUrl Abstract/FREE Full Text
[52].↵
Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv (2014). doi:10.48550/ARXIV.1406.2661. https://arxiv.org/abs/1406.2661
OpenUrl CrossRef
[53].↵
Mirza, M., Osindero, S.: Conditional Generative Adversarial Nets (2014)
[54].↵
Yoon, J., Jarrett, D., Van der Schaar, M.: Time-series generative adversarial networks. Advances in neural information processing systems 32 (2019)
[55].↵
Yu, L., Zhang, W., Wang, J., Yu, Y.: Seqgan: Sequence generative adversarial nets with policy gradient. CoRR abs/1609.05473 (2016) https://arxiv.org/abs/1609.05473
[56].↵
Levina, E., Bickel, P.: The earth mover’s distance is the mallows distance: some insights from statistics. In: Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, vol. 2, pp. 251–2562 (2001). doi:10.1109/ICCV.2001.937632
OpenUrl CrossRef
[57].↵
Arjovsky, M., Bottou, L.: Towards Principled Methods for Training Gen-erative Adversarial Networks. arXiv (2017). doi:10.48550/ARXIV.1701.04862. https://arxiv.org/abs/1701.04862
OpenUrl CrossRef
[58].↵
Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Processes 25(2-3), 259–284 (1998) https://arxiv.org/abs/10.1080/01638539809545028. doi:10.1080/01638539809545028
OpenUrl CrossRef Web of Science
[59].↵
Mottini, A., Lheritier, A., Acuna-Agost, R.: Airline passenger name record generation using generative adversarial networks. CoRR abs/1807.06657 (2018) https://arxiv.org/abs/1807.06657
[60].↵
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). doi:10.1162/neco.1997.9.8.1735
OpenUrl CrossRef PubMed Web of Science
[61].↵
Graves, A., Fernández, S., Schmidhuber, J.: Bidirectional lstm networks for improved phoneme classification and recognition. In: Proceedings of the 15th International Conference on Artificial Neural Networks: Formal Models and Their Applications - Volume Part II. ICANN’05, pp. 799–804. Springer, Berlin, Heidelberg (2005)
[62].↵
Mukaka, M.: Statistics corner: A guide to appropriate use of correlation coefficient in medical research. Malawi medical journal : the journal of Medical Association of Malawi 24, 69–71 (2012)
OpenUrl
[63].↵
Kuo, N.I., Jorm, L., Barbieri, S., et al: Generating synthetic clinical data that capture class imbalanced distributions with generative adversarial networks: Example using antiretroviral therapy for hiv. arXiv preprint arXiv:2208.08655 (2022)
[64].↵
Kuo, N.I., Jorm, L., Barbieri, S., et al: Synthetic health-related longitu-dinal data with mixed-type variables generated using diffusion models. arXiv preprint arXiv:2303.12281 (2023)
[65].↵
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsu-pervised learning using nonequilibrium thermodynamics. In: International Conference on Machine Learning, pp. 2256–2265 (2015). PMLR
[66].↵
Althelaya, K.A., El-Alfy, E.-S.M., Mohammed, S.: Evaluation of bidirec-tional lstm for short-and long-term stock market prediction. In: 2018 9th International Conference on Information and Communication Systems (ICICS), pp. 151–156 (2018). doi:10.1109/IACS.2018.8355458
OpenUrl CrossRef

View the discussion thread.

Posted August 19, 2024.

Download PDF

Data/Code

Citation Tools

Subject Area

Intensive Care and Critical Care Medicine

Subject Areas

All Articles

Addiction Medicine (399)
Allergy and Immunology (708)
Anesthesia (201)
Cardiovascular Medicine (2942)
Dentistry and Oral Medicine (334)
Dermatology (249)
Emergency Medicine (439)
Endocrinology (including Diabetes Mellitus and Metabolic Disease) (1036)
Epidemiology (12743)
Forensic Medicine (12)
Gastroenterology (828)
Genetic and Genomic Medicine (4583)
Geriatric Medicine (417)
Health Economics (729)
Health Informatics (2916)
Health Policy (1069)
Health Systems and Quality Improvement (1077)
Hematology (389)
HIV/AIDS (924)
Infectious Diseases (except HIV/AIDS) (14098)
Intensive Care and Critical Care Medicine (846)
Medical Education (425)
Medical Ethics (115)
Nephrology (469)
Neurology (4353)
Nursing (236)
Nutrition (639)
Obstetrics and Gynecology (805)
Occupational and Environmental Health (735)
Oncology (2268)
Ophthalmology (646)
Orthopedics (258)
Otolaryngology (325)
Pain Medicine (279)
Palliative Medicine (83)
Pathology (501)
Pediatrics (1196)
Pharmacology and Therapeutics (504)
Primary Care Research (496)
Psychiatry and Clinical Psychology (3755)
Public and Global Health (6937)
Radiology and Imaging (1527)
Rehabilitation Medicine and Physical Therapy (905)
Respiratory Medicine (915)
Rheumatology (437)
Sexual and Reproductive Health (443)
Sports Medicine (385)
Surgery (488)
Toxicology (60)
Transplantation (212)
Urology (180)

[1] [1].↵
Topol, E.J.: High-performance medicine: the convergence of human and artificial intelligence. Nat. Med. 25(1), 44–56 (2019). doi:10.1038/s41591-018-0300-7
OpenUrl CrossRef PubMed

[2] [2].↵
Organization, W.H. (ed
.): Global Strategy on Digital Health 2020-2025. World Health Organization, Genève, Switzerland (2021)

[3] Organization, W.H. (ed

[4] [3].↵
Obermeyer, Z., Powers, B., Vogeli, C., Mullainathan, S.: Dissecting racial bias in an algorithm used to manage the health of populations. Science 366(6464), 447–453 (2019). doi:10.1126/science.aax2342
OpenUrl Abstract/FREE Full Text

[5] [4].↵
Ibrahim, H., Liu, X., Zariffa, N., Morris, A.D., Denniston, A.K.: Health data poverty: an assailable barrier to equitable digital health care. The Lancet Digital Health 3(4), 260–265 (2021). doi:10.1016/S2589-7500(20)30317-4
OpenUrl CrossRef

[6] [5].↵
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research 16, 321–357 (2002). doi:10.1613/jair.953
OpenUrl CrossRef PubMed

[7] [6].↵
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans-actions on Information Theory 13(1), 21–27 (1967). doi:10.1109/TIT.1967.1053964
OpenUrl CrossRef

[8] [7].↵
Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. In: Lecture Notes in Computer Science. Lecture notes in computer science, pp. 878–887. Springer, Berlin, Heidelberg (2005). doi:10.1007/11538059_91
OpenUrl CrossRef

[9] [8].
Gosain, A., Sardana, S.: Farthest SMOTE: A modified SMOTE approach. In: Advances in Intelligent Systems and Computing. Advances in intelligent systems and computing, pp. 309–320. Springer, Singapore (2019). doi:10.1007/978-981-10-8055-5_28
OpenUrl CrossRef

[10] [9].↵
Nguyen, H.M., Cooper, E.W., Kamei, K.: Borderline over-sampling for imbalanced data classification. Int. J. Knowl. Eng. Soft Data Paradig. 3(1), 4 (2011). doi:10.1504/IJKESDP.2011.039875
OpenUrl CrossRef

[11] [10].↵
Blagus, R., Lusa, L.: SMOTE for high-dimensional class-imbalanced data. BMC Bioinformatics 14(1), 106 (2013). doi:10.1186/1471-2105-14-106
OpenUrl CrossRef PubMed

[12] [11].
Fernandez, A., Garcia, S., Herrera, F., Chawla, N.V.: SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary. Journal of Artificial Intelligence Research 61, 863–905 (2018). doi:10.1613/jair.1.11192
OpenUrl CrossRef

[13] [12].↵
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering 21(9), 1263–1284 (2009). doi:10.1109/TKDE.2008.239
OpenUrl CrossRef Web of Science

[14] [13].↵
Lu, C., Reddy, C.K., Wang, P., Nie, D., Ning, Y.: Multi-Label Clinical Time-Series Generation via Conditional GAN. arXiv (2022). doi:10.48550/ARXIV.2204.04797. https://arxiv.org/abs/2204.04797
OpenUrl CrossRef

[15] [14].
Engelmann, J., Lessmann, S.: Conditional wasserstein gan-based over-sampling of tabular data for imbalanced learning. Expert Systems with Applications 174, 114582 (2021). doi:10.1016/j.eswa.2021.114582
OpenUrl CrossRef

[16] [15].
Zheng, M., Li, T., Zhu, R., Tang, Y., Tang, M., Lin, L., Ma, Z.: Condi-tional wasserstein generative adversarial network-gradient penalty-based approach to alleviating imbalanced data classification. Information Sciences 512, 1009–1023 (2020). doi:10.1016/j.ins.2019.10.014
OpenUrl CrossRef

[17] [16].
Seibold, M., Hoch, A., Farshad, M., Navab, N., Fürnstahl, P.: Conditional Generative Data Augmentation for Clinical Audio Datasets. arXiv (2022). doi:10.48550/ARXIV.2203.11570. https://arxiv.org/abs/2203.11570
OpenUrl CrossRef

[18] [17].↵
Gao, X., Deng, F., Yue, X.: Data augmentation in fault diagnosis based on the wasserstein generative adversarial network with gradient penalty. Neurocomputing 396, 487–494 (2020). doi:10.1016/j.neucom.2018.10.109
OpenUrl CrossRef

[19] [18].↵
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4396–4405 (2019). doi:10.1109/CVPR.2019.00453
OpenUrl CrossRef

[20] [19].↵
de Rosa, G.H., Papa, J.P.: A survey on text generation using generative adversarial networks. Pattern Recognit. 119(108098), 108098 (2021). doi:10.1016/j.patcog.2021.108098
OpenUrl CrossRef

[21] [20].↵
Kong, J., Kim, J., Bae, J.: Hifi-gan: Generative adversarial networks for efficient and high fidelity speech synthesis. Advances in Neural Information Processing Systems 33, 17022–17033 (2020)
OpenUrl

[22] [21].↵
Savage, N.: Synthetic data could be better than real data. Nature (2023). doi:10.1038/d41586-023-01445-8
OpenUrl CrossRef

[23] [22].↵
Brophy, E., Wang, Z., She, Q., Ward, T.: Generative adversarial networks in time series: A systematic literature review. ACM Comput. Surv. 55(10) (2023). doi:10.1145/3559540
OpenUrl CrossRef

[24] [23].↵
Alaa, A., Van Breugel, B., Saveliev, E.S., van der Schaar, M.: How faithful is your synthetic data? sample-level metrics for evaluating and auditing generative models. In: International Conference on Machine Learning, pp. 290–306 (2022). PMLR

[25] [24].↵
Ghosheh, G., Li, J., Zhu, T.: A review of Generative Adversarial Networks for Electronic Health Records: applications, evaluation measures and data sources. arXiv (2022). doi:10.48550/ARXIV.2203.07018. https://arxiv.org/abs/2203.07018
OpenUrl CrossRef

[26] [25].↵
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: International Conference on Machine Learning, pp. 214–223 (2017). PMLR

[27] [26].↵
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of wasserstein gans. Advances in neural information processing systems 30 (2017)

[28] [27].↵
Kuo, N.I.-H., Polizzotto, M.N., Finfer, S., Garcia, F., Sönnerborg, A., Zazzi, M., Böhm, M., Kaiser, R., Jorm, L., Barbieri, S.: The health gym: synthetic health-related datasets for the development of reinforcement learning algorithms. Scientific Data 9(1) (2022). doi:10.1038/s41597-022-01784-7
OpenUrl CrossRef

[29] [28].↵
Juwara, L., El-Hussuna, A., El Emam, K.: An evaluation of synthetic data augmentation for mitigating covariate bias in health data. Patterns 5(4) (2024)

[30] [29].↵
Johnson, A.E.W., Pollard, T.J., Shen, L., Lehman, L.-w.H., Feng, M., Ghassemi, M., Moody, B., Szolovits, P., Celi, L.A., Mark, R.G.: MIMIC-III, a freely accessible critical care database. Scientific Data 3(1) (2016). doi:10.1038/sdata.2016.35
OpenUrl CrossRef PubMed

[31] [30].↵
Theis, L., van den Oord, A., Bethge, M.: A note on the evaluation of gen-erative models. In: International Conference on Learning Representations (2016). http://arxiv.org/abs/1511.01844

[32] [31].↵
van der Maaten, L., Hinton, G.: Visualizing data using t-sne. Journal of Machine Learning Research 9(86), 2579–2605 (2008)
OpenUrl

[33] [32].↵
McInnes, L., Healy, J., Melville, J.: UMAP: Uniform Manifold Approxi-mation and Projection for Dimension Reduction (2020)

[34] [33].↵
Goodfellow, I.J.: NIPS 2016 tutorial: Generative adversarial networks. CoRR abs/1701.00160 (2017) https://arxiv.org/abs/1701.00160

[35] [34].↵
Kullback, S., Leibler, R.A.: On information and sufficiency. The Annals of Mathematical Statistics 22(1), 79–86 (1951). Accessed 2022-09-14
OpenUrl CrossRef

[36] [35].↵
Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. Journal of Machine Learning Research 13(25), 723–773 (2012)
OpenUrl

[37] [36].↵
Kendall, M.G.: THE TREATMENT OF TIES IN RANKING PROB-LEMS. Biometrika 33(3), 239–251 (1945). doi:10.1093/biomet/33.3.239
OpenUrl CrossRef PubMed

[38] [37].↵
Yazici, Y., Foo, C.-S., Winkler, S., Yap, K.-H., Chandrasekhar, V.: Empir-ical analysis of overfitting and mode drop in gan training. In: 2020 IEEE International Conference on Image Processing (ICIP), pp. 1651–1655 (2020). doi:10.1109/ICIP40778.2020.9191083
OpenUrl CrossRef

[39] [38].↵
Caton, S., Haas, C.: Fairness in machine learning: A survey. ACM Computing Surveys 56(7), 1–38 (2024). doi:10.1145/3616865
OpenUrl CrossRef

[40] [39].↵
Ricci Lara, M.A., Echeveste, R., Ferrante, E.: Addressing fairness in artificial intelligence for medical imaging. Nature Communications 13(1) (2022). doi:10.1038/s41467-022-32186-3
OpenUrl CrossRef

[41] [40].↵
Mamandipoor, B., Yeung, W., Agha-Mir-Salim, L., Stone, D.J., Osmani, V., Celi, L.A.: Prediction of blood lactate values in critically ill patients: a retrospective multi-center cohort study. Journal of clinical monitoring and computing, 1–11 (2022). doi:10.1007/s10877-021-00739-4
OpenUrl CrossRef

[42] [41].↵
Rezar, R., Mamandipoor, B., Seelmaier, C., Jung, C., Lichtenauer, M., Hoppe, U.C., Kaufmann, R., Osmani, V., Wernly, B.: Hyperlactatemia and altered lactate kinetics are associated with excess mortality in sepsis: A mul-ticenter retrospective observational study. Wiener klinische Wochenschrift 135(3), 80–88 (2023). doi:10.1007/s00508-022-02130-y
OpenUrl CrossRef

[43] [42].↵
Bruno, R.R., Wernly, B., Binneboessel, S., Baldia, P., Duse, D.A., Erkens, R., Kelm, M., Mamandipoor, B., Osmani, V., Jung, C.: Failure of lactate clearance predicts the outcome of critically ill septic patients. Diagnostics 10(12) (2020). doi:10.3390/diagnostics10121105
OpenUrl CrossRef

[44] [43].↵
Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. Advances in neural information processing systems 30 (2017)

[45] [44].↵
Carrington, A.M., Manuel, D.G., Fieguth, P.W., Ramsay, T., Osmani, V., Wernly, B., Bennett, C., Hawken, S., McInnes, M., Magwood, O., et al: Deep roc analysis and auc as balanced average accuracy to improve model selection, understanding and interpretation. arXiv preprint arXiv:2103.11357 (2021). doi:10.48550/arXiv.2103.11357
OpenUrl CrossRef

[46] [45].↵
Carrington, A.M., Manuel, D.G., Fieguth, P.W., Ramsay, T., Osmani, V., Wernly, B., Bennett, C., Hawken, S., Magwood, O., Sheikh, Y., McInnes, M., Holzinger, A.: Deep roc analysis and auc as balanced average accuracy, for improved classifier selection, audit and explanation. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(1), 329–341 (2023). doi:10.1109/tpami.2022.3145392
OpenUrl CrossRef

[47] [46].↵
Kuo, N.I.-H., Perez-Concha, O., Hanly, M., Mnatzaganian, E., Hao, B., Di Sipio, M., Yu, G., Vanjara, J., Valerie, I.C., de Oliveira Costa, J., Churches, T., Lujic, S., Hegarty, J., Jorm, L., Barbieri, S.: Enrich-ing Data Science and Healthcare Education: Application and Impact of Synthetic Datasets through the Health Gym Project. JMIR Medical Education. forthcoming/in press (2023). doi:10.2196/51388. https://preprints.jmir.org/preprint/51388
OpenUrl CrossRef

[48] [47].↵
Bai, S., Kolter, J.Z., Koltun, V.: An empirical evaluation of generic con-volutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271 (2018)

[49] [48].↵
Kuo, N.I., Harandi, M., Fourrier, N., Walder, C., Ferraro, G., Suominen, H.: An input residual connection for simplifying gated recurrent neural networks. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2020). IEEE

[50] [49].↵
Johansson, P., Bright, J., Krishna, S., Fischer, C., Leslie, D.: Explor-ing responsible applications of Synthetic Data to advance Online Safety Research and Development (2024). https://arxiv.org/abs/2402.04910

[51] [50].↵
Velichkovska, B., Gjoreski, H., Denkovski, D., Kalendar, M., Mul-lan, I.D., Gichoya, J.W., Martinez, N., Celi, L.A., Osmani, V.: AI learns racial information from the values of vital signs. medRxiv (2023) https://arxiv.org/abs/ https://www.medrxiv.org/content/early/2023/12/11/2023.12.11.23299819.full.pdf. doi:10.1101/2023.12.11.23299819
OpenUrl Abstract/FREE Full Text

[52] [51].↵
Velichkovska, B., Gjoreski, H., Denkovski, D., Kalendar, M., Mamandipoor, B., Celi, L.A., Osmani, V.: Vital signs as a source of racial bias. medRxiv, 2022–020322270291 (2022). doi:10.1101/2022.02.03.22270291
OpenUrl Abstract/FREE Full Text

[53] [52].↵
Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv (2014). doi:10.48550/ARXIV.1406.2661. https://arxiv.org/abs/1406.2661
OpenUrl CrossRef

[54] [53].↵
Mirza, M., Osindero, S.: Conditional Generative Adversarial Nets (2014)

[55] [54].↵
Yoon, J., Jarrett, D., Van der Schaar, M.: Time-series generative adversarial networks. Advances in neural information processing systems 32 (2019)

[56] [55].↵
Yu, L., Zhang, W., Wang, J., Yu, Y.: Seqgan: Sequence generative adversarial nets with policy gradient. CoRR abs/1609.05473 (2016) https://arxiv.org/abs/1609.05473

[57] [56].↵
Levina, E., Bickel, P.: The earth mover’s distance is the mallows distance: some insights from statistics. In: Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, vol. 2, pp. 251–2562 (2001). doi:10.1109/ICCV.2001.937632
OpenUrl CrossRef

[58] [57].↵
Arjovsky, M., Bottou, L.: Towards Principled Methods for Training Gen-erative Adversarial Networks. arXiv (2017). doi:10.48550/ARXIV.1701.04862. https://arxiv.org/abs/1701.04862
OpenUrl CrossRef

[59] [58].↵
Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Processes 25(2-3), 259–284 (1998) https://arxiv.org/abs/10.1080/01638539809545028. doi:10.1080/01638539809545028
OpenUrl CrossRef Web of Science

[60] [59].↵
Mottini, A., Lheritier, A., Acuna-Agost, R.: Airline passenger name record generation using generative adversarial networks. CoRR abs/1807.06657 (2018) https://arxiv.org/abs/1807.06657

[61] [60].↵
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). doi:10.1162/neco.1997.9.8.1735
OpenUrl CrossRef PubMed Web of Science

[62] [61].↵
Graves, A., Fernández, S., Schmidhuber, J.: Bidirectional lstm networks for improved phoneme classification and recognition. In: Proceedings of the 15th International Conference on Artificial Neural Networks: Formal Models and Their Applications - Volume Part II. ICANN’05, pp. 799–804. Springer, Berlin, Heidelberg (2005)

[63] [62].↵
Mukaka, M.: Statistics corner: A guide to appropriate use of correlation coefficient in medical research. Malawi medical journal : the journal of Medical Association of Malawi 24, 69–71 (2012)
OpenUrl

[64] [63].↵
Kuo, N.I., Jorm, L., Barbieri, S., et al: Generating synthetic clinical data that capture class imbalanced distributions with generative adversarial networks: Example using antiretroviral therapy for hiv. arXiv preprint arXiv:2208.08655 (2022)

[65] [64].↵
Kuo, N.I., Jorm, L., Barbieri, S., et al: Synthetic health-related longitu-dinal data with mixed-type variables generated using diffusion models. arXiv preprint arXiv:2303.12281 (2023)

[66] [65].↵
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsu-pervised learning using nonequilibrium thermodynamics. In: International Conference on Machine Learning, pp. 2256–2265 (2015). PMLR

[67] [66].↵
Althelaya, K.A., El-Alfy, E.-S.M., Mohammed, S.: Evaluation of bidirec-tional lstm for short-and long-term stock market prediction. In: 2018 9th International Conference on Information and Communication Systems (ICICS), pp. 151–156 (2018). doi:10.1109/IACS.2018.8355458
OpenUrl CrossRef

Generative AI Mitigates Representation Bias and Improves Model Fairness Through Synthetic Health Data

Abstract

1 Introduction

2 Results

2.1 Qualitative evaluation

2.2 Quantitative evaluation

2.3 Variable correlations

2.4 Synthetic data authenticity

2.5 Improving model fairness

2.6 Downstream regression task

3 Discussion

4 Methods

4.1 Problem Formulation

4.2 Data sources, variables and patient population

4.3 GAN vs CGAN

4.4 WGAN-GP*

4.5 CA-GAN

Data Availability

Declarations

4.6 Funding

4.7 Competing interests

4.8 Ethics approval

4.9 Availability of data and materials

4. 10 Code availability

Appendix A Distribution Plots for Sepsis

Appendix B Datasets

B.1 Data preprocessing

Appendix C UMAP and t-SNE parameters

Appendix D Joint distributions of variables

Appendix E Summary of distance metrics

Appendix F Description of Downstream Regression Task

Appendix G Downstream regression task on gender-conditioned data

Appendix H UMAP Plots

Footnotes

References

Citation Manager Formats

Subject Area