Abstract
Biases in sample creation can arise at any study phase, including initial patient recruitment, exclusion criteria, input-level exclusion and outcome-level exclusion, and often reflect the underrepresentation or exclusion of demographic groups historically disadvantaged in medical research. The use of non-representative samples to construct clinical algorithms in artificial intelligence (AI) and machine learning (ML) applications may further amplify this selection bias. Building on the “Data Cards” initiative for transparency in AI research, we advocate for the addition of a detailed participant flow diagram for AI studies, emphasizing the need to detail excluded participant demographic characteristics at every study phase. This tracking of excluded participants enhances understanding of potential algorithmic biases before their clinical implementation, and thus deserves to be detailed in any medical AI study. We include both a model for this flow diagram as well as a brief case study explaining how it could be implemented in practice. Through standardized reporting of participant flow diagrams, we can better gauge the potential inequity embedded in AI applications, facilitating more reliable and equitable clinical algorithms.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The study used ONLY openly available human data that were originally located at PhysioNet.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
Code Availability The code that produces the presented example case study has been made open-source, available on GitHub: https://github.com/joamats/mit-flow-diagram.
Data Availability The data that support the example case study are available in MIMIC-IV with the identifier doi.org/10.1093/jamia/ocx084 publicly available on PhysioNet (https://physionet.org/content/mimiciv/2.2/).
Funding This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Conflicts of Interest All authors report no conflicts of interest.
Data Availability
All data used are available online at https://physionet.org/content/mimiciv/2.2