Abstract
Background The use of big data and large language models in healthcare can play a key role in improving patient treatment and healthcare management, especially when applied to large-scale administrative data. A major challenge to achieving this is ensuring that patient confidentiality and personal information is protected. One way to overcome this is by augmenting clinical data with administrative laboratory dataset linkages in order to avoid the use of demographic information.
Methods We explored an alternative method to examine patient files from a large administrative dataset in South Africa (the National Health Laboratory Services, or NHLS), by linking external data to the NHLS database using specimen barcodes associated with laboratory tests. This offers us with a deterministic way of performing data linkages without accessing demographic information. In this paper, we quantify the performance metrics of this approach.
Results The linkage of the large NHLS data to external hospital data using specimen barcodes achieved a 95% success. Out of the 1200 records in the validation sample, 87% were exact matches and 9% were matches with typographic correction. The remaining 5% were either complete mismatches or were due to duplicates in the administrative data.
Conclusions The high success rate indicates the reliability of using barcodes for linking data without demographic identifiers. Specimen barcodes are an effective tool for deterministic linking in health data, and may provide a method of creating large, linked data sets without compromising patient confidentiality.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This study was funded by the US National Institutes of Health (NIH) Eunice Kennedy Shriver National Institute of Child Health & Human Development and the National Institute for Allergy and Infectious Diseases under grant R01 HD103466 and R01 HD103466-04S1. The cohort was also supported by the Eunice Kennedy Shriver National Institute of Child Health and Human Development/National Institute of Allergy and Infectious Disease, National Institutes of Health under grant U01HD080441 and U01AI069924, USAID/PEPFAR and the South African National HIV Programme.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Human Research Ethics Committee (Medical) of the University of the Witwatersrand gave ethical approval of this work under protocol M200237.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
The external data underlying this article were provided with permission by the data gatekeeper for Rahima Moosa Mother and Child Hospital and the Empilweni Services and Research Unit. Cohort participants provided written consent for data to be used for research purposes and requests for access can be directed to Empliweni Services and Research Unit, Johannesburg, (email: Karl-Gunter.Technau@wits.ac.za). Laboratory data linked are owned by the National Health Laboratory Services and access is governed by policies and procedures in response to requests made directly to the NHLS Office of Academic Affairs and Research.