Abstract
Objective Administrative healthcare data are an attractive source of secondary analysis because of their potential to answer population-health questions. Although these datasets have known susceptibilities to biases, the degree to which they can distort measurements like cancer screening rates are not widely appreciated, nor are their causes and possible solutions.
Methods Using a billing code database derived from our institution’s electronic health records (EHR), we estimated the colorectal cancer screening rate of average-risk patients aged 50-74 seen in primary care or gastroenterology clinic in 2016-2017. 200 records (150 unscreened, 50 screened) were sampled to quantify the accuracy against manual review.
Results Out of 4,611 patients, an analysis of billing data suggested a 61% screening rate. Manual review revealed a positive predictive value of 96% (86-100%), negative predictive value of 21% (15-29%), and a corrected screening rate of 85% (81-90%). Most false negatives occurred due to exams performed outside the scope of the database – both within and outside of our institution – but 21% of false negatives fell within the database’s scope. False positives occurred due to incomplete exams and inadequate bowel preparation. Reasons for screening failure include ordered but incomplete exams (48%), lack of or incorrect documentation by primary care (29%) including incorrect screening intervals (13%), and patients declining screening (13%).
Conclusions Although analytics on administrative data are commonly ‘validated’ by comparison to independent datasets, comparing our naïve estimate to the CDC estimate (∼60%) would have been misleading. Therefore, regular data audits using the complete EHR are critical to improve screening rates and measure improvement.
WHAT IS KNOWN
Medical billing data might be useful for measuring colon cancer screening rates but are bias-prone and difficult to validate
The degree to which these biases may skew the results of simple population-level analytics is not widely appreciated, nor are their causes and possible solutions.
WHAT IS NEW HERE
Billing data from the health record does not accurately capture unscreened patients. Some reasons were predictable (screening outside the system or prior to software implementation) but others were not.
The common practice of external validation would have been falsely reassuring for these data. The naïve estimate of screening rates matches the CDC estimate (61%); the true rate was 85%.
Periodic data audits using the full EHR is critical to continue to improve screening rates and monitor improvements accurately and at scale.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
UCSF Bakar Computational Health Sciences Institute and the National Center for Advancing Translational Sciences of the National Institutes of Health under award number UL1 TR001872. VAR was supported by the National Institute of Diabetes and Digestive and Kidney Disease of the National Institutes of Health grant under award number T32 DK007007-42.
Author Declarations
All relevant ethical guidelines have been followed and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
Any clinical trials involved have been registered with an ICMJE-approved registry such as ClinicalTrials.gov and the trial ID is included in the manuscript.
Not Applicable
I have followed all appropriate research reporting guidelines and uploaded the relevant Equator, ICMJE or other checklist(s) as supplementary files, if applicable.
Yes
Footnotes
Conflicts of Interest: None relevant to this publication
Financial Support: UCSF Bakar Computational Health Sciences Institute and the National Center for Advancing Translational Sciences of the National Institutes of Health under award number UL1 TR001872. VAR was supported by the National Institute of Diabetes and Digestive and Kidney Disease of the National Institutes of Health grant under award number T32 DK007007-42.
Ethics: Approved by the University of California, San Francisco Institutional Review Board (#18-25166)
The Chan Zuckerberg Initiative, Cold Spring Harbor Laboratory, the Sergey Brin Family Foundation, California Institute of Technology, Centre National de la Recherche Scientifique, Fred Hutchinson Cancer Center, Imperial College London, Massachusetts Institute of Technology, Stanford University, University of Washington, and Vrije Universiteit Amsterdam.