PT - JOURNAL ARTICLE AU - Khera, Rohan AU - Mortazavi, Bobak J. AU - Sangha, Veer AU - Warner, Frederick AU - Young, H. Patrick AU - Ross, Joseph S. AU - Shah, Nilay D. AU - Theel, Elitza S. AU - Jenkinson, William G. AU - Knepper, Camille AU - Wang, Karen AU - Peaper, David AU - Martinello, Richard A AU - Brandt, Cynthia A. AU - Lin, Zhenqiu AU - Ko, Albert I. AU - Krumholz, Harlan M. AU - Pollock, Benjamin D. AU - Schulz, Wade L. TI - Accuracy of Computable Phenotyping Approaches for SARS-CoV-2 Infection and COVID-19 Hospitalizations from the Electronic Health Record AID - 10.1101/2021.03.16.21253770 DP - 2021 Jan 01 TA - medRxiv PG - 2021.03.16.21253770 4099 - http://medrxiv.org/content/early/2021/05/13/2021.03.16.21253770.short 4100 - http://medrxiv.org/content/early/2021/05/13/2021.03.16.21253770.full AB - Objective Real-world data have been critical for rapid-knowledge generation throughout the COVID-19 pandemic. To ensure high-quality results are delivered to guide clinical decision making and the public health response, as well as characterize the response to interventions, it is essential to establish the accuracy of COVID-19 case definitions derived from administrative data to identify infections and hospitalizations.Methods Electronic Health Record (EHR) data were obtained from the clinical data warehouse of the Yale New Haven Health System (Yale, primary site) and 3 hospital systems of the Mayo Clinic (validation site). Detailed characteristics on demographics, diagnoses, and laboratory results were obtained for all patients with either a positive SARS-CoV-2 PCR or antigen test or ICD-10 diagnosis of COVID-19 (U07.1) between April 1, 2020 and March 1, 2021. Various computable phenotype definitions were evaluated for their accuracy to identify SARS-CoV-2 infection and COVID-19 hospitalizations.Results Of the 69,423 individuals with either a diagnosis code or a laboratory diagnosis of a SARS-CoV-2 infection at Yale, 61,023 had a principal or a secondary diagnosis code for COVID-19 and 50,355 had a positive SARS-CoV-2 test. Among those with a positive laboratory test, 38,506 (76.5%) and 3449 (6.8%) had a principal and secondary diagnosis code of COVID-19, respectively, while 8400 (16.7%) had no COVID-19 diagnosis. Moreover, of the 61,023 patients with a COVID-19 diagnosis code, 19,068 (31.2%) did not have a positive laboratory test for SARS-CoV-2 in the EHR. Of the 20 cases randomly sampled from this latter group for manual review, all had a COVID-19 diagnosis code related to asymptomatic testing with negative subsequent test results. The positive predictive value (precision) and sensitivity (recall) of a COVID-19 diagnosis in the medical record for a documented positive SARS-CoV-2 test were 68.8% and 83.3%, respectively. Among 5,109 patients who were hospitalized with a principal diagnosis of COVID-19, 4843 (94.8%) had a positive SARS-CoV-2 test within the 2 weeks preceding hospital admission or during hospitalization. In addition, 789 hospitalizations had a secondary diagnosis of COVID-19, of which 446 (56.5%) had a principal diagnosis consistent with severe clinical manifestation of COVID-19 (e.g., sepsis or respiratory failure). Compared with the cohort that had a principal diagnosis of COVID-19, those with a secondary diagnosis had a more than 2-fold higher in-hospital mortality rate (13.2% vs 28.0%, P<0.001). In the validation sample at Mayo Clinic, diagnosis codes more consistently identified SARS-CoV-2 infection (precision of 95%) but had lower recall (63.5%) with substantial variation across the 3 Mayo Clinic sites. Similar to Yale, diagnosis codes consistently identified COVID-19 hospitalizations at Mayo, with hospitalizations defined by secondary diagnosis code with 2-fold higher in-hospital mortality compared to those with a primary diagnosis of COVID-19.Conclusions COVID-19 diagnosis codes misclassified the SARS-CoV-2 infection status of many people, with implications for clinical research and epidemiological surveillance. Moreover, the codes had different performance across two academic health systems and identified groups with different risks of mortality. Real-world data from the EHR can be used to in conjunction with diagnosis codes to improve the identification of people infected with SARS-CoV-2.Competing Interest StatementH.M.K. works under contract with the Centers for Medicare & Medicaid Services to support quality measurement programs, was a recipient of a research grant from Johnson & Johnson, through Yale University, to support clinical trial data sharing; was a recipient of a research agreement, through Yale University, from the Shenzhen Center for Health Information for work to advance intelligent disease prevention and health promotion; collaborates with the National Center for Cardiovascular Diseases in Beijing; receives payment from the Arnold & Porter Law Firm for work related to the Sanofi clopidogrel litigation, from the Martin Baughman Law Firm for work related to the Cook Celect IVC filter litigation, and from the Siegfried and Jensen Law Firm for work related to Vioxx litigation; chairs a Cardiac Scientific Advisory Board for UnitedHealth; was a member of the IBM Watson Health Life Sciences Board; is a member of the Advisory Board for Element Science, the Advisory Board for Facebook, and the Physician Advisory Board for Aetna; and is the co-founder of Hugo Health, a personal health information platform, and co-founder of Refactor Health, a healthcare AI-augmented data management company. W.L.S. was an investigator for a research agreement, through Yale University, from the Shenzhen Center for Health Information for work to advance intelligent disease prevention and health promotion; collaborates with the National Center for Cardiovascular Diseases in Beijing; is a technical consultant to Hugo Health, a personal health information platform, and co-founder of Refactor Health, an AI-augmented data management platform for healthcare; is a consultant for Interpace Diagnostics Group, a molecular diagnostics company. J.S.R. currently receives research support through Yale University from Johnson and Johnson to develop methods of clinical trial data sharing, from the Medical Device Innovation Consortium as part of the National Evaluation System for Health Technology (NEST), from the Food and Drug Administration for the Yale-Mayo Clinic Center for Excellence in Regulatory Science and Innovation (CERSI) program (U01FD005938); from the Agency for Healthcare Research and Quality (R01HS022882), from the National Heart, Lung and Blood Institute of the National Institutes of Health (NIH) (R01HS025164, R01HL144644), and from the Laura and John Arnold Foundation to establish the Good Pharma Scorecard at Bioethics International. In the past 36 months, NDS has received research support through Mayo Clinic from the Food and Drug Administration to establish Yale-Mayo Clinic Center for Excellence in Regulatory Science and Innovation (CERSI) program (U01FD005938); the Centers of Medicare and Medicaid Innovation under the Transforming Clinical Practice Initiative (TCPI); the Agency for Healthcare Research and Quality (R01HS025164; R01HS025402; R03HS025517; K12HS026379); the National Heart, Lung and Blood Institute of the National Institutes of Health (NIH) (R56HL130496; R01HL131535; R01HL151662); the National Science Foundation; from the Medical Device Innovation Consortium as part of the National Evaluation System for Health Technology (NEST) and the Patient Centered Outcomes Research Institute (PCORI) to develop a Clinical Data Research Network (LHSNet). E.S.T. serves on the advisory board of Roche Diagnostics and Serimmune. R.M. serves on the data safety and monitoring board for a phase 1 trial of a COVID therapeutic being investigated by Noveome. D.R.P. serves on the advisory board of Tangen Biosciences and is a co-founder of M/Z Diagnostics. The other authors do not report any relevant disclosures.Funding StatementDr. Khera received support from the National Heart, Lung, and Blood Institute of the National Institutes of Health under the award K23HL153775-01A1. The funder had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:Yale and Mayo Institutional Review BoardsAll necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesPatient-level data represented protected health information and cannot be shared. Summary data for the information presented in the figures are available upon request.