RT Journal Article SR Electronic T1 Identifying Cases of Shoulder Injury Related to Vaccine Administration (SIRVA) Using Natural Language Processing JF medRxiv FD Cold Spring Harbor Laboratory Press SP 2021.05.05.21256555 DO 10.1101/2021.05.05.21256555 A1 Zheng, Chengyi A1 Duffy, Jonathan A1 Liu, In-Lu Amy A1 Sy, Lina S. A1 Navarro, Ronald A. A1 Kim, Sunhea S. A1 Ryan, Denison A1 Chen, Wansu A1 Qian, Lei A1 Mercado, Cheryl A1 Jacobsen, Steven J. YR 2021 UL http://medrxiv.org/content/early/2021/05/07/2021.05.05.21256555.abstract AB Background Shoulder injury related to vaccine administration (SIRVA) accounts for more than half of all claims received by the National Vaccine Injury Compensation Program. However, there is a lack of population-based studies due to the challenge of identifying SIRVA cases in large health care databases.Objective To develop a natural language processing (NLP) method to identify SIRVA cases from clinical notes.Methods We conducted the study among members of a large integrated health care organization who were vaccinated between 04/1/2016 and 12/31/2017 and had subsequent diagnosis codes indicative of shoulder injury. Based on a training dataset with a chart review reference standard of 164 individuals, we developed an NLP algorithm to extract shoulder disorder information, including prior vaccination, anatomic location, temporality and causality. The algorithm identified three groups of positive SIRVA cases (definite, probable and possible) based on the strength of evidence. We compared NLP results to a chart review reference standard of 100 vaccinated individuals. We then applied the final automated NLP algorithm to a broader cohort of vaccinated individuals with a shoulder injury diagnosis code and performed manual chart confirmation on a random sample of NLP-identified definite cases and all NLP-identified probable and possible cases.Results In the validation sample, the NLP algorithm had 100% accuracy for identifying 4 SIRVA cases and 96 individuals without SIRVA. In the broader cohort of 53,585 individuals, the NLP algorithm identified 291 definite, 124 probable, and 52 possible SIRVA cases. The chart-confirmation rates for these groups were 95.3%, 67.7% and 18.9%, respectively.Conclusions The algorithm performed with high sensitivity and reasonable specificity in identifying positive SIRVA cases. The NLP algorithm can potentially be used in future population-based studies to identify this rare adverse event, avoiding labor-intensive chart review validation.Competing Interest StatementLina Sy received research support from GlaxoSmithKline, Dynavax, Seqirus, and Novavax for studies unrelated to this paper. All other authors report no conflicts of interest related to the submitted work.Funding StatementThis study was funded through the Vaccine Safety Datalink under contract 200-2012-53580 from the Centers for Disease Control and Prevention (CDC). The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The Institutional Review Board at KPSC approved this study.All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesThe datasets generated and/or analyzed during the current study are not publicly available due to ethical standards. The authors do not have permission to share data.CIconfidence intervalEHRelectronic health recordsICD-10-CMInternational Classification of Diseases 10th Revision Clinical ModificationKPSCKaiser Permanente Southern CaliforniaNLPnatural language processingNPVnegative predictive valuePPVpositive predictive valueSIRVAshoulder injury related to vaccine administrationVICPNational Vaccine Injury Compensation Program