Abstract
The increasing significance of Adverse Drug Events (ADEs) extracted from social media, such as Twitter data, has led to the development of various end-to-end resolution methodologies. Despite recent advancements, there remains a substantial gap in normalizing ADE entities coming from social media, particularly with informal and diverse expressions of symptoms, which is crucial for accurate ADE identification and reporting. To address this challenge, we introduce a novel end-to-end solution called CONORM: Context-Aware Entity Normalization. CONORM is a two-step pipeline. The first component is a transformer encoder fine-tuned for entity recognition. The second component is a context-aware entity normalization algorithm. This algorithm uses a dynamic context refining mechanism to adjust entity embeddings, aiming to align ADE mentions with their respective concepts in medical terminology. An integral feature of CONORM is its compatibility with vector databases, which enables efficient querying and scalable parallel processing. Upon evaluation with the SMM4H 2023 ADE normalization shared task dataset, CONORM achieved an F1-score of 50.20% overall and 39.40% for out-of-distribution samples. These results improve performance by 18.00% and 19.90% over the median shared task results, 7.60% and 10.20% over the best model in the shared task, and 5.00% and 3.10% over the existing state-of-the-art ADE mining algorithm. CONORM’s ability to provide context-aware entity normalization paves the way for enhanced end-to-end ADE resolution methods. Our findings and methodologies shed light on the potential advancements in the broader realm of pharmacovigilance using social media data.
The model architectures are publicly available at https://github.com/anthonyyazdani/CONORM.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This study did not receive any funding
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
anthony.yazdani{at}etu.unige.ch
Data Availability
All data are available online at https://codalab.lisn.upsaclay.fr/competitions/12941