Artificial intelligence-enabled event adjudication: estimating delayed cardiovascular effects of respiratory viruses
====================================================================================================================

* Shinichi Goto
* Max Homilius
* Jenine E. John
* James G. Truslow
* Andreas A. Werdich
* Alexander J. Blood
* Brian H. Park
* Calum A. MacRae
* Rahul C. Deo

## Abstract

Healthcare systems ideally should be able to draw lessons from historical data, including whether common exposures are associated with adverse clinical outcomes. Unfortunately, structured clinical data, such as encounter diagnostic codes in electronic health records, suffer from multiple limitations and biases, limiting effective learning. We hypothesized that a machine learning approach to automate ascertainment of clinical events and disease history from medical notes would improve upon using structured data and enable the estimation of real-world risks. We sought to test this approach to address a timely goal: estimating the delayed risk of adverse cardiovascular events (i.e. after the index infection) in patients infected with respiratory viruses. Using 4,151 cardiologist-labeled notes as gold standard, we trained a series of neural network models to automate event adjudication for heart failure hospitalization, acute coronary syndrome, stroke, and coronary revascularization and to identify past medical history for heart failure. Though performance varied by task, in nearly all cases, our models surpassed the use of structured data in terms of sensitivity for a given specificity level and enabled principled evaluation of classification thresholds, which is typically impossible to do with diagnostic codes. Deploying our models on more than 17 million notes for 267,596 patients across an extensive integrated delivery network, we found that patients infected with respiratory syncytial virus had a 23% increased risk of delayed heart failure hospitalization over a subsequent 4-year period compared with propensity-score matched patients who had the same test but with negative results (p = 0.003, log-rank). In contrast, we found no such increased risk in patients with a positive influenza viral test compared with a negative test (rate ratio 0.98, p = 0.71). We conclude that convolutional neural network-based models enable accurate clinical labeling at scale, thereby unlocking timely insights from unstructured clinical data.

## Introduction

The elusive goal of a learning healthcare system requires accurate clinical information. Structured healthcare data, although increasingly accessible, suffers from biases ubiquitous in observational data, including misclassification bias. In particular, diagnostic codes, widely used for billing and research, typically have low sensitivity and specificity1, 2. When used to establish an association of exposures to a clinical outcome, such limitations can lead to misclassified outcomes and interfere with adjustment procedures for confounding and selection bias3.

Nonetheless, the detection of such associations is a priority and often requires analysis of large- scale real-world data 4. Furthermore, such observational data is often the only choice for urgent clinical questions with potential public health implications. One example of such a pressing topic is the cardiovascular consequences of common respiratory viruses. The immediate adverse impact of respiratory viruses such as influenza, respiratory syncytial virus (RSV), and most recently, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is well recognized and includes acute heart failure5, 6, myocardial infarction7, 8, myocarditis9–11, and stroke12. However, one unresolved question is whether there is a delayed increased risk in such patients well after the incident infection.

We sought to address these questions, first developing machine learning methods to ascertain and adjudicate clinical events and past medical history at scale, and then applying these to patient data across multiple institutions within a large integrated delivery network.

## Results

### Building a large compendium of cardiologist-labeled notes

The overview of our approach is shown in **Figure 1**. To train an accurate model for event adjudication using medical notes, we labeled a large compendium of discharge summaries and progress notes. A total of 2,967 discharge summaries (1,372 for derivation set, 592 for validation set and 1,003 for test set) and 1,184 progress notes (588 for derivation set, 231 for validation set and 365 for test set) were obtained from the Massachusetts General Brigham (MGB) Enterprise Data Warehouse (EDW) database, which represents multiple institutions across an integrated delivery network. Each discharge summary was manually annotated by cardiologists following previously published guidelines for 4 possible diagnoses13: acute coronary syndrome (ACS), heart failure (HF) as the primary cause of hospitalization, percutaneous coronary intervention or coronary artery bypass graft surgery (PCI/CABG), and ischemic stroke. Progress notes were labeled according to a 4-class system for heart failure history (see Methods). Discharge summaries were derived from 17 different institutions while progress notes were from 46 different institutions with differing levels of specialized care **(Table S1 and S2)**. The labeled discharge summaries were then used to train the event models and the progress notes were used to train the note-level past-history model **(Figure S1)**. Because single notes are not exhaustive in documenting the past medical history of a patient, we developed a model that integrated the output from multiple notes to estimate past-history of a specific disease at a given point in time. To this end, all notes for an additional 863 patients (403 for derivation, 156 for validation and 304 for test) were used to label past history at a given timepoint, and were used in combination with the note-level past-history model to train a patient level past-history model **(Figure S2)**.

![Figure 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/11/16/2020.11.12.20230706/F1.medium.gif)

[Figure 1.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/F1)

Figure 1. 
Schematic illustration of the study flow. A large compendium of cardiologist-labeled discharge summaries and progress notes were used to train four cardiovascular event adjudication models and a heart failure history model using neural networks. The models were deployed on 17 million notes for 267,596 patients to evaluate the impact of respiratory virus infections on delayed cardiovascular events.

View this table:
[Table 1.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/T1)

Table 1. Sensitivity and specificity of ICD-10 code

### Neural network-based models have high accuracy for event adjudication

We trained four event adjudication models based on a neural network architecture using the labeled discharge summaries as training and testing data **(Figure S3)**. The neural network-based classifiers had excellent accuracy for detecting events with area under the curve of receiver operating characteristics (AUC-ROC) of 0.97 (95%CI 0.95-0.98) for heart failure hospitalization, 1.0 (0.99-1.00) for PCI/CABG, 0.97 (0.95-0.98) for ACS and 0.98 (0.97-0.99) for ischemic stroke (**Figure 2**). In comparison, structured diagnostic data (ICD-10 codes for ACS/HF/IS and Current Procedural Terminology (CPT) codes for PCI/CABG) had excellent specificity but only modest sensitivity with the exception of ischemic stroke, for which diagnostic codes had excellent sensitivity for the notes we examined **(Table 1)**. The good performance of the ICD-10 codes for stroke may be attributable to the fact that most of the stroke cases we considered were admitted to neurology intensive care units, which may not be the case at other institutions/settings. Overall, our models were able to provide increased sensitivity at a 90% specificity threshold for events **(Table 2).** Based on these results, we predicted events based on our event adjudication models for HF hospitalization, PCI/CABG and ACS and used ICD-10 codes to define ischemic stroke events.

![Figure 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/11/16/2020.11.12.20230706/F2.medium.gif)

[Figure 2.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/F2)

Figure 2. 
Accuracy of event adjudication model to classify hospitalized events. Receiver operating characteristics curves (left) and precision-recall curves (right) for event adjudication models for PCI/CABG, ACS, heart failure hospitalization and ischemic stroke in the test dataset. Light blue area indicates 95% confidence interval and the pink diamond indicates the sensitivity and specificities of the ICD-10 code-based diagnosis.

View this table:
[Table 2.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/T2)

Table 2. Cutoffs, sensitivity and specificity of the event adjudication model

### Patients positive for respiratory syncytial virus infection have a higher risk of subsequent heart failure hospitalization compared with those with negative results

A scalable method of accurate event adjudication enables a breadth of downstream applications of clinical and economic value to hospital systems and public health measures. We focused on a timely area, the association of respiratory virus infection with remote cardiovascular events. Using data from the MGB EDW, we identified 267,596 patients with diagnostic test results for a respiratory virus. Of these, we first focused on the 57,828 patients with respiratory syncytial virus (RSV) diagnostic test results, out of which 35,056, 35,828 and 35,778 patients were included in the final cohort for predicting heart failure hospitalization, urgent revascularization and ischemic stroke analysis respectively **(Figure S4-S6)**.

By selecting patients based on having undergone a viral test, we introduced possible selection (collider) bias14, given that clinical suspicion for a respiratory virus implies an increased likelihood of a cardiopulmonary etiology **(Figure S7)**. We thus used propensity-score matching to identify patients with a negative test result but with a similar probability of having a positive result for viral infection, as judged by values of other clinical (confounding) variables15. Propensity-score matching with 1:16 ratio resulted in well-balanced baseline characteristics between RSV positive and negative patients for all 3 outcomes (standardized mean difference <0.1 for all characteristics, **Tables S3-S8**). To focus on delayed outcomes rather than acute outcomes, we defined events as occurring at least 30 days after the test result was obtained (and matched patients by comorbidities up to that point). RSV positive patients had an event rate of 10.5/100 person-years of heart failure hospitalization which was significantly higher compared to those with negative results who had an event rate of 8.6/100 person-years (n=26,027, median follow-up 368 days, p = 0.0032, **Figure 3 upper panel**). There was no significant difference between RSV positive patients and negative patients for urgent revascularization, defined as the conjunction of a diagnosis of acute coronary syndrome and either PCI or CABG (n=26,724, median follow-up 388 days, 0.8/100 person-years for test positive patients and 0.7/100 person-years for test negative patients, p = 0.49) and ischemic stroke (n=26,690, median follow-up 389 days, 1.1/100 person- years for test positive and 0.9/100 person-years for test negative, p = 0.46). To evaluate if this analysis was driven mostly by early events, we restricted events to those occurring at least 90 days after the test result (patient selection and baseline characteristics shown in **Figures S8-S10** and **Tables S9-14**). This sensitivity analysis showed similar results with a longer median follow- up period of 418, 444 and 442 days for heart failure hospitalization, urgent revascularization and ischemic stroke, respectively, suggesting that our results are driven by delayed events rather than acute events **(Figure S11 upper panel)**.

![Figure 3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/11/16/2020.11.12.20230706/F3.medium.gif)

[Figure 3.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/F3)

Figure 3. 
Survival analysis for RSV and influenza infection on cardiovascular events. Kaplan-Meier plot comparing event rates of heart failure hospitalization, urgent revascularization and ischemic stroke in virus positive and negative population for RSV and influenza. Abbreviations, RSV: respiratory syncytial virus, INF: influenza virus.

### Patients positive for influenza virus infection do not have an increased risk of delayed cardiovascular events compared to those with negative results

We identified 81,250 patients with influenza test results, of which 58,522, 59,594 and 59,523 patients were included in the final cohort for heart failure hospitalization, urgent revascularization, and ischemic stroke, respectively **(Figure S12-S14).** As for RSV, we used propensity-score matching to adjust for differences in baseline characteristics between the virus positive and negative population. This matching procedure resulted in 1:5 matching between influenza positive and negative patients for all 3 outcomes (standardized mean difference <0.1, **Tables S15-S20**). In contrast to the RSV virus, there were no significant differences between influenza positive patients and negative patients for delayed heart failure hospitalization (n=30,828, median follow- up 614 days, 2.8/100 person-years for test positive and 2.9/100 person-years for test negative, p = 0.71), urgent revascularization (n=31,056, median follow-up 643 days, 0.3/100 person-years for test positive and 0.3/100 person-years for test negative, p = 0.62) or ischemic stroke (n=31,032, median follow-up 653 days, 0.5/100 person-years for test positive and 0.5/100 person-years for test negative, p = 0.62, **Figure 3 lower panel**). As with RSV, we also performed an analysis by using 90 days as the early event exclusion period (patient selection and baselines shown in **Figures S15-S17** and **Tables 21-26**). The results were similar to the main analysis with median follow up period of 670, 682 and 653 days for heart failure hospitalization, urgent revascularization and ischemic stroke, respectively **(Figure S11 lower panel).**

### Sensitivity analysis using a neural network-based model for heart failure history shows robustness of results

The past medical history of heart failure used for propensity-score matching was derived from an aggregate use of ICD-10 codes from problem lists, encounter diagnoses, medical history, and admission diagnoses. Given concern for inaccuracy of structured data, our finding that RSV infection was associated with higher rate of heart failure hospitalization may have been affected by the definition of HF history (which has large impact on this outcome). To overcome this concern, we first sought to evaluate the accuracy of the ICD-10 code-based approach to define HF history. To this end, we performed a systematic effort of labeling 863 patients and found that the ICD-10 approach nonetheless had a good accuracy with sensitivity 0.80 and specificity 0.97 (**Figure 4A**). Although the performance was good, the use of structured data typically leaves no flexibility to explore robustness of the result to different thresholds. We were particularly concerned about the impact of a reduced sensitivity. Therefore, we developed a neural network-based approach for learning past medical history at a given point in time using a neural network-based strategy. We trained three separate models: a model for note-informativeness, a model for heart failure history classification, and finally a time series model that used an aggregate of note scores across time. The final model had an excellent discriminative ability with ROC-AUC of 0.97 (**Figure 4A**). Although the accuracy at the mandatory point of comparison was not superior to the ICD-10 code- based definition, the model enabled evaluating robustness of the results to the selected cutoff point. We assessed conclusions using cutoffs with high sensitivity, high specificity and a balanced sensitivity/specificity ratio. Different cutoffs gave different absolute rates of heart failure history **(Table S26)**. Nonetheless, propensity score matching using model-based heart failure history was successful with SMD<0.1 for all baseline values **(Table S27-32)** and the overall results were unchanged for the association of heart failure hospitalization with viral infection, with p-values ranging from 0.0030 to 0.0043 for RSV and 0.28 to 0.70 for influenza (**Figure 4B**).

![Figure 4.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/11/16/2020.11.12.20230706/F4.medium.gif)

[Figure 4.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/F4)

Figure 4. 
Sensitivity analysis evaluating different thresholds for a past medical history of heart failure on the risk of heart failure hospitalization with viral infection. (A) Receiver operating characteristics curves for the heart failure past medical history model. (B) Kaplan-Meier plot comparing event rates of heart failure hospitalization in virus positive and negative populations for RSV and influenza with various cutoffs for heart failure history. Sensitivity and specificity for detecting heart failure histories with the corresponding cutoffs are shown. Abbreviations, RSV: respiratory syncytial virus, INF: influenza virus.

## Discussion

Here we outline two primary contributions; 1) machine learning models for automated event adjudication for cardiovascular events and past medical history and 2) application of these models to estimate the risk of delayed heart failure hospitalizations in individuals with respiratory virus infections. In this context, we observed an estimated 23% higher risk of delayed heart failure hospitalizations in individuals infected with RSV compared to those who tested negative but no change in the risk for such events following influenza infection.

Our work labeling a large collection of notes across over 17 institutions within an integrated delivery network adds further support to the limitations of diagnostic codes for describing cardiovascular events2. The accuracy of diagnostic codes varied strongly with the type of event: at one extreme, ischemic stroke was very well captured whereas there were no clear diagnostic codes for coronary revascularization, and, at least within our database, the use of CPT codes had very low sensitivity. Reimbursement incentives are known to shape the completeness of diagnostic codes16 and there is also further variation from center to center based on provider and institutional coding practices. Nonetheless, a systematic labeling approach can provide local estimates of the adequacy of diagnostic code for any specific application, and further simulation- based modeling can be used to understand the impact of misclassification bias on any downstream uses of the data.

Although we found that a historical compilation encompassing encounter diagnoses, admission diagnoses, problem lists, and medical history is highly accurate, at least for heart failure, the sensitivity was relatively low compared to specificity, which is in agreement with previous studies showing that ICD-10 codes for past medical history suffers primarily from problems of sensitivity17. The ICD-10 code-based approach forces a single sensitivity/specificity and does not allow adjustment or optimization. In comparison, our model-based definition can provide various sensitivity/specificity ratio by adjusting the threshold. This feature enabled us to perform a sensitivity analysis showing that our results are robust to the sensitivity/specificity of the past medical history detection. Furthermore, we anticipate that our approach should generalize well to capturing conditions for which there are no diagnostic codes (such as inadequate response to a medication).

In terms of prior work, most of the focus on EHR-based automated diagnoses has been on the use of heuristic combinations of diagnostic codes, medications, labs, and regular expression matching within unstructured data2. Recent advances in convolutional neural network-based text processing have motivated their application to the problem of note classification18, 19. Our own experience with this process emphasized the need for both domain (i.e., cardiovascular disease) and data science expertise, with an iterative examination of failure cases, augmented labeling, and retraining of models.

Our models enabled scalable analysis of millions of discharge summaries and progress notes. We see multiple applications of such an approach, whether it be for pharmacovigilance, quality control, or patient selection for a clinical trial according to criteria that may be poorly reflected in structured data. For our specific application, we were able to conclude that influenza virus infection does not appear to contribute to any delayed increase in heart failure hospitalization, while RSV infection contributes a 23% increase in risk that is apparent across an entire four-year period.

Prior work focusing on respiratory viruses have emphasized the acute cardiovascular manifestations of viral infections in a limited subset of patients5, 20. In the case of RSV, it is well recognized that patients with RSV infections have a higher burden of cardiovascular disease, although this may reflect selection bias, given that cardiovascular consequences of viral infection may add to pre-existing disease to drive presentation to healthcare systems. However, our results are striking in that even controlling for heart failure diagnoses prior to and up to 90 days after the incident event, we still see an increased risk of future events. While previous reports show increased risk of cardiovascular events in patients with influenza infections, they focus entirely on acute events in the context of the infection6–8. Our analysis excludes acute events and thus, does not contradict these findings. It is entirely plausible that influenza virus increases heart failure hospitalization in the acute phase, either through direct myocardial involvement21 or indirectly through a systemic inflammatory response as with many other infections22, but patients may not have prolonged damage to the heart.

The primary limitation of our work is the unknown contribution of unmeasured confounders with significant impact on both heart failure hospitalization outcomes and probability of positive test for viral infection. Since our analysis focused only on those who had been tested for the virus, these population likely had symptoms suggestive of a cardiopulmonary illness. Thus, the negative results for the virus results may be consistent with other reasons for having respiratory symptoms. In the case of influenza virus, we observed a classic example of collider bias requiring propensity score matching approaches23. Nonetheless, such approaches inevitably can suffer from model misspecification. However, it should be noted that possible approaches to address these limitations such as randomizing patients to be infected by a virus or performing a cohort study testing every individual regardless of the symptoms are not feasible.

An open question is whether our work has relevance to other respiratory viruses, such as SARS- COV2. Although SARS-COV2, like influenza21 and RSV, can cause myocarditis9–11, 24, there is no clarity on the prevalence of this effect given the selection bias in published reports. Nonetheless, it is reasonable to assume the possibility of long-term cardiac consequences for SARS-COV2, that will only be addressed with multi-year surveillance and methodologic approaches such as our own.

## Methods

### Ethics statement

Institutional review board approval was obtained for all aspects of this study,

### Selection and labeling of clinical notes for event model training and evaluation

The Massachusetts General Brigham (MGB) Electronic Data Warehouse (EDW), which includes notes from >20 provider locations across a large integrated delivery network, was the source of notes and structured clinical data used in this study. Two sets of discharge summaries were used for model training and testing. The training set, consisting of 1964 discharge summaries, were partially enriched with the use of ICD10 codes for stroke, myocardial infarction, and heart failure. The test set, consisting of 1,003 discharge summaries, was selected randomly from patients above who had undergone electrocardiograms within the centers participating in the EDW.

Following previously published guideline13, one of 4 cardiologists labeled each discharge summary for 4 possible diagnoses: acute coronary syndrome, heart failure as the primary cause of hospitalization, percutaneous coronary intervention or coronary artery bypass surgery (PCI/CABG), or ischemic stroke.

### Processing of clinical notes

We used SpaCy v2.2.425 to tokenize medical notes. We extended the English language model (en-core-web-large) tokenization rules by adding a comprehensive set of abbreviations that are commonly used in medical records. We used fasttext v0.9.226 to train 100-dimensional bigram word embeddings with the parameters ‘fasttext skipgram -wordNgrams 2’. Word embeddings were trained on a compendium of more than 10 million clinical notes ranging from 2002 to 2020, based on more than 330,000 subjects, with up to 50 notes per subject.

### Training of event adjudication models

The event adjudication model was constructed with the combination of 1D-CNN and long-short- term memory (LSTM). It consisted of a word embedding layer, which converts words into word vectors, followed by a 1D-CNN and LSTM combination network (schematic shown in **Figure S1**). The model was trained using data from derivation dataset. Each discharge summary was labeled as positive or negative for 4 outcomes of interest (heart failure hospitalization, PCI/CABG, ACS and ischemic stroke) and 4 separate models were trained for each outcome label. The model was trained to minimize the binary cross entropy between model prediction and the label using RMSprop optimizer with initial learning rate of 0.0001. The model was trained for 150 epochs. At the end of each epoch, ROC-AUC on the validation dataset was calculated. The final model was chosen as the model with highest ROC-AUC on the validation cohort across all 150 epochs.

### Selection of test results for respiratory virus infection

Using a data dictionary, we identified identifiers for diagnostic tests for influenza A and B, parainfluenza, respiratory syncytial virus (RSV), coronavirus, and adenovirus and downloaded all relevant viral test results within the MGB-EDW. We limited analysis to tests having at least 5000 patients. The test results were reported as free text. While most results were reported simply as positive or negative, in some cases, the descriptions of the viral test results were complex. Therefore, a physician (SG) reviewed all possible texts in the report and compiled a conversion table from the texts to positive/negative results. Since comprehensive reporting in the EDW started on Feb 2016, we included only results after that date.

### Survival analysis for cardiovascular events

To evaluate the influence of viral infection on 3 cardiovascular events (heart failure hospitalization, urgent revascularization and ischemic stroke), we performed a survival analysis for these outcomes. Due to limited numbers or follow-up length for other viruses, we focused our analysis on influenza and RSV. To avoid possible bias caused by the COVID-19 outbreak in Boston area, all the follow up were censored at February 2020. All viral test results between February 2016 and February 2020 were identified. Positive patients were identified as those having at least one positive result for the virus of interest within the study period and the date of first positive result was used as the index date. Patients who had the test for the virus of interest but did not have any positive results were identified as a control population within that cohort. Index dates were identified as the date of first viral test for control patients. All patients who had other viral infections before the index date were excluded.

Outcome events were defined bases on the event text model predictions for heart failure hospitalization and urgent revascularization and ICD-10 codes for ischemic stroke. The cutoffs for the events were selected as the value that gave the best sensitivity at a specificity over 90%. Urgent revascularization was defined as positive results for PCI/CABG and ACS for the same discharge summary. All 3 outcome events were analyzed independently. Patients were censored at the first event, death, last encounter in the system or the end of study period, whichever occurred first. To account for confounders, we used a propensity-score matching method15. A propensity score for a positive test was calculated with age, sex, race, smoking status, history of heart failure, history of coronary artery disease, history of atrial fibrillation, history of type 2 diabetes mellitus, history of COPD, history of hypertension, history of dyslipidemia, use of loop diuretic, use of aspirin, use of P2Y12 inhibitors and the method for viral test using logistic regression. In this analysis, the past histories were identified using ICD-10 codes. The between group difference in baseline characteristics was assessed by the standardized mean difference (SMD) where a value under 0.1 is considered negligible. Patients with positive viral results were matched to controls (without replacement) using a nearest neighbor method, as implemented in the *MatchIt* package (version 3.0.2) in *R* (version 3.5.1)27. The matching ratio was set to the highest number that did not introduce substantial difference in any baseline characteristics (SMD<0.1).

### Training and testing of a past medical history model

Since past medical history is defined using a cumulative record and not on a single note, we took a 2-step approach for training the history model. Our first step focused on classifying the note. Since some notes are totally unrelated to the history of disease (such as nutrition notes), we trained a model to classify if the note is informative for a specific history or not and a second model to actually classify if the history is positive or negative. To this end, each note was initially labeled with 3 labels (positive/negative and uninformative). During labeling, we also noticed that there were some notes that stated an ambiguous condition (e.g. extremely low ejection fraction without noting if the patient have symptoms or not for heart failure). We labeled these by a 4th label (ambiguous) and excluded these from the training data. Two sets of notes were used for model training and testing from the MGB-EDW database where the training set were partially enriched with the use of ICD10 codes heart failure and the test set selected randomly. The final numbers of notes were 819 for training set and 365 for the test set. The note-level models resulted in note-level accuracy of AUC 0.93 for informativeness and 0.93 for the presence of heart failure history **(Figure S2A).**

The second step focused on combining all the note-level output to identify the past medical history status for each patient for a specific date. To this end, we built a time-series neural network-model that takes note-level prediction for informativeness and past medical history status along with the dates of the note relative to the index date as input. The model produces a single binary output reflecting the probability of the patient having the disease history at the index date **(Figure S2B)**. For this model, we labeled a separate patient dataset (with no overlap with the note level history model patients) consisting of 561 patient-date pairs for training and 305 patient-date pairs for testing. The patients in the training datasets were partially enriched for past medical history using the event adjudication model to select patients who had an event before the index date. In contrast, the patients in test set were randomly selected from a group of patients 60 years and above who had received an electrocardiogram at MGB. The labeling was done by reading all the notes associated with that patient to identify if the patient had a positive history of the disease at the index date. The input data was constructed as time-series data of the note-level output for all the notes within ± 2500 days of index date. Each output value was inserted into a vector of length 5000 (each element corresponding to the date relative to index date). The neural network was trained to take this vector as input and a binary 1/0 value as output.

### Sensitivity analysis with various thresholds for heart failure history

To evaluate if the results of our survival analysis was sensitive to the sensitivity/specificity of the past medical history detection method, repeated the survival analysis for heart failure hospitalization using the heart failure history defined based on the time-series history model with various cutoff points. We used 3 cutoff points for this analysis, each of which giving (1) highest specificity with sensitivity ≥ 0.95, (2) sensitivity and specificity both over > 0.90, (3) highest sensitivity with specificity ≥ 0.95. The cutoffs were 0.24 (sensitivity=0.96, specificity=0.83), 0.41 (sensitivity=0.90, specificity=0.93) and 0.43 (sensitivity=0.68, specificity=0.98). The propensity score matching was re-done using the model-based heart failure history instead of ICD-10 code- based heart failure history.

## Data Availability

The data that support the findings of this study are available on request from the corresponding author R.C.D. upon approval of the data sharing committees of the respective institutions. The data are not publicly available due to the presence of information that could compromise research participant privacy.

## Supplementary Figures

![Figure S1:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/11/16/2020.11.12.20230706/F5.medium.gif)

[Figure S1:](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/F5)

Figure S1: 
Schematic drawing of the network structure of the event adjudication model. (A) Structure of the multi-1D-CNN module and (B) the full network structure of the model

![Figure S2:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/11/16/2020.11.12.20230706/F6.medium.gif)

[Figure S2:](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/F6)

Figure S2: 
Past medical history model. (A) ROC curves showing performance of note-level past medical history and uninformativeness model. (B) Schematic drawing for time-series model for past medical history.

![Figure S3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/11/16/2020.11.12.20230706/F7.medium.gif)

[Figure S3.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/F7)

Figure S3. 
Flow diagram of notes selection used for training and testing event adjudication models.

![Figure S4.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/11/16/2020.11.12.20230706/F8.medium.gif)

[Figure S4.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/F8)

Figure S4. 
Flow diagram of patient selection for survival analysis for heart failure hospitalization in patients who had an RSV viral test between February 1, 2016 and February 1, 2020.

![Figure S5.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/11/16/2020.11.12.20230706/F9.medium.gif)

[Figure S5.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/F9)

Figure S5. 
Flow diagram of patient selection for survival analysis on urgent revascularization in patients undergoing RSV viral tests.

![Figure S6.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/11/16/2020.11.12.20230706/F10.medium.gif)

[Figure S6.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/F10)

Figure S6. 
Flow diagram of patient selection for survival analysis on ischemic stroke in atients undergoing RSV viral tests.

![Figure S7.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/11/16/2020.11.12.20230706/F11.medium.gif)

[Figure S7.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/F11)

Figure S7. Graphical model illustrating potential selection bias in patients who underwent testing for a respiratory virus.
Selecting patients based on having undergone a viral test induces an inverse association between viral infection and other causes of respiratory symptoms (collider bias). The latter may also influence the probability of delayed cardiovascular events, and thus were handled by adjustment.

![Figure S8.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/11/16/2020.11.12.20230706/F12.medium.gif)

[Figure S8.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/F12)

Figure S8. 
Flow diagram of patient selection for survival analysis on heart failure in patients undergoing RSV viral tests (excluding those who had event within the first 90 days).

![Figure S9.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/11/16/2020.11.12.20230706/F13.medium.gif)

[Figure S9.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/F13)

Figure S9. 
Flow diagram of patient selection for survival analysis on urgent revascularization in patients undergoing RSV viral tests (excluding those who had event within the first 90 days).

![Figure S10.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/11/16/2020.11.12.20230706/F14.medium.gif)

[Figure S10.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/F14)

Figure S10. 
Flow diagram of patient selection for survival analysis on ischemic stroke in patients undergoing RSV viral tests (excluding those who had event within the first 90 days).

![Figure S11.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/11/16/2020.11.12.20230706/F15.medium.gif)

[Figure S11.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/F15)

Figure S11. 
Survival analysis for RSV and influenza infection on cardiovascular events (excluding those who had event within the first 90 days). Kaplan-Meier plot comparing event rates of heart failure hospitalization, urgent revascularization and ischemic stroke in virus positive and negative population for RSV and influenza. Abbreviations, RSV: respiratory syncytial virus, INF: influenza virus.

![Figure S12.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/11/16/2020.11.12.20230706/F16.medium.gif)

[Figure S12.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/F16)

Figure S12. 
Flow diagram of patient selection for survival analysis on heart failure hospitalization in patients undergoing influenza viral tests.

![Figure S13.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/11/16/2020.11.12.20230706/F17.medium.gif)

[Figure S13.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/F17)

Figure S13. 
Flow diagram of patient selection for survival analysis on urgent revascularization in patients undergoing influenza viral tests.

![Figure S14.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/11/16/2020.11.12.20230706/F18.medium.gif)

[Figure S14.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/F18)

Figure S14. 
Flow diagram of patient selection for survival analysis on ischemic stroke in patients undergoing influenza viral tests.

![Figure S15.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/11/16/2020.11.12.20230706/F19.medium.gif)

[Figure S15.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/F19)

Figure S15. 
Flow diagram of patient selection for survival analysis on heart failure in patients undergoing influenza viral tests (excluding those who had event within the first 90 days).

![Figure S16.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/11/16/2020.11.12.20230706/F20.medium.gif)

[Figure S16.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/F20)

Figure S16. 
Flow diagram of patient selection for survival analysis on urgent revascularization in patients undergoing influenza viral tests (excluding those who had event within the first 90 days).

![Figure S17.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/11/16/2020.11.12.20230706/F21.medium.gif)

[Figure S17.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/F21)

Figure S17. 
Flow diagram of patient selection for survival analysis on ischemic stroke in patients undergoing influenza viral tests (excluding those who had event within the first 90 days).

## Supplementary Tables

View this table:
[Table S1.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/T3)

Table S1. 
Demographics for the notes used for training and testing event models

View this table:
[Table S2.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/T4)

Table S2. 
Demographics for the notes used for training and testing history model

View this table:
[Table S3.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/T5)

Table S3. 
Baseline characteristics for survival analysis of heart failure hospitalization stratified by RSV infection (unmatched)

View this table:
[Table S4.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/T6)

Table S4. 
Baseline characteristics for survival analysis of urgent revascularization stratified by RSV infection (unmatched)

View this table:
[Table S5.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/T7)

Table S5. 
Baseline characteristics for survival analysis of stroke stratified by RSV infection (unmatched)

View this table:
[Table S6.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/T8)

Table S6. 
Baseline characteristics for survival analysis of heart failure hospitalization stratified by RSV infection (matched)

View this table:
[Table S7.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/T9)

Table S7. 
Baseline characteristics for survival analysis of urgent revascularization stratified by RSV infection (matched)

View this table:
[Table S8.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/T10)

Table S8. 
Baseline characteristics for survival analysis of stroke stratified by RSV infection (matched)

View this table:
[Table S9.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/T11)

Table S9. 
Baseline characteristics for survival analysis of heart failure stratified by RSV infection (with 90 days exclusion window, unmatched)

View this table:
[Table S10.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/T12)

Table S10. 
Baseline characteristics for survival analysis of urgent revascularization stratified by RSV infection (with 90 days exclusion window, unmatched)

View this table:
[Table S11.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/T13)

Table S11. 
Baseline characteristics for survival analysis of ischemic stroke stratified by RSV infection (with 90 days exclusion window, unmatched)

View this table:
[Table S12.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/T14)

Table S12. 
Baseline characteristics for survival analysis of heart failure stratified by RSV infection (with 90 days exclusion window, matched)

View this table:
[Table S13.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/T15)

Table S13. 
Baseline characteristics for survival analysis of urgent revascularization stratified by RSV infection (with 90 days exclusion window, matched)

View this table:
[Table S14.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/T16)

Table S14. 
Baseline characteristics for survival analysis of ischemic stroke stratified by RSV infection (with 90 days exclusion window, matched)

View this table:
[Table S15.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/T17)

Table S15. 
Baseline characteristics for survival analysis of heart failure hospitalization stratified by influenza infection (unmatched)

View this table:
[Table S16.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/T18)

Table S16. 
Baseline characteristics for survival analysis of urgent revascularization stratified by influenza infection (unmatched)

View this table:
[Table S17.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/T19)

Table S17. 
Baseline characteristics for survival analysis of stroke stratified by influenza infection (unmatched)

View this table:
[Table S18.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/T20)

Table S18. 
Baseline characteristics for survival analysis of heart failure hospitalization stratified by influenza infection (matched)

View this table:
[Table S19.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/T21)

Table S19. 
Baseline characteristics for survival analysis of urgent revascularization stratified by influenza infection (matched)

View this table:
[Table S20.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/T22)

Table S20. 
Baseline characteristics for survival analysis of stroke stratified by influenza infection (matched)

View this table:
[Table S21.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/T23)

Table S21. 
Baseline characteristics for survival analysis of heart failure stratified by influenza infection (with 90 days exclusion window, unmatched)

View this table:
[Table S22.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/T24)

Table S22. 
Baseline characteristics for survival analysis of urgent revascularization stratified by influenza infection (with 90 days exclusion window, unmatched)

View this table:
[Table S23.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/T25)

Table S23. 
Baseline characteristics for survival analysis of ischemic stroke stratified by influenza infection (with 90 days exclusion window, unmatched)

View this table:
[Table S24.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/T26)

Table S24. 
Baseline characteristics for survival analysis of heart failure stratified by influenza infection (with 90 days exclusion window, matched)

View this table:
[Table S25.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/T27)

Table S25. 
Baseline characteristics for survival analysis of urgent revascularization stratified by influenza infection (with 90 days exclusion window, matched)

View this table:
[Table S26.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/T28)

Table S26. 
Baseline characteristics for survival analysis of ischemic stroke stratified by influenza infection (with 90 days exclusion window, matched)

View this table:
[Table S26.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/T29)

Table S26. 
History of heart failure based on the model with various cutoffs for heart failure hospitalization analysis on RSV infection (unmatched)

View this table:
[Table S27.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/T30)

Table S27. 
Baseline characteristics for survival analysis of heart failure hospitalization stratified by RSV infection (matched, model-based heart failure history with cutoff 0.24)

View this table:
[Table S28.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/T31)

Table S28. 
Baseline characteristics for survival analysis of heart failure hospitalization stratified by RSV infection (matched, model-based heart failure history with cutoff 0.41)

View this table:
[Table S29.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/T32)

Table S29. 
Baseline characteristics for survival analysis of heart failure hospitalization stratified by RSV infection (matched, model-based heart failure history with cutoff 0.43)

View this table:
[Table S30.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/T33)

Table S30. 
Baseline characteristics for survival analysis of heart failure hospitalization stratified by influenza infection (matched, model- based heart failure history with cutoff 0.24)

View this table:
[Table S31.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/T34)

Table S31. 
Baseline characteristics for survival analysis of heart failure hospitalization stratified by influenza infection (matched, model- based heart failure history with cutoff 0.41)

View this table:
[Table S32.](http://medrxiv.org/content/early/2020/11/16/2020.11.12.20230706/T35)

Table S32. 
Baseline characteristics for survival analysis of heart failure hospitalization stratified by influenza infection (matched, model- based heart failure history with cutoff 0.43)

*   Received November 12, 2020.
*   Revision received November 12, 2020.
*   Accepted November 16, 2020.


*   © 2020, Posted by Cold Spring Harbor Laboratory

This pre-print is available under a Creative Commons License (Attribution-NonCommercial-NoDerivs 4.0 International), CC BY-NC-ND 4.0, as described at [http://creativecommons.org/licenses/by-nc-nd/4.0/](http://creativecommons.org/licenses/by-nc-nd/4.0/)

## References

1.  1.O’Malley KJ, Cook KF, Price MD, Wildes KR, Hurdle JF, Ashton CM. Measuring Diagnoses: ICD Code Accuracy. Health Serv Res. 2005;40:1620–1639.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/j.1475-6773.2005.00444.x&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=16178999&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F11%2F16%2F2020.11.12.20230706.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000231908400005&link_type=ISI) 

2.  2.Wei W-Q, Teixeira PL, Mo H, Cronin RM, Warner JL, Denny JC. Combining billing codes, clinical notes, and medications from electronic health records provides superior phenotyping performance. J Am Med Inform Assn. 2016;23:e20–e27.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/jamia/ocv130&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26338219&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F11%2F16%2F2020.11.12.20230706.atom) 

3.  3.Lash TL, Fox MP, MacLehose RF, Maldonado G, McCandless LC, Greenland S. Good practices for quantitative bias analysis. Int J Epidemiol. 2014;43:1969–1985.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/ije/dyu149&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25080530&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F11%2F16%2F2020.11.12.20230706.atom) 

4.  4.Patadia VK, Coloma P, Schuemie MJ, Herings R, Gini R, Mazzaglia G, Picelli G, Fornari C, Pedersen L, Lei J van der, Sturkenboom M, Trifirò G, consortium E-A. Using real-world healthcare data for pharmacovigilance signal detection – the experience of the EU-ADR project. Expert Rev Clin Phar. 2014;8:95–102.
    
    
5.  5.Falsey AR, Hennessey PA, Formica MA, Cox C, Walsh EE. Respiratory Syncytial Virus Infection in Elderly and High-Risk Adults. New Engl J Med. 2005;352:1749–1759.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJMoa043951&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=15858184&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F11%2F16%2F2020.11.12.20230706.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000228689000005&link_type=ISI) 

6.  6.Kytömaa S, Hegde S, Claggett B, Udell JA, Rosamond W, Temte J, Nichol K, Wright JD, Solomon SD, Vardeny O. Association of Influenza-like Illness Activity With Hospitalizations for Heart Failure. Jama Cardiol. 2019;4:363–369.
    
    
7.  7.Kwong JC, Schwartz KL, Campitelli MA, Chung H, Crowcroft NS, Karnauchow T, Katz K, Ko DT, McGeer AJ, McNally D, Richardson DC, Rosella LC, Simor A, Smieja M, Zahariadis G, Gubbay JB. Acute Myocardial Infarction after Laboratory-Confirmed Influenza Infection. New Engl J Medicine. 2018;378:345–353.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJMoa1702090&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=29365305&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F11%2F16%2F2020.11.12.20230706.atom) 

8.  8.Barnes M, Heywood AE, Mahimbo A, Rahman B, Newall AT, Macintyre CR. Acute myocardial infarction and influenza: a meta-analysis of case–control studies. Heart. 2015;101:1738.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6ODoiaGVhcnRqbmwiO3M6NToicmVzaWQiO3M6MTE6IjEwMS8yMS8xNzM4IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMTEvMTYvMjAyMC4xMS4xMi4yMDIzMDcwNi5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

9.  9.Inciardi RM, Lupi L, Zaccone G, Italia L, Raffo M, Tomasoni D, Cani DS, Cerini M, Farina D, Gavazzi E, Maroldi R, Adamo M, Ammirati E, Sinagra G, Lombardi CM, Metra M. Cardiac Involvement in a Patient With Coronavirus Disease 2019 (COVID-19). Jama Cardiol. 2020;5:819–824.
    
    
10. 10.Doyen D, Moceri P, Ducreux D, Dellamonica J. Myocarditis in a patient with COVID-19: a cause of raised troponin and ECG changes. Lancet. 2020;395:1516.
    
    
11. 11.Madjid M, Safavi-Naeini P, Solomon SD, Vardeny O. Potential Effects of Coronaviruses on the Cardiovascular System. Jama Cardiol. 2020;5:831–840.
    
    
12. 12.Boehme AK, Luna J, Kulick ER, Kamel H, Elkind MSV. Influenza-like illness as a trigger for ischemic stroke. Ann Clin Transl Neur. 2018;5:456–463.
    
    
13. 13.Hicks KA, Mahaffey KW, Mehran R, Nissen SE, Wiviott SD, Dunn B, Solomon SD, Marler JR, Teerlink JR, Farb A, Morrow DA, Targum SL, Sila CA, Hai MTT, Jaff MR, Joffe HV, Cutlip DE, Desai AS, Lewis EF, Gibson CM, Landray MJ, Lincoff AM, White CJ, Brooks SS, Rosenfield K, Domanski MJ, Lansky AJ, McMurray JJV, Tcheng JE, Steinhubl SR, Burton P, Mauri L, O’Connor CM, Pfeffer MA, Hung HMJ, Stockbridge NL, Chaitman BR, Temple RJ, (SCTI) SDC for CTI. 2017 Cardiovascular and Stroke Endpoint Definitions for Clinical Trials. Circulation. 2018;137:961–972.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MTQ6ImNpcmN1bGF0aW9uYWhhIjtzOjU6InJlc2lkIjtzOjk6IjEzNy85Lzk2MSI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIwLzExLzE2LzIwMjAuMTEuMTIuMjAyMzA3MDYuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 

14. 14.Berkson J. Limitations of the Application of Fourfold Table Analysis to Hospital Data.,. Int J Epidemiol. 2014;43:511–515.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/ije/dyu022&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24585734&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F11%2F16%2F2020.11.12.20230706.atom) 

15. 15.Rosenbaum PR, Rubin DB. Assessing Sensitivity to an Unobserved Binary Covariate in an Observational Study with Binary Outcome. J Royal Statistical Soc Ser B Methodol. 1983;45:212–218.
    
    
16. 16.Dafny LS. How Do Hospitals Respond to Price Changes? Am Econ Rev. 2005;95:1525–1547.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1257/000282805775014236&link_type=DOI) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000234305300010&link_type=ISI) 

17. 17.Hripcsak G, Albers DJ. Next-generation phenotyping of electronic health records. J Am Med Inform Assn. 2012;20:117–121.
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22955496&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F11%2F16%2F2020.11.12.20230706.atom) 

18. 18.Usama M, Ahmad B, Wan J, Hossain MS, Alhamid MF, Hossain MA. Deep Feature Learning for Disease Risk Assessment Based on Convolutional Neural Network With Intra-Layer Recurrent Connection by Using Hospital Big Data. Ieee Access. 2018;6:67927–67939.
    
    
19. 19.Magna AAR, Allende-Cid H, Taramasco C, Becerra C, Figueroa RL. Application of Machine Learning and Word Embeddings in the Classification of Cancer Diagnosis Using Patient Anamnesis. Ieee Access. 2020;8:106198–106213.
    
    
20. 20.Ivey KS, Edwards KM, Talbot HK. Respiratory Syncytial Virus and Associations With Cardiovascular Disease in Adults. J Am Coll Cardiol. 2018;71:1574–1583.
    
    [FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6MzoiUERGIjtzOjExOiJqb3VybmFsQ29kZSI7czo0OiJhY2NqIjtzOjU6InJlc2lkIjtzOjEwOiI3MS8xNC8xNTc0IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMTEvMTYvMjAyMC4xMS4xMi4yMDIzMDcwNi5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

21. 21.Ukimura A, Satomi H, Ooi Y, Kanzaki Y. Myocarditis Associated with Influenza A H1N1pdm2009. Influ Res Treat. 2012;2012:351979.
    
    
22. 22.Drozd M, Garland E, Walker AMN, Slater TA, Koshy A, Straw S, Gierula J, Paton M, Lowry J, Sapsford R, Witte KK, Kearney MT, Cubbon RM. Infection-Related Hospitalization in Heart Failure With Reduced Ejection Fraction. Circulation Hear Fail. 2020;13:e006746.
    
    
23. 23.Hernán MA, Hernández-Díaz S, Robins JM. A Structural Approach to Selection Bias. Epidemiology. 2004;15:615–625.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1097/01.ede.0000135174.63482.43&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=15308962&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F11%2F16%2F2020.11.12.20230706.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000223382600020&link_type=ISI) 

24. 24.Shi S, Qin M, Shen B, Cai Y, Liu T, Yang F, Gong W, Liu X, Liang J, Zhao Q, Huang H, Yang B, Huang C. Association of Cardiac Injury With Mortality in Hospitalized Patients With COVID-19 in Wuhan, China. Jama Cardiol. 2020;5:802–810.
    
    
25. 25.Honnibal M, Montani I. SpaCy 2: Natural language understanding with bloom embeddings, convolutional neural networks and incremental parsing. To appear. 2017;7.
    
    
26. 26.Bojanowski P, Grave E, Joulin A, Mikolov T. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics. 2017;5:135–146.
    
    
27. 27.Ho DE, Imai K, King G, Stuart EA. Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference. Polit Anal. 2007;15:199–236.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/pan/mpl013&link_type=DOI) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000248544000002&link_type=ISI)