Development of Predictive Risk Models for All-cause Mortality in Pulmonary Hypertension using Machine Learning ============================================================================================================== * Jiandong Zhou * Ka Hei Gabriel Wong * Sharen Lee * Tong Liu * Keith Sai Kit Leung * Kamalan Jeevaratnam * Bernard Man Yung Cheung * Ian Chi Kei Wong * Qingpeng Zhang * Gary Tse ## Abstract **Background** Pulmonary hypertension, a progressive lung disorder with symptoms such as breathlessness and loss of exercise capacity, is highly debilitating and has a negative impact on the quality of life. In this study, we examined whether a multi-parametric approach using machine learning can improve mortality prediction. **Methods** A population-based territory-wide cohort of pulmonary hypertension patients from January 1, 2000 to December 31, 2017 were retrospectively analyzed. Significant predictors of all-cause mortality were identified. Easy-to-use frailty indexes predicting primary and secondary pulmonary hypertension were derived and stratification performances of the derived scores were compared. A factorization machine model was used for the development of an accurate predictive risk model and the results were compared to multivariate logistic regression, support vector machine, random forests, and multilayer perceptron. **Results** The cohorts consist of 2562 patients with either primary (n=1009) or secondary (n=1553) pulmonary hypertension. Multivariate Cox regression showed that age, prior cardiovascular, respiratory and kidney diseases, hypertension, number of emergency readmissions within 28 days of discharge were all predictors of all-cause mortality. Easy-to-use frailty scores were developed from Cox regression. A factorization machine model demonstrates superior risk prediction improvements for both primary (precision: 0.90, recall: 0.89, F1-score: 0.91, AUC: 0.91) and secondary pulmonary hypertension (precision: 0.87, recall: 0.86, F1-score: 0.89, AUC: 0.88) patients. **Conclusion** We derived easy-to-use frailty scores predicting mortality in primary and secondary pulmonary hypertension. A machine learning model incorporating multi-modality clinical data significantly improves risk stratification performance. Key words * pulmonary hypertension * risk stratification * frailty score * all-cause mortality * factorization machine ## Introduction Pulmonary hypertension is a progressive lung disorder characterized by elevated pulmonary arterial pressure, which can have different etiologies 1. Patients may experience symptoms like breathlessness and loss of exercise capacity that can be highly debilitating and adversely affect their quality of life. This is exacerbated by complications, such as cavitation and infection of the lungs, alveolar hemorrhage and heart failure, which can result in premature mortality 2. However, mortality risk differs between etiologies 3, and more accurate risk stratification strategies could potentially improve clinical management. To this end, several studies have developed predictive risk models based on different variables. For example, the first prognostic equation, which was based on pulmonary haemodynamics (right atrial pressure, mean pulmonary artery pressure and cardiac index at diagnosis), was derived from the National Institutes of Health (NIH) registry study of 194 patients with primary pulmonary hypertension from 32 centers 4. This model was applied in a contemporary cohort from the Pulmonary Hypertension Connection (PHC) registry, which showed better survival 5. From the Risk Evaluation and Education for Alzheimer’s Disease (REVEAL) study using a registry of 2716 patients, the model included pulmonary vascular resistance, portal hypertension, modified New York Heart Association/World Health Organization functional class IV, men >60 years of age and family history of pulmonary arterial hypertension 6. Recently, the Scottish composite score developed using a United Kingdom cohort of 182 pulmonary arterial hypertension patients was based on age, sex, right atrial pressure, cardiac output and 6-min walk distance 7. However, to date, there have been no predictive model that could be wholly derived from data obtained from administrative databases, which would provide the opportunity to develop models based on routinely collected data to examine long-term outcomes and have the advantage over registry-based data in terms of reduced bias 8. Such administrative databases have been used to estimate healthcare resource utilization and costs associated with pulmonary arterial hypertension 9, but not for development of accurate predictive risk models. In this territory-wide cohort study, we examined the predictors of all-cause mortality, derived frailty scores predicting adverse events, and tested the hypothesis that a multi-parametric approach can improve risk prediction in patients with primary and secondary pulmonary hypertension. ## Research design and methods ### Study design The study was approved by The Joint Chinese University of Hong Kong – New Territories East Cluster Clinical Research Ethics Committee. This population-based territory-wide cohort study included patients with a diagnosis of primary and secondary pulmonary hypertension, that were managed in hospitals under the Hong Kong Hospital Authority over the period between 1st January 2000 and 31st December 2017. The patients were identified from the Clinical Data Analysis and Reporting System (CDARS), a healthcare database that integrates patient information across all 43 publicly funded hospitals and their associated ambulatory and primary care facilities in Hong Kong to establish comprehensive medical records. The available information includes demographics, clinical characteristics, disease diagnoses, laboratory examinations, drug prescription details, and admission statistics. The system has been previously used by both our team and other teams in Hong Kong 10-13. ### Data extraction Patients with primary pulmonary hypertension and secondary pulmonary hypertension were identified by their respective International Classification of Diseases Nineth Edition (ICD-9) coding of 416.0 and 416.8(3), respectively. Prior comorbidities of cardiovascular, respiratory, renal, gastrointestinal and endocrine diseases and obesity were extracted with corresponding ICD-9 diagnosis codes. Hypertension and diabetes mellitus were extracted separately. We extracted drug prescription of ten commonly prescribed drug classes: cardiac glycosides, phosphodiesterase type-3 inhibitors, thiazide diuretics, loop diuretics, potassium-sparing diuretics and, anti-arrhythmic drugs, beta blockers, vasodilative antihypertensive drugs, centrally acting antihypertensive drugs, and alpha blockers for pulmonary hypertension treatment. The mean daily dose of each drug class was reported, which is derived from multiplying the daily dose frequency against the drug dose during the study period, then averaged by the drug prescription duration. Details about the codes for identifying prior comorbidities and the specific drugs in each drug category prescribed for the study cohort are provided in the **Supplementary Tables 1 and 2**, respectively. View this table: [Table 1.](http://medrxiv.org/content/early/2021/01/20/2021.01.16.21249934/T1) Table 1. Descriptive statistics of patients with primary and secondary pulmonary hypertension * Mean daily drug dosage (mg/per day); * for p≤ 0.05, ** for p ≤ 0.01, \***| for p≤ 0.001 View this table: [Table 2.](http://medrxiv.org/content/early/2021/01/20/2021.01.16.21249934/T2) Table 2. Univariate analysis to predict mortality of patients with primary and secondary pulmonary hypertension * for p≤ 0.05, ** for p≤ 0.01, \***| for p≤ 0.001 ### Primary outcome and statistical analysis The primary outcome was all-cause mortality. Descriptive statistics were used to summarize patients’ characteristics of each primary diagnosis and mortality outcome. Continuous variables were presented as median (95% confidence interval [CI] or interquartile range [IQR]) and categorical variables were presented as count (%). The Mann-Whitney U test was used to compare continuous variables. The χ2 test with Yates’ correction was used for 2×2 contingency data, and Pearson’s χ2 test was used for contingency data for variables with more than two categories. To evaluate the significant prognostic risk factors and the effects of drug therapies associated with disease group status and primary outcomes, univariate logistic regression model was used with adjustments based on baseline characteristics. Multivariate logistic regression was conducted further to identify the important mortality factors with significant univariable predictors as input (**Figure 1**). Frailty scores without medications were derived to predict adverse events of primary and secondary pulmonary hypertension, Odds ratios (ORs) with corresponding 95% CIs and P values were reported accordingly. All significance tests were two-tailed and considered statistically significant if P values were <0.05. Data analyses were performed using RStudio software (Version: 1.1.456) and Python (Version: 3.6). Experiments were simulated on a 15-inch MacBook Pro with 2.2 GHz Intel Core i7 Processor and 16 GB RAM. ![Figure 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2021/01/20/2021.01.16.21249934/F1.medium.gif) [Figure 1.](http://medrxiv.org/content/early/2021/01/20/2021.01.16.21249934/F1) Figure 1. Framework of factorization machine model to predict mortality risk ### Development of a machine learning model In this study, we develop a factorization machine (FM) model 14 for pulmonary hypertension mortality risk prediction based on baseline characteristics. We observed that some categorical variables, such as comorbidities and drug prescriptions, are sparse after one-hot encoding, even leading to some missing pairs of (**x***j*, **x***j*′) in the training data (**x***j* and **x***j*′ denotes the *j*th and *j*′ th variables). In this situation, traditional polynomial mapping approaches to capture nonlinear interactions cannot handle this data sparsity issue, while FM is able to learn interactions among variable pairs by factorizing weight matrix **W**= (**w**1,**w**2, …, **w**|*D*|)T into **vv**T where v= (***v***l, ***v***2,…, ***v***|*D*|) T where each row vector ***v******j*** ∈**v** represents the latent vector with regard to variable **x***j*. The embedding technique of the FM model can handle the hidden nonlinear and interaction patterns within the data and still demonstrate high accuracy and effectiveness when there are missing values in some features (i.e., the case of data sparsity). This motivates us to use a FM model for accurate mortality risk evaluation of patients with pulmonary hypertension based on patient’s baseline characteristics. Model performance evaluation metrics include precision, recall, F1-score and area under the receiver operating characteristic curve (AUC), and the FM model was compared to benchmark models of multivariate logistic regression, support vector machine (SVM), random forests, and multilayer perceptron (MLP). ## Results ### Baseline characteristics The study cohort has 2562 patients, of whom 1009 and 1553 had primary and secondary pulmonary hypertension, respectively (**Table 1**). Amongst the primary pulmonary hypertension patients, 574 deaths occurred during follow-up until the end of 2019. Those who passed away were significantly older (median: 75.3, IQR:[56.2-83.8] vs. median: 48, IQR:[17.0-68.3] years old, p value <0.0001), more likely to have pre-existing diabetes, cardiovascular and renal diseases and had a shorter readmission intervals between discharges (median: 198.38 days, IQR: [95% CI: 102.37-355] vs. median 271.72 days, IQR: [111.79-564.4]; p value =0.036). However, those who survived had a similar cumulative number of hospital admissions (median: 15.0, IQR:[8.0-24.0] days vs. median: 8.0, IQR:[4.0-19.0] days; p value =0.241), number of emergency readmissions within 28 days of discharge (median: 2.0, IQR: [0.0-5.0] vs. median: 0.0 IQR: [0.0-2.0]; p value =0.6313), and cumulative length of hospital stay over the lifetime (median: 101 days, IQR:[53.5-186] vs. median: 36 days, IQR:[15-109], p value =0.1241) with those who were deceased. For secondary hypertension, 574 deaths occurred on follow-up. Those who died were significantly older and more frequently suffered from diabetes, but had a similar number of hospital admissions, number of emergency readmissions, average readmission interval but significantly longer length-of-stay. Further details on the comparisons and statistics of other baseline characteristics are shown in **Table 1**. Drug prescription characteristics of mean daily dosages are also summarized with ten most common drug categories for both primary and secondary pulmonary hypertension patients. For primary hypertension, those who died were prescribed with a higher dosage of cardiac glycosides (median: 108.14, IQR:[62.5-241.34] mg/per day), loop diuretics (median: 34.3493, IQR: [17.6381-66.15] mg/per day), antiarrhythmics (median: 1039, IQR: [232-4227] mg/per day), beta-adrenoreceptor blockers (median: 35, IQR: [19-84] mg/per day), vasodilative antihypertensives (median: 51, IQR: [32-75] mg/per day), centrally acting antihypertensives(median: 723, IQR: [500-1750] mg/per day), alpha-adrenoceptor blocker (median: 1.7, IQR: [0.9-2.9] mg/per day) and endothelin receptor antagonists (median: 125, IQR: [67-231] mg/per day). Similar patterns were observed for patients with secondary pulmonary hypertension. ### Predictors of mortality in pulmonary hypertension Univariate logistics regression analysis identified significant predictors of primary pulmonary hypertension mortality (**Table 2**): 1. Age (OR: 1.003, 95% CI: [1.002, 1.003], p value<0.001); 2. Prior comorbidities of cardiovascular disease (OR: 4.85, 95% CI: [3.63, 6.49], p value<0.001), respiratory disease (OR: 5.26, 95% CI: [2.06, 8.20], p value <0.001), renal disease (OR: 2.88, 95% CI: [2.13, 3.90], p value<0.001), diabetes mellitus (OR: 2.55, 95% CI: [1.67, 3.91], p value<0.001), hypertension (OR: 23.12, 95% CI: [15.24, 34.25], p value <0.0001), gastrointestinal disease (OR: 1.582, 95% CI: [1.22, 2.05], p value<0.001); 3. Healthcare utilization metrics including the number of emergency readmissions within 28 days after discharge (OR: 1.04, 95% CI: [1.03, 1.05], p value < 0.001) and average readmission interval (OR: 1.09, 95% CI: [1.00, 1.20], p value <0.001); 4. Drugs for cardiac glycosides (OR: 3.6, 95% CI: [2.6,5.0], p value<0.001), thiazide diuretics (OR: 2.5, 95% CI: [1.8,3.5], p value<0.001), loop diuretics (OR: 6.0, 95% CI: [4.5,8.2], p value<0.001), potassium-sparing diuretics (OR: 2.0, 95% CI: [1.5,2.6], p value<0.001), anti-arrhythmic drugs (OR: 3.5, 95% CI: [2.3,5.3], p value<0.001), beta adrenoceptor blockers (OR: 2.6, 95% CI: [2.0,3.4], p value<0.001), centrally antihypertensive drugs (OR: 4.0, 95% CI: [2.2,7.4], p value<0.001) and alpha adrenoceptor blockers(OR: 2.8, 95% CI: [1.8,4.4], p value<0.001). The significant predictors of all-cause mortality for secondary pulmonary hypertension patients were largely similar to those for primary pulmonary hypertension patients (**Table 2**). Multivariate logistics regression analysis was performed by including variables with p value<0.10 (**Table 3**). After adjustment, the significant predictors were: older age (OR: 2.34, 95% CI: [1.34,3.43], p value<0.0001), prior comorbidities of cardiovascular diseases (OR: 1.79, 95% CI: [1.56,3.03], p value<0.0001), respiratory diseases (OR: 1.62, 95% CI: [1.11,2.07], p value<0.001), kidney diseases (OR: 1.21, 95% CI: [1.03, 1.45], p value=0.0001), diabetes mellitus (OR: 1.45, 95% CI: [1.02,2.16], p value=0.0003), hypertension (OR: 17.34, 95% CI: [10.43,31.27], p value<0.0001); number of emergency readmissions within 28 days discharge (OR: 1.25, 95% CI: [1.43,1.53], p value<0.0001), average readmission interval (OR: 1.13, 95% CI: [1.46,1.78], p value<0.0001). There predictors are also significant for predicting the mortality risk of the secondary pulmonary hypertension patients. A summary of the different predictors of mortality are shown in **Supplementary Table 3**. Based on the significant multivariate predictors, easy-to-use score systems were derived to predict the adverse events of primary and secondary pulmonary hypertension as shown in **Table 4**. The characteristics of patients with/without primary and secondary pulmonary hypertension using the derived score systems are further described in **Table 5**. Median risk score for identifying primary pulmonary hypertension is 11 (95% CI: [6-18], max: 27), while the median for secondary pulmonary hypertension is 12 (95% CI: [6-17], max: 28). Further, the stratification performances of score and dichotomized score systems in predicting primary and secondary pulmonary hypertension were shown in **Table 6**. Derived frailty scores demonstrated significant stratification performance in predicting primary pulmonary hypertension (OR: 1.45, 95% CI: 1.14-2.16, p value<0.0001) with cutoff 11.75 and secondary pulmonary hypertension (OR: 1.34, 95% CI: 1.2-2.01, p value<0.0001) with a cut-off value of 13.23. Dichotomized frailty scores also provide significant strengths in predicting primary pulmonary hypertension (OR: 15.32, 95% CI: 10.34-27.45, p value<0.0001) and secondary pulmonary hypertension (OR: 19.23, 95% CI: 11.88-37.21, p value<0.0001). View this table: [Table 3.](http://medrxiv.org/content/early/2021/01/20/2021.01.16.21249934/T3) Table 3. Multivariate analysis to predict mortality of patients with pulmonary hypertension * for p≤ 0.05, ** for p≤ 0.01, \***| for p≤ 0.001 View this table: [Table 4.](http://medrxiv.org/content/early/2021/01/20/2021.01.16.21249934/T4) Table 4. Easy-to-use score system for early prediction of primary and secondary pulmonary hypertension View this table: [Table 5.](http://medrxiv.org/content/early/2021/01/20/2021.01.16.21249934/T5) Table 5. Derived score characteristics of patients with/without primary and secondary pulmonary hypertension * for p≤ 0.05, ** for p≤ 0.01, \***| for p≤ 0.001 View this table: [Table 6.](http://medrxiv.org/content/early/2021/01/20/2021.01.16.21249934/T6) Table 6. Stratification performance of score and dichotomized score system in predicting primary and secondary pulmonary hypertension * for p≤ 0.05, ** for p≤ 0.01, \***| for p≤ 0.001 ### Machine learning results The performance of FM model was compared to that of multivariate logistic regression, SVM, random forests, and multilayer perceptron, using significant univariable characteristics (without medications) as input to avoid overfitting (**Figure 1**). All of the models were trained with a randomly selected 80% (n=2048) of patients and tested with five-fold cross-validation approach using the remaining 20% (n=512) patients. The comparative performance evaluation results with metrics of recall, precision, F1-score and AUC were reported in **Table 7**. View this table: [Table 7.](http://medrxiv.org/content/early/2021/01/20/2021.01.16.21249934/T7) Table 7. Performance comparison of Logistic Regression (LR), Random Forest (RF), Support Vector Machine (SVM), Multilayer Perceptron (MLP) in predicting primary/secondary pulmonary hypertension mortality risk with five-fold cross-validation approach (without medication predictors). The best metrics are underlined. In the cross-validation, the FM model demonstrates significant improvement in pulmonary hypertension mortality risk prediction compared to other baselines. We can see that FM model outperforms baseline models with superior improvements for mortality prediction amongst both primary (precision=0.8966, recall=0.8876, F1-score=0.9106, AUC=0.9093) and secondary pulmonary hypertension patients (precision= 0.8693, recall= 0.8564, F1-score= 0.8864, AUC= 0.8887). As the important hyperparameters in the baseline models to improve prediction performance, for the SVM model the radial kernel parameters gamma and cost of constraints violation were tuned to 0.01, and 11, respectively. The number of trees and tree depth were tuned to be 732 and 9, respectively, for the random forest model. For the multilayer perceptron model, the number of units in the hidden layer was set to 5, and the decay was set to 0.073. The hyperparameter settings are finished with widely used grid and randomized search approach 15. The observations about model performance are consistent with previous studies that FM produced the most accurate predictions when compared to other models 14, 16. ## Discussion In this study, we developed a predictive risk model for all-cause mortality in primary and secondary pulmonary hypertension patients incorporating baseline demographics, healthcare utilization metrics, comorbidities and drug prescription records using a population-based administrative database. A FM model was introduced as a multi-parametric mortality risk evaluation approach, which significantly improved risk prediction when compared to several baseline models. The use of administrative databases for data mining in healthcare research has been a recent focus over the last decades. Specific to pulmonary hypertension, existing predictive risk models have largely relied on registry data. Whilst this can provide important insights by incorporating clinical parameters, such an approach can be difficult in the case of large patient numbers. In our study, we examined a territory-wide study of patients with both primary and secondary pulmonary hypertension, and used a multi-parametric approach incorporating data from different domains. We found common predictors of all-cause mortality for both primary and secondary causes, despite important differences in their aetiology, physiological basis and disease life course. The implications are that even without specific haemodynamic or physiological data, accurate predictions can be made. ### Pharmacotherapy and association with all-cause mortality Regarding pharmacological treatment, diuretics are used in secondary pulmonary hypertension to reduce afterload 17. This is supported by their use to lower pulmonary arterial pressure in patients with right heart failure 18. In our study, the use of loop diuretics was significantly associated with a lower all-cause mortality. Moreover, whilst the efficacy of cardiac glycosides has not been extensively studied in pulmonary hypertension, one study found that the short-term use of digoxin can increase cardiac output for pulmonary hypertension patients with right ventricular failure 19 but not mortality 20. However, in contrary, our study found that their use was associated with higher mortality, which may be attributed to greater disease severity for patients who are prescribed these medications. Interestingly, the use of anti-arrhythmic drugs was also associated with higher mortality. This may suggest that patients with pulmonary hypertension die from causes other than cardiac arrhythmias. Unlike other drug groups in this study, the use of vasodilator antihypertensive drugs was not statistically significant in the prediction of all-cause mortality. A study found that patient response, i.e. pulmonary vasodilation, to the use of different vasodilators was highly variable between individuals 21. Furthermore, vasodilators such as calcium channel blockers are not considered empiric treatment for pulmonary hypertension and may only be effective as long-term treatment for a minority of patients who demonstrate an acute vasodilatory response 22. The highly individualized responsiveness and efficacy of vasodilators are hence a likely explanation as to why vasodilator antihypertensives were not a predictor of mortality. ### Factorization machine model for risk prediction FM 14 as an efficient machine learning model has the main advantage stemming from its generality: a generic classifier working with any real-valued variable vector for supervised learning, in contract to matrix factorization 23 that only models the relation of two entities and over traditional machine learning models such as SVM 24, random forests 25, multilayer perceptron 26 that are quite difficult to capture the hidden interactions among characteristics in latent space. The main reason is that FM can learn meaningful embedding vectors for each variable as long as the variable itself appears enough times in the data, allowing the dot product a good estimator of pair-wise interaction effects even if two variables never or seldom co-occur. Previous experimental results demonstrated the better discrimination superiority of FM model in significantly improving prediction accuracy to be applied for multi-parametric mortality risk stratification in clinical practice. ### Strengths and limitations The main strength of this study is the inclusion of a cohort of patients with pulmonary hypertension over an 18-year period with comprehensive laboratory, comorbidity and drug, healthcare utilization and follow-up data. This was complemented by machine learning analysis using characteristics of demographics, hospitalization, comorbidities and drug prescription data. Important informative indicators to predict mortality risk are detected using FM model, which demonstrate superior predictive performance over several baseline models. However, several limitations should be noted. Firstly, long-term pulmonary hypertension mortality related comorbidity and disease onset sequence patterns are not uncovered. Secondly, clinical parameters from echocardiography, 6-minute walk tests and other physiological tests were not available in the administrative database and these variables could not be incorporated into our predictive risk models. These remain our future investigations to be explored. ## Conclusion A machine learning model incorporating multi-modality clinical data significantly improves risk stratification performance and identify important indicators in predicting mortality in pulmonary hypertension. ## Supporting information Supplementary Appendix [[supplements/249934_file02.docx]](pending:yes) ## Data Availability Data available upon request. ## Conflicts of Interest None. ## Funding None. ## Footnotes * * joint first authors * Received January 16, 2021. * Revision received January 16, 2021. * Accepted January 20, 2021. * © 2021, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution 4.0 International), CC BY 4.0, as described at [http://creativecommons.org/licenses/by/4.0/](http://creativecommons.org/licenses/by/4.0/) ## References 1. 1.Hambly N, Alawfi F, Mehta S. Pulmonary hypertension: Diagnostic approach and optimal management. CMAJ. 2016;188:804–812 [FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiRlVMTCI7czoxMToiam91cm5hbENvZGUiO3M6NDoiY21haiI7czo1OiJyZXNpZCI7czoxMDoiMTg4LzExLzgwNCI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIxLzAxLzIwLzIwMjEuMDEuMTYuMjEyNDk5MzQuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 2. 2.Mak SM, Strickland N, Gopalan D. Complications of pulmonary hypertension: A pictorial review. Br J Radiol. 2017;90:20160745 3. 3.Hoeper MM, Kramer T, Pan Z, Eichstaedt CA, Spiesshoefer J, Benjamin N, Olsson KM, Meyer K, Vizza CD, Vonk-Noordegraaf A, et al. Mortality in pulmonary arterial hypertension: Prediction by the 2015 european pulmonary hypertension guidelines risk stratification model. European Respiratory Journal. 2017;50:1700740 [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiZXJqIjtzOjU6InJlc2lkIjtzOjEyOiI1MC8yLzE3MDA3NDAiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMS8wMS8yMC8yMDIxLjAxLjE2LjIxMjQ5OTM0LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 4. 4.D’Alonzo GE, Barst RJ, Ayres SM, Bergofsky EH, Brundage BH, Detre KM, Fishman AP, Goldring RM, Groves BM, Kernis JT, et al. Survival in patients with primary pulmonary hypertension. Results from a national prospective registry. Ann Intern Med. 1991;115:343–349 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.7326/0003-4819-115-5-343&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=1863023&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F01%2F20%2F2021.01.16.21249934.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1991GC06500002&link_type=ISI) 5. 5.Thenappan T, Shah SJ, Rich S, Tian L, Archer SL, Gomberg-Maitland M. Survival in pulmonary arterial hypertension: A reappraisal of the nih risk stratification equation. European Respiratory Journal. 2010;35:1079–1087 [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiZXJqIjtzOjU6InJlc2lkIjtzOjk6IjM1LzUvMTA3OSI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIxLzAxLzIwLzIwMjEuMDEuMTYuMjEyNDk5MzQuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 6. 6.Benza Raymond L, Miller Dave P, Gomberg-Maitland M, Frantz Robert P, Foreman Aimee J, Coffey Christopher S, Frost A, Barst Robyn J, Badesch David B, Elliott CG, et al. Predicting survival in pulmonary arterial hypertension. Circulation. 2010;122:164–172 [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MTQ6ImNpcmN1bGF0aW9uYWhhIjtzOjU6InJlc2lkIjtzOjk6IjEyMi8yLzE2NCI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIxLzAxLzIwLzIwMjEuMDEuMTYuMjEyNDk5MzQuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 7. 7. Lee W-TN, Ling Y, Sheares KK, Pepke-Zaba J, Peacock AJ, Johnson MK. Predicting survival in pulmonary arterial hypertension in the uk. European Respiratory Journal. 2012;40:604–611 [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiZXJqIjtzOjU6InJlc2lkIjtzOjg6IjQwLzMvNjA0IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjEvMDEvMjAvMjAyMS4wMS4xNi4yMTI0OTkzNC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 8. 8.Aktuerk D, McNulty D, Ray D, Begaj I, Howell N, Freemantle N, Pagano D. National administrative data produces an accurate and stable risk prediction model for short-term and 1-year mortality following cardiac surgery. Int J Cardiol. 2016;203:196–203 9. 9.Dufour R, Pruett J, Hu N, Lickert C, Stemkowski S, Tsang Y, Lane D, Drake W. Healthcare resource utilization and costs for patients with pulmonary arterial hypertension: Real-world documentation of functional class. Journal of Medical Economics. 2017;20:1178–1186 10. 10.Lau WC, Chan EW, Cheung CL, Sing CW, Man KK, Lip GY, Siu CW, Lam JK, Lee AC, Wong IC. Association between dabigatran vs warfarin and risk of osteoporotic fractures among patients with nonvalvular atrial fibrillation. JAMA. 2017;317:1151–1158 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1001/jama.2017.1363&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F01%2F20%2F2021.01.16.21249934.atom) 11. 11.Man KKC, Chan EW, Ip P, Coghill D, Simonoff E, Chan PKL, Lau WCY, Schuemie MJ, Sturkenboom M, Wong ICK. Prenatal antidepressant use and risk of attention-deficit/hyperactivity disorder in offspring: Population based cohort study. BMJ. 2017;357:j2350 [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiYm1qIjtzOjU6InJlc2lkIjtzOjE3OiIzNTcvbWF5MzFfNC9qMjM1MCI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIxLzAxLzIwLzIwMjEuMDEuMTYuMjEyNDk5MzQuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 12. 12.Ju C, Lai RWC, Li KHC, Hung JKF, Lai JCL, Ho J, Liu Y, Tsoi MF, Liu T, Cheung BMY, et al. Comparative cardiovascular risk in users versus non-users of xanthine oxidase inhibitors and febuxostat versus allopurinol users. Rheumatology (Oxford). 2019 13. 13.Law SWY, Lau WCY, Wong ICK, Lip GYH, Mok MT, Siu CW, Chan EW. Sex-based differences in outcomes of oral anticoagulation in patients with atrial fibrillation. J Am Coll Cardiol. 2018;72:271–282 [FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6MzoiUERGIjtzOjExOiJqb3VybmFsQ29kZSI7czo0OiJhY2NqIjtzOjU6InJlc2lkIjtzOjg6IjcyLzMvMjcxIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjEvMDEvMjAvMjAyMS4wMS4xNi4yMTI0OTkzNC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 14. 14.Rendle S. Factorization machines. 2010 IEEE International Conference on Data Mining. 2010:995–1000 15. 15.Bergstra J, Bengio Y. Random search for hyper-parameter optimization. The Journal of Machine Learning Research. 2012;13:281–305 16. 16.Knoll J, Stübinger J, Grottke M. Exploiting social media with higher-order factorization machines: Statistical arbitrage on high-frequency data of the s&p 500. Quantitative Finance. 2019;19:571–585 17. 17.Hansen L, Burks M, Kingman M, Stewart T. Volume management in pulmonary arterial hypertension patients: An expert pulmonary hypertension clinician perspective. Pulmonary Therapy. 2018;4:13–27 18. 18.Heinemann HO. Right-sided heart failure and the use of diuretics. Am J Med. 1978;64:367–370 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/0002-9343(78)90213-9&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=637051&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F01%2F20%2F2021.01.16.21249934.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1978ER51400001&link_type=ISI) 19. 19.Rich S, Seidlitz M, Dodin E, Osimani D, Judd D, Genthner D, McLaughlin V, Francis G. The short-term effects of digoxin in patients with right ventricular dysfunction from pulmonary hypertension. Chest. 1998;114:787–792 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1378/chest.114.3.787&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=9743167&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F01%2F20%2F2021.01.16.21249934.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000075898500025&link_type=ISI) 20. 20.Saucedo H, Zayas-Hernandez NG, López-Flores JC, Pulido-Zamudio T. Digoxin effect in mortality associated to right ventricular dysfunction in patients with pulmonary hypertension. European Respiratory Journal. 2019;54:PA4751 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1183/13993003.congress-2019.PA4751&link_type=DOI) 21. 21.Palevsky HI, Schloo BL, Pietra GG, Weber KT, Janicki JS, Rubin E, Fishman AP. Primary pulmonary hypertension. Vascular structure, morphometry, and responsiveness to vasodilator agents. Circulation. 1989;80:1207–1221 [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MTQ6ImNpcmN1bGF0aW9uYWhhIjtzOjU6InJlc2lkIjtzOjk6IjgwLzUvMTIwNyI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIxLzAxLzIwLzIwMjEuMDEuMTYuMjEyNDk5MzQuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 22. 22.Rich S, Kaufmann E, Levy PS. The effect of high doses of calcium-channel blockers on survival in primary pulmonary hypertension. N Engl J Med. 1992;327:76–81 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJM199207093270203&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=1603139&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F01%2F20%2F2021.01.16.21249934.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1992JC21400003&link_type=ISI) 23. 23.Lee D, Seung H. Algorithms for non-negative matrix factorization. Adv. Neural Inform. Process. Syst. 2001;13 24. 24.Suykens JAK, Vandewalle J. Least squares support vector machine classifiers. Neural Processing Letters. 1999;9:293–300 25. 25.Breiman L. Random forests. Machine Learning. 2001;45:5–32 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1017/CBO9781107415324.004&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=WOS:00017048&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F01%2F20%2F2021.01.16.21249934.atom) 26. 26.Ruck D, Rogers S, Kabrisky M, Oxley M, Suter B. The multilayer perceptron as an approximation to a bayes optimal discriminant function. IEEE transactions on neural networks / a publication of the IEEE Neural Networks Council. 1990;1:296–298