Neural Networks Accurately Predict Precise Metrics of Hospital Resource Utilization for Total Hip Arthroplasty: A Retrospective Database Study

Aazad Abbas; Johnathan R. Lex; Jay Toor; Elias B. Khalil; Bheeshma Ravi; Cari Whyne

doi:10.1101/2025.02.11.25322104

Abstract

Background Total hip and knee arthroplasties (THAs and TKAs) are some of the most common and successful surgeries. Predicting their duration of surgery (DOS) and length of stay (LOS) has massive implications for costs and resource management. The purpose of this study was to predict the DOS and LOS of THAs using machine learning models (MLMs) based on preoperative factors.

Methods The American College of Surgeons (ACS) National Surgical and Quality Improvement (NSQIP) database was queried for elective unilateral THA procedures. Multiple MLMs were constructed to predict DOS and LOS. Models were evaluated according to mean squared error (MSE), buffer accuracy, and classification accuracy. To ensure useful predictions, the results of the models were compared to a mean regressor and previous MLM predictions for primary TKAs.

Results 196,942 patients were included. The neural network had the best MSE, buffer and training accuracies for both DOS and LOS. For DOS testing, the neural network MSE was 0.916, with the 30-minute buffer and ≤120 min, >120 min accuracies being 75.4% and 88.5%. For LOS testing, the neural network MSE was 0.567, with the 1-day buffer and ≤2 days, >2 accuracies being 70.3% and 80.9%. Slightly reduced performance was found for THA compared to TKA for DOS and LOS (3 to 5%), with similar important features identified.

Conclusion MLMs based on preoperative factors successfully predicted the DOS and LOS of elective unilateral THAs, with similar performance to TKA. Future work should include operational factors to apply these models to real world resource optimization.

1 Introduction

Total hip arthroplasty arthroplasty (THA) is the fourth most performed procedure in the United States, after cesarian sections, perineal muscle laceration repair and total knee arthroplasty (TKA)[1]. Notably, THA procedures are incredibly dependable and have provided significant health and quality of life benefits to patients for decades[2–4]. Over 600,000 THA procedures are performed annually in the United States (US), with each procedure conferring significant associated costs secondary to implants, provider charges and resource utilization[5]. The demand for THA will continue to grow due to an aging population and ongoing obesity epidemic[6–9].

As surgical care already contributes to half of all hospital expenditure, this will significantly burden healthcare systems globally[10–13]. Accordingly, there are ongoing efforts to address these issues by developing clinical strategies to reduce patient length of stay (LOS) and healthcare spending[14,15]. However, despite increases in spending, demand for these procedures outweighs the supply and surgical wait times continue to increase[16]. This stresses the need for clinicians and healthcare systems to develop and leverage innovative solutions to prepare resources and ultimately aim to meet the growing demand.

Arthroplasty procedures specifically are a major contributor to these increased costs and growing waitlist[17,18]. The demand has skyrocketed for these procedures, with the ubiquity of these procedures correlating strongly with their burden on healthcare systems[17,18]. In the US, approximately 5% of the gross domestic product (GDP) is to care for musculoskeletal conditions[19,20]. As such, elective arthroplasty operations are the best candidate to focus efficiency improvement activities due to their reproducibility, prevalence, and resource-intensive nature. Despite this, little has been done to address a basic issue that drives up these costs: forecasting their demand on an individual patient basis in healthcare systems with limited resources[21,22].

A viable solution to the problem is machine learning (ML), a form of artificial intelligence (AI). In comparison to traditional statistics, ML methods are typically predictive rather than descriptive and aim at representing nonlinear relationships between features and outputs. By accurately enabling the prediction of patient-specific resource requirements, such as a patient’s duration of surgery (DOS) or length of stay (LOS), hospitals can improve the efficiency and reduce expenditure related to providing care. ML has been utilized for prediction of DOS and LOS; however, most of the current work focuses on identifying individual predictors, rather than generating comprehensive and applicable ML models[23–25]. Due to the nature of the prediction and the granularity of the datasets, these models are frequently not relevant nor generalizable.

Furthermore, these studies frequently fail to assess the accuracy of their forecasts to the current standard of care, potentially emphasizing or hiding the real effectiveness of machine learning models (MLMs). Incorporating generic patient variables into MLMs for DOS and LOS prediction can address these drawbacks and be critical towards maximizing healthcare resource allocation at the institutional and system levels, especially if built using large databases. With over 600 hospitals and millions of cases reported, the National Surgical Quality Improvement Program (NSQIP) of the American College of Surgeons (ACS) offers a premier database for solving this problem [26].

In a previous study using the NSQIP database to compare DOS and LOS predictions using preoperative factors for total knee arthroplasty (TKAs), we demonstrated improved performance of MLMs over the mean estimates for DOS and LOS[27]. The primary objective of this study was to use conventional and deep MLMs to predict the DOS and postoperative LOS of patients undergoing primary elective total hip arthroplasty based on pre-operative factors using the NSQIP database. The secondary objectives were to compare the efficacy of these models regarding the accuracy of these predictions and identify key features that may be used within these models including those generalizable for both THA and TKA.

2 Materials and Methods

Intuitional research ethics approval was obtained before commencement of this study (Sunnybrook Health Sciences Centre, Project ID #4899). Data is fully anonymized as per the ACS NSQIP database [26].

2.1 Study Population

The overall approach used is represented in Figure 1. All unilateral THA procedures completed over from 2014-2019 were sampled from ACS NSQIP, an anonymous international prospectively collated surgical database (data accessed: June 11, 2021). See Supplemental Table 1 for current procedural terminology (CPT) codes for THAs and appropriate procedures that were performed concurrently with THAs.

Figure 1.

Flow diagram representing overall analysis approach utilized in this study. Data was accessed from ACS NSQIP, processed, and split, with models subsequently trained, tuned, and tested. Figure made with images from the streamline resources (https://www.streamlinehq.com/license-free) under the CC BY 4.0 license.

Excluded patients were those with an unassigned American Society of Anesthesiologists (ASA) scores, and cases with a missing or incorrectly coded DOS/LOS. Continuous variables were summarized with mean and standard deviation (SD) and categorical variables summarized with counts (N) and percentages (%). The database was split into training, validation, and testing sets by year (2014-2017 training, 2018 validation, 2019 testing).

2.2 Data Preprocessing

Features used to construct the models are found in Table 1. Features that were missing for more than 50% of the patients were removed, with the remaining features utilized for model inputs. To fill in missing data, a preprocessor was trained on the training set and applied to the training, validation, and test datasets. Features were appropriately scaled and adjusted. The correlation between feature pairs was determined using Pearson’s correlation coefficient.

View this table:

Table 1. Select demographic variables of patients included in this study.

DOS and LOS for each dataset was normalized to adjust for the change by year, with a regression model based on the training data used to adjust the validation and test data. A log- normal distribution transformation was applied to DOS and LOS to allow for optimal MLM results[28,29]. Additional details on these methods are provided in Abbas et al 2022[27].

2.3 MLM Construction

A multi-layer perceptron (MLP) was used as the deep learning model[30]. An MLP is an artificial neural network (ANN) that contains an input layer, hidden layers, and an output layer. The features are described in the input layer, the hidden layer(s) are the nonlinear interactions between features, and the output layer is models’ prediction. Conventional MLMs used were linear regression, stochastic gradient descent (SGD) linear regression, K-nearest neighbors (KNN), decision trees, random forest, AdaBoost, elastic net, and linear support vector machine (SVM) regression. A mean regressor was built to compare the MLMs to a non-predictive model, i.e., a model that uses the mean of the training data as the result for the test data.

Models were trained and hyperparameters tuned to minimize the mean squared error (MSE), with the analysis done on the Mist supercomputer at the SciNet HPC Consortium using the Tune package (Supplemental Table 2) [31–33]. When possible, feature importance was extracted from the models and normalized with respect to the most important feature for each model in percentages.

2.4 Outcome Metrics

MSE, buffer accuracy, and classification accuracy were used to evaluate each model as interpretable outcome metrics for the predictions, with the quality of the final models evaluated on the test set[27]. Buffer accuracy is defined as how often the predicted DOS/LOS was within a predesignated value of the actual DOS/LOS, while classification accuracy is defined as the percentage of times the correct classification group value was predicted by the models. Buffer accuracy for DOS was measured at 15-minute intervals ranging from 15 minutes to 60 minutes, while classification accuracy for DOS was measured in three groups: 1) ≤ 90 minutes, > 90 minutes, 2) ≤120 minutes, >120 minutes, and 3) <60 minutes, 60 to 90 minutes, > 90 minutes.

Buffer accuracy for LOS was measured at 1 day, 1.5 days, and 2 days while classification accuracy was measured in three groups: 1) ≤ 1 day, >1 day, 2) ≤2 days, >2 days, and 3) <1 day, 1 to 3 days, > 3 days. A description of these metrics is provided in Abbas et al[27]. Statistical analysis was conducted using Python (Python Software Foundation, www.python.org)[34,35].

3 Results

3.1 Patient Demographics

A total of 196,987 patients were included, with 117,460 in the training set, 37,955 in the validation set, and 41,572 in the testing set. Thirty-three features were used to construct the models. The mean DOS and LOS were 90.2 (standard deviation (SD) 36.8) minutes and 2.3 (SD 2.4) days. Figure 2 displays a network plot of the correlation between all the features used to train the models. Table 1 provides a breakdown of patient demographics.

Figure 2.

Network plot representing the correlation between features in the training set. Pearson correlations coefficients with an absolute value of greater than 0.05 are shown. Line thickness and proximity represents strength of correlation. Colour represents sign of correlation. *Categorical features. ALB: albumin (g/dL), ANES: primary anesthesia technique, ANES2: secondary anesthesia technique, ASA: American Society of Anesthesiologists classification, BLEEDIS: bleeding disorder, BUN: blood urea nitrogen (mg/ dL), CANCR: disseminated cancer, CHF: congestive heart failure status, COPD: chronic obstructive pulmonary disease status, CREAT: creatinine (mg/dL), DIAL: currently on dialysis, DM: diabetes status, DYSP: dyspnea status, ETHN: ethnicity (Hispanic), FNS: functional status, HCT: hematocrit (mg/dL), HT: height, HTN: hypertension status, INOUT: inpatient/outpatient status, INR: international normalized ratio, PLT: platelets (per μL), RFAIL: renal failure status, SMOKE: smoking status, SODM: sodium (mEq/L), STER: steroid use, TRANS: preoperative transfusion within 72 h, WBC: white blood cells count (per μL), WT: weight (kg), WTLSS: >10% weight loss in last 6 months.

3.2 Model Results

Results of training the DOS and LOS models are shown in Tables 2 and 3, while results of validation and testing of the DOS and LOS models are shown in Tables 4 and 5.

View this table:

Table 2. Results of the training set for the duration of surgery (DOS) predictions. Accuracies represented in percentages.

View this table:

Table 3. Results of the training sets for the length of stay (LOS) predictions. Accuracies represented in percentages.

View this table:

Table 4. Results of the validation and testing sets for the duration of surgery (DOS) predictions. Accuracies represented in percentages.

View this table:

Table 5. Results of the validation and testing sets for the length of stay (LOS) predictions. Accuracies represented in percentages.

3.1.1 Model Training

For both DOS and LOS, the decision tree model resulted in the lowest training MSE of 0.011 and 0.012, respectively. Similarly, the decision tree model had near 100% accuracy for both DOS and LOS in training (Tables 2 and 3). For both DOS and LOS, the random forest model was second to decision trees by MSE, buffer and classification accuracies (Tables 2 and 3). For DOS and LOS, the KNN model had the third lowest MSE and third highest buffer and classification accuracies (Table 2 and 3). All models outperformed the mean regressor for both buffer and classification accuracies during training.

3.1.2 Model Validation and Testing

For DOS validation, the neural network produced the lowest MSE of 0.910, as well as the highest 15- and 30-minute buffer accuracies of 45.2% and 75.0%, respectively. In addition, all the models outperformed the mean regressor by 2-8% across all outcome metrics. The KNN, decision, tree, and elastic net models MSEs were higher than the mean regressor, i.e., worse models, which was reflected in their corresponding accuracy metrics. Similarly for DOS testing, the neural network was the superior model via an MSE of 0.916 and higher buffer and classification accuracies (Table 4). The 15- and 30- minute buffer accuracies were 45.1% and 75.4% with the classification accuracy group 2 being 88.5% (Table 4). Again, the KNN, decision tree, and elastic net models performed worse than the mean regressor via MSE metrics, with the decision tree model consistently performing worse than the mean regressor according to accuracy metrics.

For LOS validation, the neural network produced the lowest MSE of 0.501 with the linear regression model resulting in similar 1- and 2-day buffer accuracies of 71% and 91.8% (Table 5). Compared to the mean regressor, the accuracies of the models were 2% to 48% better. The decision tree and elastic net models MSEs were worse than the mean (Table 5). Similarly for LOS testing, the neural network was the best performing model with an MSE of 0.567 and higher buffer and classification accuracies (Table 5). Namely, the 1- and 2-day buffer accuracies were 70.3% and 91.9% with the classification accuracy group 2 being 80.9% (Table 5).

3.2 Feature Importance

The top ten features for select models for DOS and LOS are displayed in Figures 3 and 4, respectively.

Figure 3.

Tree plots representing the top ten most important features for linear and tree-based models for the duration of surgery prediction. Numbers indicate normalized feature importance in percentages, with legends below each plot corresponding to respective features. ALBUM: albumin (g/dL), ANES: primary anesthesia technique, ANES2: secondary anesthesia technique, BUN: blood urea nitrogen (mg/ dL), CHF: congestive heart failure status, CREAT: creatinine (mg/dL), DISCANCR: disseminated cancer, ETHNIC: ethnicity (Hispanic), FNSTATUS: functional status, HCT: hematocrit (mg/dL), INOUT: inpatient/outpatient status, INR: international normalized ratio, PLATE: platelets (per μL), SODM: sodium (mEq/L), TRANSFUS: preoperative transfusion within 72 h, WBC: white blood cells count (per μL), WTLSS: >10% weight loss in last 6 months.

Figure 4.

Tree plots representing the top ten most important features for linear and tree-based models for the length of stay predictions. Numbers indicate normalized feature importance in percentages, with legends below each plot corresponding to respective features. ALBUM: albumin (g/dL), ANES: primary anesthesia technique, ANES2: secondary anesthesia technique, ASA: American Society of Anesthesiologists classification, BUN: blood urea nitrogen (mg/ dL), CHF: congestive heart failure status, COPD: chronic obstructive pulmonary disease status, CREAT: creatinine (mg/dL), DIAL: currently on dialysis, DISCANCR: disseminated cancer, ETHNIC: ethnicity (Hispanic), FNSTATUS: functional status, HCT: hematocrit (mg/dL), INOUT: inpatient/outpatient status, INR: international normalized ratio, PLATE: platelets (per μL), RENAFAIL: renal failure status, SODM: sodium (mEq/L), TRANSFUS: preoperative transfusion within 72 h, WBC: white blood cells count (per μL), WTLSS: >10% weight loss in last 6 months.

Linear models for both DOS and LOS placed greater importance on features such as pre-existing conditions (e.g., renal failure, CHF status, disseminated cancer), patient demographic characteristics like race and sex, and preoperative factors including type of anesthesia used and inpatient/outpatient status. While including some of the features shared with the linear models, the tree-based models placed a larger emphasis on numerical features namely preoperative lab values, weight, height, and age.

The top features for the best conventional MLM for DOS and LOS prediction — linear regression — were transfusion given within 72 hours preoperatively, weight loss in last 6 months, CHF status, ethnicity, and currently receiving dialysis.

3.3 Performance and Feature Comparison to TKA Models

Performance of conventional and deep MLMs in predicting DOS and LOS was comparable for THA to that previously found for TKA (Tables 4, 5) [27]. In particular, the best DOS 30-minute buffer accuracy (THA 75.4% vs TKAs 78.8%) and LOS 1-day buffer accuracy (THA 70.3% vs TKAs 75.2%) were similar, with performance superior for TKA by 3-5% for both outcomes. In considering the fit of the models, THA neural networks performed worse than TKA for DOS (MSE 0.916 vs 0.896) but better for LOS (MSE 0.567 vs 0.690).

Features considered important to the models were consistent between both THA and TKA. Linear models considered pre-existing conditions, patient demographics and preoperative factors the most important while tree-based models emphasized factors such as preoperative lab values, patient weight, height, and age.

4 Discussion

This study used MLMs to successfully predict the DOS and postoperative LOS for primary elective unilateral THA, with neural networks shown to be the best performing model. The degree of detail generated in this paper offers significant insights into the viability of MLMs to predict outcomes that dictate healthcare resource usage.

As expected, the neural network was the best overall model for both DOS and LOS predictions. This was echoed in the various accuracy metrics of the models, with the DOS 30-minute buffer and ≤120 minutes, >120 minutes classification accuracies being 75.4% and 88.5% and the LOS 2-day buffer and the ≤2 days, 2 days + classification accuracies of 91.8% and 80.9%, respectively. The neural network developed is a predictive model composed from multiple layers of fully-connected variables. However, more traditional ML models such as the KNN, decision tree and elastic net performed worse than the mean regressor for DOS predictions, with the decision tree and elastic net models also performing worse for the LOS predictions. This is of interest, as it is often the misconception that MLMs are superior to traditional predictive statistical methods and would consistently outperform a mean value estimate. Construction of MLMs is resource intensive, so this result highlights the importance of knowing which MLMs may be most useful for a particular problem.

The implications of this work are significant. Firstly, being able to accurately predict the DOS and LOS with this level of granularity allows one to schedule surgeries more accurately. This work goes beyond previous studies whereby postoperative LOS was predicted using arbitrary cut-offs, as predicting DOS and LOS as continuous targets allows for scheduling surgery to the minute and assigned bed capacity to the day[24,25,36–38]. Secondly, this gives one the capacity to appropriately allocate staffing requirements needed in the operating room and the ward.

Applying models which better predict DOS and LOS to local hospital scheduling has the potential for massive savings by reducing staff overtime hours and facilitating appropriate resource allocation. Thirdly, by efficiently scheduling the surgeries according to what neural networks predict, hospitals can reduce overtime and/or increase throughput and thereby potentially reduce surgical waitlists. Improvements in scheduling and optimization have the potential to reduce healthcare costs, on a local institutional and health systems level. Due to the frequency and cost of arthroplasty procedures, minor improvements in the efficiency of care can have large realizable benefits for hospitals and patients.

In an era of increasing patient demand for THA and pressure for hospitals and healthcare systems to reduce the cost of delivered care, leveraging novel strategies and technology such as neural networks will be important[1,6,10,39]. Neural network models trained on large datasets have the advantages of being generalizable as they have been exposed to many examples.

Moreover, neural networks also maintain the ability to generate predictions very specific at an individual level due to their high level of complexity. The features models identified as most important for predicting both DOS and LOS were those such as patient demographics and laboratory tests that are routinely collected preoperatively (e.g., electrolytes, prothrombin time, albumin etc.). This is important to help inform variables that should be included when generating institution-specific models or implementing these models.

The construction of these models is resource intensive and mathematically challenging. This and the previous study related to TKAs was conducted using a large, internationally renowned surgical database all the while utilizing a supercomputer with cutting edge machine learning software[26,27,34,35]. Despite the advantages of this dataset, utilizing data generated from hundreds of institutions of varying sizes and clinical practices increases the complexity and heterogeneity of the dataset and its analysis. Primary joint replacements such as THAs and TKAs are very homogenous surgeries with respect to approach, surgical technique, and operating room practices. This helps make these procedures more efficient but makes it more challenging to accurately predict their DOS or LOS based on patient features. This may highlight the importance of including certain operational factors available from institutional databases and not necessarily captured in national databases such as NSQIP. Other procedures with larger variance of DOS and LOS, such as revision arthroplasty or trauma cases, may be more amenable to such predictions based on these patient-related features.

Compared to the previous study which focused on using conventional and deep MLMs for TKAs, this study showed similar outcome results with slightly reduced performance for THAs. This may be explained by a few factors; TKAs have less variability in factors that aren’t captured in the NSQIP database such as surgical approach, patient positioning, and implant types, which may allow for better fitting models for DOS. Additionally, hospital-specific factors such as surgical team structure and experience may have a larger effect on the DOS of THAs, making their outcomes harder to predict. As the MSE of the neural network MLM for LOS was lower for THA, and the NSQIP data primarily pertained to patient and anaesthetic factors, this may suggest these are more strongly correlated with LOS following THA compared to TKA.

Despite the detailed analysis, this study has some limitations. Firstly, only patient factors were used in this study. This may greatly limit maximum accuracy of the DOS and LOS predictions. Incorporating institutional factors, such as hospital infrastructure, surgical team structure and surgeon training, into the modeling may further improve DOS and LOS predictions. Secondly, despite the ACS NSQIP database being large and detailed, there is variability in the quality and accuracy of data recording[40]. As such, this may affect the quality of the models generated and subsequently the accuracy of their predictions. For example, the performance of the tree-based models was likely due to overfitting of the models. This occurred despite hyperparameter tuning and branch and depth limitations set during hyperparameter tuning on the validation set.

In conclusion, this study has utilized conventional and deep MLMs to predict DOS and LOS for unilateral THAs using preoperative factors. Multiple statistical and forecasting practices were compared, and these outcomes were predicted most accurately using neural. Notably, with these factors, LOS prediction is superior to DOS prediction, highlighting the importance of feature and dataset selection when aiming to predict specific outcomes. Studies in the future should aim to incorporate institutional factors and test the real-world efficacy of these models at improving care efficiency through prospective clinical trials.

5 Conflicts of Interests and Funding Statement

Conflict of interest statement

The authors declare that there are no conflicts of interest.

Funding statement

No funding was used to conduct this study.

Acknowledgements:

Database: The American College of Surgeons National Surgical Quality Improvement Program and the hospitals participating in the ACS NSQIP are the source of the data used herein; they have not verified and are not responsible for the statistical validity of the data analysis or the conclusions derived by the authors.

Computations were performed on the Mist supercomputer at the SciNet HPC Consortium. SciNet is funded by: the Canada Foundation for Innovation; the Government of Ontario; Ontario Research Fund - Research Excellence; and the University of Toronto.

Footnotes

First author: Aazad Abbas, MD. Role: Co-investigator. Ideation. Data analysis. Manuscript preparation. Email: aazad.abbas{at}mail.utoronto.ca College Street Room 508-A, Toronto, ON, M5T 1P5, Canada Phone: 1-613-407-6991
Co-authors: Johnathan Robert Lex, MB ChB (Hons), MASc. Role: Co-investigator. Ideation. Manuscript preparation. Email: johnathanlex{at}gmail.com College Street Room 508-A, Toronto, ON, M5T 1P5, Canada Phone: 1-647-376-3537 Jay Toor, MD, MBA, FRCSC Role: Co-investigator. Ideation. Manuscript preparation. Email: jay.toor{at}mail.utoronto.ca Sherbrook Street, Winnipeg, MB R3A 1R9, Canada Phone: 1-416-918-9519 Elias B. Khalil, PhD. Role: Co-investigator. Ideation. Project supervision. Manuscript preparation. Email: khalil{at}mie.utoronto.ca 40 St George St Room BA8110, Toronto, ON, M5S 2E4, Canada Phone: 1-416-978-4025 Bheeshma Ravi, MD, PhD, FRCSC. Role: Co-investigator. Ideation. Project supervision. Manuscript preparation. Email: bheeshma.ravi{at}sunnybrook.ca 43 Wellesley St. E., Room 315, Toronto, ON, M4Y 1H1, Canada Phone: 1-416-967-8730 Cari Whyne, PhD, FIOR Role: Principal Investigator. Ideation. Project supervision. Manuscript preparation. Email: cari.whyne{at}sunnybrook.ca Sunnybrook Research Institute, Orthopaedics Biomechanics Lab, 2075 Bayview Avenue, Toronto, ON, M4N 3M5, Canada. Phone: 1-416-480-6100, ext. 5056.

References

1.↵
McDermott K, Liang L. Statistical Brief #281. Healthcare Cost and Utilization Project (HCUP). Agency for Healthcare Research and Quality. 2021.
Google Scholar
2.↵
Towheed TE, Hochberg MC. Health-related quality of life after total hip replacement. Seminars in Arthritis and Rheumatism. 1996;26: 483–491. doi: 10.1016/S0049-0172(96)80029-1.
OpenUrl CrossRef PubMed Web of Science Google Scholar
3.
Liu X, Zi Y, Xiang L, Wang Y. Total hip arthroplasty: areview of advances, advantages and limitations. Int J Clin Exp Med. 2015;8: 27–36.
OpenUrl PubMed Google Scholar
4.↵
Ng CY, Ballantyne JA, Brenkel IJ. Quality of life and functional outcome after primary total hip replacement. The Journal of Bone and Joint Surgery. British volume. 2007;89-B: 868-873. doi: 10.1302/0301-620X.89B7.18482.
OpenUrl Abstract/FREE Full Text Google Scholar
5.↵
Blue Cross Blue Shield Association. Blue Cross Blue Shield Association Study Reveals Extreme Cost Variations for Knee and Hip Replacement Surgeries. . 2015. Available: https://www.bcbs.com/news/press-releases/blue-cross-blue-shield-association-study-reveals-extreme-cost-variations-knee.
Google Scholar
6.↵
Dunlop DD, Manheim LM, Yelin EH, Song J, Chang RW. The costs of arthritis. Arthritis Rheum. 2003;49: 101–113. doi: 10.1002/art.10913.
OpenUrl CrossRef PubMed Web of Science Google Scholar
7.
Hill JO, Peters JC. Environmental Contributions to the Obesity Epidemic. Science. 1998. doi: 10.1126/science.280.5368.1371.
OpenUrl Abstract/FREE Full Text Google Scholar
8.
Abelson P, Kennedy D. The Obesity Epidemic. Science. 2004. doi: 10.1126/science.304.5676.1413.
OpenUrl Abstract Google Scholar
9.↵
Catenacci VA, Hill JO, Wyatt HR. The Obesity Epidemic. Clinics in Chest Medicine. 2009;30: 415–444. doi: 10.1016/j.ccm.2009.05.001.
OpenUrl CrossRef PubMed Web of Science Google Scholar
10.↵
Canadian Institute for Health Information. National Health Expenditure Trends, 2020. Canadian Institute for Health Information. 2021.
Google Scholar
11.
Kaye DR, Luckenbaugh AN, Oerline M, Hollenbeck BK, Herrel LA, Dimick JB, et al. Understanding the Costs Associated With Surgical Care Delivery in the Medicare Population. Ann Surg. 2020;271: 23–28. doi: 10.1097/SLA.0000000000003165.
OpenUrl CrossRef PubMed Google Scholar
12.
Tikkanen R. Multinational Comparisons of Health Systems Data, 2019. Commonwealth Fund. 2020.
Google Scholar
13.↵
[Anonymous]. Trends in health care spending. American Medical Association.
Google Scholar
14.↵
Lex JR, Abbas A, Oitment C, Wolfstadt J, Wong P, Abouali J, et al. A Dedicated Orthopedic Trauma Room Improves Efficiency While Remaining Financially Net Positive. J Orthop Trauma. 2022. doi: 10.1097/BOT.0000000000002461.
OpenUrl CrossRef Google Scholar
15.↵
Toor J, Saleh I, Abbas A, Abouali J, Wong P, Chan TCY, et al. An Anesthesia Block Room Is Financially Net Positive for a Hospital Performing Arthroplasty. J Am Acad Orthop Surg. 2022;30: e1058–e1065. doi: 10.5435/JAAOS-D-21-01217.
OpenUrl CrossRef PubMed Google Scholar
16.↵
Viberg N, Forsberg BC, Borowitz M, Molin R. International comparisons of waiting times in health care--limitations and prospects. Health Policy. 2013;112: 53–61. doi: 10.1016/j.healthpol.2013.06.013.
OpenUrl CrossRef PubMed Google Scholar
17.↵
Yelin E, Weinstein S, King T. The burden of musculoskeletal diseases in the United States. Semin Arthritis Rheum. 2016;46: 259–260. doi: 10.1016/j.semarthrit.2016.07.013.
OpenUrl CrossRef PubMed Google Scholar
18.↵
Losina E, Thornhill TS, Rome BN, Wright J, Katz JN. The dramatic increase in total knee replacement utilization rates in the United States cannot be fully explained by growth in population size and the obesity epidemic. The Journal of Bone and Joint Surgery.American volume. 2012;94: 201.
Google Scholar
19.↵
[Anonymous]. United States Bone and Joint Decade: The Burden of Musculoskeletal Diseases in the United States. 2nd ed. Rosemont, IL; 2010.
Google Scholar
20.↵
Muñoz E, Muñoz W, Wise L. National and surgical health care expenditures, 2005-2025. Ann Surg. 2010;251: 195-200. doi: 10.1097/SLA.0b013e3181cbcc9a.
OpenUrl CrossRef PubMed Google Scholar
21.↵
Cram P, Lu X, Kates SL, Singh JA, Li Y, Wolf BR. Total knee arthroplasty volume, utilization, and outcomes among Medicare beneficiaries, 1991-2010. JAMA. 2012;308: 1227- 1236. doi: 10.1001/2012.jama.11153.
OpenUrl CrossRef PubMed Web of Science Google Scholar
22.↵
Girardi FM, Liu J, Guo Z, Valle AGD, MacLean C, Memtsoudis SG. The impact of obesity on resource utilization among patients undergoing total joint arthroplasty. Int Orthop. 2019;43: 269–274. doi: 10.1007/s00264-018-4059-8.
OpenUrl CrossRef PubMed Google Scholar
23.↵
Daghistani TA, Elshawi R, Sakr S, Ahmed AM, Al-Thwayee A, Al-Mallah MH. Predictors of in-hospital length of stay among cardiac patients: A machine learning approach. International Journal of Cardiology. 2019;288: 140–147. doi: 10.1016/j.ijcard.2019.01.046.
OpenUrl CrossRef PubMed Google Scholar
24.↵
Ramkumar PN, Karnuta JM, Navarro SM, Haeberle HS, Iorio R, Mont MA, et al. Preoperative Prediction of Value Metrics and a Patient-Specific Payment Model for Primary Total Hip Arthroplasty: Development and Validation of a Deep Learning Model. The Journal of Arthroplasty. 2019;34: 2228–2234.e1. doi: 10.1016/j.arth.2019.04.055.
OpenUrl CrossRef PubMed Google Scholar
25.↵
Navarro SM, Wang EY, Haeberle HS, Mont MA, Krebs VE, Patterson BM, et al. Machine Learning and Primary Total Knee Arthroplasty: Patient Forecasting for a Patient-Specific Payment Model. The Journal of Arthroplasty. 2018;33: 3617–3623. doi: 10.1016/j.arth.2018.08.028.
OpenUrl CrossRef PubMed Google Scholar
26.↵
American College of Surgeons. ACS National Surgical Quality Improvement Program. . Available: http://www.facs.org/quality-programs/acs-nsqip.
Google Scholar
27.↵
Abbas A, Mosseri J, Lex JR, Toor J, Ravi B, Khalil EB, et al. Machine learning using preoperative patient factors can predict duration of surgery and length of stay for total knee arthroplasty. Int J Med Inform. 2022;158: 104670. doi: 10.1016/j.ijmedinf.2021.104670.
OpenUrl CrossRef Google Scholar
28.↵
Strum D, May J, Vargas L. Modeling the Uncertainty of Surgical Procedure Times: Comparison of Log-normal and Normal Models. Anesthesiology. 2000;92: 1160–1167. doi: 10.1097/00000542-200004000-00035.
OpenUrl CrossRef PubMed Web of Science Google Scholar
29.↵
Yeo I, Johnson RA. A new family of power transformations to improve normality or symmetry. Biometrika. 2000;87: 954–959. doi: 10.1093/biomet/87.4.954.
OpenUrl CrossRef Web of Science Google Scholar
30.↵
Hinton GE. Connectionist learning procedures. In: Anonymous Machine learning. : Elsevier; 1990. pp. 555-610.
Google Scholar
31.↵
Loken C, Gruner D, Groer L, Peltier R, Bunn N, Craig M, et al. SciNet: Lessons Learned from Building a Power-efficient Top-20 System and Data Centre. J Phys : Conf Ser. 2010;256: 012026. doi: 10.1088/1742-6596/256/1/012026.
OpenUrl CrossRef Google Scholar
32.
Ponce M, van Zon R, Northrup S, Gruner D, Chen J, Ertinaz F, et al. Deploying a Top-100 Supercomputer for Large Parallel Workloads: The Niagara Supercomputer. . 2019.
Google Scholar
33.↵
Liaw R, Liang E, Nishihara R, Moritz P, Gonzalez JE, Stoica I. Tune: A Research Platform for Distributed Model Selection and Training. arXiv preprint. 2018.
Google Scholar
34.↵
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. Advances in neural information processing systems. 2019: 8026–8037.
Google Scholar
35.↵
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research. 2011;12: 2825–2830.
OpenUrl Google Scholar
36.↵
Mekhaldi RN, Caulier P, Chaabane S, Chraibi A, Piechowiak S. Using Machine Learning Models to Predict the Length of Stay in a Hospital Setting. . 2020: 202–211.
Google Scholar
37.
Han C, Liu J, Wu Y, Chong Y, Chai X, Weng X. To Predict the Length of Hospital Stay After Total Knee Arthroplasty in an Orthopedic Center in China: The Use of Machine Learning Algorithms. Front Surg. 2021;8. doi: 10.3389/fsurg.2021.606038.
OpenUrl CrossRef Google Scholar
38.↵
Ramkumar PN, Karnuta JM, Navarro SM, Haeberle HS, Scuderi GR, Mont MA, et al. Deep Learning Preoperatively Predicts Value Metrics for Primary Total Knee Arthroplasty: Development and Validation of an Artificial Neural Network Model. The Journal of Arthroplasty. 2019;34: 2220–2227.e1. doi: 10.1016/j.arth.2019.05.034.
OpenUrl CrossRef PubMed Google Scholar
39.↵
McDermott KW, Freeman WJ, Elixhauser A. Overview of Operating Room Procedures During Inpatient Stays in U.S. Hospitals, 2014: Statistical Brief #233. In: Anonymous Healthcare Cost and Utilization Project (HCUP) Statistical Briefs. Rockville (MD): Agency for Healthcare Research and Quality (US); 2006.
Google Scholar
40.↵
Simon VC, Tucker NJ, Balabanova A, Parry JA. The accuracy of hip fracture data entered into the national surgical quality improvement program (NSQIP) database. Eur J Orthop Surg Traumatol. 2022. doi: 10.1007/s00590-022-03341-9.
OpenUrl CrossRef Google Scholar

Posted February 13, 2025.

Download PDF

Author Declarations

Supplementary Material

Data/Code

Citation Tools

Get QR code

Tweet Widget

Subject Area

Orthopedics

Reviews and Context

Comment

TRIP Peer Reviews

Community Reviews

Automated Services

Blogs/Media

Author Videos

Subject Areas

All Articles

Addiction Medicine (418)
Allergy and Immunology (740)
Anesthesia (217)
Cardiovascular Medicine (3174)
Dentistry and Oral Medicine (355)
Dermatology (268)
Emergency Medicine (469)
Endocrinology (including Diabetes Mellitus and Metabolic Disease) (1127)
Epidemiology (13144)
Forensic Medicine (17)
Gastroenterology (878)
Genetic and Genomic Medicine (4977)
Geriatric Medicine (458)
Health Economics (761)
Health Informatics (3131)
Health Policy (1115)
Health Systems and Quality Improvement (1156)
Hematology (418)
HIV/AIDS (988)
Infectious Diseases (except HIV/AIDS) (14448)
Intensive Care and Critical Care Medicine (895)
Medical Education (462)
Medical Ethics (121)
Nephrology (511)
Neurology (4724)
Nursing (250)
Nutrition (699)
Obstetrics and Gynecology (858)
Occupational and Environmental Health (772)
Oncology (2433)
Ophthalmology (692)
Orthopedics (272)
Otolaryngology (335)
Pain Medicine (315)
Palliative Medicine (88)
Pathology (523)
Pediatrics (1263)
Pharmacology and Therapeutics (535)
Primary Care Research (536)
Psychiatry and Clinical Psychology (4059)
Public and Global Health (7294)
Radiology and Imaging (1634)
Rehabilitation Medicine and Physical Therapy (974)
Respiratory Medicine (953)
Rheumatology (468)
Sexual and Reproductive Health (486)
Sports Medicine (409)
Surgery (527)
Toxicology (66)
Transplantation (225)
Urology (196)

Comments

medRxiv aims to provide a venue for anyone to comment on a medRxiv preprint. Comments are moderated for offensive or irrelevant content (this can take ~24 h). Please avoid duplicate submissions and read our Comment Policy before commenting. The content of a comment is not endorsed by medRxiv.

medRxiv aims to inform readers about online discussion of this preprint occurring elsewhere. The content at the links below is not endorsed by either medRxiv or the preprint's authors.

Community reviews for this article:

There are no community reviews for this paper.

Automated Evaluations

Certain services provide automated analysis of preprints. Analyses invited by the authors are displayed at the top of this tab. Those done independently of authors are shown underneath . None of these analyses is endorsed by medRxiv.

Automated Evaluations:

There are no automated evaluations for this paper.

[1] 1.↵
McDermott K, Liang L. Statistical Brief #281. Healthcare Cost and Utilization Project (HCUP). Agency for Healthcare Research and Quality. 2021.
Google Scholar

[2] 2.↵
Towheed TE, Hochberg MC. Health-related quality of life after total hip replacement. Seminars in Arthritis and Rheumatism. 1996;26: 483–491. doi: 10.1016/S0049-0172(96)80029-1.
OpenUrl CrossRef PubMed Web of Science Google Scholar

[3] 3.
Liu X, Zi Y, Xiang L, Wang Y. Total hip arthroplasty: areview of advances, advantages and limitations. Int J Clin Exp Med. 2015;8: 27–36.
OpenUrl PubMed Google Scholar

[4] 4.↵
Ng CY, Ballantyne JA, Brenkel IJ. Quality of life and functional outcome after primary total hip replacement. The Journal of Bone and Joint Surgery. British volume. 2007;89-B: 868-873. doi: 10.1302/0301-620X.89B7.18482.
OpenUrl Abstract/FREE Full Text Google Scholar

[5] 5.↵
Blue Cross Blue Shield Association. Blue Cross Blue Shield Association Study Reveals Extreme Cost Variations for Knee and Hip Replacement Surgeries. . 2015. Available: https://www.bcbs.com/news/press-releases/blue-cross-blue-shield-association-study-reveals-extreme-cost-variations-knee.
Google Scholar

[6] 6.↵
Dunlop DD, Manheim LM, Yelin EH, Song J, Chang RW. The costs of arthritis. Arthritis Rheum. 2003;49: 101–113. doi: 10.1002/art.10913.
OpenUrl CrossRef PubMed Web of Science Google Scholar

[7] 7.
Hill JO, Peters JC. Environmental Contributions to the Obesity Epidemic. Science. 1998. doi: 10.1126/science.280.5368.1371.
OpenUrl Abstract/FREE Full Text Google Scholar

[8] 8.
Abelson P, Kennedy D. The Obesity Epidemic. Science. 2004. doi: 10.1126/science.304.5676.1413.
OpenUrl Abstract Google Scholar

[9] 9.↵
Catenacci VA, Hill JO, Wyatt HR. The Obesity Epidemic. Clinics in Chest Medicine. 2009;30: 415–444. doi: 10.1016/j.ccm.2009.05.001.
OpenUrl CrossRef PubMed Web of Science Google Scholar

[10] 10.↵
Canadian Institute for Health Information. National Health Expenditure Trends, 2020. Canadian Institute for Health Information. 2021.
Google Scholar

[11] 11.
Kaye DR, Luckenbaugh AN, Oerline M, Hollenbeck BK, Herrel LA, Dimick JB, et al. Understanding the Costs Associated With Surgical Care Delivery in the Medicare Population. Ann Surg. 2020;271: 23–28. doi: 10.1097/SLA.0000000000003165.
OpenUrl CrossRef PubMed Google Scholar

[12] 12.
Tikkanen R. Multinational Comparisons of Health Systems Data, 2019. Commonwealth Fund. 2020.
Google Scholar

[13] 13.↵
[Anonymous]. Trends in health care spending. American Medical Association.
Google Scholar

[14] 14.↵
Lex JR, Abbas A, Oitment C, Wolfstadt J, Wong P, Abouali J, et al. A Dedicated Orthopedic Trauma Room Improves Efficiency While Remaining Financially Net Positive. J Orthop Trauma. 2022. doi: 10.1097/BOT.0000000000002461.
OpenUrl CrossRef Google Scholar

[15] 15.↵
Toor J, Saleh I, Abbas A, Abouali J, Wong P, Chan TCY, et al. An Anesthesia Block Room Is Financially Net Positive for a Hospital Performing Arthroplasty. J Am Acad Orthop Surg. 2022;30: e1058–e1065. doi: 10.5435/JAAOS-D-21-01217.
OpenUrl CrossRef PubMed Google Scholar

[16] 16.↵
Viberg N, Forsberg BC, Borowitz M, Molin R. International comparisons of waiting times in health care--limitations and prospects. Health Policy. 2013;112: 53–61. doi: 10.1016/j.healthpol.2013.06.013.
OpenUrl CrossRef PubMed Google Scholar

[17] 17.↵
Yelin E, Weinstein S, King T. The burden of musculoskeletal diseases in the United States. Semin Arthritis Rheum. 2016;46: 259–260. doi: 10.1016/j.semarthrit.2016.07.013.
OpenUrl CrossRef PubMed Google Scholar

[18] 18.↵
Losina E, Thornhill TS, Rome BN, Wright J, Katz JN. The dramatic increase in total knee replacement utilization rates in the United States cannot be fully explained by growth in population size and the obesity epidemic. The Journal of Bone and Joint Surgery.American volume. 2012;94: 201.
Google Scholar

[19] 19.↵
[Anonymous]. United States Bone and Joint Decade: The Burden of Musculoskeletal Diseases in the United States. 2nd ed. Rosemont, IL; 2010.
Google Scholar

[20] 20.↵
Muñoz E, Muñoz W, Wise L. National and surgical health care expenditures, 2005-2025. Ann Surg. 2010;251: 195-200. doi: 10.1097/SLA.0b013e3181cbcc9a.
OpenUrl CrossRef PubMed Google Scholar

[21] 21.↵
Cram P, Lu X, Kates SL, Singh JA, Li Y, Wolf BR. Total knee arthroplasty volume, utilization, and outcomes among Medicare beneficiaries, 1991-2010. JAMA. 2012;308: 1227- 1236. doi: 10.1001/2012.jama.11153.
OpenUrl CrossRef PubMed Web of Science Google Scholar

[22] 22.↵
Girardi FM, Liu J, Guo Z, Valle AGD, MacLean C, Memtsoudis SG. The impact of obesity on resource utilization among patients undergoing total joint arthroplasty. Int Orthop. 2019;43: 269–274. doi: 10.1007/s00264-018-4059-8.
OpenUrl CrossRef PubMed Google Scholar

[23] 23.↵
Daghistani TA, Elshawi R, Sakr S, Ahmed AM, Al-Thwayee A, Al-Mallah MH. Predictors of in-hospital length of stay among cardiac patients: A machine learning approach. International Journal of Cardiology. 2019;288: 140–147. doi: 10.1016/j.ijcard.2019.01.046.
OpenUrl CrossRef PubMed Google Scholar

[24] 24.↵
Ramkumar PN, Karnuta JM, Navarro SM, Haeberle HS, Iorio R, Mont MA, et al. Preoperative Prediction of Value Metrics and a Patient-Specific Payment Model for Primary Total Hip Arthroplasty: Development and Validation of a Deep Learning Model. The Journal of Arthroplasty. 2019;34: 2228–2234.e1. doi: 10.1016/j.arth.2019.04.055.
OpenUrl CrossRef PubMed Google Scholar

[25] 25.↵
Navarro SM, Wang EY, Haeberle HS, Mont MA, Krebs VE, Patterson BM, et al. Machine Learning and Primary Total Knee Arthroplasty: Patient Forecasting for a Patient-Specific Payment Model. The Journal of Arthroplasty. 2018;33: 3617–3623. doi: 10.1016/j.arth.2018.08.028.
OpenUrl CrossRef PubMed Google Scholar

[26] 26.↵
American College of Surgeons. ACS National Surgical Quality Improvement Program. . Available: http://www.facs.org/quality-programs/acs-nsqip.
Google Scholar

[27] 27.↵
Abbas A, Mosseri J, Lex JR, Toor J, Ravi B, Khalil EB, et al. Machine learning using preoperative patient factors can predict duration of surgery and length of stay for total knee arthroplasty. Int J Med Inform. 2022;158: 104670. doi: 10.1016/j.ijmedinf.2021.104670.
OpenUrl CrossRef Google Scholar

[28] 28.↵
Strum D, May J, Vargas L. Modeling the Uncertainty of Surgical Procedure Times: Comparison of Log-normal and Normal Models. Anesthesiology. 2000;92: 1160–1167. doi: 10.1097/00000542-200004000-00035.
OpenUrl CrossRef PubMed Web of Science Google Scholar

[29] 29.↵
Yeo I, Johnson RA. A new family of power transformations to improve normality or symmetry. Biometrika. 2000;87: 954–959. doi: 10.1093/biomet/87.4.954.
OpenUrl CrossRef Web of Science Google Scholar

[30] 30.↵
Hinton GE. Connectionist learning procedures. In: Anonymous Machine learning. : Elsevier; 1990. pp. 555-610.
Google Scholar

[31] 31.↵
Loken C, Gruner D, Groer L, Peltier R, Bunn N, Craig M, et al. SciNet: Lessons Learned from Building a Power-efficient Top-20 System and Data Centre. J Phys : Conf Ser. 2010;256: 012026. doi: 10.1088/1742-6596/256/1/012026.
OpenUrl CrossRef Google Scholar

[32] 32.
Ponce M, van Zon R, Northrup S, Gruner D, Chen J, Ertinaz F, et al. Deploying a Top-100 Supercomputer for Large Parallel Workloads: The Niagara Supercomputer. . 2019.
Google Scholar

[33] 33.↵
Liaw R, Liang E, Nishihara R, Moritz P, Gonzalez JE, Stoica I. Tune: A Research Platform for Distributed Model Selection and Training. arXiv preprint. 2018.
Google Scholar

[34] 34.↵
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. Advances in neural information processing systems. 2019: 8026–8037.
Google Scholar

[35] 35.↵
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research. 2011;12: 2825–2830.
OpenUrl Google Scholar

[36] 36.↵
Mekhaldi RN, Caulier P, Chaabane S, Chraibi A, Piechowiak S. Using Machine Learning Models to Predict the Length of Stay in a Hospital Setting. . 2020: 202–211.
Google Scholar

[37] 37.
Han C, Liu J, Wu Y, Chong Y, Chai X, Weng X. To Predict the Length of Hospital Stay After Total Knee Arthroplasty in an Orthopedic Center in China: The Use of Machine Learning Algorithms. Front Surg. 2021;8. doi: 10.3389/fsurg.2021.606038.
OpenUrl CrossRef Google Scholar

[38] 38.↵
Ramkumar PN, Karnuta JM, Navarro SM, Haeberle HS, Scuderi GR, Mont MA, et al. Deep Learning Preoperatively Predicts Value Metrics for Primary Total Knee Arthroplasty: Development and Validation of an Artificial Neural Network Model. The Journal of Arthroplasty. 2019;34: 2220–2227.e1. doi: 10.1016/j.arth.2019.05.034.
OpenUrl CrossRef PubMed Google Scholar

[39] 39.↵
McDermott KW, Freeman WJ, Elixhauser A. Overview of Operating Room Procedures During Inpatient Stays in U.S. Hospitals, 2014: Statistical Brief #233. In: Anonymous Healthcare Cost and Utilization Project (HCUP) Statistical Briefs. Rockville (MD): Agency for Healthcare Research and Quality (US); 2006.
Google Scholar

[40] 40.↵
Simon VC, Tucker NJ, Balabanova A, Parry JA. The accuracy of hip fracture data entered into the national surgical quality improvement program (NSQIP) database. Eur J Orthop Surg Traumatol. 2022. doi: 10.1007/s00590-022-03341-9.
OpenUrl CrossRef Google Scholar

Neural Networks Accurately Predict Precise Metrics of Hospital Resource Utilization for Total Hip Arthroplasty: A Retrospective Database Study

Abstract

1 Introduction