Use of Artificial Intelligence for Acquisition of Limited Echocardiograms: A Randomized Controlled Trial for Educational Outcomes

Evan Baum; Megha D. Tandel; Casey Ren; Yingjie Weng; Matthew Pascucci; John Kugler; Kathryn Cardoza; Andre Kumar

doi:10.1101/2023.04.12.23288497

Abstract

Background Point-of-care ultrasound (POCUS) machines may utilize artificial intelligence (AI) to enhance image interpretation and acquisition. This study investigates whether AI-enabled devices improve competency among POCUS novices.

Methods We conducted a randomized controlled trial at a single academic institution from 2021-2022. Internal medicine trainees (N=43) with limited POCUS experience were randomized to receive a POCUS device with (Echonous, N=22) or without (Butterfly, N=21) AI-functionality for two weeks while on an inpatient rotation. The AI-device provided automatic labeling of cardiac structures, guidance for optimal probe placement to acquire cardiac views, and ejection fraction estimations. Participants were allowed to use the devices at their discretion for patient-related care.

The primary outcome was the time to acquire an apical 4-chamber (A4C) image. Secondary outcomes included A4C image quality using the modified Rapid Assessment for Competency in Echocardiography (RACE) scale, correct identification of pathology, and participant attitudes. Measurements were performed at the time of randomization and at two-week follow-up. All scanning assessments were performed on the same standardized patient.

Results Both AI and non-AI groups had similar scan times and image quality scores at baseline. At follow-up, the AI group had faster scan times (72 seconds [IQR 38-85] vs. 85 seconds [IQR 54-166]; p=0.01), higher image quality scores (4.5 [IQR 2-5.5] vs. 2 [IQR 1-3]; p<0.01) and correctly identified reduced systolic function more often (85% vs 50%; p=0.02) compared to the non-AI group. Trust in the AI features did not differ between the groups pre- or post-intervention. The AI group did not report increased confidence in their abilities to obtain or interpret cardiac images.

Conclusions POCUS devices with AI features may improve image acquisition and interpretation by novices. Future studies are needed to determine the extent that AI impacts POCUS learning.

Introduction

Point-of-care ultrasonography (POCUS) describes the use of ultrasound by clinicians at the bedside to provide real-time diagnoses and assist with procedural interventions.¹ It has been shown to improve diagnostic accuracy, reduce procedural complications, reduce direct costs for healthcare organizations, and improve patient satisfaction by encouraging the clinician to be present at the bedside.^1–5

In response to the growing evidence supporting the use of POCUS for patient care, medical schools, residency programs, and professional societies have developed training programs for its safe and effective usage.^5–7 In addition, several organizations have established guidelines in an effort to standardize POCUS training.^3,8 Barriers to implementing more universal training measures include time constraints within training programs, a paucity of faculty credentialed for supervision, limiting funding, the need for established quality assurance protocols, and a lack of standardized assessments.^1,6,7,9 Due to these barriers, novice users continue to use POCUS with minimal training and oversight,^5,10 thus underscoring the need for alternative training methods with this ever-growing technology.

POCUS device manufacturers have begun to employ artificial intelligence (AI) to aid in image acquisition and interpretation, which may help address the current training barriers faced by clinicians.¹¹ This technology could become complementary to traditional teaching methods by offering augmented guidance of probe placement to improve a user’s manual dexterity and labeled images to aid in the rapid identification of pathology.¹² Previous investigations have shown that AI-assisted ultrasounds can aid novices in obtaining high quality cardiac images, accurately assessing ventricular function, and identifying non-trivial pericardial effusions.^13,14 One prospective study among emergency medicine trainees demonstrated that AI-augmented POCUS may increase the efficiency and accuracy of diagnosing pneumonia in pediatric populations.¹⁵ There are currently no randomized studies comparing POCUS image acquisition and interpretation with AI vs. non-AI equipped devices among inexperienced users.

This randomized, controlled trial aims to address several ongoing gaps in knowledge related to POCUS learning. First, we hypothesize that POCUS novices randomized to AI-enabled devices that aid in image acquisition and interpretation will be more efficient and proficient at these tasks than those without AI-enabled devices. Secondly, we hypothesize that those who used the AI-devices will feel more confident in their abilities due to the additional assistance this technology may provide during scanning.

Methods

Study Participants & Setting

We conducted a randomized controlled trial at a single academic institution from 6/2021-1/2022. All eligible residents were recruited via email explaining the general nature of the study. Our inclusion criteria included internal medicine residents rotating on the general inpatient wards service. We excluded residents who had taken an ultrasound elective offered by our residency program. At the time of the study, there was no formal POCUS credentialing pathway within the residency program. The Stanford University Institutional Review Board approved this investigation.

Study Design

Recruited residents (N=43) were randomized 1:1 to receive a POCUS device with AI-functionality (Echonous, N=22) or without (Butterfly, N=21) for two weeks (Figure 1). Outcomes were assessed at baseline and at two-week follow-up (see Outcomes below). Participants were allowed to use the devices at their discretion for patient-related care or self-directed learning. For privacy reasons, any saved patient images were not reviewed by the researchers and feedback was not provided on participants’ scans. At the time of randomization, all participants received written and verbal instructions on their device’s functionality and access to online learning modules regarding POCUS acquisition and interpretation (Appendix).

Figure 1.

Overview of Study. A4C, apical-4-chamber view.

Participants were instructed to use an electronic log each time they used their devices for any type of scan to track how frequently they were being used.

Devices

Two handheld POCUS devices were utilized in this study: Echonous^™ (AI) and Butterfly^™ (non-AI). These handheld devices were chosen for this study due to budgetary constraints and the unavailability of other machines. The AI device provided automatic labeling of anatomic structures, real-time guidance for optimal probe placement to acquire an apical-4-chamber view (A4C), and automatic left ventricular systolic function estimation using the apical windows (Appendix). The non-AI device did not provide these features (Appendix).

Outcomes

Our primary outcome was the time to acquire an A4C image. Secondary outcomes included the quality of captured A4C images, correct interpretation of pathological images, correct identification of anatomic structures, trainee confidence in POCUS, and their trust in the AI system.

Assessments/Surveys

Measurements were performed at the time of randomization (baseline) and at two-week follow-up. All of the scanning assessments were performed on the same standardized patient with the same probe (Butterfly), regardless of study arm. A study author was present for the scanning assessments to provide instruction and to set up the device, but they did not directly observe or comment on the images being acquired. For the primary outcome of scanning time, participants were instructed to notify the proctor when they had acquired an optimal A4C image for evaluation (which was saved for analysis). Participants were timed from the moment the probe touched the patient’s torso until proctor notification. For the secondary outcome of scan quality, we utilized the modified Rapid Assessment of Competency in Echocardiography (RACE) scale, which has excellent interrater reliability (α = 0.87) and has been previously validated as an assessment tool for image acquisition and quality with POCUS.^16,17,16,18 Two reviewers (AK and JK), who were blinded to the study arms, independently reviewed the baseline and follow-up assessment scans to assign RACE scores. The average scores between the two reviewers were used to create composite RACE scores for analyses.

Assessments of anatomic identification and pathological image interpretation were performed utilizing a HIPPA-compliant online survey platform (Qualtrics, Provo, UT) that was sent pre- and post-intervention. This assessment has been previously described by our study team and consists of short video clips with multiple choice answers.⁵ Surveys were administered alongside these assessments to the participants. These surveys assessed trainee attitudes toward POCUS, trust in the AI system, and their own confidence in acquiring images. Attitudes were measured using 5-point Likert scales. These surveys were based on previously described assessments.^5,10

Statistical Analysis

Baseline performance and attitudes were compared between participants in the AI group and those in the non-AI group.

Distributions of the scanning time were visualized in a Box/violin plot. Median and interquartile range (IQR) were reported and compared by the two randomized groups. We further performed the ANOVA (or Kruskal-Wallis) tests to compare the distributions by the two randomized groups at two-week follow-up visits. Similar analysis was performed for all secondary outcomes. To explore the potential changes of the outcomes over time, Wilcoxon signed rank test was applied to compare the differences of secondary outcomes from baseline to 2-week follow-up, by the two randomized groups, respectively. Median and IQR were reported for baseline and 2-week follow-up. Chi-square tests were performed to compare participants’ confidence levels.

All statistical tests were conducted using SAS 9.4 (Cary, NC), and a p-value <0.05 was considered statistically significant.

Results

Baseline Characteristics

There were a total of N=105 residents eligible for participation, of which N=43 responded to the email invitation to participate. No residents were excluded. Among the N=43 residents, N=22 were randomized to the AI device, while N=21 were randomized to the non-AI device.

Completion rates at follow-up for the scanning assessments were 77% (N=17) for the AI group and 95% (N=20) for the non-AI group (Table 1). Survey and image quiz completion rates at follow-up were 91% (N=20) for the AI group and 85% (N=18) for the non-AI group. Participant demographics and their prior POCUS experience are shown in Table 1.

View this table:

Table 1.

Participant Demographics and Completion Rates. AI, artificial intelligence; PGY, post-graduate year.

Primary Outcome: Time to Scan

For our primary outcome of time to acquire an A4C image at follow-up, the scanning times were significantly faster in the AI (median 57s [IQR: 32-75]) vs. non-AI groups (median 85.0s [IQR: 50-172]; p=0.01; Table 2). Both groups had similar median scanning times at baseline (AI 146s [IQR: 98-220] and non-AI 119s [IQR: 64-175]; p=0.21). On sub-analysis, the AI group significantly improved in their median scanning times pre-vs. post-intervention (p<0.01), while the non-AI group did not (p=0.26).

View this table:

Table 2.

Study Outcomes at Two-Week Follow-Up. A4C, Apical 4-Chamber View; IQR, Interquartile Range; LVSF, Left-Ventricular Systolic Function.

Secondary Outcomes

A. Image Quality

For the secondary outcome of A4C image quality at follow-up, the median RACE scores were significantly higher in the AI (4.5 points [IQR 2-5.5]) vs. the non-AI group (2 points [IQR 1-3]; p<0.01; Table 2). Both groups had statistically similar median RACE scores at baseline (AI: 3 points [IQR 2-4]; non-AI: 2 points [IQR: 1-3]; p=0.08).

B. Identification of Pathology and Anatomy

Overall, the AI and non-AI groups performed similarly on the follow-up image assessment for left ventricular systolic function and anatomic identification in the A4C view (AI median score: 100% [IQR: 90-100%] vs. non-AI median score: 90% [IQR: 80-100%]; p=0.09; Table 2). Notably, a greater proportion of the AI group correctly identified reduced left ventricular systolic function in the A4C view compared to the non-AI group on the two-week follow-up assessment (85% vs. 50%; p=0.02).

C. Device Usage

Participants in both arms were tracked on the frequency they used the devices using an electronic log (see Methods). Participants randomized to the AI device reported using the devices nearly twice as frequently as those randomized to a non-AI device (mean 6.6 times [SD 4.1] vs. 3.3 times [SD 2.3] ; p<0.01; Table 2).

D. Survey Results

The AI and non-AI participants reported similar confidence levels being able to obtain A4C images at two-week follow-up (Table 2), despite the AI group having significantly faster scan times and higher image quality scores. Similarly, both groups reported similar confidence in identifying normal and reduced LV systolic function (Table 2). When the pre-vs. post-intervention surveys for the AI group were compared, they did not report an increase in trust for the AI features (anatomic labeling, calculations of ejection fraction, and real-time guidance for optimal probe placement; Appendix). Both groups reported low to moderate trust in the AI system on the post-intervention surveys, including for features directly exhibited by the AI device (Table 2).

Discussion

As POCUS usage continues to expand,^7,19 AI-enabled devices represent a possible means to enhance competency among novice users and aid in interpretation.^13,14,20 In this randomized, controlled trial, we observed that internal medicine residents randomized to carry an AI-POCUS device for two weeks without feedback were able to obtain A4C views more quickly, had higher A4C image quality scores, and were more likely to identify reduced systolic function compared to residents who carried non-AI devices. Interestingly, the general trust in the AI system remained low to moderate in both groups, and the AI group did not report higher confidence in their skills despite outperforming the non-AI group. To our knowledge, this is the first randomized study that demonstrates that AI can improve scanning efficiency, acquired image quality, and pathological image interpretation.

The use of AI-enabled technologies to enhance proficiency in performance is gaining attention as an alternative to traditional teaching methods.^21–23 Previous non-randomized studies have shown that AI-assisted ultrasounds can aid in the acquisition of cardiac images and interpretation of reduced systolic function.^13,14 While our results support the hypothesis that AI-enabled devices can improve POCUS learning among novices, it is important to note the AI group in this study performed more scans overall. It could be argued this alone led to more deliberate practice and improved competency in the AI group. In support of this, previous investigations have shown that trainees can become proficient in acquiring cardiac and abdominal POCUS images in as few as 20-30 examinations.^16,18,24,25 However, it is important to consider whether such trainees would be considered “competent” with POCUS.¹⁸ Some authors have argued that the mastery of skills requiring manual dexterity may take substantially longer and require years of deliberate practice.^26,27 Furthermore, others have demonstrated that the degree of improvement for cardiac images is nominal between 0-10 scans (our groups had a mean difference of roughly three scans).^16,17,16,18

In this study, we found the participants had low to moderate levels of trust in the AI system, and the AI-group did not report increased trust in the system at follow-up. This lack of trust despite improved performance with AI is well described outside of POCUS,^22,28,29 which underscores the need for effective curricula on how to optimally integrate AI with clinical care.³⁰ Moreover, the AI group did not report higher confidence levels in their own skills despite significantly higher levels of performance. This finding is consistent with previous investigations that have demonstrated a novice’s actual ability to acquire ultrasound images does not correlate with their expressed confidence.¹⁰ Our results reinforce a concern that novices may have difficulty assessing their own competency with POCUS, which underscores the need for stringent training requirements and oversight as the technology and its usage expands.^18,31,32

There are several limitations of this study. It was conducted at a single academic site, which limits its generalizability. Due to the study design, participants were not blinded to the study outcomes or their study arm. Moreover, the relatively short-term follow up rate limits any conclusions regarding skill retention or overall improvement in competency. No formal feedback was provided to either study arm on image quality or acquisition techniques, even though this is thought to be an effective means of teaching POCUS.⁶ The scanning assessments were performed on a standardized patient with adequate cardiac windows and the image assessments were performed using idealized POCUS images. Therefore, these findings may not reflect real world practice wherein clinicians obtain images and evaluate them at bedside, often in patients with difficult anatomy. Nevertheless, these results represent an intriguing implementation of AI-enabled POCUS, with future studies being warranted to investigate its applications in medical training and cardiac image acquisition.

In conclusion, POCUS novices randomized to carry AI-enabled devices for two weeks for patient care were able to obtain cardiac images more quickly, had higher image quality scores, and more accurately identified reduced systolic function at two week-follow-up. However, they continued to have low to moderate trust in the device’s AI features despite superior performance. Future studies should focus on how AI impacts long-term POCUS learning and the retention of skills.

Data Availability

ll data produced in the present study are available upon reasonable request to the authors

Acknowledgements

None

Footnotes

Conflict of Interests: Dr. Kumar reports receiving consultant fees from Vave Health, which is unrelated to this body of work. The other authors do not have any financial interests to disclose. This study was an investigator-initiated study. None of the device manufacturers used in this study oversaw, influenced, or reviewed the data presented prior to submission. The authors vouch for the independent nature by which this investigation was conducted and disseminated, without influence from outside organizations.

Abbreviations

A4C: (Apical-4-chamber)
AI: (artificial intelligence)
POCUS: (point-of-care ultrasound)
RACE: (rapid assessment for competency in echocardiography)

References

1.↵
Díaz-Gómez JL, Mayo PH, Koenig SJ. Point-of-Care Ultrasonography. N Engl J Med 2021;385(17):1593–602.
OpenUrl PubMed
2.
Kumar A, Liu G, Chi J, Kugler J. The Role of Technology in the Bedside Encounter. Med Clin North Am 2018;102(3):443–51.
OpenUrl
3.↵
Qaseem A, Etxeandia-Ikobaltzeta I, Mustafa RA, et al. Appropriate Use of Point-of-Care Ultrasonography in Patients With Acute Dyspnea in Emergency Department or Inpatient Settings: A Clinical Guideline From the American College of Physicians. Ann Intern Med 2021;174(7):985–93.
OpenUrl
4.
Baribeau Y, Sharkey A, Chaudhary O, et al. Handheld Point-of-Care Ultrasound Probes: The New Generation of POCUS. J Cardiothorac Vasc Anesth 2020;34(11):3139–45.
OpenUrl
5.↵
Kumar A, Weng Y, Wang L, et al. Portable Ultrasound Device Usage and Learning Outcomes Among Internal Medicine Trainees: A Parallel-Group Randomized Trial. J Hosp Med 2020;15(2):e1–6.
OpenUrl
6.↵
Kumar A, Kugler J, Jensen T. Evaluation of Trainee Competency with Point-of-Care Ultrasonography (POCUS): a Conceptual Framework and Review of Existing Assessments. J Gen Intern Med 2019; http://dx.doi.org/10.1007/s11606-019-04945-4. doi:10.1007/s11606-019-04945-4
OpenUrl CrossRef
7.↵
Williams JP, Nathanson R, LoPresti CM, et al. Current use, training, and barriers in point-of-care ultrasound in hospital medicine: A national survey of VA hospitals. J Hosp Med 2022;17(8):601–8.
OpenUrl
8.↵
Soni NJ, Tierney DM, Jensen TP, Lucas BP. Certification of Point-of-Care Ultrasound Competency. J Hosp Med 2017;12(9):775–6.
OpenUrl CrossRef
9.↵
Díaz-Gómez JL, Frankel HL, Hernandez A. National Certification in Critical Care Echocardiography: Its Time Has Come. Crit Care Med 2017;45(11):1801–4.
OpenUrl
10.↵
Buesing J, Weng Y, Kugler J, et al. Handheld Ultrasound Device Usage and Image Acquisition Ability Among Internal Medicine Trainees: A Randomized Trial. J Grad Med Educ 2021;13(1):76–82.
OpenUrl
11.↵
Wang H, Uraco AM, Hughes J. Artificial Intelligence Application on Point-of-Care Ultrasound. J Cardiothorac Vasc Anesth 2021;35(11):3451–2.
OpenUrl
12.↵
Sonko ML, Arnold TC, Kuznetsov IA. Machine Learning in Point of Care Ultrasound. POCUS J 2022;7(Kidney):78–87.
OpenUrl
13.↵
Narang A, Bae R, Hong H, et al. Utility of a Deep-Learning Algorithm to Guide Novices to Acquire Echocardiograms for Limited Diagnostic Use. JAMA Cardiol 2021;6(6):624–32.
OpenUrl
14.↵
Cheema BS, Walter J, Narang A, Thomas JD. Artificial Intelligence-Enabled POCUS in the COVID-19 ICU: A New Spin on Cardiac Ultrasound. JACC Case Rep 2021;3(2):258–63.
OpenUrl
15.↵
Nti B, Lehmann AS, Haddad A, Kennedy SK, Russell FM. Artificial Intelligence-Augmented Pediatric Lung POCUS: A Pilot Study of Novice Learners. J Ultrasound Med 2022;41(12):2965–72.
OpenUrl
16.↵
Millington SJ, Arntfield RT, Hewak M, et al. The Rapid Assessment of Competency in Echocardiography Scale: Validation of a Tool for Point-of-Care Ultrasound. J Ultrasound Med 2016;35(7):1457–63.
OpenUrl Abstract/FREE Full Text
17.↵
Bahner DP, Adkins EJ, Nagel R, Way D, Werman HA, Royall NA. Brightness mode quality ultrasound imaging examination technique (B-QUIET): quantifying quality in ultrasound imaging. J Ultrasound Med 2011;30(12):1649–55.
OpenUrl Abstract/FREE Full Text
18.↵
Kumar, A., Jensen, T., Kugler, J. Evaluation of trainee competency with point-of-care ultrasonography (POCUS): a conceptual framework and review of existing assessments. J Gen Intern Med 2019;[Epub ahead of print].
19.↵
Nathanson R, Williams JP, Gupta N, et al. Current Use and Barriers to Point-of-Care Ultrasound in Primary Care: A National Survey of VA Medical Centers. Am J Med 2023; https://doi.org/10.1016/j.amjmed.2023.01.038. doi:10.1016/j.amjmed.2023.01.038
OpenUrl CrossRef
20.↵
Shokoohi H, LeSaux MA, Roohani YH, Liteplo A, Huang C, Blaivas M.Enhanced Point-of-Care Ultrasound Applications by Integrating Automated Feature-Learning Systems Using Deep Learning. J Ultrasound Med 2019;38(7):1887–97.
OpenUrl
21.↵
Kumar A, Aikens RC, Hom J, et al. OrderRex clinical user testing: a randomized trial of recommender system decision support on simulated cases. J Am Med Inform Assoc 2020;27(12):1850–9.
OpenUrl
22.↵
Chiang J, Kumar A, Morales D, et al. Physician Usage and Acceptance of a Machine Learning Recommender System for Simulated Clinical Order Entry. AMIA Summits Transl Sci Proc 2020;2020:89.
OpenUrl
23.↵
Helm JM, Swiergosz AM, Haeberle HS, et al. Machine Learning and Artificial Intelligence: Definitions, Applications, and Future Directions. Curr Rev Musculoskelet Med 2020;13(1):69–76.
OpenUrl PubMed
24.↵
Millington SJ, Hewak M, Arntfield RT, et al. Outcomes from extensive training in critical care echocardiography: Identifying the optimal number of practice studies required to achieve competency. J Crit Care 2017;40:99–102.
OpenUrl
25.↵
Millington SJ, Arntfield RT, Guo RJ, et al. The Assessment of Competency in Thoracic Sonography (ACTS) scale: validation of a tool for point-of-care ultrasound. Crit Ultrasound J 2017;9(1):25.
OpenUrl
26.↵
Anders Ericsson K, Hoffman RR, Kozbelt A, Mark Williams A. The Cambridge Handbook of Expertise and Expert Performance. Cambridge University Press; 2018.
27.↵
Chi M. The nature of expertise. Psychology Press; 2014.
28.↵
Chen M, Zhang B, Cai Z, et al. Acceptance of clinical artificial intelligence among physicians and medical students: A systematic review with cross-sectional survey. Front Med 2022;9:990604.
OpenUrl
29.↵
Nundy S, Montgomery T, Wachter RM. Promoting Trust Between Patients and Physicians in the Era of Artificial Intelligence. JAMA 2019;322(6):497–8.
OpenUrl PubMed
30.↵
Çaliskan SA, Demir K, Karaca O. Artificial intelligence in medical education curriculum: An e-Delphi study for competencies. PLoS One 2022;17(7):e0271872.
OpenUrl
31.↵
Lucas BP, Tierney DM, Jensen TP, et al. Credentialing of Hospitalists in Ultrasound-Guided Bedside Procedures: A Position Statement of the Society of Hospital Medicine. J Hosp Med 2018;13(2):117–25.
OpenUrl CrossRef
32.↵
Bahner DP, Goldman E, Way D, Royall NA, Liu YT. The state of ultrasound education in U.S. medical schools: results of a national survey. Acad Med 2014;89(12):1681–6.
OpenUrl CrossRef PubMed