PT - JOURNAL ARTICLE AU - Martin, Sam AU - Beecham, Emma AU - Kursumovic, Emira AU - Armstrong, Richard A. AU - Cook, Tim M. AU - Déom, Noémie AU - Kane, Andrew D. AU - Moniz, Sophie AU - Soar, Jasmeet AU - Vindrola-Padros, Cecilia AU - collaborators TI - Comparing human vs. machine-assisted analysis to develop a new approach for Big Qualitative Data Analysis AID - 10.1101/2024.07.16.24310275 DP - 2024 Jan 01 TA - medRxiv PG - 2024.07.16.24310275 4099 - http://medrxiv.org/content/early/2024/07/17/2024.07.16.24310275.short 4100 - http://medrxiv.org/content/early/2024/07/17/2024.07.16.24310275.full AB - Background Analysing large qualitative datasets can present significant challenges, including the time and resources required for manual analysis and the potential for missing nuanced insights. This paper aims to address these challenges by exploring the application of Big Qualitative (Big Qual) and artificial intelligence (AI) methods to efficiently analyse Big Qual data while retaining the depth and complexity of human understanding. The free-text responses from the Royal College of Anaesthetists’ 7th National Audit Project (NAP7) baseline survey on peri-operative cardiac arrest experiences serve as a case study to test and validate this approach.Methodology/Principal Findings Quantitative analysis segmented the data and identified keywords using AI methods. In-depth sentiment and thematic analysis combined natural language processing (NLP) and machine learning (ML) with human input - researchers assigned topic/theme labels and sentiments to responses, while discourse analysis explored sub-topics and thematic diversity. Human annotation refined the machine-generated sentiments, leading to an additional “ambiguous” category to capture nuanced, mixed responses. Comparative analysis was used to evaluate the concordance between human and machine-assisted sentiment labelling. While ML reduced analysis time significantly, human input was crucial for refining sentiment categories and capturing nuances.Conclusions/Significance The application of AI-assisted data analysis tools, combined with human expertise, offers a powerful approach to efficiently analyse large-scale qualitative datasets while preserving the nuance and complexity of the data. This study demonstrates the potential of this novel methodology to streamline the analysis process, reduce resource requirements, and generate meaningful insights from Big Qual data. The integration of NLP, ML, and human input allows for a more comprehensive understanding of the themes, sentiments, and experiences captured in free-text responses. This study underscores the importance of continued interdisciplinary collaboration among domain experts, data scientists, and AI specialists to optimise these methods, ensuring their reliability, validity, and ethical application in real-world contexts.Author Summary The use of Artificial intelligence (AI) in health research has grown over recent years. However, analysis of large qualitative datasets known as Big Qualitative Data, in public health using AI, is a relatively new area of research. Here, we use novel techniques of machine learning and natural language processing where computers learn how to handle and interpret human language, to analyse a large national survey. The Royal College of Anaesthetists’ 7th National Audit Project is a large UK-wide initiative examining peri- operative cardiac arrest. We use the free-text data from this survey to test and validate our novel methods and compare analysing the data by hand (human) vs. human-machine learning also known as ‘machine-assisted’ analysis. Using two AI tools to conduct the analysis we found that the machine- assisted analysis significantly reduced the time to analyse the dataset. Extra human input, however, was required to provide topic expertise and nuance to the analysis. The AI tools reduced the sentiment analysis to positive, negative or neutral, but the human input introduced a fourth ‘ambiguous’ category. The insights gained from this approach present ways that AI can help inform targeted interventions and quality improvement initiatives to enhance patient safety, in this case, in peri-operative cardiac arrest management.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThe project infrastructure was supported financially and with staffing from the Royal College of Anaesthetists. The NAP7 fellows salaries were supported by: South Tees Hospitals NHS Foundation Trust (AK); Royal United Hospitals Bath NHS Foundation Trust (EK); NIHR Academic Clinical Fellowship (RA). JS and TCs employers receive backfill for their time on the project (4 hours per week). NAP7 panel members were not paid for their role. EB SM and CVP were supported by the NIHR Central London Patient Safety Research Collaboration (CL PSRC) reference number NIHR204297.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:All parts of the NAP7 are classified as a service evaluation as there is no intervention no randomisation of patients and no change to standard patient care or treatment. The project is observational and does not require research ethics committee approval in line with the Health Research Agencys decision tools.I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesAll data for our study is based on secondary data analysis of the data shared in the data available in online Supporting Information from the following publication Kane et al (2022) Methods of the 7th National Audit Project (NAP7) of the Royal College of Anaesthetists: peri-operative cardiac arrest. Anaesthesia 77: 1376-1385.