Abstract
An exponential growth of literature about novel coronavirus disease 19 (COVID-19) has been observed in the last few months. This textual analysis of 5,780 publications extracted from the Web of Science, Medline, and Scopus databases was performed to explore the current research focuses and propose further research agenda. The Latent Dirichlet allocation was used for topic modeling. Regression analysis was conducted to examine country variations in the research focuses. Results indicated that publications were mainly contributed by the United States, China, and European countries. Guidelines for emergency care and surgical, viral pathogenesis, and global responses in the COVID-19 pandemic were the most common topics. There was variation in the research approaches to mitigate COVID-19 problems in countries with different income and transmission levels. Findings highlighted the need for global research collaboration among high- and low/middle-income countries in the different stages of prevention and control the pandemic.
Introduction
Novel coronavirus disease 19 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), is currently threatening millions of lives in the world. After being introduced at the end of 2019, this disease was officially declared as a global pandemic by the World Health Organization (WHO) on March 11, 2020.1 Until April 30, 2020, 185 countries/territories reported 3·2 million confirmed cases with 227,847 total deaths,2 and the highest-burden has been placed in European and American countries.1 Serious health, social and economic consequences of COVID-19 have been well-recognized,3-7 especially among the elderly with comorbidities, homeless individuals, and also residents who face financial, mental, and physical hardships due to social distancing policies.8
Given that COVID-19 is a new threat without any antiviral therapies or vaccines, current measures to mitigate this crisis depend heavily on the national and regional preparedness and responses.9 However, optimal strategies to cope with the complexity of this pandemic demands substantial scientific evidence. Recently, the WHO has issued technical guidance for countries/regions and research institutions, as well as worked closely with global researchers to update the empirical evidence.10,11 Efforts have been made around the globe to enhancethe understanding of the dynamic transmission of COVID-19, appropriate vaccine research strategies effective treatments, as well as the impacts of current responses on population health and well-being.12 As a result, in the last four months, the number of publications about COVID-19 has been increased dramatically, including articles, reviews, letters to editors, or preprint documents.13 These contributions have proven the importance of scientific research in pandemic preparedness as well as helping the governments to respond rapidly and effectively to the crisis.14
Cumulatively, current research evidence has partly shaped our knowledge about COVID-19, but there has been still raising more questions to address, which require sharing information and providing scientific expertise and leadership of all countries to accelerate research efforts.15 Yet, there has been a the lack of studies attempted to identify the research focus in different countries and regions. Previous systematic reviews have been conducted to explore in detail the clinical characteristics of COVID-19,16,17 or the effectiveness of specific COVID-19 treatment,18,19 and policies.20 However, these studies were conducted on a small volume of articles. One potential approach to delineate the focus of the academic community is bibliometric analysis. This method has been previously applied to identify the rapid development of research output, the most prolific countries, journal information, and major concerns of research, but not provide a deep analysis of publications.21-23 In this study, by using text visualization and topic modeling approaches as a part of natural language processing and machine learning, we aimed to explore the research focus in general and in countries with different levels of income and COVID-19 transmission features. Findings might provide a knowledge gap in scientific research related to COVID-19 and future direction to inform research focus and policy responses better.
Materials and Methods
Searching strategy and selection criteria
Information on COVID-19 and SARS-CoV-2-related documents published until 23 April 2020 were extracted from the Medline, Scopus, and Web of Science (WoS) databases. The search terms and search queries for each online database were developed according to the WHO naming process for the virus and the disease it causes,24 and are presented in the Supplementary 1. Any English-language publications containing COVID-19 disease or SARS-CoV-2 virus published from December 2019 to 23 April 2020 were included. Document types such as corrections, data papers, reprints, or conference papers were excluded because they might be duplicated to peer-review papers. Datasets of three databases were merged, and duplications were screened independently and removed by two researchers. A final dataset of 5,780 papers was used for further analysis. The searching process was presented in Figure 1.
Data analysis
In this paper, we extracted the documents’ title, abstract, keywords, citation, and affiliation of authors for analysis. As a document could be authored by scholars from different countries, we considered that all these countries contributed to the document preparation. Moreover, we decided to include both documents with and without abstracts for text analysis since the title of the document could partly reflect the document’s topic. We first descriptively analyzed the number of publications in each country and presented these data by using Microsoft Excel’s Map function. Then, we exported the top ten most cited publications for a detailed analysis of the content of the most cited papers.
We used the VOSviewer textual analysis software tool (version 1·6·15, Centre for Science and Technology Studies, Leiden University, the Netherlands) to measure the co-occurrence of keywords and most frequent terms in title/abstract.25,26 Then, we employed Latent Dirichlet allocation (LDA) to discover fifteen latent topics from the titles and abstracts of documents. This Bayesian model treats each document as a set of topics, and topics are probability distributed over a set of words and their co-occurrence.27 Thus, the LDA technique can produce two outputs: 1) Probability distributions of different topics per document (to acknowledge how many topics are created based on the given publications), and 2) Probability distributions of unique words per topic (to define the topics).27 Because each title/abstract may contain a mixture of topics, the LDA outputs may not reflect a specific research field or discipline. However, experience for previous work suggested that documents that focused on particular themes would be more likely to be categorized in the same group. To ensure robust results in naming the topic, we also checked at least ten documents per topic to ensure that the theme could fit the content of documents.
Multivariable linear regression models were performed to examine the research focus of countries with different income classification (low, low-middle, high-middle, and high income – according to World Bank classification),28 and different transmission classification (Pending, Sporadic case, Clusters of cases, Community transmission – according to WHO).29 The dependent variable was the share of publications in specific topic out of total publications in each country (%), while the independent variables were income classifications and transmission classifications. The models were adjusted to the natural logarithm of gross domestic product (GDP) per capita, the number of COVID-19 cases, and the number of COVID-19 deaths per country. The latest data on GDP per capita and income classifications were collected from the World Bank databases, while data on cases deaths were extracted from WHO reports on April 24, 2020. A p-value of less than 0.05 was used to detect statistical significance.
Role of the funding source
Research is supported by Vingroup Innovation Foundation (VINIF) in project code VINIF.2020.COVID-19.DA03
Results
Figure 2 shows the research productivity of each country. 115 countries produced 5,780 publications in the searching period. It appears that scientific publications were mainly driven by the research hubs such as China, the United States, Canada, France, Italy, the United Kingdom, and India, which were also heavily hit by COVID-19. In contrast, the majority of African countries had no more than ten studies about COVID-19.
The list of ten most cited publications about SARS-CoV-2 and COVID-19 and their main findings are presented in Table 1. Reports on the clinical and laboratory characteristics of the confirmed cases are of most interest, with six out of ten papers in the list. The most cited paper was a descriptive study about epidemiological and clinical features of 99 cases from Wuhan, China, which was believed to be the genesisof SARS-CoV-2.
Top ten most cited papers
Figure 3 presents the network of 200 keywords with a co-occurrence of at least 20 times. The keywords were assigned to three major clusters. Cluster 1 (blue) reveals some basic imaging techniques for the diagnosis of lung function impairments (tomography and thorax radiograph) in children, adolescents, and adults. Cluster 2 (red) refers to the major concerns of the world regarding COVID-19, such as prevention, medicine, and public health response. Cluster 3 (green) focuses on the biology of SARS-CoV-2, including the origin, the phylogenetic network, and the genomic, proteomic, and metabolomic characterization of the virus.
Thematic analysis of 250 most frequent terms is presented in Figure 4. Major themes of current research on COVID-19 are (1) promising therapies for COVID-19 prevention and treatment, and their mechanisms (blue), (2) hot spots of the pandemic and the governments’ responses (red), and (3) clinical patterns, such as ground-glass opacities (GGO), and complications of COVID-19 (green).
Figure 5 shows the clustering of research areas in the WOS database. The research landscapes in this research field was the combination of several research areas. First cluster was Infectious diseases and Pharmacology. This cluster had a close connection with Surgery and Gastroenterology (second cluster). The third cluster related to treatment and diagnosis (such as Radioogy, Hematology, Virology, Psychiatry, Gerontology, or Metanolism). The other clusters in COVID-19 research areas, including 1) Critical car and Respiratory System (the fourth cluster); 2) Health care service and health policy (the fifth cluster); 3) Microbiology and Immunology (the sixth cluster); 4) Oncology and Experimental Research (the seven cluster); and 5) Biology.
The LDA results are presented in Table 2. Since the beginning of the pandemic, researchers have devoted special attention to the biology of SARS-CoV-2 (Topic 3 and 4) and made an enormous effort on various aspects of clinical investigation, such as diagnostic tests for detection of the virus, clinical examination of hospitalized patients and treatment for the disease (Topic 5, 7, 8, 9, 10, 11, and 15). Meanwhile, research on global and national responses to COVID-19 accounted for nearly a quarter of available literature volume (Topic 2, 12, and 13). Epidemiological characterization of COVID-19 and psychological disorders during the epidemic are also of great interest (Topic 1, 6, and 14).
Fifteen topics about COVID-19 according to topic modeling
The research focuses on countries with different income level and epidemic characteristics are shown in Table 3. High-income countries (HICs) were significantly more likely to report fewer studies focused on research in epidemiological characteristics and interventions of Psychological disorders in COVID-19 pandemic (Topic 6) compared with countries with other income levels. Low-middle income countries were found to publish less on diagnostic values of SARS-CoV-2 tests and improvement strategies (Topic 10). Treatment interventions for COVID-19 (Topic 15) attracted the interest of scientists among countries at all income levels, especially in HICs.
Regression models to identify the research trend among countries with different income level and epidemic characteristics
Regarding transmission classification, comorbidities in patients with COVID-19 (Topic 8) were found to receive less attention among countries with sporadic cases in comparison with countries having “pending” transmission classification. Treatment interventions were significantly less likely to be the research focused in countries having sporadic cases, a cluster of cases, and community transmission compared with those with “pending” transmission classification.
Discussion
By using the natural language processing approach with the Latent Dirichlet allocation, this study was able to capture the focus of COVID-19 related publications in different settings. This paper informed the rapid growth of research publications, the global variation in research productivity, and research interests. Moreover, the findings of this study suggest research gaps.
In this study, we found a greater number of publications regarding COVID-19 and SARS-CoV-2 in comparison with previous bibliometric studies.21-23 For example, Lou et al. used the Medline database and only found 183 publications through February 29, 2020.22 This disparity could be justified that our search was far more comprehensive than these studies by usingthree major databases, including the Medline, Scopus, and Web of Science. In addition, we included other document types such as letters, commentaries, or notes rather than concentrating only on original articles. As original papers require a long period for peer-reviewed,30 scientists tended to publish their ideas in those document types first for receiving rapid feedbacks from others,31 Therefore, we believed that our approach was appropriate given that these documents might partly reflect the research focus in each country.
The thematic maps of authors’ keywords and terms reveal that major research themes included virological and molecular analysis of the virus; clinical, laboratory and radiology examinations; and global and public health responses. Our findings are in line with a previous bibliometric study, which showed that virology, clinical characteristics, and epidemiology of COVID-19 were found to be the research focus with the highest volume of papers.22 Indeed, these research areas are essential components for controlling the pandemic. While understanding the biology of SARS-CoV-2 is critical for the development of effective and safe screening tests, drugs and vaccines. Investigations into clinical characteristics of COVID-19, along with results of laboratory and radiology, could inform a fundamental for appropriate patient management in the clinical settings. Research on public health responses could illustrate the effectiveness of different policies and strategies to mitigate the consequences of COVID-19 pandemic.32-34
The results of topic modeling offer more penetrating insights into the emerging research themes. Of all identified topics, clinical aspects, particularly guidelines for emergency care and surgical management during the COVID-19 pandemic (Topic 11), were most frequent. Along with the rapid increase in the number of confirmed cases, the heavy demand for health facilities and health workers, as well as the lack of effective treatment regimens, place a heavy burden and prevent the healthcare systems from operating efficiently. Without guidelines for prompt responses in emergency care, the burden caused by COVID-19 would go beyond the capacity of most health systems, especially for ICU care.35 In addition, a number of SARS-CoV-2 infections emerged from operations were reported in China, suggesting the risk of virus exposure despite strict hygienic requirements and aseptic techniques during the surgical process.36 Research for clinical guidelines, therefore, plays a critical role in mitigating the impact of COVID-19 on the healthcare system.
The origin and pathophysiology of the virus have attracted a great deal of attention since the beginning of the outbreak.37-39 The interest on this topic has continued to rise as the virus has gone beyond China, where the first infection was reported, and positive cases have been found in most countries and territories.40 On the other hand, the information that SARS-CoV-2 is a laboratory derived virus, albeit has been confirmed to be a false claim, gave rise to considerable controversy and also facilitated research on the nature of the virus 41. Another topic that should be mentioned is national public health responses and actions against COVID-19, especially at the beginning of the pandemic, when there was a wide difference in policies introduced by different governments. In particular, some countries have been advocating achieving herd immunity, whereas low- and middle-income countries (LMICs) have implemented strict actions, including quarantine, isolation, social distancing, and community containment as soon as the outbreak occurred 42-45. Although such measures have sdemonstrated their effectiveness, for optimal public health as well as economic outcomes, further investigation into their implementation within specific contextual factors should be prioritized.46 Moreover, continued medical training for healthcare workers 47 and preventive measures for the workforce,48 along with frequent transparent communication and educational interventions for the public is essential to strengthen the preventive capacity of each individual and thus, contribute to the global fight against COVID-19. Meanwhile, since COVID-19 has been reported to have no noticeable effect on pregnancy, research on COVID-19 among pregnant women received relatively slight interest.49
Regarding the research trend and level of interest of different country groups, it appeared that the share of publications regarding the topic about psychological health and related interventions was negatively associated with the income levels. This finding might imply that this topic might not be the priority of the countries, or in other words, developed nations even show less interest than the ones having lower-income.50 However, we do believe that most of the studies about this topic were cross-sectional surveys in the community, which were more affordable for low-income countries to perform compared to other topics. Nevertheless, COVID-19 caused a significantly psychiatric impact,51 and this impact maintained when the total number of COVID-19 cases continues to rise in a country.52 Developed nations are not immune from mental health issues and mental health services are often disrupted during COVID-19 pandemic.53 In terms of treatment interventions, although all countries are making an effort to develop an effective treatment regimen, high-income countries, with their vast financial resources, greater expertise, and infrastructure, demonstrated their bold attempt at this research area.54,55 Meanwhile, compared to low-income nations, we observed a lower share of SARS-CoV-2 test-related publications among low-middle income countries, suggesting that these countries prioritized to other research fields such as treatment interventions given the resource-constrained.56
Findings also suggested that research on comorbidities associated with COVID-19 is relatively underdeveloped in countries with sporadic cases, in contrast with the extensive understanding and research on the effects of comorbidities on COVID-19 of those countries with a high number of infections.57,58 On the other hand, treatment interventions were not prioritized, while the level of transmission increases. Although some high-income countries such as the United States, Canada or the United Kingdom were classified as “community transmission” and greatly contributed to the progress of finding treatment interventions, most of the nations in this categories were low-middle income countries (e.g. South American and African countries) and the governments tends to focus on preventive methods to prevent the pandemic from getting worse.2
This study has several implications. To begin with, while rapid transmission of COVID-19 has been triggering a strong need for the development of an effective vaccine, our results show minimalresearch on this research topic. In addition, since there have been anecdotalevidence that promising drugs for COVID-19, such as Lopinavir/ritonavir (LPV/RTV). Chloroquine (CQ) and hydroxychloroquine (H0), show no significant benefits to health outcomes of patients, hence,eveloping an effective and safe medication specific for the treatment of COVID-19 is of utmost importance 59-61. We also found a lack of research on the social stigma caused by COVID-19. Due to the rapid contagion of the virus, fear and anxiety about being infected can give rise to stigma and discrimination toward people, places, or things. For instance, people associated with the disease, such as being in the neighborhood of high risk or being civils of nations with a high rate of COVID-19, are often stigmatized.62,63 Stigma can also arise when a person is released from quarantine, even though they have been confirmed to be negative and are no longera risk. Although there have been several published guidelines for reducing social stigma related to COVID-19, further investigation into the detrimental effects of social stigma and development of interventions for this problem should be carefully considered.62,63
To our knowledge, this is the first analysis using text mining and text modeling to investigate the topics of the worldwide COVID-19 publications. However, some limitations should be notedd. The restriction of the search strategy to the English language might not reflect globalized practices and the research priority of a country, such as articles in the local language. Analyses of keywords, titles, and abstracts may not fully reflect the content of articles. However, with the combination of three large datasets and various techniques of text mining, this study is useful for an overview of the research direction.
Conclusions
This study showed that COVID-19 related publications were primarily contributed by major research hubs such as the United States, China, and European countries. Global researchers were currently focusing on clinical management, viral pathogenesis, and public health responses in combating against COVID-19. We alsofound country variation in research priorities. Findings suggest the need for global research collaboration among high- and low/middle-income countries in the different stages of prevention and control the pandemic.
Data Availability
Data is available from corresponding author upon reasonable request
Authors’ contributions
Bach Xuan Tran: conceptualization, revision of manuscript; Giang Hai Ha: conceptualization, literature search, and writing; Hoang Long Nguyen: data analysis, literature search and writing; Giang Thu Vu: literature search, data analysis and writing; Hai Thanh Phan: literature search and writing; Huong Thi Le: revision of manuscript; Carl A. Latkin: revision of manuscript; Roger C.M. Ho: revision of manuscript; Cyrus S.H. Ho: revision of manuscript.
Declaration of interests
The authors declare no conflict of interest.