How Have Global Scientists Responded to Tackling COVID-19? ========================================================== * Longbing Cao * Wenfeng Hou ## Abstract Since the outbreak of COVID-19, the global scientific communities across almost all countries have made urgent, intensive, and continuous effort on understanding, fighting and modeling the COVID-19 pandemic. COVID-19 research turns out to be the first overwhelming global scientific reaction to significant global crises and threats. This literature analysis report collects and summarizes the profiles, trends, quality and impact of this global scientific response. It collects and analyzes 346,267 scientific references in English, involving researchers from 189 countries and regions in 27 subject areas. The report generates a picture of how global scientists have responded to COVID-19 between Jan 2020 and Mar 2022 in terms of their publication quantity, impact, focused major problems, and research areas and methods over country/region, discipline and time and collaboratively. The report also captures broad-reaching distributions and trends of modeling COVID-19 by AI, data science, analytics, shallow and deep machine learning, epidemic modeling, applied mathematics, and social science methods, etc. We further show the correlations between publication quality and quantity and economic status and COVID-19 infections globally and in major countries and regions. The literature analysis results of this global scientific response to COVID-19 present a comprehensive global, regional and subject-specific picture of the significant cross-disciplinary, cross-country, cross-problem, and cross-technique profiles and differences of the COVID-19 publication quantity and quality. The report also discloses significant imbalances in the COVID-19 research across countries/regions, subject areas, problems, topics, methods, research collaborations, and economic statuses. We share the source and analytical data of this global literature analysis for further research on this unprecedented and future crises. **The COVID-19 global scientists response dataset** The COVID-19 global scientists response dataset supporting this report can be found in Kaggle at: [https://www.kaggle.com/datasets/datascienceslab/covid19-global-scientists-response](https://www.kaggle.com/datasets/datascienceslab/covid19-global-scientists-response), Dataset doi: 10.34740/kaggle/dsv/4077649. **More information about COVID-19 modeling** Interested readers may refer to the following review on modeling COVID-19 which complements this global literature analysis: Longbing Cao and Qing Liu, COVID-19 Modeling: A Review, 1-103, 2022 (version 2). Accessible at medRxiv: [https://www.medrxiv.org/content/10.1101/2022.08.22.22279022v1](https://www.medrxiv.org/content/10.1101/2022.08.22.22279022v1). More information about COVID-19 modeling, including the analytical results of this report and data and review on COVID-19 can be found at: [https://datasciences.org/covid19-modeling](https://datasciences.org/covid19-modeling). **Citation of this report** Longbing Cao and Wenfeng Hou. How have global scientists responded to tackling COVID-19? Full technical report, [https://doi.org/10.1101/2022.08.16.22278871](https://doi.org/10.1101/2022.08.16.22278871), Data Science Lab, University of Technology Sydney, 2022. ## 1 Introduction The COVID-19[**Elsken19**] pandemic12 has caused unprecedented impact on almost every aspect of our life, work, study, travel, and entertainment and has reshaped our economy, society, and globalization. Perhaps, for the first time in the human scientific history, almost all countries, institutions, disciplines and scientists globally have been involved more or less in reacting to this emergent, unprecedented, significant threat and crisis to humans in a very unprecedented, immediate, intensive and continuous manner. An important research agenda is to review this mostly intensive global scientific effort ([**Cao-Liu’22**]) and to create a COVID-19 publishing profile of this globalized COVID-19 research. In particular, as quantifying COVID-19 is essential for understanding and managing the pandemic, it is important to understand what modeling techniques including in mathematics, epidemiology, AI, data science, machine learning, deep learning, and social science etc. have been applied and where they have played their part in modeling COVID-193 ([**Cao-Liu’22, Cao’22**]). ### 1.1 Review objectives and questions Accordingly, we set up four major review objectives in this project to understand how global scientists have responded to fighting the COVID-19 pandemic. By collecting and analyzing all literature on COVID-19, our first objective is to draw a global COVID-19 publishing picture of their quantity, quality (in terms of publication impact metrics), mainly concerned problems, and mostly applied techniques, etc. The second objective is to understand the differences between countries, disciplines, problems, and techniques in both publishing quantity and quality. The third objective is to explore how major countries including the US, China, G20 and OECD countries and regions perform and differ from each other in terms of their research interests and publication impact. Lastly, we aim to explore the correlation between COVID-19 publication and GDP per capita of a country and between publication and infected and death case numbers in major countries. We then design various research questions for this exploration. First, to generate an overview of the global COVID-19 research quantity and quality, we aim to identify answers from the literature analysis to the following questions. * What is the global statistics of the COVID-19 publication quantity and quality (impact)? * What is the global statistics of the COVID-19 publication quality across major disciplines? * How has each country published on COVID-19 quantitatively and what is their impact in terms of major publication impact metrics? * What are the mostly concerned keywords globally? * What is the global distribution of the cumulative publication impact in terms of combining major publication impact metrics? * What is the global publication impact distribution in terms of various impact metrics? * Who are the top-10 published countries and their research impact? * What is the disciplinary research impact of the top-10 published countries? * How have global scientists collaborate with each other in fighting the pandemic? These are expected to capture the overall profile and trends of this global scientific exploration and provide statistics about how the COVID-19 publication quantity and quality differ between countries and their high-performing countries in studying the COVID-19. Second, to quantify how G20 and OECD countries and regions have performed in tackling COVID-19, we explore the indication to the following questions: * What are the publication number and impact of G20 countries and regions? * What is the disciplinary publication impact of G20 countries and regions? * What is the publication-averaged mean publication impact of G20 countries and regions? * What are the publication number and mean publication impact of OECD countries and regions? Third, we aim to understand how various modeling techniques ([**Cao-Liu’22**]), including applied mathematics, AI and data science (including shallow and deep learning and analytics) ([**Cao’22**]), epidemic modeling, and other methods such as simulation, have contributed to quantifying COVID-19 in terms of their trending effects. We thus explore the following questions: * What are the main keywords concerned in modeling COVID-19 globally? * What are the main modeling keywords concerned in different disciplines? * What are the main keywords concerned in modeling COVID-19 in the two major economic and technical powers USA and China respectively? * What are the top-50 concerned problems and their publication impact? * What are the top-50 concerned problems and their modeling techniques? * What are the top-50 modeling techniques and their publication impact? * What are top-10 monthly concerned problems in modeling COVID-19? * What are the top-10 problems concerned in the top-10 published countries? * What are the top-10 modeling techniques applied by the top-10 published countries? * What are the top-10 problems and techniques concerned by the top-10 published countries? Lastly, we aim to understand the relations between the COVID-19 research outcomes and a country’s coronavirus infections, deaths, and their economy by exploring the following questions: * How does global first-authored countries’ GDP per capita correlate with their publications and cumulative publication impact? * How does a G20 country’s GDP correlate with their publication quantity and impact? * How does an OECD country’s GDP correlate with their publication quantity and impact? * How does global monthly publications correlate with the global monthly infections and deaths in major disciplines? * How does a G20 country’s coronavirus infections and deaths correlate with their publications? * How does an OECD country’s coronavirus infections and deaths correlate with their publications? ### 1.2 Review methods To obtain answers and indications to the above research questions, we developed crawling programs to acquire, preprocess, and analyze the literature on COVID-19 from major digital libraries and archival systems. Below, we explain how we extracted and processed the COVID-19 publications. First, the references, with mostly published ones and partially preprints, were collected from the COVID-19 Open Research Dataset (CORD-19) in SemanticScholar from Jan 2020 to Mar 2022, as discussed in Section 2.1. We further collected the supplementary information about the collected publications, including their publishing dates, and author country information. This information is collected and verified through mapping the CORD-19 publications with those listed in Web of Science, Researchgate, WHO, Crossref, medRxiv, arXiv, and the US National Library of Medicine. More details are in Section 2.2. In addition, we collected the publication impact metrics of journals and conferences which have published the collected publications from Google Scholar, Web of Science’s Journal Citation Reports, and Scopus. More details are available in Section 2.3. These materials are matched and processed by programs and verified manually to form the global COVID-19 publication source data. Then, we preprocessed the publication data including filtering all publications before 2020 and not in English, and extracted country information and publication date available from their publications. More details are in Sections 3.2 and 3.3. In all publications, we extracted keywords related to COVID-19 problems and techniques of handling such problems from their titles and abstracts by Apache OpenNLP4. To extract the COVID-19 modeling keywords of each publication, after generating the embeddings of the extracted keywords and a list of predefined domain-specified keywords on modeling by SciBert, those extracted keywords with highly similar embeddings to that of the domain-specified modeling keywords were selected as the final modeling keywords of each publication. More details are in Section 3.5. To extract the COVID-19 modelled problem keywords of each publication, we applied PositionRank based on Pagerank and domain-specified synonym and expression mapping to extract the modelled problems in each publication. More details are in Section 3.4. Consequently, we obtained the keywords reflecting the COVID-19 modeling techniques and modelled problems of all publications to highlight the publication profile of COVID-19 modeling. In addition, we categorize each publication to its mostly relevant disciplines by focusing on medical science, computer science (broadly refer to IT, computing, informatics, applied statistics), and social science. The disciplinary categorization is generated by matching each publication’s publishing venue to the disciplinary information available in the Web of Science, SJR5, and [Research.com](https://Research.com)6. The disciplinary information about social science is mapped to that in Social Science Research Network (SSRN)7. We thus obtained the disciplinary categorization of each modeling publication. More details are in Section 3.6. Lastly, we further extracted all publications with modeling as their focus. This was done by mapping the extracted publication keywords to a list of domain-specified modeling-sensitive words and expressions. More details are in Section 3.7. As a result, we identified 346,267 publications, including 26,420 preprints. 186,650 are from medical science publications, 13,959 from computer science, and 32,128 from social science. 51,071 publications including 7,150 focus on and are related to general COVID-19 modeling research, where 21,785 are from medical science, 6,142 from computer science, and 3,890 from social science. ### 1.3 Publication analysis methods Our focus in this report to undertake the publication analysis driven by the aforementioned review questions in terms of four aspects: (1) global publication profile in Chapter 5, (2) G20 and OECD publication profile in Chapter 6, (3) publication profile on modeling COVID-19 in Chapter 7, and (4) the correlations between publications, economy and infections in Chapter 8. First, we applied and created various evaluation measures to quantify the target variables and their analytical results. We involve major publication impact metrics to quantify the impact of publications. They are H5-index, Impact Factor, CiteScore, SNIP, and SJR, introduced in Section 4.1. We further generated several measures, including the composite indicator in Section 4.2 that integrates H5-index, Impact Factor, CiteScore, SNIP, and SJR, the correlation coefficient between publications and GDP per capita of a country in Section 4.4. Second, the above four aspects of the COVID-19 publishing profiles are generated by descriptive, contrastive, multi-dimensional, ranking, linkage, sequential, and correlative analytics of the COVID-19 publication quantity, quality and trending effects across countries and regions, disciplines, concerned problems, and applied techniques, in particular on COVID-19 modeling. * *Descriptive analytics* collects the statistics of publications in terms of factors such as publication number and mean, the statistics of publication impact metrics such as mean, maximum, and minimum values, for example, the statistics and distributions of global publications in Chapter 5. * *Contrastive analysis* provides comparison between target publication variables, such as first-authored country, publication quantity, publication impact metrics, and publication disciplines, e.g., G20 country’s publication impact in computer science in Section 6.2. * *Multi-dimensional analysis* presents the results in terms of multiple variables, e.g., producing a joint view of relations between top-50 problems and their publication impact, and between top-10 modeling methods, top-10 problems in top-10 published countries as in Section 7.14, and G20 country’s mean publication impact in Section 6.5. * *Ranking analysis* is used to generate top-K list in terms of target variables, for example, top-10 published countries in Section 5.10, and top-50 problems in Section 7.8. * *Linkage analysis* is applied to identify the connectivity between target variables, e.g., the research collaborations between countries in Section 5.12. * *Sequential analysis* captures temporal patterns and dynamics of a target variable, for example, top-10 monthly concerned problems in modeling COVID-19 in Section 7.11. * *Correlation analysis* explores the relationships and coefficients between two to multiple factors, for example, the correlations between publications and GDP per capita in G20 countries in Section 8.2. Lastly, various visualization techniques are applied to present the analytics and their results. They include choropleth world map8 (e.g., Figure 2), multi-dimensional visualization9 (e.g., Figure 15), radar chart10 (e.g., Figure 19), and contour map11 (e.g., Figure 28). ![Figure 1:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/09/06/2022.08.16.22278871/F1.medium.gif) [Figure 1:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/F1) Figure 1: Disciplinary distribution of COVID-19 publications ![Figure 2:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/09/06/2022.08.16.22278871/F2.medium.gif) [Figure 2:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/F2) Figure 2: Global publication distribution In addition to the visualization tools applied above, we present the results in Python or R programming. These include: word cloud (e.g., Figure 3), linkage analysis (e.g., Figure 12), and heat map (e.g., Figure 27) ![Figure 3:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/09/06/2022.08.16.22278871/F3.medium.gif) [Figure 3:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/F3) Figure 3: Global top-200 word cloud of all publications ### 1.4 Findings In particular, this project identifies the following publishing profiles and findings from analyzing the global scientific response to the COVID-19 pandemic: * Scientists from 189 countries and regions and 175 first-authored countries and regions have contributed to the research on COVID-19, although their research capacity and quality are highly divided. * 27 subject areas have been involved in studying COVID-19 globally, where medicine, biochemistry/genetics/molecular biology, social science, immunology/microbiology, and environmental science rank the top five subject areas contributing the most publications. * The US, China, Italy, the UK, and India lead the number of publications, and their cumulative publication composite indices of H5-index, impact factor, CiteScore, SNIP and SJR also rank top-5 globally. However, of the top-20 publishing countries, Netherlands, the UK, France, China and Switzerland lead the worlds in terms of their publication’s mean composite indices. * The US, China, the UK, Italy and India are the top-5 mostly collaborative countries, where the US appears to be at the center point of global collaboration on COVID-19 research. The US has the highest collaborations with China, the vice versa. * Of all publications, pandemic, patient, infection, risk, treatment are the top-5 keywords mostly concerned globally. * Mental health, second wave, lock down, vaccine, and anxiety are the five mostly concerned problems concerned in global research. * Regression, machine learning and deep learning, simulation, multivariate statistics, artificial intelligence, and statistical model are the mostly applied techniques in quantifying and modeling COVID-19. * In terms of quantifying COVID-19, in medical science, regression, multivariate statistics, simulation, statistical modeling, and machine learning are mostly applied. In computer science, machine learning, simulation, deep learning, artificial intelligence, and regression are mostly applied. In contrast, social science favors regression, structural equation models, machine learning, simulation, and artificial intelligence. * In G20 countries and regions, the US, EU, China, the UK, and India rank top-5 in terms of publication number in computer science. In social science, the US, EU, the UK, China and Australia take the lead. In contrast, the US, EU, China, the UK, and India take the lead in medical science. * The top-5 countries with the highest coefficient between the total publication number and GDP per capita and between the collective composite indicator and GDP per capita are Germany, France, Canada, Spain, and the UK. In contrast, India and China rank the lowest of the top-10 countries, indicating most productive in the COVID-19 research in the context of their GDP per capita. * In G20 countries and regions, Argentina, South Korea, Japan, Australia and Germany rank the top-5 in terms of their correlation between publication number and GDP per capita; while Argentina, Russia, Japan, South Korea and Australia rank the top-5 in terms of their publication number and collective composite indicator. In contrast, India, China and the US mark the lowest three, showing their higher productivity in terms of their GDP per capita. * In OECD countries and regions, Iceland, Luxembourg, Estonia rank top 3 in terms of their publication number and GDP per capita; in contrast, US, Turkey and Italy rank have the lowest correlation between their publication number and GDP per capita, i.e., showing relatively higher productivity. The same trends hold for the correlation between publication number and collective composite indicator. * In the top-20 countries, the top-5 countries with the highest log-normalized correlations between their publication number and new infection case number are Netherlands, Poland, Brazil, France, and Switzerland. * With regard to the log-normalized correlation between publication number and death number, the top-5 countries are regions are Brazil, Poland, Iran, India and France. There are significant imbalances in the global COVID-19 research. Mostly developed countries and regions may not necessarily contribute to the most impactful research outcomes, instead countries with low economic status may be more productive. Many underdeveloped countries and regions, including in the African region, suffer from less concentration and productivity in studying COVID-19. In addition, the subject area imbalance and topical imbalance appear across countries and disciplines. With regard to modeling COVID-19, classic methods still play a dominant role in particular in medical and social science. More advanced AI and data science have not been widely applied across disciplines and countries. ## 2 Publication collection Here, we briefly introduce the collection of COVID-19 research publications, their supplementary information, and their publication impact metrics. ### 2.1 COVID-19 open research data acquisition The COVID-19 Open Research Dataset (CORD-19) ([**Wang’20**]) was downloaded from SemanticScholar 12. The latest CORD-19 data was collected up to 9th March 2022. CORD-19 is a growing resource of scientific publications on COVID-19 and its related historical research on coronavirus13. ### 2.2 Supplemental information collection As the author’s country, affiliation, and the publication date of each publication in SemanticScholar are missing in the CORD-19 dataset, we crawled this additional information from the websites below, cross-checked and refined them for each publication: * Clarivate’s Web of Science14 * ResearchGate15 * The US National Library of Medicine 16, called PubMed or PMC source below. * The World Health Organization’s COVID-19 information source17 * Crossref18 * medRxiv19 * arXiv20 The Python source codes for crawling the above data are in the folders crawler/WOS\_crawler.py, WHO\_crawler.py, RG\_crawler.py, PMD\_keyword\_crawler.py, medrxiv\_crawler.py, and cross-ref\_crawler\_by_id.py, respectively. ### 2.3 Collection of publication impact metrics For each publication, we collected the research impact metrics of their publication venue (journal or conference), including H5-index, Impact Factor, CiteScore, SNIP, and SJR for those available from the following repositories: * H5-index: each journal or conference publication’s H5-index was crawled from the Google Scholar website21, the crawl date is 8th March 2022. * Impact Factor: was obtained from the Web of Science’s Journal Citation Reports22, the data was lastly updated on 1st July 2021 (i.e., the 2020 journal impact factor). * CiteScore, SNIP and SJR: were obtained from the Scopus website23, updated to October 2021. The collected publication impact data of each publication in this COVID-19 global scientist response dataset can be found in Kaggle file at [https://www.kaggle.com/datasets/datascienceslab/covid19-global-scientists-response](https://www.kaggle.com/datasets/datascienceslab/covid19-global-scientists-response). ## 3 Publication processing Here, we briefly introduce the processing of the collected publications. The processing consists of preprocessing the collected publications, extracting author’s country information, extracting publication date, extracting COVID-19 problem keywords, extracting COVID-19 modeling technique keywords, categorizing the disciplines of each publication, and extracting those publications focusing on modeling COVID-19. The keywords describing each publication were extracted from the title and abstract of each publication. We categorize keywords into two groups: one on COVID-19 problems or topics of interest from a domain perspective (e.g., epidemiological, medical and social perspective, called *problem keywords* in this report), and the other on the techniques for modeling COVID-19 problems (we call them *modeling keywords* in this report). The problem keywords and modeling keywords were extracted by different approaches, explained in Sections 3.4 and 3.5, respectively. ### 3.1 Preprocessing publication data In the CORD-19 dataset, some publications may be somehow related to COVID-19 but were published before 2020, we thus removed them. In addition, we focus on the publications in English and thus removed those non-English publications collected from the CORD-19 dataset. Finally, as the CORD-19 publications were collected from multiple sources, there are duplicated ones, which were removed as well. The source code for the data preprocessing of the global publications can be found in data\_processing/pre\_process.py. ### 3.2 Extracting country information Here, we collect the author’s country information related to each publication if it is available. The author’s country information of each publication (if available) was extracted from the data crawled from the Web of Science, ResearchGate, the US National Library of Medicine, WHO, Crossref, and medRxiv. However, the country information extracted from these websites is inconsistent and has quality issues. For example, some publications only have affiliations or city information, which have to be transformed to obtain their country information. Hence, an institution-country mapping file is created to map an institution to its country. The source code for processing author’s country information can be found in the file: *data processing/country_process*.*py*. The extracted country information of each publication in this COVID-19 global scientist response dataset can be found in Kaggle file at [https://www.kaggle.com/datasets/datascienceslab/covid19-global-scientists-response](https://www.kaggle.com/datasets/datascienceslab/covid19-global-scientists-response). ### 3.3 Extracting publication date The publishing date of a publication was extracted from the data crawled from the Web of Science, ResearchGate, the US National Library of Medicine, WHO, Crossref, and medRxiv. All dates were transformed to the format of ‘yyyy-mm-dd’. The source code for extracting and processing publication date can be found in the file: *data processing/publish date process*.*py*. The extracted publication date of each publica-tion in this COVID-19 global scientist response dataset can be found in Kaggle file at [https://www.kaggle.com/datasets/datascienceslab/covid19-global-scientists-response](https://www.kaggle.com/datasets/datascienceslab/covid19-global-scientists-response). ### 3.4 Extracting problem keywords Here, we collect the keywords describing the main COVID-19 problems and topics addressed in all publications. We call them COVID-19 “problem keywords.” We first collected the candidate COVID-19 problems-related keywords from three sources: * **Keywords provided by publications** In the Web of Science (WOS) and PubMed (PMD) websites, some publications have keywords provided by authors or other tools (e.g., publishers). We crawled these keywords as well as each publication’s supplementary information (including author’s affiliation, country, and publication date). The keywords are regarded as the candidate problem keywords. The related source code for collecting these keywords can be found in crawler/PMD\_keyword\_crawler.py, WOS_crawler.py. * **Keywords extracted by PositionRank** COVID-19 problems-related words were further extracted from the titles and abstracts of all publications. We used PositionRank 24 ([**Corina’17**]) to conduct the text segmentation of titles and abstracts. The position rank captures both the positions of words in the title and abstract of each publication and the word occurrence frequencies in the title and abstract. Then, the PositionRank is calculated by the position-biased PageRank, which captures the co-occurrence word’s position information. As the topic of a publication usually occurs in the very beginning of its title/abstract, PositionRank is thus suitable to extract problem-related keywords. We selected top-5 keywords as the candidate problem keywords for each publication. The source code can be in /keyword\_extraction/keyword\_position\_rank/main\_process.py. * **Domain-driven keywords** In some publications, no specific keywords were provided, where their synonyms and other expressions were used instead. For example, for a publication with the title “A novel model to predict severe COVID-19 and mortality using an artificial intelligence algorithm,” we know it is about “mortality prediction’ although the phrase “mortality prediction” does not appear in the title. To find out such specific topics of those publications, we used a domain-driven method to work out their relevant topics. First, we created a mapping list of expressions/patterns to a specific topic, for example, two lists for the problem-related keyword “mortality prediction” as follows: *{forecasting, forecast, forecasts, predict, prediction, estimate, estimation, time series, modeling, modelling*, and *model}* and *{death rate, mortality rate, fatality rate, death case, death, fatality*, and *mortality}*, If a publication’s title and abstract contain one of the keywords in both lists, then the phrase “mortality prediction” would be added as the domain-driven problem keyword to this publication. The source code can be found in data\_processing/domain\_process.py. After we obtained the keywords generated by the above three approaches, all keywords for each publication were combined to form this publication’s candidate problem keywords. The file containing the problem keywords of each publication in this COVID-19 global scientist response dataset can be found in Kaggle file at [https://www.kaggle.com/datasets/datascienceslab/covid19-global-scientists-response](https://www.kaggle.com/datasets/datascienceslab/covid19-global-scientists-response). ### 3.5 Extracting modeling keywords Here, we explain how the modeling-specific keywords of all publications were generated in this project. First, we extracted the keywords and phrases from the title and abstract of each extracted publication. However, those extracted terms are not all related to modeling. To select modeling-specific terms, we calculate the cosine similarity between each term and the embedding (mean value) of all predefined modeling keywords in Appendix 10.2. We then sorted the similarity scores of all terms in the descending order. We further manually selected those terms with high scores. Below, we introduce them in detail. * We used Apache OpenNLP to extract keyword and key phrases from the titles and abstracts of all publications. To narrow down the search scope, we only chose those terms with their length no greater than three. Those extracted terms are regarded as the basic keywords of each publication. * To obtain the embedding of a keyword, we first collected a list of predefined modeling-related keywords per domain knowledge and prior information, such as machine learning, deep learning, statistical model, compartmental model, epidemic modeling, and other modeling techniques and methods. These predefined modeling keywords can be found in Appendix 10.2: List of predefined modeling keywords. Then we applied SciBert ([**Beltagy’19**]) to generate the embeddings of these predefined keywords. SciBert25 is a pretrained language model based on BERT, which was trained on a large corpus of scientific publications. We then obtain the embedding of each keyword by SciBert. After that, an average embedding value was calculated upon the embeddings of all keywords, which is called *predefined modeling keyword embedding*. * For each term extracted from a publication, we further generated its embedding by SciBert, forming the *extracted keyword embedding*. * Then, for each extracted keyword embedding from its publication, we applied the cosine similarity function to calculate the similarity between this embedding and the predefined modeling keyword embedding. We further sorted the extracted keywords based on this similarity score. Those extracted keywords with high similarity score are chosen as the modeling keywords of each publication. The source code can be found in “data\_processig\keyword\_process.py.” The file collecting the modeling technical keywords of each publication in this COVID-19 global scientist response dataset can be found in Kaggle file at [https://www.kaggle.com/datasets/datascienceslab/covid19-global-scientists-response](https://www.kaggle.com/datasets/datascienceslab/covid19-global-scientists-response). ### 3.6 Extracting disciplinary categorization Here, in all publications, we extract those publications associated with three major disciplines: medical science, computer science, and social science. Here, *medical science* refers to anything related to medicine, medical science, health science, and epidemiology, etc. *Computer science* groups any computing-related areas, including applied mathematics, IT, computing, and informatics. *Social science* publications were collected from SSRN. The definition of each discipline is based on the categorization collected from three websites: Web of Science, SJR26, and [Research.com](http://Research.com)27. This disciplinary categorization can be found in Appendix 10.1: List of disciplinary categorization. In each category, there are a list of journals (and conferences). Based on this categorization, each publication is assigned to its related disciplines. As a journal may involve two disciplines, some publications may thus appear in two disciplines. As the disciplinary sources are from different websites, below, we introduce how we categorized a publication per the above different sources. * **Web of Science**: we crawled a publication’s disciplinary category from the WOS website, based on Appendix 10.1: List of disciplinary categorization. We added the publication to its related disciplinary categories. For example, if a publication’s category falls in Cell Biology based on the categorization, we then add this publication to the discipline of medical science. * **SJR**: In the SJR website, a journal (or conference) is labeled by one to multiple categories. Each category has a collection of journals and conferences. We crawled the sources (i.e., the names of journals and conferences) of publications from Research-Gate, the US National Library of Medicine, WHO, Crossref, and medRxiv. According to these sources of each publication, we added the publication to its categorized disciplines. * **[Research.com](http://Research.com)**: previously also called Guide2Research, was a research portal initially for computer science and is now for broad disciplines. It lists a large number of computer science sources (journals and conferences). Based on a publication’s sources obtained from ResearchGate, the US National Library of Medicine, WHO, Crossref and medRxiv, we added the publication to its mostly relevant disciplines. Based on the disciplinary files generated per the above three sources, we combined the data from these three sources to generate the unified disciplinary files of publications. The source code can be found in “data\_processing\domain\_process.py.” The generated disciplinary information of each publication in this COVID-19 global scientist response dataset can be found in Kaggle file at [https://www.kaggle.com/datasets/datascienceslab/covid19-global-scientists-response](https://www.kaggle.com/datasets/datascienceslab/covid19-global-scientists-response). ### 3.7 Extracting modeling publications Here, we extracted those modeling-specific publications from the entire publication set. The modeling publications are those containing at least one of the modeling keywords and expressions. The file “Modeling related words and expressions.txt” collected the modeling words and expressions. ### 3.8 Extracting global GDP and population data Here, we collected the Gross Domestic Product (GDP) and population data of each country globally from the World Population Review28. Accordingly, we calculate the GDP per Capita for each country, as shown in Section 4.3. ### 3.9 Extracting COVID-19 infection cases Here, we extract the number of infected cases and the number of deaths in each country. The case numbers of all countries were collected from Our World in Data29. This case data will be used to analyze the correlation between COVID-19 publications and the case numbers for each country or region. ## 4 Measurement Here, we introduce the measures and metrics applied for analyzing or evaluating the COVID-19 publications. They include publication impact metrics, composite indicator, and correlation coefficient. ### 4.1 Publication impact metrics To quantify the quality and impact of each publication and the publications associated with a target variable, e.g., a country, we consider the following publication impact metrics: H5-index, Impact Factor, CiteScore, SNIP, and SJR. We consider their original values, mean values, and normalized values, respectively. #### Original impact metrics Here, we introduce the five original impact metrics. * H5-index: created by Google, “is the largest H-index for articles published in a publication journal or conference in the last 5 complete years, which had at least h citations for each article”, available at Google Scholar30. * Impact Factor: i.e., journal impact factor (JIF or IF), created by the Web of Science announced in its annual Journal Citation Reports (JCR), “is calculated by dividing the number of current year citations to the source items published in that journal during the previous two years”31, available at annual JCR reports32. * CiteScore: created by Elsevier, “calculates the citations from all documents in year one to all documents indexed on Scopus published in the prior three years for a publication”33, available from the Scopus Journal Metrics34. * Source Normalized Impact per Paper: i.e, SNIP, “measures the contextual citation impact by weighting citations based on the total number of citations in a subject field”35, available from the Scopus Journal Metrics36. * SCImago Journal Rank: i.e., SJR, “is based on the concept of a transfer of prestige between journals via their citation links. Drawing on a similar approach to the Google PageRank algorithm - which assumes that important websites are linked to from other important websites - SJR weights each incoming citation to a journal by the SJR of the citing journal, with a citation from a high-SJR source counting for more than a citation from a low-SJR source”37, available from the Scopus Journal Metrics38. #### Mean publication impact metrics We further calculate the *mean publication impact metrics* of each publication in terms of the five original impact metrics: H5-index, Impact Factor, CiteScore, SNIP, and SJR, respectively. The *mean publication impact metrics* measures the publication-averaged impact metric of all publications in terms of a target variable, e.g., a discipline, or a country. For example, Section 5.1.5 reports the mean publication impact in three major disciplines, respectively. Sections 6.5 and 6.6 report the mean publication impact of G20 and OECD countries and regions, respectively. #### Normalized publication impact metrics To better compare the relative publication impact of all publications in terms of an impact metric, we also generate its relative publication impact metric by normalizing the values of publications. As the impact values from different countries are usually nonlinear and highly skewed, we take the following log normalization function scaled to the maximum log normalization for better comparison and visualization effects, which generates *log-normalized impact metrics*. For example, to log-scale H5-index (H5) to obtain its log-normalized relative H5-index ![Graphic][1] of each publication *i*, ![Formula][2] where *max* is to extract the maximum log value of H5-index over all publications. Further, to obtain the log-scaled H5-index for a country, we sum its values of all publications from that country to obtain its cumulative log-scaled H5-index. This result is then applied to compare the publication impact between countries, as shown in Section 5.10. Similarly, we log normalize the Impact Factor, CiteScore, SNIP, SJR, and CI (to-be-introduced in Section 4.2) values of all publications to obtain their relative log-normalized impact metrics. In addition, to analyze the overall impact of all publications (e.g., of a country or discipline), we further introduce the *log-normalized cumulative impact metrics*. For example, to calculate the log-normalized cumulative H5-index (*sum*(*H*5*i*)) of a country *i* as follows: ![Formula][3] These log-normalized cumulative impact metrics are applied to analyze the publication impact in Sections 6.1, 6.2, 6.3, and 6.4. We also introduce the *log-normalized publication number* as follows for a target variable, e.g., a country or discipline. ![Formula][4] Here, *N**i* refers to the number of publications of the target variable, e.g., country *i*. This log-normalized cumulative publication number is applied in Sections 6.1, 6.2, 6.3, and 6.4 to analyze the relative impact of G20 and OECD countries on the COVID-19 research. ### 4.2 Composite indicator (CI) Here, we propose three collective impact metrics to measure the overall impact of each publication, the mean collective impact of a publication, and the cumulative impact of all publications selected per a criterion (e.g., a country or a discipline), respectively. They are called *publication composite indicator, cumulative composite indicator*, and *mean composite indicator*. #### Publication composite indicator In our work, we create the *Publication Composite Indicator* (CI, *C**i*, which is also generally called composite indicator for simplicity), which is defined below, as one of the research impact metrics to measure the collective impact of a publication in terms of five vendor-created individual major impact metrics: H5-index (H5), Impact Factor (IF), CiteScore (CS), SNIP (SN), and SJR (SJR). ![Formula][5] where *i* refers to each publication, *max* is the operation of extracting the maximum value. The H5-index data was obtained from Google Scholar, impact factor values were from the Web of Science, the CiteScore, SNIP and SJR were obtained from the Scopus sources. #### Cumulative composite indicator Given a grouping criterion such as a country or a discipline, all of their publications can be selected. After calculating the composite indicator of each of their publications, we can then sum the publication composite indicator to obtain the cumulative composite indicator *𝒞**i*. ![Formula][6] Here, *i* refers to a given target, e.g., a country *i, N**i* refers to the number of total publications with the information about composite indicator available from the target variable, *C**j* is the composite indicator of each publication *j* of *N**i*. The cumulative composite indicator is applied in analyzing the global publication impact such as in Section 5.4 (see Figure 4), and analyzing the correlation between publication composite impact and GDP for a country as in Section 8.1 (see Figures 32 and 33). ![Figure 4:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/09/06/2022.08.16.22278871/F4.medium.gif) [Figure 4:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/F4) Figure 4: Global publication impact distribution per cumulative Composite Indicator (CI) #### Mean composite indicator With the cumulative composite indicator *𝒞**i* and the number of total publications *N**i* from a target variable e.g., a country *i* with the information for calculating the composite indicator, we can then calculate the *mean composite indicator* ![Graphic][7] as follows: ![Formula][8] Examples of using the mean composite indicator are the analysis of the top-10 published country’s research impact in Section 5.10 (as in Figure 10) and the top-10 published country’s disciplinary research impact in Section 5.11 (as in Figure 11), and the analysis of the G20 country’s mean publication impact in Section 6.5 (see Figure 19). ### 4.3 GDP per capita Here, we calculate the Gross Domestic Product (GDP) per Capita (GDPPC, ![Graphic][9]) as follows, which measures a country’s economic output per person. ![Formula][10] Here, *G**i* is the gross domestic product of country or region *i*, and *P**i* is the number of total population of that country or region. *G**i* and *P**i* are collected per the tool introduced in Section 3.8. Examples of using GDPPC are the analysis of the correlation between COVID-19 publications and the GDP per capita of of all countries in Section 8.1 (see Figures 32 and 33) and the correlation between publications and GDP per capita in G20 countries and regions in Section 8.2 (see Table 28). ### 4.4 Publication-GDP correlation coefficient We further calculate the correlation coefficient between a country’s publication metrics (e.g., publication number, and CI) and their economic metrics (e.g., GDP per capita) or COVID-19 infection metrics (e.g., infected case number, death number) of the country. Specifically, the *publication correlation coefficient* is defined for a first-authored country in terms of their total publication number and the cumulative composite indicator of the publications with another variable such as GDP per capita in that country. Further, three publication-GDP coefficients are calculated to measure the correlations between the publication number (N) and the GDP per capita (G) and between the composite indicator (CI) and the GDP per capita (G) for each country, respectively. First, the *absolute similarity* between a publication metric and the GDP per capita of a country is measured by *absolute publication-GDP coefficient* (*Ab_corr*) *ρ*. Accordingly, we define the *absolute publication number-GDP coefficient ρ**n* below: ![Formula][11] where *N**i* refers to the publication number of country *i*, and ![Graphic][12] refers to the GDP per capita of the country. Similarly, we define the *absolute publication composite impact-GDP coefficient* (*Ab_corr*) *ρ**c* as follows: ![Formula][13] where *𝒞**i* refers to the cumulative composite indicator value of all publications from country *i*, and ![Graphic][14] refers to the GDP per capita of the country. Both *ρ**n* and *ρ**c* fall in the value range [*−*1, 1] symmetric about 0 with no correlation. The magnitude of values reflect more positive or negative correlation between the comparison objects, i.e., more strongly co-move in the same or different direction. Second, the maximum-normalized coefficient between the publication metrics and GDP per capita is measured by the *relative publication-GDP coefficient* (*Re_corr*) ![Graphic][15], which normalizes the publications and GDP per their respective maximum values. Accordingly, we define the *relative publication number-GDP coefficient* ![Graphic][16] below: ![Formula][17] where *N**i* refers to the publication number of country *i*, ![Graphic][18] refers to the GDP per capital of the country, *Max*(*N**i*) refers to the maximum value of publication number over all countries, *Max* ![Graphic][19] refers to the maximum value of GDP per capita over all countries. Similarly, we define the *relative publication composite impact-GDP coefficient* (*Re_corr*) ![Graphic][20] as follows: ![Formula][21] where *𝒞**i* refers to the cumulative composite indicator of country *i*, ![Graphic][22] refers to the GDP per capita of the country, *Max*(*𝒞**i*) refers to the maximum value of composite indicator over all countries, *Max* ![Graphic][23] refers to the maximum value of GDP per capita over all countries. Both ![Graphic][24] and ![Graphic][25] fall in the value range [*−*1, 1] symmetric about 0 with no correlation. The magnitude of values reflect more positive or negative correlation between the comparison objects, i.e., more strongly co-move in the same or different direction. Third, we further measure the *dispersion-deviation coefficient De_corr* between two comparison objects in terms of their dispersion/deviation from the mean of the cohort. Accordingly, we define the *publication-GDP dispersion coefficient* ![Graphic][26] between publication number *N**i* and GDP per capita *G**i* as follows: ![Formula][27] where ![Graphic][28] refers to the mean of publication number of all *J* countries and regions; ![Graphic][29] refers to the mean of GDP per capita of all countries and regions; and *N**j* and *G**j* refer to the publication number and GDP per capita of country/region *j*, respectively. Similarly, we define the *dispersion coefficient between publication composite impact and GDP* ![Graphic][30] as follows: ![Formula][31] where ![Graphic][32] refers to the mean of publication number of all *J* countries and regions; ![Graphic][33] refers to the mean of composite indicator of all countries and regions; and *N**j* and *C**j* refer to the publication number and composite indicator of country/region *j*, respectively. Both ![Graphic][34] and ![Graphic][35] fall in the value range [*−∞*, +*∞*] symmetric about 0 with no correlation. The magnitude of values reflect more positive or negative correlation between the comparison objects, i.e., more strongly co-move in the same or different direction. ### 4.5 Publication-COVID-19 infection correlation coefficient Here, we introduce measurement to quantify the correlation between the COVID-19 publications and the COVID-19 infections. Since the infections and deaths are very high for many countries, the value ranges of publications and infections and deaths are very divided, and their values across countries and regions are also highly divided and incomparable, it would be difficult to directly analyze their correlation on the original values. It is thus unsuitable to follow the same approach as in Section 4.4 (i.e., Equations (8) to (11)) to measure the publication-infection correlations. Instead, we natural-log-normalize the number of publications and the number of infections and then apply them to Equation (8) to measure the correlation between COVID-19 publications and infections and deaths respectively. We thus obtain the *log-normalized publication-infection coefficient ρ**i*: ![Formula][36] where ln *N**i* is the log-normalized value of publications of a target variable *i* (e.g., for a country, a month, or a discipline), ln *I**i* refers to the log-normalized value of the infections of a target variable *i*. Accordingly, we also obtain the *log-normalized publication-death coefficient ρ**d*: ![Formula][37] where ln *N**i* is the log-normalized value of publications of a target variable *i* (e.g., for a country, a month, or a discipline), ln *D**i* refers to the log-normalized value of the deaths of a target variable *i*. Lastly, we further measure the *dispersion-deviation coefficient De_corr* between publications and infections/deaths for each country/region in terms of their dispersion/deviation from the mean of the cohort. Accordingly, we define the *log-normalized publication-infection dispersion coefficient* ![Graphic][38] between publication number *N**i* and log-normalized infection number ln *N**i* + 1) as follows: ![Formula][39] where ![Graphic][40] refers to the mean of publication number of all *J* countries and regions; ![Graphic][41] refers to the mean of infections of all countries and regions; and *N**j* and *I**j* refer to the publication number and infections of country/region *j*, respectively. Similarly, the *log-normalized publication-death dispersion coefficient* ![Graphic][42] between publication number *N**i* and log-normalized infection number ln(*D**i* + 1) as follows: ![Formula][43] where ![Graphic][44] refers to the mean of publication number of all *J* countries and regions; ![Graphic][45] refers to the mean of deaths of all countries and regions; and *N**j* and *D**j* refer to the publication number and deaths of country/region *j*, respectively. Both ![Graphic][46] and ![Graphic][47] fall in the value range [*−∞*, +*∞*] symmetric about 0 with no correlation. The magnitude of values reflect more positive or negative correlation between the comparison objects, i.e., more strongly co-move in the same or different direction. ## 5 Global publication profile Following the review objectives and questions and the first review objective of generating an overview of the global COVID-19 research profile, as discussed in Section 1.1, here we present the results of identified global COVID-19 publication profile. ### 5.1 Global publication overview Here, we present an overview of the global publications on COVID-19, produced by all first-authored countries, including the results of total publications, disciplinary publications, publication statistics with publication dates and author countries, the statistics of publication impact, and the mean publication impact of major disciplines. #### 5.1.1 Total publications Here, we summarize the count of all publications on COVID-19 from all disciplines and by all published countries. The collected publications comprise official publications and preprints. We further extracted those on COVID-19 modeling, called *modeling publications* in this report. Modeling publications are those with modeling keywords, as shown in Appendix 10.1, appearing in their titles and abstracts. Table 1 shows the overall statistics of COVID-19 publications. In total, up to 9 March 2022, there have been 319,847 published references and 26,420 preprints available online, amounting to 346,267 references collected on COVID-19. These publications involve authors from 189 countries. View this table: [Table 1:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/T1) Table 1: Overall statistics of publications #### 5.1.2 Disciplinary publications Here, we report the statistics of the disciplinary publications. All publications are segmented into disciplinary areas (subjects) by using the subject categorization listed in the Scimago Journal Country Rank39. There are 233,387 publications in 27 subject areas contributed to the COVID-19 research, which include ‘Agricultural and Biological Sciences’, ‘Arts and Humanities’, ‘Biochemistry, Genetics and Molecular Biology’, ‘Business, Management and Accounting’, ‘Chemical Engineering’, ‘Chemistry’, ‘Computer Science’, ‘Decision Sciences’, ‘Dentistry’, ‘Earth and Planetary Sciences’, ‘Economics, Econometrics and Finance’, ‘Energy’, ‘Engineering’, ‘Environmental Science’, ‘Health Professions’, ‘Immunology and Microbiology’, ‘Materials Science’, ‘Mathematics’, ‘Medicine’, ‘Multidisciplinary’, ‘Neuroscience’, ‘Nursing’, ‘Pharmacology, Toxicology and Pharmaceutics’, ‘Physics and Astronomy’, ‘Psychology’, ‘Social Sciences’ and ‘Veterinary.’ There are 86,460 publications without clear alignment with their subject areas. Figure 1 shows the subject distribution of the COVID-19 publications. According to the pie chart, ‘Medicine’ occupies the largest percentage in all the publications, followed by subjects ‘Biochemistry, Genetics and Molecular Biology’, ‘Social Sciences’ and ‘Immunology and Microbiology.’ Table 2 shows the detailed data of the subject distribution of COVID-19 publications. View this table: [Table 2:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/T2) Table 2: Disciplinary distribution of COVID-19 publications Further, we categorize those publications in terms of their relevance to the three major disciplines: computer science, medical science, and social sciences. Table 3 shows the results of all collected publications labelled to these three major disciplines: computer science, medical science, and social sciences. Of all 346,267 publications from all disciplines, there are 13,959 associated with computer science, 186,650 associated with medical science, and 32,128 associated with social science. View this table: [Table 3:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/T3) Table 3: Distribution of COVID-19 publications in major disciplines #### 5.1.3 Publications with publication date and first-authored country Here, we report the publications with publication dates and first-authored country information, which were collected per the methods introduced in Sections 3.3 and 3.2. Of 189 countries and regions associated with the tot publications, there are 175 first-authored countries appearing in the publications, which spread all five continents. Of 319,847 published references, there are 272,060 with publication dates. Table 4 shows the number of publications with publication dates and first-author’s country information in those published literature (i.e., excluding those preprints), respectively. View this table: [Table 4:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/T4) Table 4: Publications with publication date and first-authored country #### 5.1.4 Statistics of publication impact Here, we present the statistics of the publication impact in terms of major publication impact metrics: H5-index, Impact Factor, CiteScore, SNIP, and SJR. The values of these impact metrics for the collected publications were collected per the introduction in Section 2.3. Over all 346,267 published and preprint publications, 243,597 are with H5-index, 199,284 are with Impact Factor, 250,998 are with CiteScore, 246,640 are with SNIP, and 247,403 are with SJR. There are 50,792 references without any of the impact metrics H5-index, Impact Factor, CiteScore, SNIP, and SJR. Table 5 shows the statistics of these impact metrics of all collected publications with the impact metrics issued on their publication venues. Num_pub refer to the number of publications, max, min, mean and median refer to the statistics of original impact metrics values associated with each publication. View this table: [Table 5:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/T5) Table 5: Statistics of publication impact metrics of all publications #### 5.1.5 Mean publication impact in major disciplines Here, we present the mean publication impact of publications from the three main disciplines: computer science, medical science, and social science. The disciplinary publications were collected in Section 5.1.2, their impact metrics were collected by the method introduced in Section 2.3. Here, we only count those references with at least one of the five impact metrics available on their publication venues. Table 6 shows the paper-averaged score of the five publication impact metrics: H5-index, Impact Factor, CiteScore, SNIP, and SJR, in three disciplines: CS - computer science, MS - medical science, and SS - social science. We call the paper-averaged score *mean publication impact*. In the table, *Num_pub* refer to the number of publications, *Mean_H*5 refers to the mean H5-index of each publication, *Mean_IF* refers to the mean Impact Factor of each publication, *Mean_CS* refers to the mean CiteScore of each publication, *Mean_SN* refers to the mean SNIP of each publication, and *Mean_SJR* refers to the mean SJR of each publication. View this table: [Table 6:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/T6) Table 6: Mean publication impact score in three disciplines ### 5.2 Global publication distribution Here, we report the distribution of global publications on COVID-19. The *global publication distribution* refers to the publication number distribution of first-authored countries of all publications with country information. The first-authored countries were crawled from the Web of Science, ResearchGate, WHO, Crossref, and the US National Library of Medicine, as explained in Section 3.2. Figure 2 shows the publication distribution of 175 first-authored countries, where the publication of each first-authored country is plotted to the World Map. We further extracted the top-20 first-authored countries in terms of their publication number in Table 7. The top-20 countries are the US, China, Italy, the UK, India, Spain, Canada, Germany, France, Brazil, Iran, Australia, Turkey, Japan, Saudi Arabia, South Korea, Netherlands, Poland, Switzerland, and Pakistan. View this table: [Table 7:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/T7) Table 7: Global publication distribution The data of the global publication distribution can be found in the file named “Global publication distribution.xlsx.” The US alone contributed to 49,203 references, amounting to 15.38% of the total published COVID-19 scientific references. China made the second largest contribution, amounting to 21,693 i.e., 6.78%, while EU together contributed to 48,140 publications, amounting to 15.05%. G20 countries and regions contributed to 182,097 publications, amounting to 56.93%. OECD countries and regions contributed to 139,588 publications, amounting to 43.64%. ### 5.3 Word cloud of global publications Here, we present the most frequent keywords extracted from the NLP analysis of all global publications and present them in word cloud40. Figure 3 shows the word cloud of the top-200 words appeared in all publications. The source code for generating the word cloud can be found in the data processing/word cloud.py. The full results can be found in “Word cloud of global publications.xlsx.” The top-20 keywords appearing in the global publications are: ‘pandemic’, ‘patient’, ‘infection’, ‘risk’, ‘treatment’, ‘hospital’, ‘measure’, ‘symptom’, ‘management’, ‘development’, ‘outbreak’, ‘mortality’, ‘countries’, ‘survey’, ‘march’, ‘spread’, ‘participant’, ‘effect’, ‘evidence’ and ‘healthcare.’ ### 5.4 Global publication impact distribution per Composite Indicator Here, we present the global publication impact distribution by world map in terms of the cumulative composite indicator value of each first-authored country. The definition of composite indicator is given in Section 4.2. Only those publications with at least one metric value available from the five metrics: H5-index, Impact Factor, CiteScore, SNIP and SJR are counted here. Accordingly, for each country, at least one of their first-authored publications must have the CI value in order to be included in the world map visualization. Figure 4 shows the sum of each publication’s composite indicator (CI) value of each first-authored country for all countries with first-authored publications. 174 countries are with CI values, while one country (Vanuatu) missing the CI value as there is no CI with some of their publications. The definition of Composite Indicator can be found in Section 4.2. The full results of all countries can be found in “Global research impact distribution per Composite Indicator (CI).xlsx.” Further, the top 20 countries in terms of the sum of their publication’s CI values are listed in Table 8, where the US, China, the UK, Italy and India mark the top 5. View this table: [Table 8:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/T8) Table 8: Top-20 countries with the highest cumulative Composite Indicator (CI) ### 5.5 Global publication impact distribution per H5-index Here, we present the distribution of the global publication impact in terms of the cumulative H5-index values of each first-authored country’s publications in world map. Only those publications with H5-index values are counted here. Figure 5 shows the sum of the H5-index values of all publications for each first-authored country over all countries with the H5-index values on their publications. The H5-index values of journal and conference publications were extracted from Google Scholar, as shown in Section 2.3. 173 countries except Guinea-bissau and Vanuatu are included in this result. The full results can be found in “Global research impact distribution per H5-index.xlsx.” ![Figure 5:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/09/06/2022.08.16.22278871/F5.medium.gif) [Figure 5:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/F5) Figure 5: Global publication impact distribution per cumulative H5-index The top-10 countries in terms of their cumulative H5-index are the US, China, the UK, Italy, India, Germany, Spain, Canada, France and Australia. ### 5.6 Global publication impact distribution per Impact Factor Here, the distribution of the global publication impact in terms of the cumulative impact factor values of all first-authored publications from each country is presented in world map. Only those publications with the impact factor values on their publication venues are counted here. Figure 6 shows the sum of the impact factor values of all publications for each first-authored country over all countries with their data available. The impact factor values of journal publications were obtained from Web of Science, as shown in Section 2.3. There are 173 countries except S. Sudan and Vanuatu displayed in the figure. The full results can be found in “Global research impact distribution per Impact Factor.xlsx.” ![Figure 6:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/09/06/2022.08.16.22278871/F6.medium.gif) [Figure 6:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/F6) Figure 6: Global publication impact distribution per cumulative Impact Factor The top-10 countries in terms of their cumulative impact factor are the US, China, the UK, Italy, Germany, Canada, France, India, Spain and Australia. ### 5.7 Global publication impact distribution per CiteScore Here, the distribution of the global publication impact in terms of the cumulative CiteScore values of all first-authored publications from each country is presented in world map. Only those publications with the CiteScore values on their publication venues are counted here. Figure 7 presents the sum of the CiteScore values of all publications for each first-authored country over all countries. The CiteScore values of journal and conference publications were obtained from Scopus, as shown in Section 2.3. 174 countries except Vanuatu are displayed here. The full results can be found in “Global research impact distribution per CiteScore.xlsx.” ![Figure 7:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/09/06/2022.08.16.22278871/F7.medium.gif) [Figure 7:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/F7) Figure 7: Global publication impact distribution per cumulative CiteScore The top-10 countries in terms of their cumulative CiteScore are the US, China, the UK, Italy, India, Germany, France, Canada, Spain and Australia. ### 5.8 Global publication impact distribution per SNIP Here, the distribution of the global publication impact in terms of the cumulative SNIP values of all first-authored publications from each country is presented in world map. Only those publications with the SNIP values on their publication venues are counted here. Figure 8 presents the sum of the SNIP values of all publications for each first-authored country over all countries with data available. The SNIP values of journal and conference publications were obtained from Scopus, as shown in Section 2.3. 174 countries except Vanuatu are displayed in the figure. The full results can be found in “Global publication impact distribution per SNIP.xlsx.” ![Figure 8:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/09/06/2022.08.16.22278871/F8.medium.gif) [Figure 8:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/F8) Figure 8: Global publication impact distribution per cumulative SNIP The top-10 countries in terms of their cumulative SNIP are the US, China, the UK, Italy, India, Canada, Germany, Spain, France and Australia. ### 5.9 Global publication impact distribution per SJR Here, the distribution of the global publication impact in terms of the cumulative SJR values of all first-authored publications from each country is presented in world map. Only those publications with the SJR values on their publication venues are counted here. Figure 9 displays the sum of the SJR values of all publications for each first-authored country over all countries in world map. The SJR values of journal and conference publications were obtained from Scopus, as shown in Section 2.3. 173 countries except Andorra and Vanuatu are displayed in this result. The full results can be found in “Global publication impact distribution per SJR.xlsx.” ![Figure 9:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/09/06/2022.08.16.22278871/F9.medium.gif) [Figure 9:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/F9) Figure 9: Global publication impact distribution per cumulative SJR The top-10 countries in terms of their cumulative SJR are the US, China, the UK, Italy, Germany, India, France, Canada, Spain and Australia. ### 5.10 Top-10 published country’s research impact Here, we present the research impact of the top-10 first-authored countries with the highest number of publications. The publication impact metrics are H5-index, Impact Factor, CiteScore, SNIP, SJR, and CI. We calculate their cumulative values for each country. To make the impact metrics of publications comparable between countries, we normalize each impact metric and then sum its values of all publications from each first-authored country as the overall publication impact of this country per the impact metric. The log normalization of these impact metrics is introduced in Section 4.1. Figure 10 shows the top-10 mostly published country’s research impact. There are two Y axes, the right Y axis with line graphs stands for the publication-averaged mean of the log-normalized impact metrics of the top-10 countries, the log-normalized impact metrics of each impact metric is calculated per Equation (1). The left Y axis refers to the cumulative log-normalized impact metric of each country, presented in six colored bar charts for the sum of log-normalization of H5-index, Impact Factor, CiteScore, SNIP, SJR and CI, respectively. ![Figure 10:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/09/06/2022.08.16.22278871/F10.medium.gif) [Figure 10:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/F10) Figure 10: Top-10 mostly published country’s research impact The full details can be found in “Global research impact.xlsx.” The top-10 mostly published countries are the US, China, Italy, the UK, India, Spain, Canada, Germany, France, and Brazil. In terms of the cumulative (sum of) H5-index, the US, China, the UK, Italy and India rank the top-5 respectively; the US, China, the UK, Italy and Germany rank the top-5 in terms of sum and mean of Impact Factor; the US, China, the UK, Italy and India rank top-5 in terms of sum and mean of CiteScore; the US, China, the UK, Italy and India rank top-5 in terms of sum and mean of SNIP; and the US, China, the UK, Italy and Germany rank top-5 in terms of sum and mean of SJR. With regard to the cumulative composite indicator (CI), the US, China, the UK, Italy and India rank top 5, in contrast, the UK, France, China, the US and Germany rank top 5 in terms of mean CI. ### 5.11 Top-10 published country’s disciplinary research impact Similar to the presentation in Section 5.10, here, we present the results of the top-10 published country’s publication impact in terms of three major disciplines: computer science, social science, and medical science. The visualization settings are the same as that in Section 5.10. Figure 11 shows the publication research impact of Top-10 countries in terms of their disciplinary publications on COVID-19. The full details can be found in “Global disciplinary research impact.xlsx.” ![Figure 11:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/09/06/2022.08.16.22278871/F11.medium.gif) [Figure 11:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/F11) Figure 11: Top-10 mostly published country’s disciplinary research impact ### 5.12 Global research collaborations Here, we discuss the research collaborations between countries on COVID-19. Centered by the first-authored country of each publication, each two-author pairwise collaboration shows a two-author pair from different countries. We thus obtain the country-collaboration pair of N countries in the form of 1-2, 1-3, …, 1-N, if a publication has authors from N different countries. The country of the first author’s affiliation is regarded as the publication’s host country, which must be available from the publication data. In the global publications, there are 189 countries formed research collaborations on COVID-19. Figures 12, 13 and 14 show some snapshots of such research collaborations with collaboration times greater than 50, 200, and 500, respectively. Here, “greater than N” (N = 50, 200, 500) refers to those jointly authored publications with at least N co-author pairs from different countries, i.e., N collaborative publications. ![Figure 12:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/09/06/2022.08.16.22278871/F12.medium.gif) [Figure 12:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/F12) Figure 12: Cross-country COVID-19 research collaboration with joint publications >= 50 (the networking graph shows 67 countries have collaborative publications >= 50.) ![Figure 13:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/09/06/2022.08.16.22278871/F13.medium.gif) [Figure 13:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/F13) Figure 13: Cross-country COVID-19 research collaborations >= 200 (the networking graph shows 35 countries have collaboration publications >= 200.) ![Figure 14:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/09/06/2022.08.16.22278871/F14.medium.gif) [Figure 14:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/F14) Figure 14: Cross-country COVID-19 research collaborations >= 500 (the networking graph shows 17 countries have collaboration publications >= 500. In the network graph, different colors stand for countries from different continents. We only consider the pairwise collaborations between the first author and another author from a different country in a publication. Coauthoring from the same country is ignored, we also ignore the pairwise collaborations between non-first authors, such as the second-third author pair, the third-fourth author pair even if they are from different countries. The edge thickness represents the joint publication number (i.e., collaboration times) between countries. The node size stands for a country’s collaborative publication number. The research collaboration results show that the US, China, the UK, Italy and India are the top-5 mostly collaborative countries on COVID-19. The US appears to be the center point of the global COVID-19 research collaboration network among their top-5 collaboration countries. Of all publications, the US had 3,356 collaborations with China, 2,570 with UK, 1,788 with Italy and 1,397 with India, respectively. The results can be found in sheet named ‘Collab. without author’s order’ in the file ‘Global research collaborations.xlsx.’ Similarly, of the 1397 publications with UK as the first authored country, 1200 collaborations with the Canada, 1,044 with China and 718 with Italy, as the US’ top-5 collaborative countries. Of all publications, China had 2,312 collaborations with the US, 868 with UK, 538 with Australia and 368 with Canada, respectively, as China’s top-5 research collaborators on COVID-19. The results can be found in the sheet named ‘Collab. with author’s order’ of the file ‘Global research collaborations.xlsx.’ ## 6 G20 and OECD publication profile G20 countries and regions41 contributed 182,097 publications to studying the COVID-19 pandemic in total. In contrast, 38 OECD countries and regions42 contributed 139,588 publications. Here, we analyze their publication impact in terms of all of their publications, disciplinary publications, and publication-averaged impact, respectively. ### 6.1 G20 country’s overall publication impact Here, we analyze the G20 country’s COVID-19 research quality in terms of the five impact metrics of all of their publications in each country. We collected the publications of each G20 country, and then calculated their impact metrics for each country. Figure 15 shows the log-normalized cumulative values of H5-index, Impact Factor, and CiteScore of each country in terms of three dimensions. The X axis stands for the logarithm of a country *i*’s sum of CiteScore of all of their publications *N**i*, which is calculated by *ln*(*sum*(*CiteScore**i* +1)). Similarly, Y axis stands for the logarithm of a country’s sum of Impact Factor values of all publications, which is represented by *ln*(*sum*(*Impactfactor**i* + 1)). Z axis stands for the logarithm of a country’s sum of H5-index values of all publications, which is represented by *ln*(*sum*(*H*5 *− index**i* + 1)). The ball size stands for the logarithm value of a country’s publication number *N**i*, which is represented by *ln*(*sum*(*N**i* + 1)). The data can be found in “G20 country’s overall research impact.xlsx.” ![Figure 15:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/09/06/2022.08.16.22278871/F15.medium.gif) [Figure 15:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/F15) Figure 15: G20 country’s overall COVID-19 research impact The results show that the US, EU, China, Italy and the UK made the top-5 impact in the COVID-19 research. The top-5 countries in terms of H5-index are the US, EU, China, the UK, and Italy. The top-5 countries in terms of Impact Factor are the US, EU, China, the UK and Italy. The US, EU, China, the UK and Italy rank the top-5 countries in terms of CiteScore. ### 6.2 G20 country’s publication impact in computer science Here, we analyze the research impact of G20 country’s publications in computer science. Taking the same analysis approach in Section 6.1, we collected those computer science-related publications for each country and then calculated their log-normalized cumulative impact metrics. Figure 16 shows the results of the G20 country’s COVID-19 research impact in terms of the three impact metrics: CiteScore, Impact Factor, and H5-index in the discipline of computer science. The meaning of each axis is the same as in Section 6.1. The data can be found in “G20 country’s research impact in computer science.xlsx.” ![Figure 16:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/09/06/2022.08.16.22278871/F16.medium.gif) [Figure 16:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/F16) Figure 16: G20 country’s research impact of computer science publications Table 9 further lists the details of the data shown in Figure 16. *Num_pub* refers to the number of publications in each country. View this table: [Table 9:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/T9) Table 9: G20 country’s research impact of computer science publications In computer science, the top-5 countries in terms of H5-index are EU, the US, China, India and Italy. The top-5 countries in terms of Impact Factor are EU, China, the US, India and Italy. EU, China, the US, India and Italy rank the top-5 countries in terms of CiteScore. ### 6.3 G20 country’s publication impact in medical science Similar to the analysis undertaken in Section 6.2, here, we analyze the research impact of each G20 country’s publications in medical science. Figure 17 shows the statistics of the G20 country’s COVID-19 research quality in terms of CiteScore, Impact Factor, and H5-index of their publications in the category of medical science. The meaning of each axis is the same as in Figure 15. Table 10 further lists the data details of Figure 17. The data can be found in “G20 country’s research impact in medical science.xlsx.” ![Figure 17:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/09/06/2022.08.16.22278871/F17.medium.gif) [Figure 17:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/F17) Figure 17: G20 country’s research impact of medical science publications View this table: [Table 10:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/T10) Table 10: G20 country’s research impact of medical science publications In medical science, the top-5 countries in terms of H5-index are the US, the EU, China, the UK and Italy. The top-5 countries in terms of Impact Factor are the US, the EU, the UK, China and Italy. The US, the EU, China, the UK and Italy rank the top-5 countries in terms of CiteScore. ### 6.4 G20 country’s publication impact in social science Similar to the analysis undertaken in Section 6.2, here, we analyze the research impact of each G20 country’s publications in social science. Figure 18 shows the statistics of the G20 country’s COVID-19 research quality in terms of CiteScore, Impact Factor, and H5-index of their publications in the category of social science. The meaning of each axis is the same as in Figure 15. Table 11 further lists the data details of Figure 18. The data can be found in “G20 country’s research impact of social science.xlsx.” ![Figure 18:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/09/06/2022.08.16.22278871/F18.medium.gif) [Figure 18:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/F18) Figure 18: G20 country’s research impact of social science publications View this table: [Table 11:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/T11) Table 11: G20 country’s research impact of social science publications In social science, the top-5 countries in terms of H5-index are the US, the EU, China, the UK and Italy. The top-5 countries in terms of Impact Factor are the US, the EU, China, the UK and Italy. The US, the EU, China, the UK and Italy rank the top-5 countries in terms of CiteScore. ### 6.5 G20 country’s mean publication impact Here, we further analyze the publication-averaged mean impact of each G20 country’s total publications in terms of the five major impact metrics: H5-index, Impact Factor, CiteScore, SNIP, and SJR. The mean publication impact of each impact metric is calculated per the method introduced in Section 4.1. Figure 19 shows the statistics of the G20 countries’ research impact per publication in terms of the five impact metrics. We here apply the original impact metrics values of each publication to estimate the mean publication impact. The impact metrics value were crawled from different websites, as introduced in Section 2.3 on the research impact metrics collection. The data can be found in “G20 paper-averaged research impact metrics.xlsx.” ![Figure 19:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/09/06/2022.08.16.22278871/F19.medium.gif) [Figure 19:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/F19) Figure 19: G20 publication-averaged mean publication impact The results in Figure 19 show both the individual mean impact metrics and their accumulation for each country. Table 12 further shows the details of Figure 19 in terms of the mean impact metrics of the G20 countries globally. View this table: [Table 12:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/T12) Table 12: G20 publication-averaged mean publication impact As shown in Figure 19, overall, UK, Germany, France, US and China make the top-5 cumulative mean publication impact of the COVID-19 research. Specifically, as shown in Table 12, US achieves the highest number of publications and the highest mean SJR, UK makes the highest mean Impact Factor, CiteScore and SNIP, Germany makes the highest mean H5-index. ### 6.6 OECD country’s mean publication impact Similar to the analysis of G20 country’s mean publication impact in Section 6.5, here, we analyze the mean publication impact of OECD countries and regions. Table 13 shows the publication-averaged mean values of different impact metrics in the 38 OECD countries and regions. Here, the US achieves the highest number of publications, then Iceland makes the highest mean H5-index, impact factor, CiteScore, SNIP and SJR uniformly. Italy, Switzerland, the UK, the UK, the UK and the US make the second in terms of publication number, H5-index, impact factor, CiteScore, SNIP and SJR, respectively. View this table: [Table 13:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/T13) Table 13: OECD country’s publication-averaged mean publication impact ## 7 Publication profile on modeling COVID-19 In this chapter, we analyze the publication profile of those publications on modeling COVID-19. We focus on two perspectives. On one hand, we explore the word cloud of keywords in all modeling publications collected in Section 3.7 with modeling keywords collected in Section 3.5 as well as in three major disciplines and the two mostly published countries the US and China. On the other hand, we extract the major trends of the modeling publications in terms of top-K concerned problems per the COVID-19 problem-related keywords collected in Section 3.4, top-K modeling keywords defined in Section 3.5, top-K published countries, and top-K monthly concerned problems, etc. ### 7.1 Keyword cloud of modeling publications Here, we present the word cloud of all modeling publications. Modeling publications are those publications extracted in Section 3.7 with modeling keywords defined in Section 3.5. There are 43,921 publications on COVID-19 modeling. We count the frequency of each modeling keyword in these publications, and then extract the top-K keywords to draw the word cloud by the word cloud drawing tool. Figure 20 shows the word cloud of the top-200 modeling words appeared in all modeling publications. The results show that the modeling methods are diversified, covering classic mathematical and statistical methods, epidemic modeling methods, simulation, and classic machine learning methods. ![Figure 20:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/09/06/2022.08.16.22278871/F20.medium.gif) [Figure 20:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/F20) Figure 20: Modeling keyword cloud in modeling publications Further, Table 14 extracts the top-10 modeling keywords mostly appearing in the modeling publications: regression model, machine learning, simulation, linear regression, multivariate statistics, artificial intelligence, logistic regression, statistical model, deep learning, and CNN. They show that classic regression methods dominate the publications in modeling COVID-19, statistical models and machine learning follow that. Deep learning, although emerged only in a rather shorter period than those classic methods, makes a strong presence in modeling COVID-19, in particular, by the most fundamental network CNN. View this table: [Table 14:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/T14) Table 14: Top-10 modeling keywords in modeling publications The source code for generating the word cloud can be found in the file: data\_processing/word\_cloud.py. The data can be found in “Modeling keyword cloud in modeling publications.xlsx” ### 7.2 Keyword cloud of modeling publications in computer science Here, we further analyze the word cloud of the keywords in computer science publications on COVID-19. We take the similar method to that in Section 7.1. Figure 21 shows the word cloud of the top-200 modeling words appeared in modeling publications which belong to the computer science category, categorized per the method discussed in Section 3.6 on the disciplinary categorization, with the list of disciplines presented in Appendix 10.1. ![Figure 21:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/09/06/2022.08.16.22278871/F21.medium.gif) [Figure 21:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/F21) Figure 21: Modeling keyword cloud of modeling publications in computer science The results show that, in contrast to that of all modeling publications in Figure 20, the modeling publications from the computer science discipline mainly applied machine learning, deep learning and AI methods, although mathematical models also appeared highly frequently in the publications. Contrast to the top-10 keywords of all modeling publications in Table 14, here, as shown in Table 15, classic machine learning methods including neural networks and SVM, simulation, and deep learning networks including CNN dominate the publications, although their frequencies are much lower than that in the overall modeling publications. View this table: [Table 15:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/T15) Table 15: Top-10 keywords in modeling publications in computer science The source code for generating the word cloud can be found in the file: data\_processing/word\_cloud.py. The data can be found in “Modeling word cloud of modeling publications in computer science.xlsx.” ### 7.3 Keyword cloud of modeling publications in medical science Here, we take the similar method as in Section 7.1 to further analyze the word cloud of the keywords in medical science publications on COVID-19. Figure 22 shows the word cloud of the top-200 modeling keywords appeared in modeling publications which belong to the medical science category. These publications are categorized into medical science per the method introduced in Section 3.6 and the list of disciplinary categorization in Appendix 10.1. Table 16 further lists the top-10 keywords in medical science publications. View this table: [Table 16:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/T16) Table 16: Top-10 keywords in modeling publications in medical science ![Figure 22:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/09/06/2022.08.16.22278871/F22.medium.gif) [Figure 22:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/F22) Figure 22: Modeling word cloud of modeling publications in medical science In contrast to the results presented in Figure 20 and Table 14 for all modeling publications and Figure 21 and Table 15 for modeling publications in computer science, the top-200 keywords shown in Figure 22 and the top-10 keywords in Table 16 largely overlap those in the overall and computer science-based publications. However, publications related to medical science seem to favor classic mathematical and statistical methods than those modern methods such as deep learning and machine learning methods, and their frequencies are also much higher than those in computer science. The source code for generating the word cloud can be found in the file: data\_processing/word\_cloud.py. The data can be found in “Modeling word cloud of modeling publications in medical science.xlsx”. ### 7.4 Keyword cloud of modeling publications in social science Similarly, here we further analyze the word cloud of the keywords in social science publications on COVID-19. Figure 23 shows the keyword cloud of the top-200 modeling words appeared in modeling publications which were published in SSRN, as collected in the method introduced in Section 3.6 and the list of disciplinary categorization in Appendix 10.1. Table 17 further lists their top-10 keywords. ![Figure 23:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/09/06/2022.08.16.22278871/F23.medium.gif) [Figure 23:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/F23) Figure 23: Keyword cloud of modeling publications in social science View this table: [Table 17:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/T17) Table 17: Top-10 keywords of modeling publications in social science In comparison to the results presented in Figure 20 and Table 14 for all modeling publications, Figure 21 and Table 15 for modeling publications in computer science, Figure 22 and Table 16 for medical science, here, social science publications on COVID-19 favor classic mathematical tools such as regression models and structural equation models, although machine learning and AI methods also play a significant role. It is interesting to see big data and blockchain emerge as top concerns in social science publications. The source code for generating the word cloud can be found in the file: data\_processing/word\_cloud.py. The data can be found in “Modeling word cloud of modeling publications in in social science.xlsx.” ### 7.5 Keyword cloud of modeling publications from the US Since the US and China are the top-2 countries with most of the publications on COVID-19, we analyze their mostly favored modeling methods in this section and Section 7.6. We extracted those modeling publications with the first authors from the US for this analysis. Figure 24 shows the word cloud of the top-200 modeling keywords appeared in the modeling publications whose first authors are from the US. Table 18 lists their top-10 modeling keywords. View this table: [Table 18:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/T18) Table 18: Top-10 keywords in modeling publications from US ![Figure 24:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/09/06/2022.08.16.22278871/F24.medium.gif) [Figure 24:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/F24) Figure 24: Keyword cloud of modeling publications from the US In comparison to the results presented in Figure 20 and Table 14 for all modeling publications, publications from the US seem to overwhelmingly favor classic modeling methods from statistical and mathematical families, including regression models, multivariate and Bayesian methods. It is realized that deep learning and CNN, favored in the global modeling communities, did not mark their top-10 roles here in the US publications. The collection and categorization of first-author countries are introduced in Section 3.2. The source code for generating the word cloud can be found in the data\_processing/word\_cloud.py. The data can be found in “Modeling word cloud of modeling publications from the US.xlsx.” ### 7.6 Keyword cloud of modeling publications from China Similar to the analysis in Section 7.5 on the keywords of modeling publications in the US, here, we analyze those mostly applied by the first authors from China. Figure 25 shows the word cloud of the top-200 modeling keywords appeared in modeling publications whose first authors are from China mainland. Table 19 shows the top-10 keywords mostly concerned by Chinese researchers. ![Figure 25:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/09/06/2022.08.16.22278871/F25.medium.gif) [Figure 25:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/F25) Figure 25: Keyword cloud of modeling publications from China View this table: [Table 19:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/T19) Table 19: Top-10 keywords in modeling publications from China In comparison to the results presented in Figure 24 and Table 18 for the US, the results in Figure 25 and Table 19 show major similarity in favoring classic regression and statistical models but also significant priority difference, where Chinese researchers favored structural equation models, deep learning, and epidemic models like the susceptible-exposed-infectious-removed (SEIR) model. It is interesting to see, although both countries appreciated the importance of machine learning and simulation, their priorities (i.e., frequencies appearing in respective modeling publications) are exactly opposite, Chinese researchers like simulation, while the American ones prefer machine learning as their no. 2 favorite tool. It is also interesting to see Chinese drops ‘artificial intelligence’ and ‘Bayesian’ in their publications in the top-10 list, which were favored by Americans instead. The collection and categorization of author countries are introduced in Section 3.2. The source code for generating the word cloud can be found in the data\_processing/word\_cloud.py. The data can be found in “Modeling word cloud of modeling publications from China.xlsx.” ### 7.7 Keyword cloud of modeling publications from EU Similar to the analysis in Section 7.5 on the keywords of modeling publications in the US, here, we analyze those mostly applied by the first authors from the EU. Figure 26 shows the word cloud of the top-200 modeling keywords appeared in modeling publications whose first authors are from China mainland. Table 20 lists their top-10 modeling keywords. ![Figure 26:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/09/06/2022.08.16.22278871/F26.medium.gif) [Figure 26:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/F26) Figure 26: Keyword cloud of modeling publications from the EU View this table: [Table 20:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/T20) Table 20: Top-10 keywords in modeling publications from the EU ### 7.8 Top-50 modeling problems and their publication impact Here, we analyze the mostly concerned COVID-19 problems in modeling publications and their publication impact. The problem-related keywords are selected per the method introduced in Section 3.4 from three resources. We then select the top-50 problem-related keywords, and extract their associated modeling publications. We further calculate their mean publication impact metrics in terms of H5-index, Impact Factor, CiteScore, SNIP, and SJR. The mean publication impact is calculated per the method introduced in Section 4.1. Table 21 shows the mean of publication research impact in terms of metrics H5-index, Impact Factor, CiteScore, SNIP and SJR for the publications associated with the top-50 problem keywords. View this table: [Table 21:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/T21) Table 21: Mean publication impact of top-50 modeling problems The top-10 concerned problem keywords in the literature are mental health, second wave, lockdown, vaccine, anxiety, risk factor, vaccination, X-ray, public health, and depression. Further, of the top-50 problem keywords, the keywords can be grouped into the following major aspects and categories: * various issues related to mental health are mostly concerned, including anxiety, depression, stress, distress, sentiment analysis, and sleep, forming the mostly concern issues in the literature (5193). * non-pharmaceutical interventions and policies are widely concerned, including lockdown, social distancing, policy, quarantine, surveillance, physical activity, and isolation. * issues related to public health, vaccine, vaccination, testing, prevention, telehealth, screening, monitoring, and contact tracing, and concerns about children, cancer, diabetes, older adult, sleep, and obesity are widely concerned in the literature. * many literature concerns about risk factor, X-ray, CT, classification, treatment, hospitalization, and prognosis. * many references are on the prediction of spread, cases, mortality, and reproduction number. ### 7.9 Top-50 problems and their modeling methods Here, we analyze the popularly concerned problems in COVID-19 modeling and their modeling methods. We extract the problem keywords per the method introduced in Section 3.4. Those modeling publications associated with the top-50 selected problem keywords are extracted per the method introduced in Section 3.7. The modeling keywords are then extracted from these problems-related publications by the method introduced in Section 3.5. Figure 27 shows the mapping relationship between the top-50 problem keywords and their modeling methods. Each cell in the diagram refers to the interaction between a COVID-19 problem and its related modeling methods. The vertical columns on the top and at the bottom respectively refer to the categories of problems and specific problems of each category. The top-50 specific problems are grouped into seven major categories: epidemic, detection, impact, mental health, NIP, vaccine, and other. The horizontal rows correspond to the families of modeling methods and their specific methods. The families of modeling methods are machine learning, deep learning, statistical modeling, epidemic modeling, and others, labelled on the left of the diagram. Their specific methods are labelled on the left of the diagram respectively. ![Figure 27:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/09/06/2022.08.16.22278871/F27.medium.gif) [Figure 27:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/F27) Figure 27: Mapping between top-50 problems and their modeling methods The color of each cell measures the percentage is equal to the publications of the business problems and their modeling methods/top-50 business problems-related publications. The color of cell stands for the percentage, dark-to-light colors refer to high-to-low percentages correspondingly. The data can be found in “Mapping between top-50 business problems and their modelling methods.xlsx.” ### 7.10 Top-50 modeling methods and their publication impact Here, we analyze the popular modeling methods applied in modeling COVID-19 and then analyze their publication impact. We extract the modeling keywords per the method introduced in Section 3.5. Then, the publications association with top-K modeling keywords are selected. We further calculate the publication impact of these publications. Table 23 shows the top-50 modeling keywords concerned in modeling publications on COVID-19 and their mean publication impact metrics in terms of H5-index, Impact Factor, CiteScore, SNIP, and SJR. The mean publication impact is calculated per the method introduced in Section 4.1. View this table: [Table 22:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/T22) Table 22: Cross-disciplinary/method modeling of COVID-19 View this table: [Table 23:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/T23) Table 23: Top-50 modeling methods in modeling publications and their publication impact The top-50 modeling keywords can be grouped into a few major specific modeling techniques: time-series analysis, statistical modeling, shallow machine learning, deep learning, epidemic modeling, and medical analytics. Typical modeling tasks include estimation and forecasting, classification, prediction, sentiment analysis. * time-series analysis, including regression, linear regression, logistic regression, Cox regression, Poisson regression, linear model, ARIMA, multiple linear regression, and multivariate regression. * statistical modeling, including multivariate statistics, statistical model, Bayesian, univariate statistics, and mathematical models such as structural equation model, fore-casting model, and bivariate analysis. * shallow machine learning, including neural networks and ANN, support vector machine, random forest, transfer learning, decision tree, classification model, predictive model, cluster analysis, sentiment analysis, and natural language processing. * deep learning, including deep learning, CNN, LSTM, and ResNet. * epidemic modeling, including epidemiological model, SIR, SEIR, and transmission model. * medical and biomedical modeling, health belief model, and molecular dynamic. Table 22 shows the detailed data of the distributions of cross-disciplinary/method modeling on COVID-19. ### 7.11 Top-10 monthly concerned problems in modeling COVID-19 Here, we analyze the temporal trend of mostly concerned problems in modeling COVID-19. We first extract the modeling keywords per the method introduced in Section 3.4, where we choose the top-10 mostly concerned problems. Then, per the publication date of each publication with the date available, we categorize their publications into each month from Jan 2020 to Mar 2022, respectively. This forms the monthly publications associated with top-10 modeling problems. Figure 28 shows the monthly top-10 problems concerned in modeling publications on COVID-19 over months between Jan 2020 and Mar 2022. The top-10 problems are mental health, second wave, lockdown, vaccine, anxiety, risk factor, vaccination, X-ray, public health, and depression, where mental and public health dominate the publications with significant concern increase over the whole period, the same trend holds for second wave, vaccine and vaccination. In contrast, concerns on X-ray seem to decline over the recent year. Table 24 further shows the number of publications in modeling publications of monthly top-10 problems over the 27 months. ![Figure 28:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/09/06/2022.08.16.22278871/F28.medium.gif) [Figure 28:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/F28) Figure 28: Top-10 monthly concerned problems in modeling publications View this table: [Table 24:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/T24) Table 24: Top-10 monthly concerned problems in modeling publications The visualization tool for generating this result is 3D Waterfall43. The data can be found in “Monthly top-10 business problems in modeling publications.xlsx” ### 7.12 Top-10 problems in top-10 published countries In this section, we analyze the top-10 problems concerned by top-10 mostly published countries on modeling COVID-19. Similar to the approach taken in Section 7.11, we firstly extract the top-10 problem keywords from all extracted problem keywords in modeling publications following the problem keyword extraction method in Section 3.4. On the other hand, we count the number of total publications of each first-authored country per the author country extraction method in Section 3.2 and then extract the top-10 mostly published countries in all modeling publications. We then count the number of publications on each of the top-10 problems by each country. Figure 29 shows the number of publications associated with top-10 problems concerned by each of the top-10 published countries. Table 25 further lists their publication numbers in detail. The mostly concerned top-10 problems are mental health, second wave, lockdown, vaccine, anxiety, risk factor, vaccination, X-ray, public health, and depression. The top-10 mostly published first-authored countries are the US, China, Italy, the UK, India, Spain, Canada, Germany, France, and Brazil. ![Figure 29:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/09/06/2022.08.16.22278871/F29.medium.gif) [Figure 29:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/F29) Figure 29: Top-10 problems in top-10 published countries in modeling publications View this table: [Table 25:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/T25) Table 25: Top-10 problems in top-10 published countries This result shows the difference and similarity of prioritized problems between countries. For example, vaccine/vaccination, mental health/anxiety/depression were on the list of top concerned problems by the American scientists. In China, more priority was placed on understanding the risk factors of COVID-19, with much less intensity on vaccine, although similar interest to the US is on mental health/depression. In contrast, Italy, UK, Spain, and France paid highest attention to lockdown. Interestingly, Indian researchers favored X-ray and then lockdown in their COVID-19 research. Specifically, the US produced 1,433 publications to address the top-10 problems, in contrast to 1,549 by China. The data can be found in “Top-10 business problems in top-10 countries in modeling publications.xlsx.” ### 7.13 Top-10 modeling methods in top-10 published countries Similar to Section 7.12, here, we are concerned about the top-10 modeling methods mostly applied by the top-10 mostly published countries. We firstly extract the modeling publications as in Section 3.7 per the modeling keyword extraction method introduced in Section 3.5. We then extract the top-10 modeling keywords from the extracted modeling keyword list. Further, we extract the top-10 mostly published countries on modeling COVID-19, as shown in Sections 2.2 and 3.2. Figure 30 shows the number of publications associated with the top-10 modeling methods mostly concerned by the top-10 countries with the highest number of modeling publications. Table 26 further lists their publication numbers in detail. The top-10 modeling methods are regression model, machine learning, simulation, linear regression, multivariate statistics, artificial intelligence, logistic regression, statistical model, deep learning, and CNN. The top-10 mostly published first-authored countries are the US, China, Italy, the UK, India, Spain, Canada, Germany, France, and Brazil. ![Figure 30:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/09/06/2022.08.16.22278871/F30.medium.gif) [Figure 30:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/F30) Figure 30: Top-10 modeling methods in top-10 published countries in modeling publications View this table: [Table 26:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/T26) Table 26: Top-10 modeling methods in top-10 published countries in modeling publications The results show some interesting preference difference and similarity across the top-10 countries. Almost all countries favor and heavily rely on regression models in studying COVID-19. Not all of the top-10 modeling methods selected from all of the modeling publications appear in the top-10 of all countries, for example, deep learning including CNNs were mainly applied in India, China, and the US. Both American and Chinese researchers heavily applied regression models in comparison with other techniques. Indians seem to like simulation much more than other countries, although they all made simulation in their mostly applied technique. Classic machine learning and AI methods were widely applied by all countries as a major tool of studying COVID-19, which seem to be particularly favored by Indians and Americans. The full details can be found in “Top-10 modeling methods globally in modeling publications.xlsx.” ### 7.14 Top-10 problems and top-10 modeling methods by top-10 published countries Here, we study the interactions between top-10 modeling methods, top-10 modeled problems, and the top-10 published countries, and measure their publication impact. Taken the modeling keyword extraction method in Section 3.5, problem keyword extraction method in Section 3.4, modeling publication extraction method in Section 3.7, and first-authored country extraction in Section 3.2, we generate the top-10 modeling keywords, top-10 problem keywords, and top-10 published countries respectively from the modeling publications. Figure 31 shows the results of the top-10 modeling methods interacted with top-10 problems concerned by top-10 published countries with the highest number of their modeling publications. The mean of the collective impact metrics CI measures the mean publication impact of each country in terms of integrating H5-index, Impact Factor, CiteScore, SNIP, and SJR. CI is calculated per the method in Section 4.2. There are two pairwise bars for each country, where the left bars refer to the number of publications associated with each of the top-10 problems concerned by each country, the right bars refer to the number of publications corresponding to each of the top-10 modeling methods applied by each country. There are two Y axes in the diagram. The left Y axis refers to two pairwise bars, showing the number of publications corresponding to each of top-10 problems and each of the top-10 modeling methods concerned by each country, we stack the publications over all 10 problems and 10 modeling methods respectively for each country to obtain their total number of publications across all top-10 problems and top-10 modeling methods for each country. The right Y axis refers to the mean CI, showing the publication-averaged composite indicator value of each country on those top-10 problems and top-10 modeling methods jointly. ![Figure 31:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/09/06/2022.08.16.22278871/F31.medium.gif) [Figure 31:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/F31) Figure 31: Top-10 problems and top-10 modeling methods by top-10 published countries with mean composite indicator The mostly concerned top-10 problems are mental health, second wave, lockdown, vaccine, anxiety, risk factor, vaccination, X-ray, public health, and depression. The top-10 modeling methods are regression model, machine learning, simulation, linear regression, multivariate statistics, artificial intelligence, logistic regression, statistical model, deep learning, and CNN. The top-10 mostly published first-authored countries are the US, China, Italy, the UK, India, Spain, Canada, Germany, France, and Brazil. In addition to the findings discussed in Section 7.12 (e.g., Figure 29) on top-10 problems by top-10 countries, and that in Section 7.13 (e.g., Figure 30) on top-10 modeling methods by top-10 countries, here, we can further find out the significant inconsistency between publication numbers and their mean publication impact in terms of mean CI across countries. The UK did not make the most contributions in terms of the number of modeling publications, while they mean CI is the highest across the top-10 published countries. India unfortunately marks the lowest impact although their overall number of publications ranks the third. France made the second lowest number of publications among the top-10 countries, however, their mean CI ranks the second. There seems to be a strong correlation between the quality and quantity of COVID-19 modeling publications and their economic status, as further investigated in Section 4.4 on the correlation between the COVID-19 publications and GDP per capita. The results for the top-10 problems concerned by all countries is in the file “Top-10 business problems globally in modeling publications.xlsx;” and the results for the top-10 modeling methods applied by all countries is in the file “Top-10 modeling methods globally in modeling publications.xlsx.” ## 8 Correlation between publications, economy and infection This chapter fulfils the fourth aims and objectives discussed in Section 1.1 on understanding the relations between COVID-19 research and a country’s COVID-19 development and their economic status. We thus explore a few questions here, including the correlations between publications and GDP, the correlations between coronavirus infections and resultant deaths and publications globally and in G20 and OECD countries and regions, respectively. ### 8.1 Correlation between global publications and GDP per capita Here, we analyze the correlation between the number of publications and GDP per capita, and between the cumulative composite indicator and GDP per capita for each first-authored country. First, we extract all publications for each first-authored country per the method introduced in Section 3.2. Then, we calculate the cumulative composite indicator value of all publications from each first-authored country per the method in Section 4.2. In addition, we obtain the GDP and population data for each country per the method introduced in Section 3.2 and then calculate the GDP per capita (GDPPC) per the measurement in Section 4.3. Further, we calculate the correlation between the number of publications and the GDPPC for each country per the absolute and relative publication-GDP correlation coefficients, and the absolute and relative publication composite impact-GDP coefficients between the cumulative composite indicator and the GDPPC of a country, respectively, as introduced in Section 4.4. As a result, Figure 32 shows the results of the absolute publication composite impact-GDP coefficient, Figure 33 shows the results of the relative publication composite impact-GDP coefficient, Figure 34 shows the results of the dispersion publication composite impact-GDP coefficient of the global COVID-19 research, visualized by world map. Table 27 further lists the top-10 mostly correlated countries in terms of absolute, relative and dispersion publication composite impact-GDP correlation coefficients, respectively. The results can be found in file ‘Global countries’ publications and GDP per capita.xlsx.’ ![Figure 32:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/09/06/2022.08.16.22278871/F32.medium.gif) [Figure 32:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/F32) Figure 32: Global absolute publication composite impact-GDP coefficient ![Figure 33:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/09/06/2022.08.16.22278871/F33.medium.gif) [Figure 33:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/F33) Figure 33: Global relative publication composite impact-GDP coefficient ![Figure 34:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/09/06/2022.08.16.22278871/F34.medium.gif) [Figure 34:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/F34) Figure 34: Global dispersion publication composite impact-GDP coefficient View this table: [Table 27:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/T27) Table 27: Correlation between top 10 countries’ publications and GDP per capita These results show that * Germany, France and Canada show the lowest correlation coefficient between their publication numbers and their GDP per capita and their publication impact in terms of cumulative composite indicator (CI). This shows that these countries have the lowest productivity in terms of COVID-19 publication number and impact in the context of their much greater GDP per capita. * In contrast, India shows the highest correlation coefficient between their publication number and GDP per capita and publication impact, while China ranks the second highest among the top-10 mostly published countries. This means India and China show the highest productivity of producing COVID-19 research outcomes in the context of their lower GDP per capita. ### 8.2 Correlation between G20 publications and GDP per capita Specifically, here, we analyze the correlation between the COVID-19 publications and their GDP per capita for those G20 countries and regions. We take the same approach as in Section 8.1 to obtain the total publications, the cumulative composite indicator of all publications, and the GDP per capita for each G20 country. Table 28 shows the absolute and relative publication-GDP coefficients, and the absolute and relative publication composite impact-GDP coefficients over the G20 countries. The measures and their calculations in the table are introduced in Section 4.4. The full data can be found in “G20 publications and GDP per capita.xlsx.” View this table: [Table 28:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/T28) Table 28: Correlation between G20 publications and GDP per capita The results show that * Argentina, South Korea and Japan rank the top-3 with the lowest correlation between their publication number and GDP per capita. This indicates that these countries have the lowest productivity of producing COVID-19 publication number in the context of their GDP per capita. * Argentina, Russia and Japan rank the top-3 with the lowest correlation between their overall publication impact (i.e., CI) and GDP per capita (GDPPC). This shows that these countries have the lowest COVID-19 research impact in terms of their GDP per capita. * In contrast, India, China and the US rank top-3 in terms of their correlation coefficient between their publication number and GDP per capita, and their overall publication impact (i.e., CI) and their GDP per capita (GDPPC). This shows that these countries have the highest productivity and impact of COVID-19 research in terms of their GDP per capita. ### 8.3 Correlation between OECD publications and GDP per capita Similar to the analysis in Section 8.2 for G20 countries and regions, here, we further analyze the correlation between OECD country’s publications and their GDP per capita for each country and region. Table 29 shows the absolute and relative publication-GDP coefficients, and the absolute and relative publication composite impact-GDP coefficients over the OECD countries. The symbols and their calculations in the table are introduced in Section 4.4. The details can be found in “OECD publications and GDP per capita.xlsx.” View this table: [Table 29:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/T29) Table 29: Correlation between OECD publications and GDP per capita The results show that * Luxembourg, Iceland and Estonia rank the lowest correlation between their publication number and GDP per capita. These countries show the lowest productivity of COVID-19 publications in terms of their GDP per capita. * In contrast, the US, Turkey and Italy rank top 3 in terms of of their correlation coefficient between their publication number and GDP per capita; showing their highest productivity in terms of their economic status. * With regard to the correlation between cumulative composite indicator (CI) and their GDP per capita (GDPPC), Luxembourg, Iceland and Estonia rank the lowest cor relation, showing the lowest productivity in COVID-19 publications; while the US, Turkey and Italy rank the highest three, showing their highest productivity in terms of COVID-19 research impact in terms of their economic conditions. ### 8.4 Global correlation between publications and infections Here, we analyze the correlation between the COVID-19 publications and the number of infected cases in each country. The publications of each first-authored country are collected per the method introduced in Section 3.2, the collection of COVID-19 infected cases is done per the method introduced in Section 3.9. The correlation between publications and infections for each country is calculated per the correlation measures introduced in Section 4.5, i.e., Equation (14). Figure 35 shows the correlation coefficient between publication number and case number in all the countries. Table 30 shows the detailed data of the correlation coefficient in top-20 countries. The results can be found in file ‘Global correlation between publications, infections and deaths.xlsx.’ ![Figure 35:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/09/06/2022.08.16.22278871/F35.medium.gif) [Figure 35:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/F35) Figure 35: Correlation coefficient between publication number and new case number in all the countries View this table: [Table 30:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/T30) Table 30: Correlation between publications and infections in top-20 countries The correlation between COVID-19 publications and infections show that, of the top-20 publishing countries: * China shows the highest log-correlation between their COVID-19 publication number and infections, showing their highest productivity in the context of their infections; * Netherlands shows the lowest log-correlation between their COVID-19 publication number and infections, showing their lowest productivity in the context of their infections. ### 8.5 Global correlation between publications and deaths Here, we analyze the correlation between the COVID-19 publications and the number of death cases in each country. The publications of each first-authored country are collected per the method introduced in Section 3.2, the collection of COVID-19 death cases is done per the method introduced in Section 3.9. The correlation between publications and deaths for each country is calculated per the correlation measures introduced in Section 4.5, i.e., Equation (14). Figure 36 shows the correlation coefficient between publication number and death number in all the countries. Table 31 shows the detailed data of the correlation coefficient in top-20 countries. The results can be found in file ‘Global correlation between publications, infections and deaths.xlsx.’ ![Figure 36:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/09/06/2022.08.16.22278871/F36.medium.gif) [Figure 36:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/F36) Figure 36: Correlation coefficient between publication number and death number in all the countries View this table: [Table 31:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/T31) Table 31: Correlation between publications and deaths in top-20 countries The correlation between COVID-19 publications and deaths show that, of the top-20 publishing countries: * China shows the highest log-correlation between their COVID-19 publication number and deaths, showing their highest productivity in the context of their deceased cases; * Brazil shows the lowest log-correlation between their COVID-19 publication number and deaths, showing their lowest productivity in the context of their COVID-19 deaths. ### 8.6 Correlation between monthly disciplinary publications and infections Here, we further analyze the correlation between the global publications and global infections for every month between Jan 2020 and Mar 2022 and the global publication-infection correlation in the three major disciplines: computer science, medical science, and social science, respectively. The monthly publications are collected per the method introduced in Section 3.3 to group the publications to each month per their publication dates. We only count those publications with publication dates available. The categorization of monthly publications to the three major disciplines is done by the method introduced in Section 3.6. The monthly infections and deaths are collected per the method introduced in Section 3.9 globally. Table 32 shows the number of the monthly global publications in computer science, medical science and social science and the number of global monthly new infection cases and deaths. It further shows the correlation coefficient between monthly publication number and infections and deaths, respectively. View this table: [Table 32:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/T32) Table 32: Monthly disciplinary publications and infection Figure 37 shows the correlation between each of the disciplinary publications and the infections and deaths over each month. The correlations between publications and infected and death numbers are calculated per the method introduced in Section 4.5. ![Figure 37:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2022/09/06/2022.08.16.22278871/F37.medium.gif) [Figure 37:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/F37) Figure 37: Correlation between monthly disciplinary publications and infections The full details of the global monthly publications and infections are in the file “Correlation between monthly disciplinary publications and infections.xlsx.” ### 8.7 Correlation between G20 publications and infections Table 33 shows the publications and case numbers in G20 countries and regions and their correlations with the publication number in each of their countries, respectively. View this table: [Table 33:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/T33) Table 33: Correlation between publications and infections in G20 countries The data can be found in ‘G20 publications and infections.xlsx.’ ### 8.8 Correlation between G20 publications and deaths Table 34 shows the publications and death case numbers in G20 countries and regions and their correlations in each of their countries, respectively. View this table: [Table 34:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/T34) Table 34: Correlation between publications and deaths in G20 countries The data can be found in ‘G20 publications and deaths.xlsx.’ ### 8.9 Correlation between OECD publications and infections Table 35 shows the publications and case numbers in OECD countries and regions and their correlations with the publication number in each of their countries, respectively. View this table: [Table 35:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/T35) Table 35: Correlation between publications and infections in OECD countries The data can be found in ‘OECD publications and infections.xlsx.’ ### 8.10 Correlation between OECD publications and deaths Here, we take the similar approach as in Section 8.5 to analyze the correlation between COVID-19 publications and the number of deaths in each of the G20 and OECD countries and regions. Table 36 shows the numbers of publications and deaths and their correlation in G20 and OECD countries. View this table: [Table 36:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/T36) Table 36: Correlation between publications and deaths in OECD countries The data can be found in ‘OECD publications and deaths.xlsx.’ ## 9 Concluding remarks and discussion In this report, we summarize our literature analyses of the global scientist’s responses to tackling COVID-19. We have collected about 350k scientific references on COVID-19 in English between Jan 2020 and Mar 2022, produced by researchers from 189 countries, including 175 first-authored countries, and in 27 subject areas. Various literature analyses of this paramount global scientific effort have been made, including * the descriptive statistics of the global publications in terms of publication quality and impact over countries and regions, subject areas and major disciplines, and main impact metrics; * the profile and trends of COVID-19 publication quality and impact in G20 and OECD countries and regions; * the publication profile, patterns and trends of publications on COVID-19 modeling; * the correlations between COVID-19 publications and economic status and COVID-19 infections globally and in major countries and regions; * the imbalances of COVID-19 research across subject areas, countries and regions, and quantity and quality. The above analyses have been conducted on the information of title, abstract, author affiliations, and keywords extracted from global publications. This work complements our other comprehensive review on COVID-19 modeling in [**Cao-Liu’22**]. However, further work includes but is not limited to: * more extraction and analyses of full text of these global publications; * more analyses of the COVID-19 global scientists response data; * specific review analyses of subject areas, for example, multi-disciplinary modeling of COVID-19, and modeling inference and impact of COVID-19; * updating the publication impact analyses per the recently released 2021 impact metrics of journals and conferences. ## Data Availability The data of global scientific response to COVID-19 is available at [https://www.kaggle.com/datasets/datascienceslab/covid19-global-scientists-response](https://www.kaggle.com/datasets/datascienceslab/covid19-global-scientists-response) and [https://datasciences.org/covid19-modeling/](https://datasciences.org/covid19-modeling/). [https://datasciences.org/covid19-modeling/](https://datasciences.org/covid19-modeling/) ## 10 Appendices Here, we list a few appendices. We first include the domain-driven list of disciplinary categorization, the list of modeling keywords predefined by domain knowledge. Then, we introduce the collected metadata of COVID-19 publications, the list of files corresponding to the various results, and the list of files corresponding to source codes for processing the publications in this report. ### 10.1 Appendix 1. List of disciplinary categorization Here, we collect the list of disciplinary keywords and terminologies collected from Web of Science, SJR, and [Research.com](http://Research.com), which are used to categorize the disciplines of each collected publication on COVID-19. Table 37 shows the categories of subject areas in three major disciplines: medical science, computer science, and social science. View this table: [Table 37:](http://medrxiv.org/content/early/2022/09/06/2022.08.16.22278871/T37) Table 37: List of disciplinary categorization ### 10.2 Appendix 2. List of predefined modeling keywords The following includes the list of keywords we have collected from different sources and by our domain knowledge. They are categorized into machine learning, deep learning, mathematical modeling, epidemic modeling, and other general methods. * Machine learning: * Machine learning * Sentiment analysis * Support vector machine * Transfer learning * Random forest * Decision tree * Natural language processing * Artificial neural network Deep learning: Deep learning * Convolutional neural network * Neural network * Long short-term memory * Deep neural network Mathematical modeling: * Mathematical modeling * Regression model linear regression * Multivariate statistics * Logistic regression * Statistical model * Bayesian * Mathematical model * Cox regression * Univariate analysis * Autoregressive integrated moving average * Poisson regression * Time series analysis * Linear model * Bivariate analysis * Multiple linear regression * Multivariate regression Epidemic modeling: * Compartmental modeling * Susceptible-exposed-infectious-removed * Susceptible-infectious-recovered * Transmission model * Epidemiological model General and other methods and keywords: * Compartmental modeling * Predictive model * Forecasting model * Simulation * Artificial intelligence * Big data * Decision making * Data mining * Data analytics ### 10.3 Appendix 3. List of COVID-19 publication metadata Here, we list the metadata corresponding to the crawled publications and their supplementary information and impact metrics associated with each publication on COVID-19. The metadata of this aggregated source data of COVID-19 publications are as follows in terms of reference information of publications, supplementary information of publications, and the impact metrics of the publication venues. There are 27 fields describing the COVID-19 publications. They are listed below. * Pub_ID: the unique publication reference number; * Pub_DOI: Publication DOI * Title: Title of publication * Abstract: Abstract of publication * Authors: the authors of a publication * First_author: First author of publication * Author_affiliations: the affiliations of authors * First\_author\_country: Country or region of the first author * Coauthor_countries: the Countries of co-authors * Pub_date: Publishing date of a publication * Pub_source: Publication source * Pub\_source\_type: Type of publication source * Source\_H5\_format: H5-index format of publication source * Source\_Scopus\_format: Scopus format of publication source * Source\_WOS\_format: WoS format of publication source * Keyword\_from\_pub: Keywords listed in the publication * Keyword\_by\_OpenNLP: Keywords provided by OpenNLP * Keyword\_by\_PositionRank: Keywords by PositionRank * Source_discipline: Discipline or subject area of the publication * Modeling_related: Is the publication related to AI, data science, machine learning, and other modeling methods? * Modeling\_technical\_keywords: Modeling method keywords * Source\_Impact\_Factor: Impact factor of the publication source * Source\_H5\_Index: H5-index of the publication source * Source_CiteScore: CiteScore of the publication source * Source_SNIP: SNIP of the publication source * Source_SJR: SJR of the publication source * Source\_Composite\_Indicator: Composite Indicator of the publication source Readers may also refer to the attributes of publication references from the following source: * The COVID-19 Open Research Dataset (CORD-19) ([**Wang’20**]) The full publication metadata of this COVID-19 global scientist response dataset can be found in Kaggle at [https://www.kaggle.com/datasets/datascienceslab/covid19-global-scientists-response](https://www.kaggle.com/datasets/datascienceslab/covid19-global-scientists-response). ### 10.4 Appendix 4. List of COVID-19 publication analysis results Here, we list the files storing the analytical results of processing COVID-19 publications in the respective sections in this report. The files are as follows: * Correlation between monthly disciplinary publications and infections.xlsx * G20 country’s overall research impact.xlsx * G20 country’s research impact in computer science.xlsx * G20 country’s research impact in medical science.xlsx * G20 country’s research impact of social science.xlsx * G20 paper-averaged research impact metrics.xlsx * G20 publications and deaths.xlsx * G20 publications and GDP per capita.xlsx * G20 publications and infections.xlsx * Global correlation between publications, infections and deaths.xlsx * Global countries’ publications and GDP per capita.xlsx * Global country’s research impact.xlsx * Global disciplinary research impact.xlsx * Global publication distribution.xlsx * Global research collaborations.xlsx * Global research impact distribution per Composite Indicator (CI).xlsx * Global research impact distribution per SJR.xlsx * Global research impact distribution per CiteScore.xlsx * Global research impact distribution per H5-index.xlsx * Global research impact distribution per Impact Factor.xlsx * Global research impact distribution per SNIP.xlsx * Mapping between top-50 business problems and their modelling methods.xlsx * Modeling keyword cloud in modeling publications.xlsx * Modeling word cloud of modeling publications from China.xlsx * Modeling word cloud of modeling publications from the US.xlsx * Modeling word cloud of modeling publications in computer science.xlsx * Modeling word cloud of modeling publications in in social science.xlsx * Modeling word cloud of modeling publications in medical science.xlsx * Monthly top-10 business problems in modeling publications.xlsx * OECD publications and deaths.xlsx * OECD publications and GDP per capita.xlsx * OECD publications and infections.xlsx * Top-10 business problems globally in modeling publications.xlsx * Top-10 modeling methods globally in modeling publications.xlsx * Top-10 mostly published country’s disciplinary research impact.xlsx * Word cloud of global publications.xlsx ### 10.5 Appendix 5. List of source codes for processing COVID-19 publications * Here, we list the files storing the source codes of processing COVID-19 publications in the respective sections in this report. * crawler/WOS_crawler.py, * crawler/WHO_crawler.py, * crawler/RG_crawler.py, * crawler/PMD\_keyword\_crawler.py, * crawler/medrxiv_crawler.py, * crawler/crossref\_crawler\_by_id.py * crawler/WOS_crawler.py, * crawler/WHO_crawler.py, * crawler/RG_crawler.py, * crawler/PMD\_keyword\_crawler.py, * crawler/medrxiv_crawler.py, * crawler/crossref\_crawler\_by_id.py * dataprocessing/pre process.py * data_processing/country process.py * data_processing/publish date process.py * keyword_extraction/keyword position rank/main process.py * data_processing/domain process.py * data\_processing/keyword\_process.py * data_processing/word cloud.py ## Footnotes * This new version has incorporated substantial new and updated results on the correlation analysis and the updated data, and fixed some minor errors. * 1 WHO Coronavirus (COVID-19) Dashboard: [https://covid19.who.int/](https://covid19.who.int/). * 2 COVID-19 Coronavirus Pandemic: [https://www.worldometers.info/coronavirus/#countries](https://www.worldometers.info/coronavirus/#countries) * 3 We would suggest interested readers to read this technical report together with our systematic review on COVID-19 modeling in [**Cao-Liu’22, Covid19-modeling**]. * 4 [https://opennlp.apache.org/](https://opennlp.apache.org/) * 5 [scimagojr.com](http://scimagojr.com) * 6 [Research.com](http://Research.com) * 7 [https://www.ssrn.com/index.cfm/en/](https://www.ssrn.com/index.cfm/en/) * 8 The tool for creating world map is available at [https://pyecharts.org/#/en-us/](https://pyecharts.org/#/en-us/). * 9 The tool for creating multi-dimensional visualization is available at [https://www.originlab.com/](https://www.originlab.com/). * 10 The tool for creating radar chart is available at [https://online.visual-paradigm.com/](https://online.visual-paradigm.com/). * 11 The tool for creating contour map is available at [https://www.originlab.com/](https://www.originlab.com/). * 12 [https://ai2-semanticscholar-cord-19.s3-us-west-2.amazonaws.com/historical\_releases.html](https://ai2-semanticscholar-cord-19.s3-us-west-2.amazonaws.com/historical_releases.html) * 13 Note, COVID-19 global literature on coronavirus disease between 2019 and 2023 is available at: [https://search.bvsalud.org/global-literature-on-novel-coronavirus-2019-ncov/](https://search.bvsalud.org/global-literature-on-novel-coronavirus-2019-ncov/), this resource includes literature from different countries, languages, and disciplinary venues. * 14 [https://apps.webofknowledge.com](https://apps.webofknowledge.com) * 15 [https://www.researchgate.net/](https://www.researchgate.net/) * 16 [https://www.ncbi.nlm.nih.gov/pmc/](https://www.ncbi.nlm.nih.gov/pmc/) and [https://pubmed.ncbi.nlm.nih.gov/](https://pubmed.ncbi.nlm.nih.gov/) * 17 [https://www.who.int/emergencies/diseases/novel-coronavirus-2019/global-research-on-novel-coronavirus-2019-ncov](https://www.who.int/emergencies/diseases/novel-coronavirus-2019/global-research-on-novel-coronavirus-2019-ncov) * 18 [https://www.crossref.org/](https://www.crossref.org/) * 19 [https://www.medrxiv.org/](https://www.medrxiv.org/) * 20 [https://arxiv.org/](https://arxiv.org/), since many publications here may not be associated with author country or publication date, we do not use the preprints from arXiv in this report. * 21 [https://scholar.google.com/citations?view\_op=top\_venues](https://scholar.google.com/citations?view_op=top_venues) * 22 [https://clarivate.com/webofsciencegroup/solutions/journal-citation-reports/](https://clarivate.com/webofsciencegroup/solutions/journal-citation-reports/) * 23 [https://www.scopus.com/sources.uri?RN\_AG\_Sourced\_400000654](https://www.scopus.com/sources.uri?RN_AG_Sourced_400000654) * 24 [https://github.com/ymym3412/position-rank](https://github.com/ymym3412/position-rank) * 25 [https://github.com/allenai/scibert](https://github.com/allenai/scibert) * 26 [scimagojr.com](http://scimagojr.com) * 27 [Research.com](http://Research.com) * 28 [https://worldpopulationreview.com/](https://worldpopulationreview.com/) * 29 [https://ourworldindata.org/covid-cases](https://ourworldindata.org/covid-cases) * 30 [https://scholar.google.com/citations?view\_op=top\_venues](https://scholar.google.com/citations?view_op=top_venues) * 31 [https://clarivate.com/webofsciencegroup/essays/impact-factor/#:~:text=Thus%2C%20the%20impact%20factor%20of,(or%20total)%20citation%20frequencies](https://clarivate.com/webofsciencegroup/essays/impact-factor/#:~:text=Thus%2C%20the%20impact%20factor%20of,(or%20total)%20citation%20frequencies). * 32 [https://jcr.clarivate.com/](https://jcr.clarivate.com/) * 33 [https://www.elsevier.com/connect/editors-update/citescore-a-new-metric-to-help-you-choose-the-right-journal](https://www.elsevier.com/connect/editors-update/citescore-a-new-metric-to-help-you-choose-the-right-journal) * 34 [https://www.scopus.com/sources](https://www.scopus.com/sources) * 35 [https://www.elsevier.com/connect/editors-update/citescore-a-new-metric-to-help-you-choose-the-right-journal](https://www.elsevier.com/connect/editors-update/citescore-a-new-metric-to-help-you-choose-the-right-journal) * 36 [https://www.scopus.com/sources](https://www.scopus.com/sources) * 37 [https://www.elsevier.com/connect/editors-update/citescore-a-new-metric-to-help-you-choose-the-right-journal](https://www.elsevier.com/connect/editors-update/citescore-a-new-metric-to-help-you-choose-the-right-journal) * 38 [https://www.scopus.com/sources](https://www.scopus.com/sources) * 39 [https://www.scimagojr.com/journalrank.php](https://www.scimagojr.com/journalrank.php) * 40 Word cloud is a visual representation of words with greater prominence to those more frequently-occurring words. * 41 The G20 countries and regions can be found in [https://g20.org/about-the-g20/](https://g20.org/about-the-g20/). * 42 The OECD countries and regions can be found in [https://www.oecd.org/about/members-and-partners/](https://www.oecd.org/about/members-and-partners/). * 43 [https://www.originlab.com/](https://www.originlab.com/) * Received August 16, 2022. * Revision received September 5, 2022. * Accepted September 6, 2022. * © 2022, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution-NonCommercial-NoDerivs 4.0 International), CC BY-NC-ND 4.0, as described at [http://creativecommons.org/licenses/by-nc-nd/4.0/](http://creativecommons.org/licenses/by-nc-nd/4.0/) [1]: /embed/inline-graphic-1.gif [2]: /embed/graphic-4.gif [3]: /embed/graphic-5.gif [4]: /embed/graphic-6.gif [5]: /embed/graphic-7.gif [6]: /embed/graphic-8.gif [7]: /embed/inline-graphic-2.gif [8]: /embed/graphic-10.gif [9]: /embed/inline-graphic-3.gif [10]: /embed/graphic-11.gif [11]: /embed/graphic-12.gif [12]: /embed/inline-graphic-4.gif [13]: /embed/graphic-13.gif [14]: /embed/inline-graphic-5.gif [15]: /embed/inline-graphic-6.gif [16]: /embed/inline-graphic-7.gif [17]: /embed/graphic-14.gif [18]: /embed/inline-graphic-8.gif [19]: /embed/inline-graphic-9.gif [20]: /embed/inline-graphic-10.gif [21]: /embed/graphic-15.gif [22]: /embed/inline-graphic-11.gif [23]: /embed/inline-graphic-12.gif [24]: /embed/inline-graphic-13.gif [25]: /embed/inline-graphic-14.gif [26]: /embed/inline-graphic-15.gif [27]: /embed/graphic-16.gif [28]: /embed/inline-graphic-16.gif [29]: /embed/inline-graphic-17.gif [30]: /embed/inline-graphic-18.gif [31]: /embed/graphic-17.gif [32]: /embed/inline-graphic-19.gif [33]: /embed/inline-graphic-20.gif [34]: /embed/inline-graphic-21.gif [35]: /embed/inline-graphic-22.gif [36]: /embed/graphic-18.gif [37]: /embed/graphic-19.gif [38]: /embed/inline-graphic-23.gif [39]: /embed/graphic-20.gif [40]: /embed/inline-graphic-24.gif [41]: /embed/inline-graphic-25.gif [42]: /embed/inline-graphic-26.gif [43]: /embed/graphic-21.gif [44]: /embed/inline-graphic-27.gif [45]: /embed/inline-graphic-28.gif [46]: /embed/inline-graphic-29.gif [47]: /embed/inline-graphic-30.gif