PT  - JOURNAL ARTICLE
AU  - Samuel, Jim
AU  - Ali, G. G. Md. Nawaz
AU  - Rahman, Md. Mokhlesur
AU  - Esawi, Ek
AU  - Samuel, Yana
TI  - COVID-19 Public Sentiment Insights and Machine Learning for Tweets Classification
AID  - 10.1101/2020.06.01.20119347
DP  - 2020 Jan 01
TA  - medRxiv
PG  - 2020.06.01.20119347
4099  - http://medrxiv.org/content/early/2020/06/03/2020.06.01.20119347.short
4100  - http://medrxiv.org/content/early/2020/06/03/2020.06.01.20119347.full
AB  - Along with the Coronavirus pandemic, another crisis has manifested itself in the form of mass fear and panic phenomena, fueled by incomplete and often inaccurate information. There is therefore a tremendous need to address and better understand COVID-19’s informational crisis and gauge public sentiment, so that appropriate messaging and policy decisions can be implemented. In this research article, we identify public sentiment associated with the pandemic using Coronavirus specific Tweets and R statistical software, along with its sentiment analysis packages. We demonstrate insights into the progress of fear-sentiment over time as COVID-19 approached peak levels in the United States, using descriptive textual analytics supported by necessary textual data visualizations. Furthermore, we provide a methodological overview of two essential machine learning (ML) classification methods, in the context of textual analytics, and compare their effectiveness in classifying Coronavirus Tweets of varying lengths. We observe a strong classification accuracy of 91% for short Tweets, with the Naïve Bayes method. We also observe that the logistic regression classification method provides a reasonable accuracy of 74% with shorter Tweets, and both methods showed relatively weaker performance for longer Tweets. This research provides insights into Coronavirus fear sentiment progression, and outlines associated methods, implications, limitations and opportunities.Competing Interest StatementThe authors have declared no competing interest.Funding StatementNo fundingAuthor DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:There was no third party involved in this study.All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesData will be available upon requestCOVID-19Coronavirus Disease 2019MLMachine LearningNLPNatural Language Processing