RT Journal Article SR Electronic T1 Bayesian Shrinkage Priors in Zero-Inflated and Negative Binomial Regression models with Real World Data Applications of COVID-19 Vaccine, and RNA-Seq JF medRxiv FD Cold Spring Harbor Laboratory Press SP 2022.07.13.22277610 DO 10.1101/2022.07.13.22277610 A1 Bhattacharyya, Arinjita A1 Mitra, Riten A1 Rai, Shesh A1 Pal, Subhadip YR 2022 UL http://medrxiv.org/content/early/2022/07/15/2022.07.13.22277610.abstract AB Background Count data regression modeling has received much attention in several science fields in which the Poisson, Negative binomial, and Zero-Inflated models are some of the primary regression techniques. Negative binomial regression is applied to modeling count variables, usually when they are over-dispersed. A Poisson distribution is also utilized for counting data where the mean is equal to the variance. This situation is often unrealistic since the distribution of counts will usually have a variance that is not equal to its mean. Modeling it as Poisson distributed leads to ignoring under- or overdispersion, depending on if the variance is smaller or larger than the mean. Also, situations with outcomes having a larger number of zeros such as RNASeq data require Zero-inflated models. Variable selection through shrinkage priors has been a popular method to address the curse of dimensionality and achieve the identification of significant variables.Methods We present a unified Bayesian hierarchical framework that implements and compares shrinkage priors in negative-binomial and zero-inflated negative binomial regression models. The key feature is the representation of the likelihood by a Polya-Gamma data augmentation, which admits a natural integration with a family of shrinkage priors. We specifically focus on the Horseshoe, Dirichlet Laplace, and Double Pareto priors. Extensive simulation studies address the efficiency of the model and mean square errors are reported. Further, the models are applied to data sets such as the Covid-19 vaccine, and Covid-19 RNA-Seq data among others.Results The models are robust enough to address variable selection, and MSE decreases as the sample size increases, having lower errors in p > n cases. The noteworthy results showed that the adverse events of Covid-19 vaccines were dependent on age, recovery, medical history, and prior vaccination with a remarkable reduction in MSE of the fitted values. No. of publications of Ph.D. students were dependent on the no. of children, and the no. of articles in the last three years.Conclusions The models are robust enough to conduct both variable selections and produce effective fit because of their high shrinkage property and applicability to a broad range of biometric and public health high dimensional problems.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis work was supported by the National Institute of Health grant P42 ES023716 to principal investigator: Dr S Srivastava and the National Institute of Health grant 1P20 GM113226 to principal investigator: Dr C McClain. Dr. Shesh Rai was also partially supported by Wendell Cherry Chair in Clinical Trial Research.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesI confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesThe datasets used and/or analysed are publicly available and information about it is included in this article.SSVSStochastic Search Variable SelectionGLglobal-localHSHorseshoeDLDirichlet LaplaceDPDouble ParetoPGPolya-GammaDAData-AugmentationMCMCMarkov Chain Monte CarloMSEMean Squared ErrorVSvariable selectionBZINBBayesian Zero-Inflated Negative BinomialBNBBayesian Negative BinomialNBNegative BinomialZINBZero-Inflated Negative Binomial;