An evaluation of reproducibility and errors in published sample size calculations performed using G*Power

Robert T. Thibault; Emmanuel A. Zavalis; Mario Malički; Hugo Pedder

doi:10.1101/2024.07.15.24310458

Abstract

Background Published studies in the life and health sciences often employ sample sizes that are too small to detect realistic effect sizes. This shortcoming increases the rate of false positives and false negatives, giving rise to a potentially misleading scientific record. To address this shortcoming, many researchers now use point-and-click software to run sample size calculations.

Objective We aimed to (1) estimate how many published articles report using the G*Power sample size calculation software; (2) assess whether these calculations are reproducible and (3) error-free; and (4) assess how often these calculations use G*Power’s default option for mixed-design ANOVAs—which can be misleading and output sample sizes that are too small for a researcher’s intended purpose.

Method We randomly sampled open access articles from PubMed Central published between 2017 and 2022 and used a coding form to manually assess 95 sample size calculations for reproducibility and errors.

Results We estimate that more than 48,000 articles published between 2017 and 2022 and indexed in PubMed Central or PubMed report using G*Power (i.e., 0.65% [95% CI: 0.62% - 0.67%] of articles). We could reproduce 2% (2/95) of the sample size calculations without making any assumptions, and likely reproduce another 28% (27/95) after making assumptions. Many calculations were not reported transparently enough to assess whether an error was present (75%; 71/95) or whether the sample size calculation was for a statistical test that appeared in the results section of the publication (48%; 46/95). Few articles that performed a calculation for a mixed-design ANOVA unambiguously selected the non-default option (8%; 3/36).

Conclusion Published sample size calculations that use G*Power are not transparently reported and may not be well-informed. Given the popularity of software packages like G*Power, they present an intervention point to increase the prevalence of informative sample size calculations.

Competing Interest Statement

The authors have declared no competing interest.

Clinical Protocols

https://doi.org/10.17605/OSF.IO/UJXHW

Funding Statement

Robert Thibault was supported by a general support grant awarded to METRICS from Arnold Ventures and a postdoctoral fellowship from the Canadian Institutes of Health Research. Robert Thibault will serve as guarantor for the contents of this paper. Hugo Pedder was supported by the UK National Institute for Health and Social Care Excellence (NICE) via the Bristol Technology Assessment Group and the NICE Technical Support Unit. The funders had no role in the preparation of this manuscript or the decision to publish.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Footnotes

The calculations presented in supplementary figures 1 and 2 contained an error. They were calculated using samples sizes of n=22 and n=23, rather than n=27 and n=28. We thank Alexis Makin for bringing this issue to our attention.

Data availability

Data, data dictionaries, analysis scripts, and other materials related to this study are publicly available at https://osf.io/msz24/. The study protocol was registered on 31 May 2022 at https://doi.org/10.17605/OSF.IO/UJXHW. Discrepancies between this manuscript and the registered protocol are outlined in Supplementary Material A. The analysis script can be rerun by selecting “Reproducible Run” in the Code Ocean container available at https://doi.org/10.24433/CO.4349082.v1.

The copyright holder has placed this preprint in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, reuse, remix, or adapt this material for any purpose without crediting the original authors.