Abstract
Background Transparency in research is crucial as it allows for the scrutiny and replication of findings, fosters confidence in scientific outcomes, and ultimately contributes to the advancement of knowledge and the betterment of society.
Aim We aimed to assess adherence to five practices promoting transparency in scientific publications (data availability, code availability, protocol registration, conflicts of interest (COI) and funding disclosures) from open-access articles published in medical journals.
Methods We searched and exported all open-access articles from Science Citation Index Expanded (SCIE)-indexed journals through the Europe PubMed Central database published until March 16, 2024. Basic journal- and article-related information was retrieved from the database. We used R to produce descriptive statistics.
Results The analysis included 2,189,542 open-access articles from SCIE-indexed medical journals. Of these, 87.5% (95% CI: 87.4%-87.5%) disclosed COI and 80.1% (95% CI: 80.0%-80.1%) disclosed funding. Protocol registration was present in 6.6% (95% CI: 6.6%-6.6%), data sharing in 7.6% (95% CI: 7.6%-7.6%), and code sharing in 1.4% (95% CI: 1.4%-1.4%) of the articles. More than 76.0% adhered to at least two transparency practices, while full adherence to all five practices was less than 0.02%. The data showed an increasing trend in adherence to transparency practices since the late 2000s. COI and funding were disclosed more often in lower impact factor journals whereas protocol registration and data and code sharing were more prevalent in higher impact factor journals (all had P-values<0.001). Also, articles that did not disclose their COI had higher median citations. Among all fields, Rheumatology (97.2%), Neuroimaging (94.6%), Anesthesiology (32.4%), Genetics & Heredity (36.7%), and Neuroimaging (12.5%) showed the highest level of transparency in COI and funding disclosure, protocol registration, and data and code sharing, respectively. Whereas Medicine, Legal (61.5%), Andrology (59.0%), Materials Science, Biomaterials (0.3%), Surgery (1.5%), and Nursing (<0.01%) showed the lowest adherence.
Conclusion While most articles and fields had a COI disclosure, adherence to other transparent practices was far from acceptable. To increase protocol registration, data, and code sharing, much stronger commitment is needed from all stakeholders.
Background
Recent recognition of health research transparency, essential for accountability, has resulted in stringent disclosure requirements by academic, medical institutions, and voluntary industry and publisher codes (1,2). Although global regulatory bodies, funding agencies, and ethics boards supervise medical research, transparency and disclosure practices are still inconsistent and incomplete (3,4). Research conducted by industry, academia, or their collaboration equally has deficiencies in transparency (2,5). Despite publishers’ policies, ethical mandates, and mission statements (6), academic medical centers show poor performance and significant variation in disseminating clinical trial results following transparency practices (7). Transparency in research is pivotal for accountability and trust in results and upholds the ethical responsibilities of researchers, editors, publishers, and funders (8–12). Inefficient use of primarily public or non-profit research funding significantly disadvantages patients and society.
Key indicators of research transparency include data sharing, code sharing, disclosures of conflicts of interest, funding acknowledgments, and protocol registration (13). Data sharing is more common in non-COVID-19 articles (12%) than in COVID-19 studies (4%) (14). A systematic review of 105 meta-research studies, analyzing 2,121,580 articles across 31 specialties, uncovered substantial transparency challenges (15). Issues include low declared (8%) and actual (2%) public data availability, minimal public code sharing (<0.5%), and inconsistent journal data-sharing policy adherence. The review also noted discrepancies between declared and actual data sharing practices and challenges in privately obtaining data and code from authors (15). These findings highlight the pressing need to enhance transparency, particularly during public health crises.
Our meta-research study assessed research transparency regarding data and code sharing, conflict of interest (COI) and funding disclosure, and protocol registration across all medical specialties. Through programmatic and comprehensive analysis, we aimed to identify patterns and areas most needing improvement.
Methods
The protocol of this descriptive study was published on the Open Science Framework (OSF) website (https://doi.org/10.17605/OSF.IO/J57BG). All the code and data associated with the study were shared through both its OSF repository (https://osf.io/zbc6p/) and GitHub (https://github.com/choxos/medical-transparency). To ensure transparency and facilitate the reproducibility of our analyses, a PDF document containing the codes and corresponding outputs is provided in Appendix 1.
Data sources and study selection
Initially, we searched records within journals listed in the 59 fields of the “Clinical Medicine” section in the Science Citation Index Expanded (SCIE) version 2020. This search was performed using the Europe PubMed Central (EPMC) database until February 28, 2022. We updated the search on March 16, 2024. The EPMC database encompasses all records found in PubMed and PubMed Central records and allows automated retrieval of full-texts of EPMC open-access records. Remarkably, what is being called “EPMC open-access articles” do not include all “open-access” labeled articles (e.g., by the journals/publishers) because some of those articles are still subject to traditional copyright restrictions. Thus, their full texts cannot be accessed via EPMC.
Using the metareadr package (16), we retrieved the full texts of all identified open-access records in XML format from the EPMC database. Concurrently, we extracted descriptive details for each journal and article, including publisher, publication year, and citations linked to the article and journal, directly from the EPMC database. We categorized articles using the EPMC variable “publication type”.
Data extraction and synthesis
We used the rtransparent package (17), a validated and automated programmatic tool (13), to identify five transparent practices from the full texts we were able to download from EPMC:
Data sharing: The accessibility of data or metadata obtained during a research study, typically through public repositories or inclusion as supplementary materials accompanying the published work.
Code sharing: The disclosure of computer code or scripts employed for data analysis in research facilitates the replication of results and enables the broader utilization of the study’s methodologies.
Conflict of interest (COI) disclosures: The public acknowledgment of potential conflicts of interest that could impact the research, commonly presented within a designated publication section.
Funding disclosures: The disclosure of the sources of financial support for the research, promoting transparency regarding the possible influence of funding organizations.
Protocol registration: The public disclosure of research protocols before conducting a study designed to reduce bias and increase transparency in the research process.
The package uses a standardized vocabulary to identify transparency indicators in EPMC XML files. It detects keywords related to COI disclosure, such as “conflicts of interest,” “competing interests,” or “nothing to disclose,” in article section titles or bodies. The tool recognizes all mentions of COI and funding disclosures and treats “nothing to disclose” statements as an indication of transparency, similar to actual conflict disclosures.
To assess data and code sharing, the rtransparent tool detects materials that are shared either as supplemental content, in general repositories (e.g., figshare, OSF, GitHub), or in field-specific repositories (e.g., dbSNP, ProteomeXchange, GenomeRNAi). Items that state “data available upon request” are not considered data sharing due to the unlikelihood of data acquisition (18). The rtransparent tool demonstrates robust validation, with high sensitivity and specificity for detecting transparency indicators: conflict of interest disclosure (sensitivity 99.2%, specificity 99.5%), funding disclosure (sensitivity 99.7%, specificity 98.1%), protocol registration (sensitivity 95.5%, specificity 99.7%), data sharing (sensitivity 75.8%, specificity 98.6%), and code sharing (sensitivity 58.7%, specificity 99.7%) (13).
Data analysis
First, we computed the percentage of articles with full-texts available via EPMC (EPMC open-access records) out of the total number of articles within the database. Alongside providing descriptive statistics for the obtained sample, we reported adherence to transparency practices categorized by publication type. We also determined and reported the number of transparency practices that articles with available full-text adhered to within each publication type, ranging from 0 to 5 practices. Furthermore, we charted differences in transparency practices between 59 distinct fields over time. The sensitivity and specificity of the rtransparent tool (17) were used to generate 95% confidence intervals (CIs) for the estimates of the transparency practices. We used visual presentation and Pearson’s product-moment correlation to analyze the yearly adherence trend to transparency practices. We used the Wilcoxon rank-sum test to test the statistical significance of the relationship between transparency indicators and journal impact factor or received citations. We also used a random intercept generalized linear model to investigate the trend of adherence to transparency practices among different fields.
Results
General characteristics
As of March 16, 2024, EPMC contained 17,694,287 articles. Of which, 2,189,542 (12.4%) records had full text available for download. Of those, 41,335 were published before 2000, 894,500 in the 2010s, and 1,157,514 after 2020. The articles came from 3,475 journals, led by the International Journal of Environmental Research and Public Health (n=61,041), Medicine (Baltimore) (n=35,6825), and Frontiers in Immunology (n=34,181).
Field-specific characteristics
On average, 89.6% (SD=8.82%) of the journals had at least one downloadable full-text in EPMC (Appendix 2). For full-text availability in EPMC, Tropical Medicine (39.8%, n=34,032), Medicine, Research & Experimental (33.2%, n=186,337), and Medicine, General & Internal (31.6%, n=233,385) led, while Audiology & Speech-Language Pathology (2.6%, 1,646 articles), Transplantation (3.3%, 4,987 articles), and Medicine, Legal (3.5%, 1,417 articles) had the lowest availability. See Appendix 2 for details.
Overall adherence to transparency practices
Of the analyzed full-text articles, 87.5% (95% CI: 87.4%-87.5%) disclosed a COI. Funding disclosures were present in 80.1% (95% CI: 80.0%-80.1%) of articles. Pre-publication registration occurred in 6.6% (95% CI: 6.6%-6.6%) and data sharing in 7.6% (95% CI: 7.6%-7.6%) of articles. Code sharing occurred in 1.4% (95% CI: 1.4%-1.4%). More than 76.0% complied with at least two transparency indicators, while less than 0.02% complied with all five.
Figure 1A shows the overall adherence to each transparency indicator. A visual analysis of the data since the end of the 2000s shows a steady increase in the proportion of articles that adhere to transparency practices (Figure 1B). Pearson’s product-moment correlation between publication year and adherence to transparency practices was highest for funding disclosure (0.716) and lowest for code sharing (0.521). All the P-values were <0.001 (Appendix 1).
COI and funding were disclosed more often in lower impact factor journals whereas protocol registration and data and code sharing were more prevalent in higher impact factor journals (all had P-values<0.001). Also, articles that did not disclose their COI had higher median citations (Table 1).
Transparency practices by fields
Conflict of Interest Disclosure
Rheumatology, with a 97.2% rate of COI disclosures, leads alongside Primary Health Care (96.4%) and Nutrition & Dietetics (95.8%). On the lower end, Medicine, Legal shows 61.5%, Toxicology 66.4%, and Neuroimaging 71.2% (Appendix 3). The 2010s marked a period of increasing COI disclosures across most fields (Appendix 4).
Funding Disclosure
The highest rates of funding disclosure are found in Neuroimaging (94.6%), Materials Science, Biomaterials (93.9%), and Audiology & Speech-language Pathology (92.6%). In contrast, Andrology reports 59.0%, Medical Laboratory Technology 59.5%, and Emergency Medicine 62.9% (Appendix 3). Funding disclosures, akin to COI disclosures, have shown an overall increase during the 2010s across various fields (Appendix 4).
Protocol Registration
Anesthesiology (32.4%) leads in protocol registration, followed by Rehabilitation (16.5%) and Rheumatology (14.1%), higher than Materials Science, Biomaterials (0.3%), Genetics & Heredity (0.9%), and Medical Laboratory Technology (1.0%) (Appendix 3). Anesthesiology, for example, has seen a steady increase, reaching up to 40%, contrasting with consistently low rates in fields like Immunology and Toxicology (Appendix 4).
Data Sharing
Leading in data sharing are Genetics & Heredity (36.7%), Neuroimaging (24.5%), and Virology (22.9%), while Surgery (1.5%), Orthopedics (1.5%), and Primary Health Care (1.6%) show the lowest rates (Appendix 3). Although there has been an increase in data sharing in certain fields, such as Neuroimaging (approximately 80% in 2022), these trends are lower compared to COI and funding disclosures (Appendix 4).
Code Sharing
In code sharing, Neuroimaging (12.5%), Genetics & Heredity (7.5%), and Medical Informatics (6.3%) report the highest rates, in contrast to negligible rates in Nursing, Dermatology, and Integrative & Complementary Medicine (Appendix 3). Code sharing remains relatively low compared to the higher prevalence of COI and funding disclosures, though an increasing trend is observed in specific fields (Appendix 4).
The random intercept generalized linear mixed-effects logistic models examining transparency indicators over time among different fields showed the highest random effects variance for code sharing (1.347) and the lowest for funding disclosure (0.345). Full details are available in Appendix 1 and 5.
Discussion
Our analyses of over two million full-text articles published in SCIE-indexed journals revealed high compliance with COI and funding disclosure in medical research since the early 2010s. However, data sharing, protocol registration, and code sharing remained low. A recent study highlights the critical role of best practices, such as pre-registration, in research (19). It demonstrates that adherence to these practices correlates with an 86% success rate in replication studies, significantly higher than the 50% success rate observed in some earlier replication efforts. Given the observed low rates of protocol registration, code sharing, and data sharing in our analysis, it is plausible that current research in these fields may face challenges in replicability, potentially falling short of the higher success rates associated with rigorous adherence to best practices.
To our knowledge, differences in transparency practices across a wide range of medical fields have not been studied before in this level of detail. Instead of focusing on a single transparency practice, such as data sharing, within one field of medicine or, e.g., its few highest-impact journals, the applied programmatic approach allowed us to estimate adherence to five transparency practices across a high number of studies across the range of medical fields (15). Previous research on data and code sharing has often taken a more generalized approach, focusing on overarching trends within the biomedical literature or narrowing their scope to specific fields or even individual journals (15). In contrast, our study delved into a detailed analysis, revealing substantial disparities, particularly in protocol registration and data or code sharing. These discrepancies likely mirror variations in research methodologies, reporting standards, publishing norms, peer review processes, and editorial practices prevalent across different medical domains. Conversely, our study indicates that COI and funding disclosures have been relatively prevalent since the mid-2000s. This suggests that achieving a high level of adherence to transparency practices could be attainable with concerted and universally applied efforts that have been behind promoting COI and funding disclosures (20). For instance, journals could adhere to Transparency and Openness Promotion Guidelines and guide authors accordingly (21). Recently, it was shown that it is possible to increase adherence to data and code sharing with such efforts, and thus subsequently increase the reproducibility of research substantially (22).
Even though we were able to use a large sample of articles and validated methods to estimate the prevalence of five transparency practices, our study also has some limitations. The analyzed articles represent, to some extent, a biased subset of all medical literature, even though earlier research has shown that there are no clear differences in these five transparency practices between articles with and without available full text in the EPMC (23). The applied methodology has not been validated for the whole time, the original validation covered the years 2015-2019 (13). We cannot be sure that it is equally valid/accurate within each of the 59 fields, because there may be some systematically different ways of registering protocols and sharing data or code, e.g. via some smaller and thus (uncoded) repositories/platforms. Furthermore, we did not manually evaluate the validity or accuracy of the disclosures, protocols, or data/code availability statements, but other studies have found similar results: these are frequently suboptimal (24,25). The articles in our sample included a wide range of research, some of which may not have necessitated using any data or code. Consequently, we acknowledge that achieving a 100% adherence rate in data and code sharing would not be a realistic expectation. For instance, fields such as “Medical Ethics” or “Medicine, Legal” may predominantly comprise qualitative research papers, which may not inherently involve creating or utilizing datasets or codes. Instead, in fields like “Genetics & Heredity” data-intensive research is likely more common. Additionally, the issue of protocol registration introduces some complexity because there is no consensus regarding the necessity of pre-registration for certain study types, such as explorative (as opposed to hypothesis-testing) or qualitative research. These discrepancies likely drive the observed variations across the 59 distinct fields. Consequently, it is essential to interpret the results with caution and make comparisons with an understanding of these inherent variations across the diverse spectrum of medical research.
Compared to all articles with available full-text in the EPMC database (13), the analyzed subsample of articles from SCIE-indexed journals showed higher adherence to COI and funding disclosures and protocol registration, but equal adherence to data and code-sharing. It seems these disparities primarily stem from historical trends. Specifically, the SCIE-indexed articles exhibited higher levels of compliance with these practices in earlier years. Interestingly, a shift in adherence patterns becomes apparent when we examine articles from the 2020s. In 2020, the sample comprising all articles with available full-text in the EPMC database demonstrated greater adherence to data and code sharing, but lower adherence to protocol registration, COI disclosures, and funding disclosures when compared to our findings of the SCIE-indexed articles in 2022 (in parenthesis): 15% (9%) for data sharing, 3% (2%) for code sharing, 90% (84.0%) for COI disclosures, 85% (77.1%) for funding disclosures, and 5% (9%) for protocol registration. Menke et al. have investigated the presence of protocols and data and code sharing in all articles with full-text in the EPMC database with another automated tool (26). They found a similar constant trend in code-sharing adherence than we did. However, they found a more pronounced increase in and higher adherence to protocol registration (19%) and data sharing (17%) in 2020 than our findings (9% and 9%) indicate, potentially attributable to differences in these automated methods.
In conclusion, our study reveals that key transparency practices such as data sharing, protocol registration, and especially code sharing continue to be notably scarce in medical research. While adherence to COI and funding disclosures is commendable, the limited adoption of these crucial practices across diverse medical fields remains a significant concern. The recent findings highlighting a high replication success rate with rigorous transparency, underscore the vital need for the universal adoption of such practices (19,22). We urge researchers, journal editors, and policymakers to advance these practices by advocating for the standardization of protocol registration and open data/code sharing across all medical fields. Such a collective commitment is essential to enhance the integrity, reliability, and impact of medical research, ultimately benefiting the global health community.
Data Availability
All the code and data associated with the study were shared through both its OSF repository (https://osf.io/zbc6p/) and GitHub (https://github.com/choxos/medical-transparency) when the manuscript was submitted.
Acknowledgments
The computational analyses were performed on servers provided by UEF Bioinformatics Center, University of Eastern Finland, Finland. Uribe was supported by the European Union’s Horizon 2020 grant 857287 for the Baltic Biomaterials Centre of Excellence, Headquarters at Riga Technical University, Riga, Latvia.
Footnotes
Conflict of interest disclosure: The authors have no conflicts of interest to disclose.
Funding disclosure: This study did not receive any funding.