medRxiv provides free and unrestricted access to all articles posted on the server. We believe this should apply not only to human readers but also to machine analysis of the content. A growing variety of resources have been created to facilitate this access.
medRxiv metadata are made available via a number of dedicated RSS feeds and APIs. Simplified summary statistics covering the content and usage are also available.
Bulk access to the full text of medRxiv articles for the purposes of text and data mining (TDM) is available via a dedicated Amazon S3 resource. This is intended for bulk TDM, which authors explicitly consent to during submission to medRxiv and is consistent with fair use doctrine applied to US copyright law. The TDM repository is not intended as a source for further redistribution of articles posted on medRxiv, or their derivatives, nor does it grant others permission to re-host content posted on medRxiv. For most articles submitted to medRxiv, authors retain copyright and reuse rights. If you build indexing services or tools based on the full text of articles, you must therefore link back to the text hosted at medRxiv rather than re-host content. For reuse/redistribution of individual articles or their derivatives, please consult the licensing terms applied by the authors, which are provided in the metadata. In most cases, this will require you to contact the copyright holder in advance to obtain permission.
Full-text access via Amazon S3 is available via a requester pays bucket. The charges are not a source of revenue for medRxiv but are intended to ensure that costs are covered by the user and that medRxiv cannot incur unpredictable expense as a consequence of errant or abusive access of the service. In the vast majority of cases, the fees levied by Amazon on users will be minimal. medRxiv reserves the right to restrict access to individuals/services who breach the licensing terms selected by authors.
The full set of processed PDF and XML files from medRxiv is deposited each month with delivery completing typically a few days into the new month. An AWS account is needed to access the files. You may define an IAM account within your AWS account and use the access key and secret access key for that account, or alternatively define an access and secret key for the root AWS account itself. The medRxiv AWS bucket is located in the US East (N. Virginia) us-east-1 region and is accessible at s3://medrxiv-src-monthly.
Using, for example, the s3cmd utility, the following should list the bucket contents:
s3cmd ls s3://medrxiv-src-monthly --requester-pays
The --requester-pays parameter is necessary in order to gain access. This setting enables AWS charges associated with the request and any download activity to be assigned to the account making the request. Detail about Requester Pays Buckets is available within the Amazon S3 guide. Please consult the AWS S3 pricing guide for information about Amazon's rates for data retrieval.
The bucket contains two virtual folders, one containing “Back_Content” and the other “Current Content.” The Back_Content folder comprises a set of folders (“Batch_[nn]”) that contain batches of past manuscript files loaded to the bucket in late 2020. The Current Content folder contains folders of content loaded since then and named with the respective month (“February_2021,” etc.). The monthly folders contain either manuscript files or numbered batch files containing manuscript files. The manuscript files, although named with extension “.meca,” are zip files. Each zip package includes a set of informational files and a folder named “content.” The informational files include “manifest.xml,” which has tagged entries for the manuscript title and for each of the files contained in the “content” folder, and three extraneous files (the “directives” and “transfer” XML files and a mime-type file). The "content" folder has files containing the manuscript content: a PDF file, a full-text XML file, and image and supplementary files if those were supplied by the authors.