PT - JOURNAL ARTICLE AU - Filiot, Alexandre AU - Ghermi, Ridouane AU - Olivier, Antoine AU - Jacob, Paul AU - Fidon, Lucas AU - Mac Kain, Alice AU - Saillard, Charlie AU - Schiratti, Jean-Baptiste TI - Scaling Self-Supervised Learning for Histopathology with Masked Image Modeling AID - 10.1101/2023.07.21.23292757 DP - 2023 Jan 01 TA - medRxiv PG - 2023.07.21.23292757 4099 - http://medrxiv.org/content/early/2023/09/14/2023.07.21.23292757.short 4100 - http://medrxiv.org/content/early/2023/09/14/2023.07.21.23292757.full AB - Computational pathology is revolutionizing the field of pathology by integrating advanced computer vision and machine learning technologies into diagnostic workflows. It offers unprecedented opportunities for improved efficiency in treatment decisions by allowing pathologists to achieve higher precision and objectivity in disease classification, tumor microenvironment description and identification of new biomarkers. However, the potential of computational pathology in personalized medicine comes with significant challenges, particularly in annotating whole slide images (WSI), which is time-consuming, costly and subject to inter-observer variability. To address these challenges, Self-Supervised Learning (SSL) has emerged as a promising solution to learn representations from histology patches and leverage large volumes of unlabelled WSI. Recently, Masked Image Modeling (MIM) as a SSL framework has emerged and is now considered to outperform purely contrastive learning paradigms. In this work, we therefore explore the application of MIM to histology using iBOT, a self-supervised transformer-based framework. Through a wide range of 17 downstream tasks over seven cancer indications, both at the slide and patch levels, we provide recommendations on the pre-training of large models for histology data using MIM. First, we demonstrate that in-domain pre-training with iBOT outperforms both ImageNet pre-training and a model pre-trained with a purely contrastive learning objective, MoCo v2. Second, we show that Vision Transformers (ViT) models, when scaled appropriately, have the capability to learn pan-cancer representations that benefit a large variety of downstream tasks. Finally, our iBOT ViT-Base model (80 million parameters), pre-trained on more than 40 million histology images from 16 different cancer types, achieves state-of-the-art performance in most weakly-supervised WSI classification tasks compared to other SSL frameworks available in the literature. This paves the way for the development of a foundation model for histopathology. Our code, models and features are publicly available at https://github.com/owkin/HistoSSLscaling.Competing Interest StatementAll authors are employees of Owkin, Inc., New York, NY, USA.Funding StatementThis work was granted access to the HPC resources of IDRIS under the allocations 2022-AD011012519 and 2023-AD011012519R1 made by GENCI.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesI confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesThe results published in this work are partly based upon data generated by the TCGA Research Network (TCGA). All images and the associated clinical outcome for TCGA cohorts used in this study are publicly available at https://portal.gdc.cancer.gov/ and cBioPortal https://www.cbioportal.org/. Regarding the PAIP dataset, de-identified pathology images and annotations used in this research were prepared and provided by the Seoul National University Hospital by a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (grant number: HI18C0316). https://portal.gdc.cancer.gov/ https://www.cbioportal.org/ http://www.wisepaip.org/paip