PT - JOURNAL ARTICLE AU - Mallon, Ann-Marie AU - Häring, Dieter A. AU - Dahlke, Frank AU - Aarden, Piet AU - Afyouni, Soroosh AU - Delbarre, Daniel AU - El Emam, Khaled AU - Ganjgahi, Habib AU - Gardiner, Stephen AU - Kwok, Chun Hei AU - West, Dominique M. AU - Straiton, Ewan AU - Haemmerle, Sibylle AU - Huffman, Adam AU - Hofmann, Tom AU - Kelly, Luke J. AU - Krusche, Peter AU - Laramee, Marie-Claude AU - Lheritier, Karine AU - Ligozio, Greg AU - Readie, Aimee AU - Santos, Luis AU - Nichols, Thomas E. AU - Branson, Janice AU - Holmes, Chris TI - Advancing data science in drug development through an innovative computational framework for data sharing and statistical analysis AID - 10.1101/2021.02.16.21251799 DP - 2021 Jan 01 TA - medRxiv PG - 2021.02.16.21251799 4099 - http://medrxiv.org/content/early/2021/02/23/2021.02.16.21251799.short 4100 - http://medrxiv.org/content/early/2021/02/23/2021.02.16.21251799.full AB - Background Novartis and the University of Oxford’s Big Data Institute (BDI) have established a research alliance with the aim to improve health care and drug development by making it more efficient and targeted. Using a combination of the latest statistical machine learning technology with an innovative IT platform developed to manage large volumes of anonymised data from numerous data sources and types we plan to identify novel patterns with clinical relevance which cannot be detected by humans alone to identify phenotypes and early predictors of patient disease activity and progression.Method The collaboration focuses on highly complex autoimmune diseases and develops a computational framework to assemble a research-ready dataset across numerous modalities. For the Multiple Sclerosis (MS) project, the collaboration has anonymised and integrated phase II to phase IV clinical and imaging trial data from ≈35,000 patients across all clinical phenotypes and collected in more than 2,200 centres worldwide. For the “IL-17” project, the collaboration has anonymised and integrated clinical and imaging data from over 30 phase II and III Cosentyx clinical trials including more than 15,000 patients, suffering from four autoimmune disorders (Psoriasis, Axial Spondyloarthritis, Psoriatic arthritis (PsA) and Rheumatoid arthritis (RA)).Results A fundamental component of successful data analysis and the collaborative development of novel machine learning methods on these rich data sets has been the construction of a research informatics framework that can capture the data at regular intervals where images could be anonymised and integrated with the de-identified clinical data, quality controlled and compiled into a research-ready relational database which would then be available to multi-disciplinary analysts. The collaborative development from a group of software developers, data wranglers, statisticians, clinicians, and domain scientists across both organisations has been key. This framework is innovative, as it facilitates collaborative data management and makes a complicated clinical trial data set from a pharmaceutical company available to academic researchers who become associated with the project.Conclusions An informatics framework has been developed to capture clinical trial data into a pipeline of anonymisation, quality control, data exploration, and subsequent integration into a database. Establishing this framework has been integral to the development of analytical tools.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis paper is the output from the Novartis funded alliance with Oxford Big Data Institute. Novartis funded the design of the study and collection, analysis and interpretation of data, and in writing the manuscript.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:Not applicableAll necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesNot applicableBDIOxford’s Big Data InstituteMSMultiple sclerosisPsAPsoriatic arthritisRARheumatoid arthritisPsOPsoriasisaxSpAAxial spondyloarthritisCNSCentral nervous systemRMSRelapsing multiple sclerosisCISClinically isolated syndromeRRMSRelapsing-remitting multiple sclerosisSPMSSecondary progressive multiple sclerosisPPMSPrimary progressive multiple sclerosisEDSSExpanded Disability Status ScaleFLAIRFluid-attenuated inversion recoveryMRIMagnetic Resonance ImagingDMZDemilitarized zoneGDPRGeneral Data Protection RegulationEMAEuropean Medicines AgencyDICOMDigital Imaging and Communications FormatCROsClinical Research OrganisationsJSONJavaScript Object NotationNIfTINeuroimaging Informatics Technology InitiationsBIDSBrain Imaging Data StructureETLExtract, Transform and Load