A data management system for precision medicine
===============================================

* John J. L. Jacobs
* Inés Beekers
* Inge Verkouter
* Levi B. Richards
* Alexandra Vegelien
* Lizan D. Bloemsma
* Vera A. M. C. Bongaerts
* Jacqueline Cloos
* Frederik Erkens
* Patrycja Gradowska
* Simon Hort
* Michael Hudecek
* Manel Juan
* Anke H. Maitland-van der Zee
* Sergio Navarro Velázquez
* Lok Lam Ngai
* Qasim A Rafiq
* Carmen Sanges
* Jesse Tettero
* Hendrikus J. A. van Os
* Rimke C. Vos
* Yolanda de Wit
* Steven van Dijk

## Abstract

**Introduction** Precision, or personalised medicine has advanced requirements for medical data management systems (MedDMSs). MedDMS for precision medicine should be able to process hundreds of parameters from multiple sites, be adaptable while remaining in sync at multiple locations, real-time syncing to analytics and be compliant with international privacy legislation. This paper describes the LogiqSuite software solution, aimed to support a precision medicine solution at the patient care (LogiqCare), research (LogiqScience) and data science (LogiqAnalytics) level. LogiqSuite is certified and compliant with international medical data and privacy legislations.

**Method** This paper evaluates a MedDMS in five types of use cases for precision medicine, ranging from data collection to algorithm development and from implementation to integration with real-world data. The MedDMS is evaluated in seven precision medicine data science projects in prehospital triage, cardiovascular disease, pulmonology, and oncology.

**Results** The P4O2 consortium uses the MedDMS as an electronic case report form (eCRF) that allows real-time data management and analytics in long covid and pulmonary diseases. In an acute myeloid leukaemia study data from different sources were integrated to facilitate easy descriptive analytics for various research questions. In the AIDPATH project, LogiqCare is used to process patient data, while LogiqScience is used for pseudonymous CAR-T cell production for cancer treatment. In both these oncological projects the data in LogiqAnalytics is also used to facilitate machine learning to develop new prediction models for clinical-decision support (CDS). The MedDMS is also evaluated for real-time recording of CDS data from U-Prevent for cardiovascular risk management and from the Stroke Triage App for prehospital triage.

**Discussion** The MedDMS is discussed in relation to other solutions for privacy-by-design, integrated data stewardship and real-time data analytics in precision medicine.

**Conclusion** LogiqSuite is used for multi-centre research study data registrations and monitoring, data analytics in interdisciplinary consortia, design of new machine learning / artificial intelligence (AI) algorithms, development of new or updated prediction models, integration of care with advanced therapy production, and real-world data monitoring in using CDS tools. The integrated MedDMS application supports data management for care and research in precision medicine.

MESH
* precision medicine
* personalized medicine
* personalised medicine
* P4 medicine
* medical data management system
* LogiqSuite

## Introduction

Standardisation of medical practice has yielded great progress for modern medicine, but the number of new drugs approved per billion US dollars spent on research and development has halved roughly every nine years since 1950 in inflation-adjusted terms, implying that the efficacy of medical progress has gone done by a factor 80 (1–4). Animal models of disease lack human disease variations (5,6). Precision, or personalised medicine (7) promises significant therapeutic improvements by prediction, prevention, personalisation or stratification, and participation of patients (8,9). It depends on distinguishing different disease mechanisms at a detail level where animal models have limited predictive value.

### Medical data management for precision medicine

Precision medicine studies clinically relevant i.e., large effects, allowing limited clinical studies (2,10) and patients stratification in small groups (11,12). Medical data science projects increase in complexity due to the evaluation of different biomarkers, either separately or in numerous combinations (13,14). While clinical trials in conventional medicine often limit their criteria by exclusion of potential comorbidities, precision medicine aims to include real-world data as relevant cases of human disease (15–18). Statistical analyses combining collections of real-world data (19,20) and clinical trial recordings (21) challenges the prerequisites for medical data management systems (medical DMSs; MedDMSs). Appropriate tools, like MedDMSs for medical data science on stratified interventions are needed for precision medicine (22,23) and lack of these lack slows down the transition to precision medicine (24). The MedDMSs are also crucial to in data collection for the mathematical development of artificial intelligence (AI) algorithms (25).

Precision medicine defines new requirements for MedDMSs. In traditional medicine, a MedDMS has dozens of parameters from a single site. MedDMSs for precision medicine should be able to process hundreds or thousands of parameters to allow distinguishing and discrimination between various forms of disease with different mechanisms and/or requiring different interventions. It is unlikely that a single site will have relevant numbers for all stratified subgroups, thus the MedDMS should be able to integrate data from multiple sites, while maintaining compliant with privacy legislation, e.g. the EU General Data Protection Regulation (GDPR). The integration should include data validation and coordination for progressive insights.

These requirements together point towards a centrally governed cloud solution that integrates data from multiple sites, while maintaining the appropriate privacy level and access on a need-to-know basis for each subset of data.

### Requirements for a MedDMS for precision medicine

Privacy protection legislations, like the GDPR, call for privacy-by-design and fine-tuned data access (26). Distinct medical conditions and research areas need different data models. Data models should be plastic for progressive insights and robust for long term data storage and management. Medical data are inherently complex with static data (e.g. sexes), data with a defined start and/or end date (e.g. diagnosis), repeated lab test results (e.g. haemoglobin), different kinds of sequence variations (e.g. mutations, indels, translocations), immunological cytokine profiles (27), and systems biology (28,29).

We are at the eve of implementation of precision medicine (30), but enhanced data management systems (DMSs) are needed for precision medicine (31,32). Real-world data are important both for initial model development, but also for real-world feedback in the process of continuous learning for continual improvement of the models. Most drugs and other interventions lack data for real-world evidence, beyond the controlled clinical trials, as they are hard to capture in current medical practice (19). Clinical studies are mostly performed under ideal circumstances with patients that fit the inclusion and exclusion criteria, which often implying a single disease without comorbidities, while real patients mostly have multiple comorbidities. Real-world data without tight inclusion and exclusion criteria increase the complexity of data.

When data become more complex, statistical rules request more data, making data sharing crucial for precision medicine. The GDPR allows the use of anonymous data in research, and pseudonymous data when patients give their consent. Medical data should be sharable between studies, implying that data should be findable, accessible, interoperable, and reusable (FAIR) (33,34). Data dictionaries allow FAIR data dictionaries to added to the data in LogiqSuite. In some European countries, patients should give informed consent to every scientific study that is performed with their data. Based on either opt-in or opt-out, it should be possible to include or exclude data from analytics.

Data quality is the heart of data science (35). A developed mathematical prediction model is unlikely to be of better quality than its data input. Handling big and complex medical data are challenges for MedDMSs and thus for implementation of precision medicine (36–39). Data quality starts with input validation (40). The MedDMS should also have dynamic forms to minimise redundant question (41). It should have reports for user feedback, be integrated in care settings for real-world data, as well as research settings for trial data. Moreover continuous improvement requires data integration with clinical-decision support (CDS) tools for precision medicine.

In medical practice and research settings, responsible physicians, and investigators, respectively are assigned to cases, but their departments are also involved. Medical specialists often have authorised crosstalk between different cases of their patients, with the regulated rights for data viewing authorisation at the case level to avoid unnecessary complexity.

### Validation of the MedDMS for precision medicine

From these requirements, we set up LogiqSuite, a MedDMS to facilitate precision medicine in multi-centre research studies, data analytics, AI development of prediction models, integration of care and research, and real-world data monitoring in using CDS tools. We describe five different use cases of LogiqSuite in seven topics of oncology, cardiovascular medicine, pulmonology, and pre-hospital triage. The use cases included fusing datasets, study monitoring, integrating care and research, development, and the implementation of a MedDMS. In different use cases, the usability of LogiqSuite as a MedDMS for precision medicine is evaluated.

## Materials & methods

### Concept

Our MedDMS is an integrated solution dubbed LogiqSuite, consisting of LOGIQ applications built on logic to LOG data Intelligently and Quantitatively. The concept is to record medical case data at the source to enable medical data science for precision medicine at the levels of patients, subjects, and data analytics. LogiqCare uses directly identifiable patient data to avoid patients being mix-up in clinical care. LogiqScience uses pseudonymised subject data for scientific research, laboratory, and other activities, where sample identity is crucial but direct identification of patients is undesirable. LogiqSuite allows seamless integration of LogiqCare and LogiqScience to aid collaboration between physicians and researchers. The data of LogiqCare and LogiqScience are synced to LogiqAnalytics for data analytics, in real-time, while the data are deidentified and depersonalised by controlling and minimising identifiable information. Separating LogiqAnalytics from the LogiqCare and LogiqScience database allows to filter out personal identifiers as well as to filter data not that should not be used for analytics (e.g., when records with unreliable or unconsented data) or data that do not conform to the need-to-know basis for analytics.

### Generic Design

All data handling and storage is encrypted in LogiqSuite according to the GDPR, ISO27001, and Dutch healthcare regulatory guidelines i.e., NEN7510 (42). LogiqSuite is a robust cloud-native solution in Microsoft Azure cloud. Data are restricted to be handled and stored in the European Economic Area. The solutions have a web interface accessible with current versions of Google Chrome, Mozilla Firefox, Microsoft Edge, and (limited) on Safari for iOS and macOS. LogiqSuite is built using modern software development best practices (continuous integration and delivery, using a test, pilot, and production deployment, automated tests, various checks, and balances on quality). New developments are delivered under a feature toggle which allows for quickly providing new functionality or security fixes.

Data isolation can be achieved by running in a multitenant environment (logical separation) or by having a dedicated environment per project (physical isolation). Backups are done automatically for both point-in-time restore (seven days into past) and long-term full backups (weekly are retained for 30 days, monthly for 365 days). Scalability is possible in the compute, data, and web layers to cope with any workload. Specific details can be found in the technical documentation (43).

### Technical details

LogiqSuite uses standard protocols: REST API for querying data and data ingress (migration), CSV/JSON serialisation for import/export of data, TLS 1.2 for encryption of communication channels, and OIDC or SAML for integrations with Identity Providers (e.g. Active Directory) (Figure 1). Full details can be found at the technical paper (44).

![Figure 1.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/02/11/2024.02.09.24302600/F1.medium.gif)

[Figure 1.](http://medrxiv.org/content/early/2024/02/11/2024.02.09.24302600/F1)

Figure 1. Data flows in LogiqSuite.
In brief, data are stored in read-, and write-model data bases, where the write-model is the single source of true, and the read a fast cache of aggregated data.

Data are stored in LogiqSuite as a stream of immutable events, which allows historical reconstruction. These events are communicated over a shared message bus that any internal service can subscribe to. For example, the service that is responsible for the web site maps the events to a database that is optimised for querying data. A service from LogiqAnalytics can take the same events and map them to a database specially tailored for a specific use case. Notably, sensitive personal identifiable information (PII) can be removed during this mapping, ensuring that research data are always anonymous. All data streams in LogiqSuite are evaluated in real-time and maintain continually in sync.

LogiqSuite is developed according to privacy-by-design principles and data sharing on a need-to-know basis, using the cloud-native capabilities of Azure (App Services, Azure SQL, Cosmos DB, Vault, Service Bus, etc). LogiqSuite relies on Auth0 for authentication-as-a-service (45) i.e., to establish connections with customer identity providers in a reliable and secure manner. Customers can also connect using just their own account, without the need for an enterprise connection. Multifactor authentication (MFA) is enforced for individual users.

LogiqAnalytics leverages the Microsoft Power BI platform to allow users to collaborate on datasets and create insightful dashboards. Users of the same project can share dashboards using the Microsoft Teams environment.

Projects that use the Sciencrew platform for publishing content enjoy seamless integration in LogiqSuite. LogiqSuite is continuously improved (46). LogiqCare & LogiqScience provide a set of REST APIs for integration with third party systems for data accessibility (47).

### Case-centred data in LogiqCare and LogiqScience

LogiqCare and LogiqScience provide different access to case data of patients and subjects, respectively. Depending on the users’ roles, directly identifiable or pseudonymous data are hidden, viewable, or editable. Depending on the implementation use case, patients, and subjects can be coupled, e.g. for data science collaboration between care and research projects. The central concept is the *case*, which can be fully customised for a certain diagnosis. Users will have worklist for the tasks to be done at their group or department.

Generally, writing and viewing access of data is mostly authorised at the *case* level, analogous to medical practice. The *patient* or the *subject* is the identifier for the person to which the *case* belongs. Organising data in *cases* allows collaborations between medical departments, like pulmonology and oncology for cases with comorbidity of asthma and oncology, respectively. *Cases* may contain *consultations* to structure data in anamnesis, direct measurements, and questionnaires send (in)directly to the patient. *Cases* may also contain *test cascades* to collaborate between departments, like a pulmonology department requesting blood tests from haematology. These test cascades will orchestrate the appropriate rights to view and edit, as well as the workload list of various departments. *Test requests* are generated by the appealing departments and the *test results* and/or *test conclusions* are entered by the performing departments. Data management can be organised over the boundaries of any single organisation, as LogiqSuite is fully cloud based and can allow multiple organisations to collaborate under precise restrictions. This collaboration could be sharing data at the *case* level or within *test cascades*.

Descriptions in the user interface and data fields can be translated to the user’s preferred language, e.g. to avoid translation of physician-patient interactions, and patient’s language, e.g. in the case of questionnaires sent directly to patients. When entering choices of options, the user interface is switch to the user language (e.g. Dutch), while syncing the appropriate English translation to LogiqAnalytics for data science purposes.

### Solution design

The LogiqSuite application is open to external interactions to other applications, like clinical decision support tools (e.g., U-Prevent, which is also built by Ortec), and has open API for third party apps (Figure 2). Data communication can be done using *patient, subject*, *case,* or *test* identifiers.

![Figure 2.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/02/11/2024.02.09.24302600/F2.medium.gif)

[Figure 2.](http://medrxiv.org/content/early/2024/02/11/2024.02.09.24302600/F2)

Figure 2. LogiqSuite landscape.
LogiqSuite (blue) with interactions to other solutions from ORTEC (orange) and external sources (green). In brief, LogiqCare allows recording of (a) Patient data in LogiqCare, and (b) Subject data in LogiqScience, with (c) Cases as a central database. Case data are (d) synced to LogiqAnalytics for real-time descriptive statistics, and (e) ready-to-use for medical math, using AI to develop new prediction models. Prediction models can be (f) productised for risk predictions for clinical-decision support, that are (g) coupled with the central cases. Central cases also allow (h) open API communication with third party apps.

### Data transfer access

Data from other databases can be mapped and imported into LogiqSuite by extract – transform – load (ETL) procedures. The ETL process allows verifiable, reliable, and repeatable data import from different sources into LogiqSuite. During the Transform step of ETL, data curation was also performed on the data from the various sources.

### Data science in LogiqAnalytics

LogiqAnalytics consists of a SQL database with numerical data and predefined choice options. All relevant data are real-time synced to LogiqAnalytics. Some data might enhance personal identifiability, like dates of birth and visits, and free text. In analytics the dates are converted to numerical data like age at diagnosis or duration of disease. Free texts are not suitable for analytics, so data intended to be analysed should be grouped into option prior data entry. The database structure in LogiqAnalytics can be configured to the needs of the project using SQL view tables, e.g. mapped into a relevant data model structure. This table structure also allows different parties to see different domains of data.

Additionally, it supports proper metadata documentation by an integrated data dictionary. The deidentified data are available in real-time for descriptive analytics in generated reports with Microsoft Power BI, Excel, SPSS, and other applications. The data can also be used for advanced analytics and AI-powered model development.

## Results

Although biomedical research has some generic principles, it also has a shear infinite number of putative solutions. Implementation in different projects is preceded by customization and regulation of user access.

### Customisation process

DMSs for Personalized medicine should be flexible to allow the rapid creation of complex databases for collaboration of many scientific groups. Database design should be plastic and robust for progressive insights and historical consistency. Upon saving data, templates are used to build entities, that warrant data integrity independent of later database changes. The databases are configured using data templates for *cases*, *consultations*, and *test request*s, *test results*, and *test conclusions*. These are fully designable using standard building blocks in LogiqCare and LogiqScience.

Users lead the customisation process by defining their templates in an Excel file, supported by a medical data scientist of the LogiqSuite team (Figure 3). The medical data scientists provide crucial input for data stewardship before data collection to facilitate real-time data analytics. After verification by a medical data scientist, the Excel template is converted to a Logiq template on the pilot environment. User interface of templates could be translated to any language e.g., for international studies. The templates function as data-entry screens in the pilot environment. Next, template validation is done by an appointed user. After customer approval, the template is made available on the production environment. Changes could be implemented rapidly, e.g. if needed within a day, while maintaining the careful multistep process.

![Figure 3.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/02/11/2024.02.09.24302600/F3.medium.gif)

[Figure 3.](http://medrxiv.org/content/early/2024/02/11/2024.02.09.24302600/F3)

Figure 3. Template customisation process.
In brief, (a) medical data scientist (MDS) provides a template to designated (b) medical researcher (MRes) who will design the desired data flows with MDS advice. When finalised, (c) the MDS will run the python script to upload the template in the Pilot environment, where (d) the medical researcher will validate the template, for (e) improvements or (f) to instruct the MDS to (g) transfer the template to the production environment.

In the LogiqAnalytics database data could be regrouped if desired for certain analytics, but in general the structure of the LogiqCare and LogiqScience template is used. Data can be enriched with a user-defined data dictionary, to give meaning to values for analysis, and to facilitate FAIR data exchange.

### Regulation of user access

Access control is supervised using a documented four-eyes principle i.e., the person granting access is not the same person as executing access. Project leaders assign the persons to decide on role-(RBAC) and attribute-based access control (ABAC), reading and writing rights for a maximum of one year per request. The requests are logged automatically for traceability and executed manually by ORTEC to avoid abuse.

LogiqSuite provides its own Azure AD as Identity Provider (IdP) to which members of another organisation can be invited using federation identity. This implies that company policies for data access apply. Authenticating to the LogiqSuite Azure AD requires an additional MFA, that might be merged into the external Azure AD’s authentication policy. Single sign-on is supported. For individual users, LogiqSuite can provide its own Azure AD as IdP, equipped with MFA that can be enforced.

User access is granted by the project lead and executed by ORTEC to obey the four-eyes principle. Only specific ORTEC users have roles with exclusive rights, like Admin, to manage accounts, departments, and groups, and Configurator, to edit studies and templates for data, e-mails, and reports. Roles for customer users are constrained to User and Viewer, limited by attributes for accounts, departments, and Care and/or Science. To avoid unintended interactions between functions, distinct RBAC and ABAC are separated for each task. After login, LogiqSuite users with multiple functions can switch between their active functions with distinct RBAC and ABAC, to avoid unintended control due to combining functions, e.g. an editing right in function A should not change a viewing right in function B.

### Evaluation of precision medicine use cases

LogiqSuite has three different privacy levels i.e., directly identifiable, pseudonymous, and (shear) anonymous for care, research (science), and analytics, respectively. This allows users to select their desired balance between protection of patient and data safety.

In seven different biomedical projects we could identify five different use-cases for precision medicine: (I) reorganisation and integration of data analytics for descriptive statistics in oncology, (II) real-time monitoring of a multicentre clinical study for pulmonary disease, (III) integration of clinical patient data with GMP data in oncology, (IV) AI development of for clinical decision support (CDS) models in oncology, and (V) integration of real-world data with CDS in cardiovascular risk management and prehospital triage.

#### I. Integration of databases for analyses

Different databases contained parts of the clinical follow-up data on minimal residual disease (MRD) from Acute Myeloid Leukaemia (AML) (48). In this research project, data was available in different databases from haemato-oncology of Adults in the Netherlands (HOVON) (49) and local databases built by the Amsterdam UMC, which had data on patient-survival follow-up and follow-up leukaemia-aberrant immune phenotypes (LAIPs), respectively (Figure 4). A common data model was crafted combining the structure of these databases in a subject-case structure with consults for direct analysis, and test cascades with test-requests for samples, test-results for direct analyses, and test-conclusions for LAIP-specific data.

![Figure 4.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/02/11/2024.02.09.24302600/F4.medium.gif)

[Figure 4.](http://medrxiv.org/content/early/2024/02/11/2024.02.09.24302600/F4)

Figure 4. Extract-Transform-Load integration for AML.
In brief, (a) data was extracted from different databases, which were coupled with unique identifiers, (b) data transformation consisted of mapping to the LogiqSuite’s data structure and values, that were translated and curated before loading into LogiqSuite, (c) which are written and stored as F-sharp code, to allow tracking and reusing of the procedure, and (d) data are loaded in LogiqSuite and organized in Patient or Subject – Case – Consults & Test cascade structures.

A research MedDMS was built in LogiqScience, and the data were synced to LogiqAnalytics. The data consist of subject, the genetic characterisation of their AML, the characterisation, classification, and quantification of LAIPs, and therapy and survival data. Data was loaded using ETL from a csv file out of a query from the HOVON database and multiple local data management systems in Amsterdam UMC and GIMEMA in Rome.

#### II. Real-time monitoring of a multilocation clinical study

Clinical studies in precision medicine are often performed at multiple locations in different languages, while they ideally should be monitored, orchestrated, and managed at a central location in real time. The results are real-time synced for data analytics. The LogiqScience orchestrates the clinical study as a central electronic case report form (eCRF) in LogiqScience. Different attributes were used for ABAC to limit data access on a need-to-know basis, e.g. by only providing access to subjects of a user’s centre. Two additional institutes contributed data on lifestyle intervention and air pollution exposure, which had different roles limiting which subset of the Subject’s data could be accessed. Additionally, Test cascades allowed cooperation between departments while properly controlling data access on a need-to-know basis.

The Long Covid use case (50) belongs to the precision medicine for lung diseases (P4O2) research project, a multi-site clinical study for lung diseases for precision medicine, including a Long Covid study. Data was entered by five different inclusion centres directly into LogiqScience as an eCRF. The eCRF templates used by the researchers were designed in English, but the questionnaire templates for the patients were available in Dutch. The multilingual properties of LogiqCare and LogiqScience offer these possibilities, but LogiqAnalytics is synced in English only. Choices, dates, and numerical values were synced to LogiqAnalytics, allowing real-time monitoring of the progress in subject inclusion and data completeness by Power BI dashboards (Figure 5). Data on air pollution by ultrasonic personal aerosol sampler was transferred into the database by ETL. Researchers could monitor the data availability dashboard and provide real-time feedback to users, such that data could be completed in time. As a result of the real-time sync to LogiqAnalytics, as soon as the data of the last subject was entered into the eCRF, the analytics database was ready for analysis.

![Figure 5.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/02/11/2024.02.09.24302600/F5.medium.gif)

[Figure 5.](http://medrxiv.org/content/early/2024/02/11/2024.02.09.24302600/F5)

Figure 5. Early study dashboard P4O2 Covid showing trial progress state in April 2022.
In brief, data are synced in analytics, showing real-time study overview. Arrows point to missing data, which are indicated in grey. This information was used for real-time identification of missing data, and early corrective actions.

#### III. Integration of care and precision medicine production

The AIDPATH project aims to set up AI-powered decentralised production of advanced (CAR-T cell) therapies in hospitals all over Europe. Production of precision medicine or advanced therapeutic medicinal products, like the production of autologous CAR-T cells, requires integration of clinical care and good manufacturing practice (GMP) production data for advanced therapies. Using LogiqSuite, it is possible to integrate a patient’s clinical care data in LogiqCare with pseudonymised CAR-T cell production data in LogiqScience (Figure 6).

![Figure 6.](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2024/02/11/2024.02.09.24302600/F6.medium.gif)

[Figure 6.](http://medrxiv.org/content/early/2024/02/11/2024.02.09.24302600/F6)

Figure 6. Integration of care and GMP processes in AIDPATH.
In brief, UMC 1 has (a) care and (b) CAR-T cell productions of (c) autologous cells, coupling the products and (d) their quality control to the (e) patient treatment. Care will be (f) recorded in LogiqCare and (g) ATMP / GMP CAR-T cell production and (h) quality control in LogiqScience, which are (i) interconnected. If a clinical-decision tool is developed this can be (j) used to determine a desired CAR-T cell product from personal clinical data, and (k) provide this information to adapt the production process. Data within a case (l) contains consults and test cascades (Figure 4), (m) data from other institutes (UMC 2) is added to LogiqAnalytics and all data are gathered for (n) continual learning to improve the clinical-decision support tool.

The AIDPATH project consortium aims to have AI-guided production of autologous chimeric antigen receptor T-cells (CAR-T cells) distributed over various UMCs (51). In the AIDPATH project, two flows are combined, a clinical with patient data with a GMP production with subject data, allowing direct patient identification where needed, and preserving patient privacy where possible. Specific pretreatment patient data are crucial for CAR-T cell production and thus integrated in shared cases between patients and subjects. Some details are only needed for care or manufacturing purposes, so case data might be shared, but underlying consultations and test cascades could be unshared. This allows seamless collaboration between care and manufacturing, ensures the pseudonymised production data are unambiguously coupled to the patient, without sacrificing patient privacy (52).

The decentralised AIDPATH production system uses laboratory devices to automatically process the CAR-T cells. The devices are digitally connected to the COPE (Control Optimise Plan Execute). COPE acts as a manufacturing execution system and connects to the devices using their respective software interfaces. This enables detailed live data logging, process control as well as online production parameter modification. The COPE Software acts as a user interface for the automation environment showing the pseudonymised patient data from the LoqiqSuite platform. Using the LogiqSuite platform, data from all decentralised production sites will be registered in a standardised manner. The resulting LogiqAnalytics database will uniformly combine data from all sites and be crucial for continuous learning and model development.

#### IV. AI development from a database

In two oncological projects, AML-MRD and AIDPATH, data from LogiqAnalytics is used for machine learning algorithms to develop predictive models for survival analysis with competing risks, that can be used as CDS tools (53). The data was imported from other software solutions (e.g., Access-, SPSS-, and Excel-databases) using ETL procedures and integrated in LogiqScience. The data from AML-MRD has been described above (e.g. Figure 4). The data for the AIDPATH use case 5 was derived from multicentre clinical trial ARI-0001 (54,55) and ARI-0002h (56).

In the AML-MRD case the competing risks were graft versus host disease (GvHD) and toxicity (chemotherapy related risk), and tumour relapse. In the CAR-T cell case the competing risks were cytokine-release syndrome, immune suppression, and tumour relapse.

Data from LogiqScience was synced to LogiqAnalytics. Available and putative relevant biomedical parameters were selected, based on their potential mechanistical relevance and their timely availability for prognosis in practice. Machine learning was employed directly on data in LogiqAnalytics (57). All these activities can be performed directly on the real-time synced data in LogiqAnalytics, allowing the CDS prediction model to be updated without having to rebuild the machine learning flow. Currently, we are working on the parameter selection and data analytics in these projects to develop a prediction model.

#### V. Real-world data integration with clinical-decision support tools

ORTEC has developed and connected two CDS tools, U-Prevent and the Stroke Triage App, to LogiqScience to collect data directly after CDS use. Since LogiqAnalytics is fed real-time from LogiqCare and LogiqScience, this allows continuous learning of the CDS prediction model. After expert evaluation for MDR this results in continual learning to improve the AI CDS tools.

U-Prevent is an MDD-certified clinical decision support solution with multiple prediction models for cardiovascular risk management. Practical use is greatly enhanced by using data directly from electronic health records (58). U-Prevent allows users to enter missing data or to use imputation (59). In the recent PROSPERA project (unpublished results), we have coupled LogiqSuite to U-Prevent (see Figure 2), allowing it to prefill data in U-Prevent from the primary care electronic health records in U-Prevent. After data revision and updates by the physician, the U-Prevent CDS tool can be used in patient care to support share decision making and allocating appropriate care. At the end of the U-Prevent session, the updated data and CDS outcomes are stored in LogiqScience. Hereby, LogiqSuite facilitates efficient use of a clinical decision support tool and automatic registration of follow-up data, in this case in a pseudonym way due to local interpretations of the GDPR. Moreover, the analytics environment yields real-time dashboard with included patients, genders, risks profiles, prevention strategies, and more, that can easily be selected per primary care centre. These key-performance indicators (KPIs) facilitate benchmarking of various practitioners.

For prehospital triage of patients with suspected acute stroke we developed the Stroke Triage App, a MDR class I CDS tool (unpublished results), based on the risk predictions and time tables of PRESTO-2 implementation study (60,61) and disease history and routing information. In brief, the Stroke Triage App supports routing decisions for emergency care by advising on the adequate hospital level for stroke care in relation to the patient’s prehospital triage assessment. Pseudonymous patient identifiers, data from prehospital triage, GPS location, routing information, and advised outcome are stored in LogiqScience. During the study the data can be enriched with clinical follow-up data. This will allow the continual evaluation of the Stroke Triage App and facilitate continuous improvement of the underlying CDS prediction model for prognosis used in routing decisions.

## Discussion

Precision medicine stratifies diagnosis-treatment combinations between similar diseases, using large and complex datasets. This implies the need for good quality large databases of complex medical data that should be GDPR compliant, collect real-world data, integrate care and research, allow FAIR data exchange with multiple parties, communicate with clinical-decision support tools, easy to adapt for progressive insight, facilitate real-time analysis, and provide ready-to-use data for AI methods to develop new prediction models for prognostic, diagnostic and therapeutic purposes. In different use-cases we have described a MedDMS system that fulfils the basic criteria. The different approach of the integrated solution does not end but opens important subjects of discussion.

### Data protection

MedDMSs for precision medicine goes beyond data sharing in the cloud and demands a higher level of traceable procedures. This includes documentation of implementation, data stewardship for the traceability of design modification, data validation steps, and privacy protection by a fine-grained authorisation network. The automatic recorded and verified process documentation becomes more important where research complexity grows in number of parameters and inclusion centres. The plan for data management is crucial, so a MedDMS cannot seen independent from its procedures to maintain confidentiality of data and consistency of the database.

LogiqSuite allows integration of medical care and clinical research with real-time available descriptive statistics, and machine learning in one solution, with different and appropriate privacy levels for each use case. LogiqSuite distinguishes itself from other MedDMS by its interactions with CDS prediction models for precision medicine as shown in the two double use-cases. LogiqSuite has prepares real-world data in real-time for development. Data from LogiqSuite is directly available in CDS tools, facilitating their easy use. This is linked to the feedback of the (modified) input and results into LogiqSuite. Syncing these data in real-time to LogiqAnalytics facilitates the cycle of continual learning for precision medicine.

Sharing data from different study sites applies standard and interoperable solutions for implementing and managing medical registries to each site (62). LogiqSuite allows this with the same data templates, but ABAC restricts data access to those authorised to the relevant attributes e.g., institutes. Input validation is an important asset of data stewardship (40). The medical data scientists guide the use of these and other features in the MedDMS implementation.

### Comparing MedDMSs

LogiqSuite is not the first cloud MedDMS. REDCap exists since 2009 (63) and is used for multilocation eCRFs (64). Castor EDC (65) and FIMED (66) also support storage of medical research data. Castor EDC has a signable integrated informed consent for research purposes, which should technically be a trusted third party (TTP) i.e., with separate access authentication. LogiqSuite distinguishes from other MedDMS using four-eyes principles to guard data structure, quality, and access. Since this process loops in the medical data scientists, they can also give advice for dynamic templates and live analytics, which are new for many medical researchers. LogiqSuite also has added template translations, sending of questionaries to patients or subjects, connectivity with open API, and support with data transfer from other sources by ETL. LogiqSuite users are charmed by the real-time study monitoring, which is suitable for monitoring study completeness and exerting needed corrective actions.

### Integrated versus federated MedDMS

Not all involved in precision medicine choose to integrate MedDMSs in a single solution. Some parties propose blockchain with communication tools between physically separate MedDMSs (67). Separate MedDMSs offers some easy advantages for privacy protection, but ABAC and RBAC strategies on a need-to-know basis could also do this. Federated learning methods in healthcare are developed to allow machine learning to be exerted. The major challenge in machine learning is the poor data quality yielding poor CDS models. While federated learning is pioneered in various projects, methods for data stewardship and curation in federated learning still needs to be explored.

Separate MedDMSs will have different interpretations of parameters and different data quality. Poor quality data could be curated, which is a critical step in machine learning (68). Input validation and data curation techniques might be suboptimal for federated approaches (69,70). Options are to perform this either centrally (71) or by general rules (e.g. x times the standard deviation) (72). In our experience, these methods have limited efficacy for curing complex, large, divergent medical data with missing properties. Other methods for data curation are needed for efficient use of federated learning techniques. Especially, systematic shifts due to confounders between data subgroups should be detected and understood. Ideally data stewardship assesses an anonymised data set completely, since this facilitates the application of the complex rules for data stewardship more appropriately.

### Beyond the MedDMS

Even if many medical research groups collaborate in single database, there will always be medical data beyond the current collaborations, which also needs to be integrated for scientific research. So MedDMSs should be prepared for FAIR data exchange. Medical data are complex by their nature, consisting of various domains (e.g. anamnesis, follow-up, disease classifications, lab tests, medication) and includes various types of data (e.g. dates, values, repeated measurements), which use various methods for data classifications. After early and continued initiatives for Observational Medical Outcomes Partnership (OMOP) common data model (73,74), the scientists expanded this to collaborate in different settings for FAIR data exchange (75). Common data models are important for various purposes and crucial for AI (76). Systems are being developed to add FAIR standards de novo to MedDMSs (77). Multiple initiatives exist to draw data details of FAIR sharing (78), many in national or regional settings. Combining complexity of medical data and the regional approach, we foresee that there will be multiple relevant common data models and corresponding data dictionaries. The LogiqAnalytics data can be filled with all relevant predefined common data model structures, using SQL views. In LogiqSuite multiple data dictionaries can be integrated for FAIR data exchange.

## Conclusions

LogiqSuite is a MedDMS that integrates directly identifiable care data, pseudonymous science data and minimal identifiable to anonymous analytic data. LogiqSuite was evaluated in five different use cases for precision medicine, including data research, multilocation study monitoring, integration of research and production data with care data, real-time use of data for prediction model development, and input and registration of data from CDS tools. The value of this MedDMS was shown in different biomedical fields, including oncology, cardiovascular risk management, pulmonology, and prehospital triage.

LogiqSuite is unique in supporting real-time data analysis in care settings, which is also a beloved feature for scientific researchers. Moreover, LogiqSuite supports collaborative clinical care and research/laboratory workflows. The data can originate from manual entry in clinical care, eCRFs of research studies, clinical-decision support tools, import via ETL from third-party sources, and through FAIR-compliant open API. Available data can be monitored in real-time, providing tools for data monitoring and continuous feedback loop for data analytics and prediction model development. In clinical practice this facilitates continual learning with real-world data, since MDR legislation require a clinical evaluation for the validity of updated prediction models in every cycle.

Appropriate tools are crucial for KPI monitoring and progress in care and science. The LogiqSuite MedDMS application supports data management and science for advancement to precision medicine by data collection and collaboration, facilitating analytics and machine learning, and implementation of CDS tools.

## Data Availability

N.A.

## Conflicts of interest

JJLJ, IB, IV, LBR, & SD are employees of ORTEC B.V. which has a commercial interest in LogiqSuite.

## Author contributions

Authors contributed on *medical data science* (JJLJ, IB, IV, LBR), *LogiqSuite design* (JJLJ, IB, IV, LBR, TS) and the use-cases *AML-MRD* (JC, LLN, JT, PG, JJLJ, IB, IV, LBR), *P4O2* (AHMZ, LDB, IB, IV, JJLJ, LBR), *AID PATH database* (JJLJ, IV, IB, SH, FE, CS, SNV, MJ, MH), *AIDPATH AI* (JJLJ, IV, AV, IB, SNV, MJ, CS, MH), *AML MRD AI* (AV, JJLJ, IB, IV, JC, LLN, JT), *Stroke Triage App* (IB, JJLJ), *Prospera* (VAMCB, HJAO, RCV, JJLJ, IB, IV), and *Idea & initial draft writing* (JJLJ). All authors contributed to the paper and agreed with its contents.

## Acknowledgements

The authors would like to thank Bob Roozenbeek and Ruben van Wijdeven (Erasmus MC, Rotterdam, The Netherlands) for their invaluable contribution to the Stroke Triage App. The authors would like to thank Sandy Pratama, Thom Steenhuis, Marco Koeleman, Christian Hutter, Vlad Constantin Lipan and Alina Bratosin for their excellent technical assistance on the development of LogiqSuite.

## Abbreviation list

ABAC
: Attribute-Based Access Control
AI
: Artificial Intelligence
AML
: Acute Myeloid Leukaemia
CAR-T
: Chimeric Antigen Receptor T-Cells
CDS
: Clinical Decision Support
DMS
: Data Management System
eCRF
: Electronic Case Report Form
ETL
: Extract Transform Load
FAIR
: Findable, Accessible, Interoperable, and Reusable
GDPR
: General Data Protection Regulation
GMP
: Good Manufacturing Practice
IdP
: Identity Provider
LAIP
: Leukaemia-Aberrant Immune Phenotype
MedDMS
: Medical Data Management System
MDR
: Medical Device Regulation
MFA
: Multi-Factor Authentication
MRD
: Minimal Residual Disease
OMOP
: Observational Medical Outcomes Partnership
PII
: Personal Identifiable Information
RBAC
: Role-Based Access Control
TTP
: Trusted Third Party

* Received February 9, 2024.
* Revision received February 9, 2024.
* Accepted February 11, 2024.

This pre-print is available under a Creative Commons License (Attribution 4.0 International), CC BY 4.0, as described at [http://creativecommons.org/licenses/by/4.0/](http://creativecommons.org/licenses/by/4.0/)

## References

1. 1.DiMasi JA, Hansen RW, Grabowski HG. The price of innovation: new estimates of drug development costs. J Health Econ. 2003; 22: p. 151.

[CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0167-6296(02)00126-1&link_type=DOI)

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=12606142&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F02%2F11%2F2024.02.09.24302600.atom)

[Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000181521300002&link_type=ISI)

2. 2.Stewart DJ, Kruzrock R. Cancer: The Road to Amiens. J Clin Oncol. 2009; 27: p. 328–33.

[FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiRlVMTCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiamNvIjtzOjU6InJlc2lkIjtzOjg6IjI3LzMvMzI4IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjQvMDIvMTEvMjAyNC4wMi4wOS4yNDMwMjYwMC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=)

3. 3.Scannell J, Blackley A, Boldon H, Warrington B. Diagnosing the decline in pharmaceutical R&D efficiency. Nat Rev Drug Discov. 2012; 11: p. 191–200.

[CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nrd3681&link_type=DOI)

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=22378269&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F02%2F11%2F2024.02.09.24302600.atom)

4. 4.Lo AW. Can Financial Economics Cure Cancer? Atl Econ J. 2021; 49: p. 3–21.

5. 5. Den Otter W, Steerenberg PA, Van der Laan JW. Testing therapeutic potency of anticancer drugs in animal studies: a commentary. Regul Toxicol Pharmacol. 2002; 35: p. 266–272.

[CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1006/rtph.2001.1522&link_type=DOI)

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=12052010&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F02%2F11%2F2024.02.09.24302600.atom)

6. 6.Fosse V, Oldoni E, Bietrix F, Budillon S, Daskalopoulos EP, Fratelli M, et al. Recommendations for robust and reproducible preclinical research in personalised medicine. BMC Medicine. 2023; 21: p. 14.

7. 7.Delpierre C, Lefèvre T. Precision and personalized medicine: What their current definition says and silences about the model of health they promote. Implication for the development of personalized health. Front Sociol. 2023; 8: p. 1112159.

8. 8.Chan IS, Ginsburg GS. Personalised medicine: progress and promise. Ann Rev Genomics Hum Genet. 2011; 12: p. 217–44.

[CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1146/annurev-genom-082410-101446&link_type=DOI)

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=21721939&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F02%2F11%2F2024.02.09.24302600.atom)

[Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000295819900010&link_type=ISI)

9. 9.Snyderman R. Personalised health care: from theory to practice. Biotechnol J. 2012; 7: p. 973.

[CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/biot.201100297&link_type=DOI)

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=22180345&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F02%2F11%2F2024.02.09.24302600.atom)

10. 10.Jacobs JJL, Characiejus D, Scheper RJ, Stewart RJE, Tan JFV, Tomova R, et al. The Amiens Strategy: small phase III trials for clinically relevant progress in the war against cancer. J Clin Oncol. 2009; 27: p. 3062–3.

[FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiRlVMTCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiamNvIjtzOjU6InJlc2lkIjtzOjEwOiIyNy8xOC8zMDYyIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjQvMDIvMTEvMjAyNC4wMi4wOS4yNDMwMjYwMC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=)

11. 11.Shaban-Nejad A, Michalowski M, Buckeridge DL. Health intelligence: how artificial intelligence transforms population and personalised health. NPJ Digit Med. 2018; 1: p. 53.

12. 12.Ollier W, Muir KR, Lophatananon A, Verma A, Yuille M. Risk biomarkers enable precision in public health. Per Med. 2018; 15: p. 329–342.

13. 13.Kamb A. What’s wrong with our cancer models? Nat Rev Drug Discov. 2005; 4: p. 161–5.

[CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nrd1635&link_type=DOI)

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=15688078&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F02%2F11%2F2024.02.09.24302600.atom)

[Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000226721500021&link_type=ISI)

14. 14.Characiejus D, Hodzic J, Jacobs JJL. “First do no harm” and the importance of prediction in oncology. EPMA J. 2010; 1: p. 369–375.

[CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s13167-010-0042-1&link_type=DOI)

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=21151487&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F02%2F11%2F2024.02.09.24302600.atom)

15. 15.Ginsburg GS, Philips JA. Precision Medicine: From Science To Value. Health Affairs. 2018; 37: p. 694–701.

[CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1377/hlthaff.2017.1624&link_type=DOI)

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F02%2F11%2F2024.02.09.24302600.atom)

16. 16.Kalra D. The importance of real-world data to precision medicine. Pers Med. 2019; 16: p. 79–82.

17. 17. Mc Cord KA. The personalized medicine challenge: shifting to population health through real-world data. Int J Public Health. 2019; 64: p. 1255–1256.

18. 18.Christopoulos P, Schlenk R, Kazdal D, Blasi M, Lennerz J, Shah R, et al. Real-world data for precision cancer medicine-A European perspective. Genes Chromosomes Cancer. 2023; 62: p. 557–563.

19. 19.Arondekar B, Duh MS, Bhak RH, DerSarkissian M, Huynh L, Wang K, et al. Real-World Evidence in Support of Oncology Product Registration: A Systematic Review of New Drug Application and Biologics License Application Approvals from 2015-2020. Clin Cancer Res. 2022; 28: p. 27–35.

[Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MTA6ImNsaW5jYW5yZXMiO3M6NToicmVzaWQiO3M6NzoiMjgvMS8yNyI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDI0LzAyLzExLzIwMjQuMDIuMDkuMjQzMDI2MDAuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9)

20. 20.Douglas MP, Kumar A. Analyzing Precision Medicine Utilization with Real-World Data: A Scoping Review. J Pers Med. 2022; 12: p. 557.

21. 21.Fountzilas E, Tsimberidou AM, Vo HH, Kurzrock R. Clinical trial design in the era of precision medicine. Genome Med. 2022; 14: p. 101.

22. 22.Sheldon J, Ou W. The real informatics challenges of personalised medicine: not just about the number of central processing units. Per Med. 2013; 10: p. 639–45.

23. 23.Müller H, Dagher G, Loibner M, Stumptner C, Kungl P, Zatloukal K. Biobanks for life sciences and personalised medicine: importance of standardization, biosafety, biosecurity, and data management. Curr. Opin Biotechnol. 2020; 64: p. 45–51.

24. 24.Li C. Personalised medicine - the promised land: are we there yet? Clin Genet. 2011; 403-12: p. 79.

25. 25.Hallack JA, Azar DT. The AI Revolution and How to Prepare for It. Transl Vis Sci Technol. 2020; 9: p. 16.

26. 26.Vlahou A, Hallinan D, Apweiler R, Argiles A, Beige J, Benigni A, et al. Data Sharing Under the General Data Protection Regulation: Time to Harmonize Law and Research Ethics? Hypertension. 2021; 77: p. 1029–1035.

[CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1161/HYPERTENSIONAHA.120.16340&link_type=DOI)

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=33583200&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F02%2F11%2F2024.02.09.24302600.atom)

27. 27.Delhalle S, Bode SFN, Balling R, Ollert M M, He FQ. A roadmap towards personalised immunology. NPJ Syst Biol Appl. 2018; 4: p. 9.

28. 28.Loscalzo J. Systems Biology and Personalised Medicine A Network Approach to Human Disease. Proc Am Thorac Soc. 2011; 8: p. 196–8.

29. 29.Samuels S, Balint B, von der Leyen H, Hupé P, de Koning L, Kamoun C, et al. Precision medicine in cancer: challenges and recommendations from an EU-funded cervical cancer biobanking study. Br J Cancer. 2016; 115: p. 1575–1583.

30. 30.Yeatman TJ, Mule J, Dalton WS, Sullivan D. On the eve of personalised medicine in oncology. Cancer Res. 2008; 68: p. 7250–7252.

[FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiRlVMTCI7czoxMToiam91cm5hbENvZGUiO3M6NjoiY2FucmVzIjtzOjU6InJlc2lkIjtzOjEwOiI2OC8xOC83MjUwIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjQvMDIvMTEvMjAyNC4wMi4wOS4yNDMwMjYwMC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=)

31. 31.Agur Z. Biomathematics in the development of personalised medicine in oncology. Future Oncol. 2006; 2: p. 39–42.

32. 32.You K, Wang P, Ho D. N-of-1 Healthcare: Challenges and Prospects for the Future of Personalised Medicine. Front. Digit. Health. 2022; 4: p. 830656.

33. 33.Tracz V, Lawrence R. Towards an open science publishing platform. F1000Res. 2016; 5: p. 130.

34. 34.Noor AM, Holmberg L, Gillett C, Grigoriadis A. Big Data: the challenge for small research groups in the era of cancer genomics. Br J Cancer. 2015; 113: p. 1405–1412.

[CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/bjc.2015.341&link_type=DOI)

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=26492224&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F02%2F11%2F2024.02.09.24302600.atom)

35. 35.Modzelewska D, Sole-Navais P, Sandstrom A, Zhang G, Muglia LJ, Flatley C, et al. Changes in data management contribute to temporal variation in gestational duration distribution in the Swedish Medical Birth Registry. PLoS One. 2020; 15: p. e0241911.

36. 36.Limaye N. Data management Redefined. Perspect Clin Res. 2010; 1: p. 110–112.

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=21814632&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F02%2F11%2F2024.02.09.24302600.atom)

37. 37.Yu KH, Hart SN, Goldfeder E, Zhang QC, Parker SCJ, Snyder M. Harnessing big data for precision medicine: infrastructures and applications. Pac Symp Biocomput. 2017; 22: p. 635–639.

38. 38.Fahr P, Buchanan J, Wordsworh S. A Review of the Challenges of Using Biomedical Big Data for Economic Evaluations of Precision Medicine. Appl Health Econ Health Policy. 2019; 17: p. 443– 452.

39. 39.Viceconti M, Hunter P, Hose R. Big data, big knowledge: big data for personalised healthcare. IEEE J Biomed Health Inform. 2015; 19: p. 1209–15.

40. 40.Data clinic. Online.; 2021. Available from: [https://elevatehealth.eu/courses/data-stewardship/](https://elevatehealth.eu/courses/data-stewardship/).

41. 41.Tran VA, Johnson N, Redline S, Zhang GQ. OnWARD: Ontology-driven Web-based Framework for Multi-center Clinical Studies. J Biomed Inform. 2011; 44: p. S48–S53.

[CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.jbi.2011.08.019&link_type=DOI)

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=21924379&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F02%2F11%2F2024.02.09.24302600.atom)

42. 42.Information security in healthcare - register NEN 7510. Online. Available from: [https://www.nen.nl/en/certificatie-en-keurmerken-nen-7510](https://www.nen.nl/en/certificatie-en-keurmerken-nen-7510).

43. 43.ScienCrew. Online. Available from: [https://www.sciencrew.com/c/10446?title=LogiqSuite](https://www.sciencrew.com/c/10446?title=LogiqSuite).

44. 44.Beekers I, van Dijk S, Pratama S, Jacobs JJL. ScienCrew. Online.; 2023. LogiqSuite. Architecture, Security & System requirement. Available from: [https://storage.imgzine.com/public/432/LogiqSuite+Architecture+and+Security+Documentation.pdf](https://storage.imgzine.com/public/432/LogiqSuite+Architecture+and+Security+Documentation.pdf).

45. 45.Auth0. Online.; 2023. Available from: [https://auth0.com/](https://auth0.com/).

46. 46.ScienCrew. Online.; 2023. Available from: [https://www.sciencrew.com/](https://www.sciencrew.com/).

47. 47.Barker M, Chue Hong NP, Katz DS S, Lamprecht AL, Martinez-Ortiz C, Psomopoulos F, et al. Introducing the FAIR Principles for research software. Scientific Data. 2022; 9(Gruenpeter M, Martinez PA, Honeyman T.): p. 622.

48. 48.Heuser M, Freeman SD, Ossenkoppele GJ, Buccisano F, Hourigan CS, Ngai LL, et al. 2021 Update on MRD in acute myeloid leukemia: a consensus document from the European LeukemiaNet MRD Working Party. Blood. 2021; 2753–2767: p. 138.

49. 49.Van Solinge TS, Zeijlemaker W, Ossenkoppele GJ, Cloos J, Schuurhuis GJ. The interference of genetic associations in establishing the prognostic value of the immunophenotype in acute myeloid leukemia. Cytometry B Clin Cytom. 2018; 94: p. 151–158.

50. 50.Baalbaki N, Blankestijn JM, Abdel-Aziz MI, de Backer J, Bazdar S, Beekers I, et al. Precision Medicine for More Oxygen (P4O2)-Study Design and First Results of the Long COVID-19 Extension. J. Pers. Med. 2023; 13: p. 1060.

51. 51.Hort S, Herbst L, Bäckel N, Erkens F, Niessing B, Frye M, et al. Toward Rapid, Widely Available Autologous CAR-T Cell Therapy - Artificial Intelligence and Automation Enabling the Smart Manufacturing Hospital. Front Med (Lausanne). 2022; 9: p. 913287.

52. 52.Bäckel N, Hort S, Kis T, Nettleton D, Egan J, Jacobs JJL, et al. The potential of Artificial Intelligence in automated CAR-T cell manufacturing. Front Mol Med Sec Cell therapy. 2023; 3: p. in press.

53. 53.Fine JP, Gray RJ. A Proportional Hazards Model for the Subdistribution of a Competing Risk. J. Am. Stat. Ass. 1999; 1446: p. 496–509.

54. 54.Ortíz-Maldonado V, Rives S, Castellà M, Alonso-Saladrigues A, Benítez-Ribas D, Caballero-Baños M, et al. CART19-BE-01: A Multicenter Trial of ARI-0001 Cell Therapy in Patients with CD19+ Relapsed/Refractory Malignancies. Mol Ther. 2021; 29: p. 636–644.

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F02%2F11%2F2024.02.09.24302600.atom)

55. 55.Ortiz-Maldonado V, Alonso-Saladrigues A, Español-Rego M, Martínez-Cibrián N, Faura A, Magnano L, et al. Results of ARI-0001 CART19 cell therapy in patients with relapsed/refractory CD19-positive acute lymphoblastic leukemia with isolated extramedullary disease. Am J Hematol. 2022; 97: p. 731–739.

56. 56.Oliver-Caldés A, González-Calle V, Cabañas V, Español-Rego M, Rodríguez-Otero P, Reguera JL, et al. Fractionated initial infusion and booster dose of ARI0002h, a humanised, BCMA-directed CAR T-cell therapy, for patients with relapsed or refractory multiple myeloma (CARTBCMA-HCB-01): a single-arm, multicentre, academic pilot study. Fractionated initial infusion and booster dose of ARI0002h, a humanised, BCMA-directed CAR T-cell therapy, for patients with relapsed or refractory multiple myeloma (CARTBCMA-HCB-01): a single-arm, multicentre, academic pilot study. 2023; 24: p. 913–924.

57. 57.Verkouter I, Vegelien A, Beekers I, Navarro Velázquez S, Juan M, Sanges C, et al. Mathematical approach towards personalised prediction of the most efficient CAR-T cell product using survival analysis with competing risks. In 5th European CAR T-cell meeting, EHA-EBMT; 2023; Rotterdam, The Netheralnds.

58. 58.Groenhof TKJ, Rittersma ZH, Bots ML, Brandjes M, Jacobs JJL, Grobbee DE, et al. A computerised decision support system for cardiovascular risk management ‘live’ in the electronic health record environment: development, validation and implementation—the Utrecht Cardiovascular Cohort Initiative. Neth Heart J. 2019; 27: p. 435–442.

59. 59.Nijman SWJ, Hoogland J, Groenhof TKJ, Brandjes M, Jacobs JJL, Bots ML, et al. Real-time imputation of missing predictor values in clinical practice. Eur Heart J Digit Health. 2020; 19: p. 154–164.

60. 60.Duvekot MHC, Venema E, Rozeman AD, Moudrous W, Vermeij FH, Biekart M, et al. Comparison of eight prehospital stroke scales to detect intracranial large-vessel occlusion in suspected stroke (PRESTO): a prospective observational study. Lancet Neurol. 2021; 20: p. 213–221.

[CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S1474-4422(20)30439-7&link_type=DOI)

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F02%2F11%2F2024.02.09.24302600.atom)

61. 61.Nguyen TTM, van den Wijngaard IR, Bosch J, van Belle E, van Zwet EW, Dofferhoff-Vermeulen T, et al. Comparison of Prehospital Scales for Predicting Large Anterior Vessel Occlusion in the Ambulance Setting. JAMA Neurol. 2021; 78: p. 157–164.

62. 62.Da Silva KR, Costa R, Crevelari ES, Lacerda MS, de Moraes Albertini CM, Filho MM, et al. Glocal Clinical Registries: Pacemaker Registry Design and Implementation for Global and Local Integration – Methodology and Case Study. PLoS ONE. 2013; 8: p. e71090.

63. 63.Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N, Conde JG. Research electronic data capture (REDCap)--a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform. 2009; 42: p. 377–81.

[CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.jbi.2008.08.010&link_type=DOI)

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=18929686&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F02%2F11%2F2024.02.09.24302600.atom)

[Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000264958800018&link_type=ISI)

64. 64.Obeid JS, McGraw CA, Minor BL, Conde JG, Pawluk R, Lin M, et al. Procurement of shared data instruments for Research Electronic Data Capture (REDCap). J Biomed Inform. 2013; 46: p. 259–65.

[CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.jbi.2012.10.006&link_type=DOI)

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=23149159&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F02%2F11%2F2024.02.09.24302600.atom)

65. 65.Ottenhoff MC, Ramos LA, Potters W, Janssen MLF, Hubers D, Hu S, et al. Predicting mortality of individual patients with COVID-19: a multicentre Dutch cohort. BMJ Open. 2021; 11: p. e047347.

[Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiYm1qb3BlbiI7czo1OiJyZXNpZCI7czoxMjoiMTEvNy9lMDQ3MzQ3IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjQvMDIvMTEvMjAyNC4wMi4wOS4yNDMwMjYwMC5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=)

66. 66.Hurtado S, García-Nieto J, Navas-Delgado I, Aldana-Montes JF. FIMED: Flexible management of biomedical data. Comput Methods Programs Biomed. 2021; 212: p. 106496.

67. 67.Pericàs-Gornals R, Mut-Puigserver M, Payeras-Capel MM. Highly private blockchain-based management system for digital COVID-19 certificates. Int J Inf Secur. 2022; 21: p. 1069–1090.

68. 68.Alves VM, Auerbach SS, Kleinstreuer N, Rooney jP, Muratov EN, Rusyn I, et al. Curated Data In - Trustworthy In Silico Models Out: The Impact of Data Quality on the Reliability of Artificial Intelligence Models as Alternatives to Animal Testing. Altern Lab Anim. 2021; 49: p. 73–82.

69. 69.Gu X, Sabrina F, Fan Z, Sohail S. A Review of Privacy Enhancement Methods for Federated Learning in Healthcare Systems. Int J Environ Res Public Health. 2023; 20: p. 6539.

70. 70.Hirano T, Motohashi T, Okumura K, Takajo K, Kuroki T, Ichikawa D, et al. Data Validation and Verification Using Blockchain in a Clinical Trial for Breast Cancer: Regulatory Sandbox. J Med Internet Res. 2020; 22: p. e18938.

[PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F02%2F11%2F2024.02.09.24302600.atom)

71. 71.Tacconelli E, Gorska A, Carrara E, Davis RJ, Bonten M, Friedrich AW, et al. Challenges of data sharing in European Covid-19 projects: A learning opportunity for advancing pandemic preparedness and response. Lancet Reg Health Eur. 2022; 21: p. 100467.

72. 72.Appelbaum L, Kaplan ID, Palchuk MD, Kundrot S, Winer-Jones JP, Rinard M. Development and Experience with Cancer Risk Prediction Models Using Federated Databases and Electronic Health Records. In Linwood SL, editor. Digital Health. Brisbane (AU): Exon Publications; 2022. p. Chapter 2.

73. 73.OHDSI. Online.; 2023. Available from: [http://www.ohdsi.org/data-standardization/](http://www.ohdsi.org/data-standardization/).

74. 74.Belenkaya R, Gurley MJ, Golozar A, Dymshyts D, Miller RT, Williams AE, et al. Extending the OMOP Common Data Model and Standardized Vocabularies to Support Observational Cancer Research. JCO Clin Cancer Inform. 2021; 5: p. 12–20.

75. 75.Wilkinson MD, Dumontier M, Aalbersberg U, Appleton G, Axton M, Baak A, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016; 3: p. 160018.

76. 76.Huerta EA, Blaiszik B, Brinson LC, Bouchard KE, Diaz D, Doglioni C, et al. FAIR for AI: An interdisciplinary and international community building perspective. Sci Data. 2023; 10: p. 487.

77. 77.Groenen KHJ, Jacobsen A, Kersloot MG, Dos Santos Vieira B, van Enckevort E, Kaliyaperumal R, et al. The de novo FAIRification process of a registry for vascular anomalie. Orphanet J Rare Dis. 2021; 16: p. 376.

78. 78.Inau ET, Sack J, Waltemath D, Zeleke AA. Initiatives, Concepts, and Implementation Practices of the Findable, Accessible, Interoperable, and Reusable Data Principles in Health Data Stewardship: Scoping Review. J Med Internet Res. 2023; 25: p. e45013.