PT - JOURNAL ARTICLE AU - Handels, Ron AU - Jönsson, Linus AU - Raket, Lars Lau AU - the Alzheimer’s Disease Neuroimaging Initiative TI - Generate Synthetic Data in R for a Hypothetical Alzheimer’s Disease Trial AID - 10.1101/2024.02.05.24302140 DP - 2024 Jan 01 TA - medRxiv PG - 2024.02.05.24302140 4099 - http://medrxiv.org/content/early/2024/02/06/2024.02.05.24302140.short 4100 - http://medrxiv.org/content/early/2024/02/06/2024.02.05.24302140.full AB - INTRODUCTION Representative data of recent Alzheimer’s Disease (AD) trials are difficult to obtain. We aimed to generate a synthetic version of an original real-world observational dataset, subsequently apply a plausible AD treatment effect, and make our method open-source available.METHODS Synthetic data was generated in the following steps: (1) Obtain real-world data from the ADNI study on demographic (age, sex, education), clinical (cognition: MMSE and ADAS; function: FAQ; composite cognition/function: CDR, ADCOMS) and biological (genetics: APOE4; cerebrospinal fluid: ABeta, Tau; imaging: PET-SUVR-centiloid) outcomes at baseline, 6, 12 and/or 18-month follow-up (35 variables), with missing data multiple-imputed to obtain 10 sets of 537 individuals. (2) Estimate (theoretical) minimum and maximum (all continuous variables) and proportions (all categorical variables). (3) Rescale to 0-1 range (continuous). (4) Estimate beta distribution shape parameters (method of moments; continuous). (5) Transform to cumulative probability distribution function (using shape parameters; continuous) and to cumulative probability (categorical). (6) Transform to a normal distribution. (7) Estimate variance-covariance matrix. (8) Generate random correlated normal data using Cholesky decomposition of variance-covariance. (9) Transform to cumulative probability distribution function. (10) Transform to beta distribution (using shape parameters; continuous). (11) Rescale to original range. (12) Keep half as control arm, and half as intervention arm, and estimate change from baseline. (13) Multiply intervention change from baseline with self-defined hypothetical relative treatment effect. We assumed correlations on normalized scale were similar to correlations on original scale. R code is available on github: https://github.com/ronhandels/synthetic-correlated-data.RESULTS The synthetic distribution and mean over time showed large similarity to the original data (visually assessed). The absolute difference in pairwise correlations between original and synthetic data median was 0.02 (95th percentile=0.11, max=0.18).CONCLUSION We judged our method sufficiently valid to generate synthetic correlated plausible hypothetical trial results.Competing Interest StatementLLR is an employee of Eli Lilly and Company. RH received outside this study research grants from JPND, ZonMW, IMI, H2020 (paid to institution); received outside this study consulting fees in the past 3 years from Lilly Nederland (2023), iMTA (2023), and Biogen (2021) (paid to institution); is member of IPECAD and member of ISPOR special interest group open-source models (un-paid).Clinical Protocols https://github.com/ronhandels/synthetic-correlated-data Funding StatementThis study did not receive any fundingAuthor DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:We requested the data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) for the following specific aim: "describe the natural progression over a short-term period by mimicing/emulating data typically obtained from AD drug treatment randomized trials" and method as described in our manuscript (in short: select data from the ADNI, fit variance-covariance matrix, use variance-covariance matrix to generate synthetic data (mimic/emulate the data)). We have received the following reply from ADNI: "Your request for access to the Alzheimer's Disease Neuroimaging Initiative (ADNI) Data has been approved."I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesPart of the data are available online at https://github.com/ronhandels/synthetic-correlated-data. All data produced in the present study are available upon reasonable request to the authors. https://github.com/ronhandels/synthetic-correlated-data