Abstract
INTRODUCTION Representative data of recent Alzheimer’s Disease (AD) trials are difficult to obtain. We aimed to generate a synthetic version of an original real-world observational dataset, subsequently apply a plausible AD treatment effect, and make our method open-source available.
METHODS Synthetic data was generated in the following steps: (1) Obtain real-world data from the ADNI study on demographic (age, sex, education), clinical (cognition: MMSE and ADAS; function: FAQ; composite cognition/function: CDR, ADCOMS) and biological (genetics: APOE4; cerebrospinal fluid: ABeta, Tau; imaging: PET-SUVR-centiloid) outcomes at baseline, 6, 12 and/or 18-month follow-up (35 variables), with missing data multiple-imputed to obtain 10 sets of 537 individuals. (2) Estimate (theoretical) minimum and maximum (all continuous variables) and proportions (all categorical variables). (3) Rescale to 0-1 range (continuous). (4) Estimate beta distribution shape parameters (method of moments; continuous). (5) Transform to cumulative probability distribution function (using shape parameters; continuous) and to cumulative probability (categorical). (6) Transform to a normal distribution. (7) Estimate variance-covariance matrix. (8) Generate random correlated normal data using Cholesky decomposition of variance-covariance. (9) Transform to cumulative probability distribution function. (10) Transform to beta distribution (using shape parameters; continuous). (11) Rescale to original range. (12) Keep half as control arm, and half as intervention arm, and estimate change from baseline. (13) Multiply intervention change from baseline with self-defined hypothetical relative treatment effect. We assumed correlations on normalized scale were similar to correlations on original scale. R code is available on github: https://github.com/ronhandels/synthetic-correlated-data.
RESULTS The synthetic distribution and mean over time showed large similarity to the original data (visually assessed). The absolute difference in pairwise correlations between original and synthetic data median was 0.02 (95th percentile=0.11, max=0.18).
CONCLUSION We judged our method sufficiently valid to generate synthetic correlated plausible hypothetical trial results.
Competing Interest Statement
LLR is an employee of Eli Lilly and Company. RH received outside this study research grants from JPND, ZonMW, IMI, H2020 (paid to institution); received outside this study consulting fees in the past 3 years from Lilly Nederland (2023), iMTA (2023), and Biogen (2021) (paid to institution); is member of IPECAD and member of ISPOR special interest group open-source models (un-paid).
Clinical Protocols
https://github.com/ronhandels/synthetic-correlated-data
Funding Statement
This study did not receive any funding
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
We requested the data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) for the following specific aim: "describe the natural progression over a short-term period by mimicing/emulating data typically obtained from AD drug treatment randomized trials" and method as described in our manuscript (in short: select data from the ADNI, fit variance-covariance matrix, use variance-covariance matrix to generate synthetic data (mimic/emulate the data)). We have received the following reply from ADNI: "Your request for access to the Alzheimer's Disease Neuroimaging Initiative (ADNI) Data has been approved."
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
↵* Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf
Data Availability
Part of the data are available online at https://github.com/ronhandels/synthetic-correlated-data. All data produced in the present study are available upon reasonable request to the authors.