PT - JOURNAL ARTICLE AU - Lusa, Lara AU - Proust-Lima, Ćecile AU - Schmidt, Carsten O. AU - Lee, Katherine J. AU - le Cessie, Saskia AU - Baillie, Mark AU - Lawrence, Frank AU - Huebner, Marianne AU - , TI - Initial data analysis for longitudinal studies to build a solid foundation for reproducible analysis AID - 10.1101/2023.12.05.23299518 DP - 2023 Jan 01 TA - medRxiv PG - 2023.12.05.23299518 4099 - http://medrxiv.org/content/early/2023/12/06/2023.12.05.23299518.short 4100 - http://medrxiv.org/content/early/2023/12/06/2023.12.05.23299518.full AB - Initial data analysis (IDA) is the part of the data pipeline that takes place between the end of data retrieval and the beginning of data analysis that addresses the research question. Systematic IDA and clear reporting of the IDA findings is an important step towards reproducible research. A general framework of IDA for observational studies includes data cleaning, data screening, and possible updates of pre-planned statistical analyses. Longitudinal studies, where participants are observed repeatedly over time, pose additional challenges, as they have special features that should be taken into account in the IDA steps before addressing the research question. We propose a systematic approach in longitudinal studies to examine data properties prior to conducting planned statistical analyses.In this paper we focus on the data screening element of IDA, assuming that the research aims are accompanied by an analysis plan, meta-data are well documented, and data cleaning has already been performed. IDA screening domains are participation profiles over time, missing data, and univariate and multivariate descriptions, and longitudinal aspects. Executing the IDA plan will result in an IDA report to inform data analysts about data properties and possible implications for the analysis plan that are other elements of the IDA framework.Our framework is illustrated focusing on hand grip strength outcome data from a data collection across several waves in a complex survey. We provide reproducible R code on a public repository, presenting a detailed data screening plan for the investigation of the average rate of age-associated decline of grip strength.With our checklist and reproducible R code we provide data analysts a framework to work with longitudinal data in an informed way, enhancing the reproducibility and validity of their work.Competing Interest StatementThe authors have declared no competing interest.Funding StatementYesAuthor DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:Not necessaryI confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesData are available for research purposes at https://share-eric.eu/data/data-access https://share-eric.eu/data/data-access https://stratosida.github.io/longitudinal/