Abstract
Objective This study aimed to build a comprehensive dataset of human genetic polymorphisms associated with nutrition by integrating data from multiple sources, including the LitVar database, PubMed, and the GWAS catalog. Such a resource could facilitate the exploration of genetic polymorphisms associated with nutrition-related traits.
Methods We developed a Python pipeline to streamline the integration and analysis of genetic polymorphism data associated with nutrition. We employed the MeSH ontology as a framework to aggregate relevant genetic data. The pipeline comprises five distinct modules that go through the following steps: data extraction from LitVar and PubMed articles, generation of a joint dataset by data merging, generation of comprehensive MeSH term lists, filtering of the joint dataset using the selected MeSH sets, lexical analysis and augmentation of the dataset with data from of the GWAS catalog dataset.
Results We successfully aggregated a wide range of papers and data on genetic polymorphism and nutrition-related traits into a single dataset. Cross-referencing with the GWAS catalog dataset provided information about possible effects or risk alleles associated with the identified genetic polymorphisms. The nutrigenetic dataset we developed is a tool for nutritionists and researchers, serving as a preliminary benchmark for personalized nutrition interventions based on genetic testing.
Conclusion The pipeline presented here consolidates and organizes information on genetic polymorphisms associated with nutrition, enabling comprehensive analysis and exploration of gene-diet interactions. Overall, the method contributes to advancing personalized nutrition interventions and nutrigenomics research. The flexible nature of the system allows its application to other investigations related to genetic polymorphisms.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This study did not receive any funding
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
Data and code are available on Zenodo and GitHub, respectively. URLs are provided in the "Data Availability Links section.