Abstract
The analysis of high-quality genomic variant data may offer a more complete understanding of the human genome, enabling researchers to identify novel biomarkers, stratify patients based on disease risk factors, and decipher underlying biological pathways. Although the availability of genomic data has sharply increased in recent years, the accessibility of bioinformatic tools to aid in its preparation is still lacking. Limitations with processing genomic data primarily include its large volume, associated computational and storage costs, and difficulty in identifying targeted and relevant information. Here, we present VAREANT, an accessible and configurable bioinformatic tool to support the preparation of variant data into a usable analysis-ready format. Designed to simplify the data pre-processing workflow, VAREANT enables the curation of targeted variant datasets. VAREANT is comprised of three standalone modules: (1) Pre- processing, (2) Variant Annotation, (3) Artificial Intelligence (AI) / Machine Learning (ML) Data Preparation. Pre-processing supports the fine-grained filtering of complex variant datasets to eliminate extraneous data. Variant Annotation allows for the addition of variant metadata from public annotation databases for subsequent analysis and interpretation. AI/ML Data Preparation supports the user in creating AI/ML-ready datasets suitable for immediate analysis with minimal pre-processing required. We have successfully tested and validated our tool on numerous variable-sized datasets and implemented VAREANT in two case studies involving patients with CVDs. Efficiently extracting relevant variants into an AI/ML ready format using tools like VAREANT has important scientific implications, such as producing targeted and high-quality datasets and helping reduce overall computational costs.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
No funding received.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data availability
The source code of VAREANT is available on GitHub < https://github.com/drzeeshanahmed/Gene_VAREANT >