In Q. Wang et al. (2023), we developed an end-to-end pipeline to demonstrate the role of epistasis in the genetic control of cardiac hypertrophy. This pipeline consisted of four major phases: (1) the derivation of estimates of left ventricular mass via deep learning, (2) the computational prioritization of epistatic drivers based on data from the UK Biobank (Bycroft et al. 2018), (3) functional interpretation of the hypothesized epistatic genetic loci, and (4) experimental confirmation of epistasis in cardiac transcriptome and cardiac cellular morphology.
Here, we expand upon the computational prioritization of epistatic drivers phase. In this phase, we leveraged large-scale genetic and cardiac MRI data from the UK Biobank and developed the low-signal signed iterative random forest (lo-siRF) to prioritize genetic loci and interactions between loci for downstream investigations. As lo-siRF is heavily rooted in the Predictability, Computability, and Stability (PCS) framework for veridical (trustworthy) data science (Yu and Kumbier 2020), we conducted extensive stability analyses to help minimize the impact of human judgment calls and modeling decisions on our scientific conclusions. In this supplementary PCS documentation, we will:
In what follows, we organize this documentation by step in the lo-siRF pipeline (see Q. Wang et al. (2023)), beginning with the choice of phenotypic data, followed by the dimension reduction step, binarization step, prediction step, and concluding with the prioritization step.
In the first step of lo-siRF, we began with a careful choice of phenotypic data. This first, but often overlooked, choice is critical to the reliability and interpretation of results. If the data quality is poor, conclusions may be unreliable. If the data is far-removed from the domain problem of interest, conclusions may not be relevant. Given the importance of this choice of phenotypic data, we conducted extensive data explorations for different phenotypes in addition obtaining clinician input regarding its clinical relevance.
Initial choice of phenotype: Our overarching goal was to study the role of epistasis in cardiac hypertrophy. One important disease that is closely related to cardiac hypertrophy is hypertrophic cardiomyopathy (HCM). HCM is a common heart disease, impacting around 1 in 500 people, where the heart muscle becomes enlarged or thickened. Given the high prevalence and possible life-threatening consequences of this disease, we first set out to explore this HCM phenotype, defined as a 0-1 indicator variable of whether or not the individual was diagnosed with HCM (i.e., any ICD10 code of I42.1 or I42.2 in the UK Biobank). However, when fitting various machine learning prediction models, including L1 or L2-regularized logistic regression, random forest (RF), iterative RF (iRF), and support vector machines, using single-nucleotide variant (SNV) data from the UK Biobank (details in Q. Wang et al. (2023)) to predict HCM diagnosis, the balanced classification accuracy on the held-out validation data did not consistently exceed 50% when trained on different bootstrapped training samples. In other words, the fitted prediction models were no better than random guessing. This made us wary of any interpretations extracted from these prediction models since it is unclear whether these models are capturing any relevant phenotypic signal or simply noise. Given this incredibly weak prediction signal, we deemed that these models did not pass the prediction check (the “P” in PCS) (Yu and Kumbier 2020). For this reason, we chose not to proceed further with the HCM phenotype and did not conduct any analysis with respect to interpreting the aforementioned models for HCM.
Pivoting to a cardiac MRI-derived phenotype: We instead pivoted to an alternative phenotype for cardiac hypertrophy based upon deep-learning-enabled left ventricular (LV) structural analysis using cardiac MRIs. This choice was heavily motivated by our struggles with HCM. Specifically, when looking back at the HCM diagnosis data, we noticed that the number of HCM cases in the UK Biobank data (only 236 HCM cases out of 337,535 unrelated White British individuals) was much lower than the 1 in 500 ratio that is expected. Although the UK Biobank population is known to be healthier than the general population (Fry et al. 2017), the observed discrepancy is large, leading us to speculate about potential under-diagnosis issues with the HCM phenotype. Consequently, rather than relying on a clinician’s diagnosis which may suffer from issues such as under-diagnosis, we chose to build upon recent advances in deep-learning-enabled phenotyping (Bai et al. 2018). Doing so enables us to extract a measure of left ventricular hypertrophy, specifically, the left ventricular mass (LVM), from anyone with a cardiac MRI.
Arriving at LVMi: However, LVM is known to be heavily influenced by body weight, height, and gender (Engel, Schwartz, and Homma 2016; Mizukoshi et al. 2016). Thus, if we were to proceed solely with LVM in our analysis, we risk identifying genetic loci and interactions between loci that are primarily associated with body weight, height, and gender, rather than left ventricular hypertrophy. Indeed, if we visualize the relationship between LVM, height, and weight within each gender group (Figure 2.1), we see strong positive correlations.
This motivates the need to adjust for height and weight in order to help decouple the effects of height/weight and LVM. As done in Khurshid et al. (2023), one common way to address this is by analyzing LVM indexed by body surface area (LVMi), defined as:
\[\begin{align*} \text{LVMi} = \text{LVM } / \text{ Body surface area}, \end{align*}\] where body surface area is calculated based on height and body weight via the Du Bois formula (Du Bois and Du Bois 1916).
After normalizing by body surface area (and hence, height and weight), the associations between LVMi and height/weight are much weaker within each gender group, exhibiting correlations closer to 0 (Figure 2.1). Thus, by focusing our statistical analysis on LVMi, we lessen the possibility that the identified genetic loci and interactions between loci are only selected because of their association with body height and/or weight. Still, there are differences in LVMi between males and females, with males tending to have higher LVMi measurements. As seen later, we will account for this in the modeling stage by using gender-specific binarization thresholds to binarize the LVMi responses into the high and low LVMi groups.
Study Cohort: Before proceeding, we note that another important human judgment call within our pipeline is the choice of study cohort. Specifically, we restricted our analysis to the unrelated White British individual population with both SNV and cardiac MRI data in the UK Biobank, resulting in a cohort of 29,661 individuals. We chose to focus this analysis on the unrelated White British population, anticipating the very low-signal problem under study (e.g., due to the generally small effect sizes of common genetic variants and the high phenotypic diversity of cardiac hypertrophy). By studying a more homogeneous cohort such as the unrelated White British population, we hope to first establish a proof-of-concept foothold on the role of epistasis in cardiac hypertrophy, and we view the study and generalization to the broader population as a very crucial next step.
Figure 2.1: Pair plot, illustrating the relationships between LVM, LVMi, height, and weight within each gender group (males in green and females in orange). Density distributions are shown along the diagonal.
Having carefully chosen the phenotypic data (i.e., the cardiac MRI-derived LVMi) under study, the next step in lo-siRF is to do dimension reduction to reduce the number of SNVs under consideration using a genome-wide association study (GWAS). This dimension reduction step is necessary to mitigate the computational and statistical challenges of searching across millions of possible SNVs (which in the case of searching for epistasis results in an even larger combinatorial explosion of possibilities).
Why GWAS? We chose to do dimension reduction by running a GWAS as it (1) has been shown to be an incredibly useful tool for better understanding the genetic basis of numerous diseases in this field (Visscher et al. 2017; Pirruccello et al. 2020; Meyer et al. 2020; Harper et al. 2021; Khurshid et al. 2023), (2) is one of the few methods that can handle an input of millions of SNVs in a computationally-tractable manner, and (3) is a common, widely-accepted preprocessing step in other related statistical genomics tasks such as fine-mapping (Schaid, Chen, and Larson 2018).
Limitations: We acknowledge, however, that using a GWAS to reduce the number of SNVs under consideration has limitations. Because a GWAS prioritizes SNVs that have strong marginal associations with the phenotype under study, we may be removing SNVs that are associated with LVMi through an epistatic interaction but not through a marginal association. Alleviating this limitation is certainly interesting and an important direction to pursue in future work. Regardless, in this current work, we use lo-siRF primarily as a way to generate recommendations for experimental validation of epistasis. We thus are most interested in generating reliable recommendations for a limited number of experiments, rather than attempting to find all possible interactions that drive the disease. Given our particular use-case for lo-siRF, we accept the limitations of using a GWAS for dimension reduction and again view this work as a reliable first step (and certainly not the last) for better understanding the role of epistasis in cardiac hypertrophy.
How was the dimension reduction performed? We performed a GWAS on the training data for the rank-based inverse normal-transformed LVMi using two algorithms, PLINK (Purcell et al. 2007) and BOLT-LMM (Loh et al. 2015). For each of the two GWAS runs, we ranked the SNVs by significance (i.e., the GWAS p-value). We then took the union of the top 1000 SNVs (without clumping) from each of the two GWAS runs and proceeded with this resulting set of 1405 SNVs for the remainder of the lo-siRF pipeline. We refer to Q. Wang et al. (2023) for the details on the PLINK and BOLT-LMM implementations and use this document instead to justify several of our human judgment calls in this dimension reduction step.
Organization: Under the GWAS Results Summary
tab, we provide an investigation into the BOLT-LMM and PLINK GWAS results for LVM and LVMi. Under the GWAS-filtered SNVs
, we explore the selected 1405 SNVs that passed the GWAS filter for the LVMi phenotype.
Below, we provide an overview of the genome-wide association study (GWAS) results for both the LVM and LVMi phenotypes. While LVMi is the primary phenotype of interest, we provide the LVM GWAS results for completeness. We further investigate the similarities in findings between the LVM and LVMi GWAS analyses via a stability analysis.
BOLT-LMM
and PLINK
tabs:
Stability Analysis
tab, we investigated the similarities and differences between the BOLT-LMM and PLINK results as well as between the LVM and LVMi results.Figure 3.1: GWAS Manhattan plot for the LVMi phenotype using BOLT-LMM. Red dashed line indicates the genome-wide significance level (p = 5E-8).
Figure 3.2: List of genomic risk loci from FUMA for the LVMi BOLT-LMM GWAS. uniqID = Unique ID of SNVs consists of chr:position:allele1:allele2 where alleles are alphabetically ordered. rsID = rsID of the top lead SNV for the given genomic locus. chr = Chromosome of top lead SNV. pos = Position of top lead SNV on hg19. p = P-value of top lead SNV. start = Start position of the locus. end = End position of the locus. nGWASSNVs = Number of the GWAS-tagged candidate SNVs within the genomic locus. nIndSigSNVs = Number of the independent significant SNVs in the genomic locus. IndSigSNVs = rsID of independent significant SNVs in the genomic locus. nLeadSNVs = The number of lead SNVs in the genomic locus. LeadSNVs = rsID of lead SNVs in the genomic locus.
Figure 3.3: List of lead SNVs from FUMA for the LVMi BOLT-LMM GWAS. GenomicLocus = Index of assigned genomic locus (note: multiple independent lead SNVs can be assigned to the same genomic locus). uniqID = Unique ID of SNVs consists of chr:position:allele1:allele2 where alleles are alphabetically ordered. rsID = rsID of the SNV. chr = Chromosome. pos = Position on hg19. p = P-value from GWAS. nIndSigSNVs = Number of independent significant SNVs which are in LD of the lead SNV at \(r^2\) = 0.1. IndSigSNVs = rsID of independent significant SNVs which are in LD of the lead SNV at \(r^2\) = 0.1.
Figure 3.4: List of independent significant SNVs from FUMA for the LVMi BOLT-LMM GWAS. GenomicLocus = Index of assigned genomic locus (note: multiple independent lead SNVs can be assigned to the same genomic locus). uniqID = Unique ID of SNVs consists of chr:position:allele1:allele2 where alleles are alphabetically ordered. rsID = rsID of the SNV. chr = Chromosome. pos = Position on hg19. p = P-value from GWAS. nGWASSNVs = Number of GWAS-tagged SNVs which are in LD of the independent significant SNV given \(r^2\) = 0.6.
Figure 3.5: List of mapped genes (via positional mapping) from FUMA for the LVMi BOLT-LMM GWAS. ensg = ENSG ID. symbol = Gene Symbol. chr = Chromosome. start = Starting position of the gene. end = Ending position of the gene. strand = Strand of the gene. type = Gene biotype from Ensembl. entrezID = entrez ID (if available). HUGO = HUGO (HGNC) gene symbol. pLI = pLI score (the probability of being loss-of-function intolerant) from ExAC database; the higher the score is, the more intolerant to loss-of-function mutations the gene is. ncRVIS = Non-coding residual variation intolerance score; the higher the score is, the more intolerant to non-coding variation the gene is. posMapSNVs = Number of SNVs mapped to gene based on positional mapping. posMapMaxCADD = The maximum CADD score of mapped SNVs by positional mapping. minGwasP = The minimum P-value of mapped SNVs. IndSigSNVs = rsID of the independent significant SNVs that are in LD with the mapped SNVs. GenomicLocus = Index of genomic loci where mapped SNVs are from.
Figure 3.6: GWAS Manhattan plot for the LVM phenotype using BOLT-LMM. Red dashed line indicates the genome-wide significance level (p = 5E-8).
Figure 3.7: List of genomic risk loci from FUMA for the LVM BOLT-LMM GWAS. uniqID = Unique ID of SNVs consists of chr:position:allele1:allele2 where alleles are alphabetically ordered. rsID = rsID of the top lead SNV for the given genomic locus. chr = Chromosome of top lead SNV. pos = Position of top lead SNV on hg19. p = P-value of top lead SNV. start = Start position of the locus. end = End position of the locus. nGWASSNVs = Number of the GWAS-tagged candidate SNVs within the genomic locus. nIndSigSNVs = Number of the independent significant SNVs in the genomic locus. IndSigSNVs = rsID of independent significant SNVs in the genomic locus. nLeadSNVs = The number of lead SNVs in the genomic locus. LeadSNVs = rsID of lead SNVs in the genomic locus.
Figure 3.8: List of lead SNVs from FUMA for the LVM BOLT-LMM GWAS. GenomicLocus = Index of assigned genomic locus (note: multiple independent lead SNVs can be assigned to the same genomic locus). uniqID = Unique ID of SNVs consists of chr:position:allele1:allele2 where alleles are alphabetically ordered. rsID = rsID of the SNV. chr = Chromosome. pos = Position on hg19. p = P-value from GWAS. nIndSigSNVs = Number of independent significant SNVs which are in LD of the lead SNV at \(r^2\) = 0.1. IndSigSNVs = rsID of independent significant SNVs which are in LD of the lead SNV at \(r^2\) = 0.1.
Figure 3.9: List of independent significant SNVs from FUMA for the LVM BOLT-LMM GWAS. GenomicLocus = Index of assigned genomic locus (note: multiple independent lead SNVs can be assigned to the same genomic locus). uniqID = Unique ID of SNVs consists of chr:position:allele1:allele2 where alleles are alphabetically ordered. rsID = rsID of the SNV. chr = Chromosome. pos = Position on hg19. p = P-value from GWAS. nGWASSNVs = Number of GWAS-tagged SNVs which are in LD of the independent significant SNV given \(r^2\) = 0.6.
Figure 3.10: List of mapped genes (via positional mapping) from FUMA for the LVM BOLT-LMM GWAS. ensg = ENSG ID. symbol = Gene Symbol. chr = Chromosome. start = Starting position of the gene. end = Ending position of the gene. strand = Strand of the gene. type = Gene biotype from Ensembl. entrezID = entrez ID (if available). HUGO = HUGO (HGNC) gene symbol. pLI = pLI score (the probability of being loss-of-function intolerant) from ExAC database; the higher the score is, the more intolerant to loss-of-function mutations the gene is. ncRVIS = Non-coding residual variation intolerance score; the higher the score is, the more intolerant to non-coding variation the gene is. posMapSNVs = Number of SNVs mapped to gene based on positional mapping. posMapMaxCADD = The maximum CADD score of mapped SNVs by positional mapping. minGwasP = The minimum P-value of mapped SNVs. IndSigSNVs = rsID of the independent significant SNVs that are in LD with the mapped SNVs. GenomicLocus = Index of genomic loci where mapped SNVs are from.
Figure 3.11: GWAS Manhattan plot for the LVMi phenotype using Plink. Red dashed line indicates the genome-wide significance level (p = 5E-8).
Figure 3.12: List of genomic risk loci from FUMA for the LVMi Plink GWAS. uniqID = Unique ID of SNVs consists of chr:position:allele1:allele2 where alleles are alphabetically ordered. rsID = rsID of the top lead SNV for the given genomic locus. chr = Chromosome of top lead SNV. pos = Position of top lead SNV on hg19. p = P-value of top lead SNV. start = Start position of the locus. end = End position of the locus. nGWASSNVs = Number of the GWAS-tagged candidate SNVs within the genomic locus. nIndSigSNVs = Number of the independent significant SNVs in the genomic locus. IndSigSNVs = rsID of independent significant SNVs in the genomic locus. nLeadSNVs = The number of lead SNVs in the genomic locus. LeadSNVs = rsID of lead SNVs in the genomic locus.
Figure 3.13: List of lead SNVs from FUMA for the LVMi Plink GWAS. GenomicLocus = Index of assigned genomic locus (note: multiple independent lead SNVs can be assigned to the same genomic locus). uniqID = Unique ID of SNVs consists of chr:position:allele1:allele2 where alleles are alphabetically ordered. rsID = rsID of the SNV. chr = Chromosome. pos = Position on hg19. p = P-value from GWAS. nIndSigSNVs = Number of independent significant SNVs which are in LD of the lead SNV at \(r^2\) = 0.1. IndSigSNVs = rsID of independent significant SNVs which are in LD of the lead SNV at \(r^2\) = 0.1.
Figure 3.14: List of independent significant SNVs from FUMA for the LVMi Plink GWAS. GenomicLocus = Index of assigned genomic locus (note: multiple independent lead SNVs can be assigned to the same genomic locus). uniqID = Unique ID of SNVs consists of chr:position:allele1:allele2 where alleles are alphabetically ordered. rsID = rsID of the SNV. chr = Chromosome. pos = Position on hg19. p = P-value from GWAS. nGWASSNVs = Number of GWAS-tagged SNVs which are in LD of the independent significant SNV given \(r^2\) = 0.6.
Figure 3.15: List of mapped genes (via positional mapping) from FUMA for the LVMi Plink GWAS. ensg = ENSG ID. symbol = Gene Symbol. chr = Chromosome. start = Starting position of the gene. end = Ending position of the gene. strand = Strand of the gene. type = Gene biotype from Ensembl. entrezID = entrez ID (if available). HUGO = HUGO (HGNC) gene symbol. pLI = pLI score (the probability of being loss-of-function intolerant) from ExAC database; the higher the score is, the more intolerant to loss-of-function mutations the gene is. ncRVIS = Non-coding residual variation intolerance score; the higher the score is, the more intolerant to non-coding variation the gene is. posMapSNVs = Number of SNVs mapped to gene based on positional mapping. posMapMaxCADD = The maximum CADD score of mapped SNVs by positional mapping. minGwasP = The minimum P-value of mapped SNVs. IndSigSNVs = rsID of the independent significant SNVs that are in LD with the mapped SNVs. GenomicLocus = Index of genomic loci where mapped SNVs are from.
Figure 3.16: GWAS Manhattan plot for the LVM phenotype using Plink. Red dashed line indicates the genome-wide significance level (p = 5E-8).
Figure 3.17: List of genomic risk loci from FUMA for the LVM Plink GWAS. uniqID = Unique ID of SNVs consists of chr:position:allele1:allele2 where alleles are alphabetically ordered. rsID = rsID of the top lead SNV for the given genomic locus. chr = Chromosome of top lead SNV. pos = Position of top lead SNV on hg19. p = P-value of top lead SNV. start = Start position of the locus. end = End position of the locus. nGWASSNVs = Number of the GWAS-tagged candidate SNVs within the genomic locus. nIndSigSNVs = Number of the independent significant SNVs in the genomic locus. IndSigSNVs = rsID of independent significant SNVs in the genomic locus. nLeadSNVs = The number of lead SNVs in the genomic locus. LeadSNVs = rsID of lead SNVs in the genomic locus.
Figure 3.18: List of lead SNVs from FUMA for the LVM Plink GWAS. GenomicLocus = Index of assigned genomic locus (note: multiple independent lead SNVs can be assigned to the same genomic locus). uniqID = Unique ID of SNVs consists of chr:position:allele1:allele2 where alleles are alphabetically ordered. rsID = rsID of the SNV. chr = Chromosome. pos = Position on hg19. p = P-value from GWAS. nIndSigSNVs = Number of independent significant SNVs which are in LD of the lead SNV at \(r^2\) = 0.1. IndSigSNVs = rsID of independent significant SNVs which are in LD of the lead SNV at \(r^2\) = 0.1.
Figure 3.19: List of independent significant SNVs from FUMA for the LVM Plink GWAS. GenomicLocus = Index of assigned genomic locus (note: multiple independent lead SNVs can be assigned to the same genomic locus). uniqID = Unique ID of SNVs consists of chr:position:allele1:allele2 where alleles are alphabetically ordered. rsID = rsID of the SNV. chr = Chromosome. pos = Position on hg19. p = P-value from GWAS. nGWASSNVs = Number of GWAS-tagged SNVs which are in LD of the independent significant SNV given \(r^2\) = 0.6.
Figure 3.20: List of mapped genes (via positional mapping) from FUMA for the LVM Plink GWAS. ensg = ENSG ID. symbol = Gene Symbol. chr = Chromosome. start = Starting position of the gene. end = Ending position of the gene. strand = Strand of the gene. type = Gene biotype from Ensembl. entrezID = entrez ID (if available). HUGO = HUGO (HGNC) gene symbol. pLI = pLI score (the probability of being loss-of-function intolerant) from ExAC database; the higher the score is, the more intolerant to loss-of-function mutations the gene is. ncRVIS = Non-coding residual variation intolerance score; the higher the score is, the more intolerant to non-coding variation the gene is. posMapSNVs = Number of SNVs mapped to gene based on positional mapping. posMapMaxCADD = The maximum CADD score of mapped SNVs by positional mapping. minGwasP = The minimum P-value of mapped SNVs. IndSigSNVs = rsID of the independent significant SNVs that are in LD with the mapped SNVs. GenomicLocus = Index of genomic loci where mapped SNVs are from.
To further investigate the choice of GWAS method and phenotype, we conducted a deeper exploration into the similarities and differences between the BOLT-LMM and PLINK results as well as between the LVM and LVMi results in the stability analysis below. The primary goal of this stability analysis is to answer the following two questions:
Figure 3.21: Comparison of the BOLT-LMM and PLINK GWAS p-values for each phenotype (LVM left, LVMi right). Each point represents a SNV with its BOLT-LMM p-value on the x-axis and its PLINK p-value on the y-axis.
Figure 3.22: Summary of annotated SNVs from FUMA across different GWAS methods (BOLT-LMM and PLINK) and across different choices of phenotype (LVMi and LVM).
Figure 3.23: Summary of mapped genes from FUMA across different GWAS methods (BOLT-LMM and PLINK) and across different choices of phenotype (LVMi and LVM).
In this section, we briefly explore the selected 1405 SNVs that passed the GWAS filter for the LVMi phenotype. We make two observations that will impact how we interpret the lo-siRF fit downstream:
We will expand upon this locus-level importance score in a later section (see Prioritization Step
).
Additional notes:
GWAS Results Summary
tab)). rs3045696, an intronic TTN SNV, was stably identified to be a top lead SNV in the GWASes, and rs66733621 is the top SNV from the TTN locus in lo-siRF. This is to say that the correlation between the CCDC141 and TTN loci is moderate and may possibly suggest independent roles of CCDC141 and TTN in their detected interaction. A further discussion of this is provided in Q. Wang et al. (2023). [Note: we use the term top SNV from a genetic locus in lo-siRF to mean the SNV in the specified genetic locus with the highest frequency of occurrence across decision paths in the siRF fit.]Figure 3.24: Correlation heatmap for the GWAS-filtered SNVs. We plot the magnitude of the Pearson correlation between all possible pairs of SNVs that passed the GWAS filter. Darker red indicates higher correlation between the two SNVs. SNVs along the axes are ordered according to genomic location. Loci with more than two constituent SNVs are labeled.
Figure 3.25: Number of SNVs per genetic locus that passed the GWAS filter. Note: a semicolon is used to denote the intergenic region between two genes.
In the next step of lo-siRF, we chose to binarize the (continuous) LVMi phenotype measurements into two categories: a low LVMi group and a high LVMi group before fitting the prediction models.
Why did we choose to binarize? One of our main motivating factors to pursue binarization revolved around the prediction check criteria in the PCS framework (Yu and Kumbier 2020). This predictability principle advocates for the use of prediction accuracy as a reality check. This is to say that at a minimum, models should fit the data well, as measured by prediction accuracy, before believing that the model is capturing something relevant to reality or the real-world phenomena under study. In particular, before we thought to binarize the phenotype, we first attempted to train prediction models using the GWAS-filtered SNVs to predict the original (continuous) LVMi phenotype measurements. While this seemed like the most natural next step, these prediction results illuminated an incredibly weak prediction signal, with validation \(R^2\) values hovering very close to 0 (see Prediction Results without Binarization
tab). Given that an \(R^2\) value of 0 can be achieved by simply predicting the mean LVMi response, this raised the question of whether these prediction models are capturing any relevant signal in the data. Consequently, the interpretation of these models may not be an accurate representation of reality. Put differently, these models do not satisfy the prediction check criteria under the PCS framework.
Given these difficulties in accurately predicting LVMi, we turned to an alternative school of thought in order to extract reliable insights from such low-signal data. Drawing inspiration from how scientists often reduce complex problems into simplified versions, we can simplify the problem of trying to understand how SNVs are associated with everyone’s precise LVMi measurements to asking which SNVs are associated with having extremely high or extremely low LVMi measurements. In other words, we can reduce the problem of predicting (continuous) LVMi responses to predicting whether one has an extremely high LVMi or low LVMi. This binarization not only simplifies the problem conceptually, but may also help to denoise since we are only looking at the more extreme LVMi cases to gain understanding. Furthermore, the original regression problem becomes a classification problem after binarization, which can help to facilitate the interpretation of prediction performance with respect to the prediction check (e.g., it is arguably easier to reason about the practical difference between 55% and 50% balanced classification accuracy than \(R^2\) values of 0.05 and 0).
How is the binarization done? In order to binarize the LVMi responses, one must choose some threshold. We chose to binarize the response based on quantiles so that for a given threshold x, we took the top and bottom x% of LVMi values and pooled them into the high and low LVMi groups, respectively, while omitting the individuals in the middle quantile range. However, due to the gender-specific biological variation of LVMi shown earlier in the exploratory data analysis (see Choice of Phenotypic Data
), we performed the binarization for males and females separately. Finally, given that this threshold is artificially introduced, we run the remainder of the lo-siRF pipeline using three different reasonable binarization thresholds (15%, 20%, 25%) that balance the amount of denoising and data lost. In the end, we will consolidate the results that were stable across all three binarization thresholds, described later.
We expand on the prediction modeling with the binarized LVMi phenotype in the next section (see Prediction Step
).
Here, we present a summary of the prediction modeling results, where we use the GWAS-filtered SNVs to predict the raw LVMi phenotype measurements (i.e., without any binarization). Specifically, we measure the validation set prediction accuracy from several popular machine learning regression algorithms (kernel ridge regression, Lasso regression, ridge regression, random forest (RF), and signed iterative random forest (siRF)) across four different prediction performance metrics - the Pearson correlation, mean absolute error (MAE), R-squared, and root-mean-squared error (RMSE) between the observed LVMi response and the predicted LVMi response. We also visualize how the predicted responses differ across the different prediction models in the provided pair plots. Though RF appears to yield the best prediction performance relative to the other methods under consideration, the RF prediction performance is still very poor as both the correlation and R-squared metrics are very close to 0.
Figure 4.1: Summary of prediction performance on the validation set for the LVMi phenotype (without binarization) across various prediction methods and evaluation metrics.
Figure 4.2: Pair plot of the predicted LVMi responses from various prediction models. Each point represents an individual in the validation set, and the coordinates represent the predicted LVMi response from two different prediction methods, which are specified by the x and y-axes labels in each subplot. Density distributions of the predicted LVMi responses are shown along the diagonal.
After choosing to binarize the LVMi phenotype, we next fit various prediction models using the GWAS-filtered SNVs to predict the binarized LVMi phenotype and assess their validation prediction accuracy.
Which models were chosen and why? While there are many methods, each with their unique advantages and disadvantages, for detecting epistasis from data (e.g., see Niel et al. 2015 and references therein), we chose to focus on siRF (Kumbier et al. 2023) as our primary interaction search engine for various reasons.
Though we are most interested in interpreting the siRF prediction model to identify candidate epistatic interactions, we also fit several other popular prediction models, including RF, L1-regularized logistic regression (Lasso), L2-regularized logistic regression (ridge), and support vector machines (SVM). These prediction models serve as baselines to ensure that siRF indeed fits the data well, relative to these other popular prediction models, and that siRF passes the prediction check in accordance with the PCS framework (Yu and Kumbier 2020).
Summary of prediction results: For all of the classification algorithms under investigation, we see that the validation prediction accuracy metrics, measured via classification accuracy, area under the ROC curve (AUROC), and area under the precision-recall curve (AUPRC), are all comfortably above the random-guessing baseline of 0.5 across three different binarization thresholds. This suggests that the prediction models are able to ascertain some phenotypic signal from the SNVs under study. That is, the SNVs used in the prediction models have some predictive power in differentiating those with high LVMi versus those with low LVMi. Moreover, siRF consistently yielded the highest predictive power across the various binarization thresholds and prediction metrics (with one exception being the classification accuracy for the 15% binarization threshold, where ridge is slightly better than siRF). From this, we conclude that siRF passed the prediction check and will proceed to interpret the siRF fit to recommend and prioritize interactions between genetic loci for downstream analyses and experiments.
Beyond the validation prediction accuracies, we also provide the confusion matrices, ROC and precision-recall curves, and pair plots comparing the predicted responses across methods for additional post-hoc exploration.
Figure 5.1: Summary of prediction performance on the validation set for the binarized LVMi phenotype across various prediction methods, binarization thresholds, and evaluation metrics.
Figure 5.2: Confusion matrix from various prediction models for predicting the binarized LVMi phenotype (binarization threshold = 0.15).
Figure 5.3: ROC curve for predicting the binarized LVMi (threshold = 0.15) across various prediction methods. TPR = true positive rate; FPR = false positive rate.
Figure 5.4: Precision-recall curve for predicting the binarized LVMi (threshold = 0.15) across various prediction methods.
Figure 5.5: Pair plots of the predicted binarized LVMi responses (threshold = 0.15) across various prediction methods. Each point represents an individual in the validation set, and the coordinates represent the predicted probability of having high LVMi from two different prediction methods, which are specified by the x and y-axes labels in each subplot. Density distributions of the predicted probabilities are shown along the diagonal.
Figure 5.6: Confusion matrix from various prediction models for predicting the binarized LVMi phenotype (binarization threshold = 0.2).
Figure 5.7: ROC curve for predicting the binarized LVMi (threshold = 0.2) across various prediction methods. TPR = true positive rate; FPR = false positive rate.
Figure 5.8: Precision-recall curve for predicting the binarized LVMi (threshold = 0.2) across various prediction methods.
Figure 5.9: Pair plots of the predicted binarized LVMi responses (threshold = 0.2) across various prediction methods. Each point represents an individual in the validation set, and the coordinates represent the predicted probability of having high LVMi from two different prediction methods, which are specified by the x and y-axes labels in each subplot. Density distributions of the predicted probabilities are shown along the diagonal.
Figure 5.10: Confusion matrix from various prediction models for predicting the binarized LVMi phenotype (binarization threshold = 0.25).
Figure 5.11: ROC curve for predicting the binarized LVMi (threshold = 0.25) across various prediction methods. TPR = true positive rate; FPR = false positive rate.
Figure 5.12: Precision-recall curve for predicting the binarized LVMi (threshold = 0.25) across various prediction methods.
Figure 5.13: Pair plots of the predicted binarized LVMi responses (threshold = 0.25) across various prediction methods. Each point represents an individual in the validation set, and the coordinates represent the predicted probability of having high LVMi from two different prediction methods, which are specified by the x and y-axes labels in each subplot. Density distributions of the predicted probabilities are shown along the diagonal.
Having passed the prediction check, we finally proceed to interpret the siRF fit in the last step of lo-siRF. In this step, we developed a novel stability-driven feature importance score to prioritize both genetic loci and interactions between genetic loci from siRF. We refer to Q. Wang et al. (2023) for the detailed description of this new importance score and here provide brief insights into the motivation and intuition behind this method.
Main idea: At a high-level, this new importance score (1) aggregates weak, unstable SNV-level importances from siRF into stronger, more stable locus-level importances and (2) can be leveraged to evaluate both the marginal importance of a genetic locus in siRF as well as the importance of higher-order interactions between genetic loci in siRF. The need for this locus-level feature importance score stems from two main reasons: the high correlation between SNVs and the weak phenotypic signal. Though the prediction models (including siRF) are all trained using SNV data, it is often very challenging to interpret the importance of an individual SNV from these models due to the high correlation between SNVs (Toloşi and Lengauer 2011; Hooker, Mentch, and Zhou 2021), which we observed in our previous exploratory data analysis (see Choice of Phenotypic Data
tab). To further complicate matters, the weak phenotypic signal increases the instability of existing feature importance methods in that we see the feature rankings change drastically when using different bootstrap training samples. Given these challenges with SNV-level feature importances, it is then no surprise that siRF, out-of-the-box, does not find any stable interactions between SNVs (i.e., the siRF stability score is around 0 for all interactions between SNVs). To address these challenges, our new feature importance score leverages the correlation structure between SNVs, groups SNVs into genetic loci, and aggregates weak, unstable SNV-level importances from siRF into stronger, more stable locus-level importances.
Motivation behind new feature importance score: To better understand the motivation and construction of our new locus-level feature importance score, it can be helpful to first review the original importance score used in siRF (Kumbier et al. 2023). Briefly, in the original version of siRF, the importance of a signed interaction is based upon how stable or frequently it occurs in decisions paths across the random forest – that is, based upon the number of decision paths for which every signed feature in the signed interaction was used in at least one decision split (see Kumbier et al. (2023) for details). Following a similar logic, one natural path forward to calculate a locus-level importance from a forest that was fit on SNV data is as follows:
However, this frequency of occurrence measure is heavily tied to the number of SNVs that belong to each genetic locus. A genetic locus that consists of more SNVs will naturally have a higher frequency of occurrence than a genetic locus with only a few SNVs even though both genetic loci may be equally important (or unimportant). In other words, the original siRF importance score, measuring the frequency of occurrence, is not comparable across genetic loci. This bias is problematic given that we have previously seen some genetic loci may consist of only 1 SNV while other genetic loci have >50 SNVs (see GWAS-filtered SNVs
tab in the Dimension Reduction Step).
Intuition behind new feature importance score: To alleviate this bias, we leverage the idea of a local feature importance score, where we compute a per-individual score that measures the importance of a locus/interaction for making the prediction of each individual. This is in contrast to a global feature importance score, which is a measure of importance for the entire group of individuals under study. While a global feature importance score gives rise to a single score for the entire population under study, we can compute a local feature importance score for each individual in the population. By computing a local feature importance score for each individual (i.e., this is the local stability importance (LSI) score from Q. Wang et al. (2023)), we can compare the importance of the locus/interaction across individuals and avoid the across-loci comparison that was problematic in the aforementioned discussion. This comparison is done via a permutation test in Q. Wang et al. (2023). Moreover, since we are comparing the local feature importances between individuals within or per loci/interaction in this permutation test, the local feature importances are all measured on the same scale and comparable, thereby mitigating the bias towards genetic loci with more SNVs.
How to aggregate SNVs: We mapped each SNV to its corresponding genetic locus using ANNOVAR (K. Wang, Li, and Hakonarson 2010) according to the hg19 refSeq Gene annotations. Given our ultimate goal of recommending/prioritizing genetic loci and interactions between loci for downstream analyses and experimental validation, we believe this choice provides an appropriate starting point to obtain actionable recommendations. However, we note that there are many alternative ways of aggregating SNVs. For example, it is possible to aggregate SNVs according to LD/haplotype blocks or to include additional gene-based or functional annotations in the aggregation scheme.
Organization: Under the SNV-level Importances
tab, we provide the SNV-level feature importances from the fitted prediction algorithms (described in the Prediction Step
section) for completeness. These SNV-level feature importances are very unstable across methods and across bootstrap perturbations for the reasons discussed above. We then summarize the results from the lo-siRF pipeline including the rankings of genetic loci and interaction between genetic loci using our new locus-level importance score under the Summary of lo-siRF Results
tab. A final technical note on the advantages of incorporating signed information from siRF in this feature importance computation is provided under the Advantages of Signed Information
tab.
Having observed that the aforementioned classification algorithms do have some predictive power to predict those with high versus low LVMi, a natural next step is to investigate which SNVs were used in these prediction models. Below, we highlight the top 100 SNVs, ranked by importance in each of the prediction models. For completeness, we provide the SNV importances for the unbinarized LVMi (regression) problem as well as the binarized LVMi (classification) problem across the three different binarization thresholds.
How is importance defined?
A word of caution: When interpreting these SNV importances, it is crucial to keep in mind that there are very large, strongly correlated blocks of SNVs within this data, which often renders the feature rankings unmeaningful and unstable (Toloşi and Lengauer 2011; Hooker, Mentch, and Zhou 2021). In particular, it is well-known that the MDI variable importance metric used in RF/iRF/siRF is biased against correlated features (Hooker, Mentch, and Zhou 2021). In other words, SNVs within large LD blocks tend to have artificially low MDI values. Given these idiosyncrasies, it is unsurprising that SNVs from TTN and CCDC141 – two genetic loci with many SNVs under LD – do not have high importances according to MDI. We provide these SNV importance tables only for completeness and to partially motivate the need to aggregate SNV-level importances into locus-level importances in lo-siRF.
Figure 6.1: Lasso - Top 100 SNVs, ranked by importance, for predicting LVMi.
Figure 6.2: Ridge - Top 100 SNVs, ranked by importance, for predicting LVMi.
Figure 6.3: RF - Top 100 SNVs, ranked by importance, for predicting LVMi.
Figure 6.4: siRF - Top 100 SNVs, ranked by importance, for predicting LVMi.
Figure 6.5: Lasso - Top 100 SNVs, ranked by importance, for predicting the binarized LVMi (threshold = 0.15).
Figure 6.6: Ridge - Top 100 SNVs, ranked by importance, for predicting the binarized LVMi (threshold = 0.15).
Figure 6.7: RF - Top 100 SNVs, ranked by importance, for predicting the binarized LVMi (threshold = 0.15).
Figure 6.8: siRF - Top 100 SNVs, ranked by importance, for predicting the binarized LVMi (threshold = 0.15).
Figure 6.9: Lasso - Top 100 SNVs, ranked by importance, for predicting the binarized LVMi (threshold = 0.2).
Figure 6.10: Ridge - Top 100 SNVs, ranked by importance, for predicting the binarized LVMi (threshold = 0.2).
Figure 6.11: RF - Top 100 SNVs, ranked by importance, for predicting the binarized LVMi (threshold = 0.2).
Figure 6.12: siRF - Top 100 SNVs, ranked by importance, for predicting the binarized LVMi (threshold = 0.2).
Figure 6.13: Lasso - Top 100 SNVs, ranked by importance, for predicting the binarized LVMi (threshold = 0.25).
Figure 6.14: Ridge - Top 100 SNVs, ranked by importance, for predicting the binarized LVMi (threshold = 0.25).
Figure 6.15: RF - Top 100 SNVs, ranked by importance, for predicting the binarized LVMi (threshold = 0.25).
Figure 6.16: siRF - Top 100 SNVs, ranked by importance, for predicting the binarized LVMi (threshold = 0.25).
Finally, we summarize the prioritization results from the lo-siRF pipeline using our new importance score to rank genetic loci and interactions between genetic loci. In this new importance score, we (1) computed a local (i.e., on a per-individual basis) stability importance score that aggregates weak SNV-level importances into stronger locus-level importances using the siRF fit and (2) conducted a permutation test to assess whether the local stability importance scores for a given locus/interaction are different between individuals with high and low LVMi (conditioned on the rest of the fitted forest). The larger the difference in local stability importance scores between those with high versus low LVMi, the greater the importance of the locus/interaction for predicting LVMi. For details on the new importance score, we refer to Q. Wang et al. (2023).
Below, we recap the siRF prediction accuracy for predicting the binarized LVMi across the three binarization thresholds (15%, 20%, 25%) (Figure 6.17). Then, in Figure 6.18, we summarize the prioritization rankings of genetic loci and interaction between loci according to the new importance score. We provide the permutation p-values along with the permutation test statistic, that is, the difference in local feature stability scores between the high and low LVMi groups, with the standard deviation of this permutation distribution in parentheses.
Note that the permutation test was not performed for all loci/interactions (details in Q. Wang et al. (2023)). Given our primary aim of generating reliable hypotheses for follow-up experimental validation, we chose to only test loci/interactions that were predictive and stable in the siRF fit, following the PCS framework for veridical data science (Yu and Kumbier 2020). Specifically, for each binarization threshold, we only tested:
By implementing this additional predictability and stability check, we are requiring that the prioritized loci/interactions meet a higher standard, with the hope that this ultimately generates more reliable (albeit more conservative) experimental recommendations.
In addition to the provided summary of results,
siRF Interaction Table
tab, we provide the full siRF locus-level interaction table output for each binarization threshold (Figures 6.19, 6.21, and 6.23). This table includes several metrics for evaluating the stability and strength of the interaction. For details on these metrics, we refer to Kumbier et al. (2023).
Local Stability Importance Plots - Interactions
and Local Stability Importance Plots - Loci
, we provide visualizations of the density distribution of local stability importance scores between the low and high LVMi groups as well as the frequency distribution of SNVs used in the siRF fit (i.e., which SNVs occurred most frequently in the decision paths across the siRF fit).
Figure 6.17: Summary of the validation prediction performance from siRF across various LVMi binarization thresholds.
Figure 6.18: Summary of the lo-siRF permutation test results across various LVMi binarization thresholds including the test statistic (i.e., difference in local feature stability scores between the high and the low LVMi groups (SD in parentheses) and the resulting p-values. Blank cells indicate that the locus/interaction was not selected for testing for the given binarization threshold. Loci/interactions are ranked according to the mean p-value across all three binarization thresholds. Interactions between genetic loci are highlighted in blue.
Figure 6.19: Summary of the siRF locus-level interaction table output for the binarized LVMi phenotype (binarization threshold = 0.15). For all metrics excluding the lo-siRF mean permutation p-value, higher values indicate a stronger interaction. Details regarding these metrics in the siRF interaction table output can be found in Kumbier et al. (2018).
Figure 6.20: Summary of the siRF SNV-level interaction table output for the binarized LVMi phenotype (binarization threshold = 0.15). Stability scores for all interactions are close to 0, suggesting unreliable SNV-level interactions. Details regarding these metrics in the siRF interaction table output can be found in Kumbier et al. (2018).
Figure 6.21: Summary of the siRF locus-level interaction table output for the binarized LVMi phenotype (binarization threshold = 0.2). For all metrics excluding the lo-siRF mean permutation p-value, higher values indicate a stronger interaction. Details regarding these metrics in the siRF interaction table output can be found in Kumbier et al. (2018).
Figure 6.22: Summary of the siRF SNV-level interaction table output for the binarized LVMi phenotype (binarization threshold = 0.2). Stability scores for all interactions are close to 0, suggesting unreliable SNV-level interactions. Details regarding these metrics in the siRF interaction table output can be found in Kumbier et al. (2018).
Figure 6.23: Summary of the siRF locus-level interaction table output for the binarized LVMi phenotype (binarization threshold = 0.25). For all metrics excluding the lo-siRF mean permutation p-value, higher values indicate a stronger interaction. Details regarding these metrics in the siRF interaction table output can be found in Kumbier et al. (2018).
Figure 6.24: Summary of the siRF SNV-level interaction table output for the binarized LVMi phenotype (binarization threshold = 0.25). Stability scores for all interactions are close to 0, suggesting unreliable SNV-level interactions. Details regarding these metrics in the siRF interaction table output can be found in Kumbier et al. (2018).
Figure 6.25: Distribution of local stability importance scores between the high and low LVMi groups (binarization threshold = 0.15) for each signed interaction between loci (specified by subplot) that passed the siRF stability filter. The permutation test p-value, assessing distributional differences between the high and low LVMi groups, is provided in the top right corner of each subplot.
Figure 6.26: Frequency of occurrence of SNV-SNV combinations in the siRF fit for the binarized LVMi phenotype (binarization threshold = 0.15). SNV-SNV combinations with the highest number of occurrences in the siRF decision paths contribute the most to the overall locus-level interaction importance in the siRF fit.
Figure 6.27: Distribution of local stability importance scores between the high and low LVMi groups (binarization threshold = 0.2) for each signed interaction between loci (specified by subplot) that passed the siRF stability filter. The permutation test p-value, assessing distributional differences between the high and low LVMi groups, is provided in the top right corner of each subplot.
Figure 6.28: Frequency of occurrence of SNV-SNV combinations in the siRF fit for the binarized LVMi phenotype (binarization threshold = 0.2). SNV-SNV combinations with the highest number of occurrences in the siRF decision paths contribute the most to the overall locus-level interaction importance in the siRF fit.
Figure 6.29: Distribution of local stability importance scores between the high and low LVMi groups (binarization threshold = 0.25) for each signed interaction between loci (specified by subplot) that passed the siRF stability filter. The permutation test p-value, assessing distributional differences between the high and low LVMi groups, is provided in the top right corner of each subplot.
Figure 6.30: Frequency of occurrence of SNV-SNV combinations in the siRF fit for the binarized LVMi phenotype (binarization threshold = 0.25). SNV-SNV combinations with the highest number of occurrences in the siRF decision paths contribute the most to the overall locus-level interaction importance in the siRF fit.
Figure 6.31: Distribution of local stability importance scores between the high and low LVMi groups (binarization threshold = 0.15) for each signed locus (specified by subplot). Here, we show only the top 25 signed loci, ranked by their average local stability importance scores. The permutation test p-value, assessing distributional differences between the high and low LVMi groups, is provided in the top right corner of each subplot.
Figure 6.32: Frequency of occurrence of SNVs in the siRF fit for the binarized LVMi phenotype (binarization threshold = 0.15). SNVs with the highest number of occurrences in the siRF decision paths contribute the most to the overall locus-level importance in the siRF fit. SNVs are ordered on the x-axis by genomic location.
Figure 6.33: Distribution of local stability importance scores between the high and low LVMi groups (binarization threshold = 0.2) for each signed locus (specified by subplot). Here, we show only the top 25 signed loci, ranked by their average local stability importance scores. The permutation test p-value, assessing distributional differences between the high and low LVMi groups, is provided in the top right corner of each subplot.
Figure 6.34: Frequency of occurrence of SNVs in the siRF fit for the binarized LVMi phenotype (binarization threshold = 0.2). SNVs with the highest number of occurrences in the siRF decision paths contribute the most to the overall locus-level importance in the siRF fit. SNVs are ordered on the x-axis by genomic location.
Figure 6.35: Distribution of local stability importance scores between the high and low LVMi groups (binarization threshold = 0.25) for each signed locus (specified by subplot). Here, we show only the top 25 signed loci, ranked by their average local stability importance scores. The permutation test p-value, assessing distributional differences between the high and low LVMi groups, is provided in the top right corner of each subplot.
Figure 6.36: Frequency of occurrence of SNVs in the siRF fit for the binarized LVMi phenotype (binarization threshold = 0.25). SNVs with the highest number of occurrences in the siRF decision paths contribute the most to the overall locus-level importance in the siRF fit. SNVs are ordered on the x-axis by genomic location.
Though the signed information is not pertinent to our goal of prioritizing/recommending candidate markers for experiments, the signed information from siRF provides more granular information that can improve our interpretation of the fit. To see this, suppose that we did not keep track of the sign of the feature when splitting (i.e., that we do not keep track of whether we split on low or high values of the feature). Suppose also that we would like to measure the importance of feature \(X_1\), which was used as the first (root) split in tree \(T\). In the first step in our new importance score computation, we compute the local stability importance (LSI) score of \(X_1\) for each individual \(i\). If we do not keep track of the sign of the \(X_1\) split, then \(X_1\) appears on every individual’s decision path so that \(LSI_T(X_1, i) = 1\) for every individual \(i\). In the next step of our new importance score computation, we perform a permutation test to assess whether there is a difference in the LSI scores for individuals with high LVMi and those with low LVMi. Since \(LSI_T(X_1, i) = 1\) for every \(i\), this permutation test will return a p-value of 1. This suggests that \(X_1\) is an unimportant feature in this model. However, this is counterintuitive given that \(X_1\) was split on first in the tree and should be considered one of the most important features in this tree. By keeping track of the signed information of the splits as done in siRF, we can utilize this more granular information to mitigate the issue discussed above and improve our interpretation of the siRF fit.
Moreover, to facilitate the interpretation of this signed information, it can be beneficial to aggregate SNV features that are positively correlated (as opposed to negatively correlated) into a single locus (or group). Since the encoding of SNVs as a value taking on 0, 1, or 2 is rather arbitrary (e.g., a value of 0 can be re-encoded to take on the value of 2 by changing the reference allele), we can choose the encoding of the SNV to try to avoid strong negative correlations within a locus. More specifically, for each genetic locus which consists of SNVs \(X_1, \ldots, X_g\), we choose to “flip” the encoding of the SNV \(X_j\) (\(j = 2, \ldots, g\)) if the Pearson correlation between the observed values of \(X_j\) and \(X_1\) is below -0.15. Here, we use the term “flip” to mean the application of the function that maps 0 to 2, 1 to 1, and 2 to 0. This correlation threshold of -0.15 was chosen by visually inspecting the resulting pairwise correlations between \(X_i\) and \(X_j\) for all \(i, j = 1, \ldots, g\) and checking that these pairwise correlations are not strongly negative. However, while we focus on the lo-siRF prioritization results using this correlation threshold of -0.15, the results appear to be robust to other reasonable correlation threshold choices (e.g., -0.05, -0.25). Further analysis and improvements to this procedure are very interesting for future work.
We reiterate that lo-siRF was originally developed as a first-stage recommendation (or hypothesis generation) tool within a broader pipeline in order to prioritize genetic loci and interactions between loci for downstream analyses and experimental validation. Importantly, the output of lo-siRF is a prioritization/ranking order, not a proper statistical test of significance. We rely on follow-up investigations and in this study rigorous gene-silencing experiments in order to validate the prioritized genetic loci and interaction between loci. With that, many of the modeling decisions and human judgment calls were made with this original use case in mind. For other use cases or problems, it is highly likely that different choices may be better suited. We also acknowledge that though many stability checks were conducted to help ensure that our conclusions are stable across different reasonable data and/or modeling perturbations, these stability checks are inevitably constrained by computational limitations, and additional stability analyses can be conducted. We hope that by documenting our modeling choices, stability analyses, and providing insights into our motivations/reasons, this may help to both clarify the limitations of our current work and to facilitate the trustworthy (veridical) use of lo-siRF (and variants thereof) in other scientific problems.