Abstract
Motivation ANOVA Simultaneous Component Analysis (ASCA) is a popular method for the analysis of multivariate data yielded by designed experiments. Meaningful associations between factors/interactions of the experimental design and measured variables in the data set are typically identified via significance testing, with permutation tests being the standard go-to choice. However, in settings with large numbers of variables, the “holistic” testing approach of ASCA (all variables considered) often overlooks statistically significant effects encoded by only a few variables.
Results We propose Variable-selection ASCA (VASCA), a method that generalizes ASCA through variable selection, augmenting its statistical power without inflating the Type-I error risk. The method is evaluated with simulations and with a real data set from a multi-omic clinical experiment. We show that VASCA is more powerful than both ASCA and the widely-adopted False Discovery Rate (FDR) controlling procedure; the latter is used as a benchmark for variable selection based on multiple significance testing. We further illustrate the usefulness of VASCA for exploratory data analysis in comparison to the popular Partial Least Squares Discriminant Analysis (PLS-DA) method and its sparse counterpart (sPLS-DA).
Availability The code for VASCA is available in the MEDA Toolbox at https://github.com/josecamachop/MEDA-Toolbox
Contact josecamacho{at}ugr.es
Supplementary information Supplementary data are available at Bioinformatics online.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This work is partly supported by the Agencia Andaluza del Conocimiento, Regional Government of Andalucia, in Spain, and ERDF (European Regional Development Fund) funds through project B-TIC136-UGR20. The work of D. Morales-Jimenez is supported in part by the State Research Agency (AEI) of Spain and the European Social Fund under grant RYC2020-030536-I and by AEI under grant PID2020- 118139RB-I00
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The study protocol was approved by the local Ethics Committee of Granada (Reference 8/15) and was conducted according to the standards given in the Declaration of Helsinki (Edinburg 2000 revised), the Good Clinical Practice of the European Union (document 111/3976/88 July 1990) and legal in-forced Spanish regulations, which regulated the clinical investigation in human beings.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
No real data is explicitly generated for this paper. The code for the simulation study is available upon reasonable request to the authors. The code for VASCA is available in the MEDA Toolbox at https://github.com/josecamachop/MEDA-Toolbox