Abstract
There is a growing focus on better understanding the complexity of dietary patterns and how they relate to health and other factors. Approaches that have not traditionally been applied to characterize dietary patterns, such as machine learning algorithms and latent class analysis methods, may offer opportunities to measure and characterize dietary patterns in greater depth than previously considered. However, there has not been a formal examination of how this wide range of approaches has been applied to characterize dietary patterns. This scoping review synthesized literature from 2005-2022 applying methods not traditionally used to characterize dietary patterns, referred to as novel methods. MEDLINE, CINAHL, and Scopus were searched using keywords including machine learning, latent class analysis, and least absolute shrinkage and selection operator (LASSO). Of 5274 records identified, 24 met the inclusion criteria. Twelve of 24 articles were published since 2020. Studies were conducted across 17 countries. Nine studies used approaches that have applications in machine learning to identify dietary patterns. Fourteen studies assessed associations between dietary patterns that were characterized using novel methods and health outcomes, including cancer, cardiovascular disease, and asthma. There was wide variation in the methods applied to characterize dietary patterns and in how these methods were described. The extension of reporting guidelines and quality appraisal tools relevant to nutrition research to consider specific features of novel methods may facilitate complete and consistent reporting and enable evidence synthesis to inform policies and programs aimed at supporting healthy dietary patterns.
Introduction
Dietary intake is among the top risk factors for chronic diseases.1,2 Research examining dietary intake has historically focused on single foods, nutrients, or other dietary constituents.3 As the focus of public health nutrition shifted from the prevention of deficiency to include a focus on the prevention of chronic diseases, research likewise shifted towards the examination of dietary patterns, aiming to capture how foods and beverages are consumed in real life.3–5 Humans typically do not consume foods or nutrients on their own, but in the context of a broader dietary pattern.3,4 Accordingly, food-based dietary guidelines are now typically focused on patterns of intake rather than single dietary components.6 It is likely the synergistic and antagonistic relationships among the multiple foods, beverages, and other dietary components that humans consume that influence health rather than individual components.4 In addition to this multidimensionality, dietary patterns are dynamic, changing from meal to meal, day to day and across the life course.4,7 Further, dietary patterns are shaped by culture, social position, and other contextual factors.8,9 However, incorporating the domains of multidimensionality, dynamism, and contextual factors into dietary patterns analysis is a difficult task.
Traditional approaches to identify dietary patterns, including ‘a priori’ and ‘a posteriori’ approaches, are useful for understanding overall dietary patterns or diet quality of populations and population subgroups.10 For example, ‘a priori’ methods like the Healthy Eating Index-2020 or the Healthy Eating Food Index-2019 are generally investigator driven,11,12 consider multiple components as inputs, such as fruits and vegetables and whole grains, but typically compress the multidimensional construct of total dietary patterns, condensing inputs to a single unidimensional score reflecting overall diet quality.13,14 ‘A posteriori’ approaches are data-driven and have also been widely used to identify dietary patterns. Commonly applied data-driven approaches include clustering methods (e.g., k-means, Ward’s method), principal component analysis, and factor analysis, providing opportunities to identify dietary patterns through statistical modelling or clustering algorithms rather than relying on researcher hypotheses.15 These approaches compress dietary components to key food groupings typically expressed as single scores.10,16 By reducing the dimensionality of dietary patterns, these methods are limited in their ability to explain the wide variation in dietary intakes.4 Methods employed to traditionally characterize dietary patterns using ‘a priori’ and ‘a posteriori’ approaches thus address multidimensionality to some extent, but do not allow for explorations of dietary patterns in their totality because they miss potential synergistic or antagonistic associations among dietary components.4,14,17
Novel methods that have not traditionally been used to identify dietary patterns, such as probabilistic graphical modelling, latent class analysis, and machine learning algorithms (e.g., random forest, neural networks), may capture complexities like dietary synergy. There is no clear delineation between traditional and novel methods, and specifically defining what is novel is challenging given it naturally implies an evolution of methods. Nonetheless, there is a growing interest among nutrition researchers in the application of methods that have not typically been used to capture dietary complexity, with these methods often centered in machine learning.18 To date, there have been perspectives and narrative reviews on the application of machine learning in nutrition,19–21 and a recent systematic review of studies that applied machine learning approaches to assess food consumption.22 However, there has not been an assessment of studies applying novel methods to characterize dietary patterns. Given the rapid adoption of these methods within the field of health,23–26 it is increasingly important for researchers to have a basic understanding of available methods and how they are being applied in the field. This will facilitate the synthesis of evidence from a range of methodological inputs to inform food-based dietary guidelines and other policies and programs that promote health. The objective of this scoping review was therefore to describe the use of novel methods not traditionally used to characterize dietary patterns in the published literature.
Methods
The review was conducted in accordance with the JBI Manual for Evidence Synthesis,27 which was developed using the Arksey and O’Malley framework.28 Reporting follows the Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR).29
Defining novel methods
The novel methods considered were based on a preliminary search of the literature and the expertise of the research team and included systems methods (e.g., agent-based modelling, system dynamics), least absolute shrinkage and selection operator (LASSO), machine learning algorithms, copulas, and data-driven statistical modelling approaches (e.g., treelet transformations, principal balances and coordinates). Novel methods could also include those that have been used previously in nutrition research if applied in new ways to characterize dietary patterns (e.g., linear programming used to model a modified dietary pattern rather than to test scenarios). Methods that were not considered to be novel were those that have been applied to assess dietary patterns in numerous studies and have been considered by prior reviews and commentaries,2,10,30 including regression, ‘a priori’ approaches such as investigator-driven indices, and routinely used data-driven approaches, including factor analysis and cluster analysis.10,31
Identifying relevant studies
Articles were eligible for inclusion if they were: a primary research article; focused on dietary intake as an exposure or outcome, including examination of dietary patterns (i.e., multiple dietary components in combination rather than single nutrients, foods, or other dietary components); used at least one or more novel methods to characterize dietary patterns as described above; were published in English; and focused on humans. Ineligible studies included those focused on individual foods or human milk rather than dietary patterns, and commentaries and reviews.
Searches of three research databases, MEDLINE (via PubMed), the Cumulative Index to Nursing and Allied Health Literature (CINAHL), and Scopus, were conducted in March 2022. These health-focused, specialized, and multidisciplinary databases were selected based on consultation with a research librarian (JS) to ensure a range of possibly relevant study types were included. The search strategies were developed in consultation with the research librarian using keywords and subject headings to capture diet-related constructs (e.g., dietary intake, patterns, recommendations, feeding behaviour, food habits) and novel methods to characterize dietary patterns (e.g., machine learning, network science, system dynamics model). No date limits were applied to the searches, and articles were included until the end of the search in March 2022. The search strategies for MEDLINE, CINAHL, and Scopus are available in Supplemental File 1.
Study selection
Two independent reviewers (two of AP, AR, SH, SIK) screened each record at the title and abstract and full-text screening stages using Covidence,32 with one consistent reviewer (AP) participating throughout the entire process. At the title and abstract screening stage, an initial pilot screening (25 records) generated 100% agreement (AR and AP) and 92% agreement (AP and SH). A second pilot screening (100 records) generated 91% agreement (AR and AP) and 93% agreement (AP and SH). When applicable, discrepancies were discussed by reviewers and if needed deferred to a third reviewer (SIK) for decision. Following pilot screening, the reviewers independently reviewed the remaining articles (96% agreement, Kappa = 0.83).
The reviewers were intentionally liberal during the title and abstract screening stage because of the breadth of possible novel methods. This required iteratively revisiting the inclusion criteria. For example, reduced rank regression was initially considered to be novel but was found to be prevalent in the literature based on title and abstract screening and was excluded during full-text review. Further, articles that used ‘a posteriori’ methods to identify dietary intake but did not specify the exact method in the title or abstract were included for full-text review.
Pilot screening of full-text reviews (50 records) generated 82% agreement (AR and AP) and 96% agreement (AP and SIK); after discrepancies were discussed, two reviewers independently screened the remaining full-text articles (93% agreement, Kappa = 0.60). The high agreement between reviewers but relatively low Cohen’s Kappa is described as Cohen’s paradox, with a larger number of studies excluded than included.33–35
Data extraction
Data extraction was completed by TEW and JMH using a pre-specified Excel template, with all extracted data subsequently verified by LA. Data extraction fields (Supplemental File 2) included information pertaining to authorship, study title, journal, year of publication, funding source, contextual details (e.g., study location), sample size, and participant characteristics (e.g., age). Details relating to study methods (e.g., analysis input variables, measurement of dietary intake, analytic approaches) and results (e.g., findings related to dietary patterns and if applicable, health risk and outcomes) were also extracted.
Results
Summary of search
A total of 5274 unique articles were identified after removing duplicates. Of these, 436 were identified as potentially relevant based on the title and abstract review and underwent full-text screening (Figure 1). Studies excluded during full-text screening included those that did not include methods defined as novel, those that did not focus on dietary patterns, commentaries, narrative reviews, systematic reviews, studies that were not published in English, studies that were not conducted with humans, and theses/dissertations. A final pool of 24 articles describing 24 unique studies met the inclusion criteria.
Characteristics of included studies
Across the 24 included studies, data from 17 countries were represented (Table 1). Half of the studies were published between 2005 and 2019,36–47 and the remaining 12 were published between 2020 and March 2022.48–59 Three studies used data from subsets of the European Prospective Investigation into Cancer and Nutrition,36,45,46 two studies used waves of data from the National Health and Nutrition Examination Survey,54,58 and two studies used data from the ELSA-Brasil cohort study (Table 2).42,57 Sample sizes ranged from 250 to over 73,000 participants. Nineteen studies were conducted using data from cohort or cross-sectional studies and five studies applied a case-control design.
The majority (n=15) of studies used food frequency questionnaires to assess dietary intake.36,37,39–43,45,47–49,52,55–57 Six studies used 24-hour recalls,46,51,53,54,58,59 two studies used food records/diaries,44,50 and one study used a food frequency questionnaire and a 24-hour recall.38 Among the studies using 24-hour recalls and records/diaries, one used data from a single recall that was combined with data from a food frequency questionnaire.38 The remaining studies, including records or recalls, averaged or combined data from two or more days of intake. Apart from averaging recalls or records, none of the included studies applied substantial efforts to mitigate measurement error present in dietary intake data. Several studies noted potential misreporting as a limitation, and only five studies specifically noted that findings may have been influenced by measurement error present in self-reported dietary assessment instruments.40,43,51,52,59
Novel methods applied to identify dietary patterns
The type of methods used and how they were implemented to identify dietary patterns varied widely (Table 3). Nine studies applied approaches that have applications in machine learning, including classification models, neural networks, and probabilistic graphical models (Table 4).36,38,44–46,49,53,56,59 The earliest study included in this review was published in 2005 and applied neural networks to characterize dietary patterns.38 Fifteen studies applied other novel methods, including latent class analysis, mutual information, and treelet transform.37,39–43,47,48,50–52,54,55,57,58 Two studies identified dietary patterns using more than one novel method.44,53 Five studies included comparisons of different novel methods, though these were typically versions of the same model.44,45,47,49,53 For example, Solans et al. compared three models for compositional data analysis and reported that the best-performing model incorporated both investigator– and data-driven methods.47
In twelve studies, two to eight distinct dietary patterns, such as the ‘prudent’ pattern or ‘Western’ pattern, were identified using methods such as latent class analysis, treelet transform, random forest with classification tree analysis, and multivariate finite mixture models.36–38,41–43,48,50,51,54,57,58 Six studies applied network methods, including probabilistic graphical models and mutual information, to identify networks of dietary patterns among populations.45,46,49,52,55,59
Dynamism, or how dietary patterns vary across time, was incorporated into four studies’ characterization or analysis of dietary patterns. Three studies incorporated stratification by meals to consider dynamism.44,46,59 In two studies using graphical models, separate networks were created for each meal to provide insights into how patterns of intake vary throughout the day.46,59 Hearty and Gibney used decision trees and neural networks and ran models by meals based on 62 food groups to predict diet quality.44 Additionally, one study considered dynamism by using ANOVA and chi-square tests to descriptively show how a variety of characteristics were associated with stable or changing dietary patterns characterized using latent class analysis.43
Fourteen studies examined relationships between dietary patterns characterized using novel methods and variables indicative of health risk or outcomes, such as periodontitis, cardiovascular disease, and metabolic syndrome (Table 3).36–40,48–56 Six studies included longitudinal analysis of the relationship between dietary patterns and health outcomes.36,38,40,50,52,53 Most studies that examined health risk or outcomes first identified dietary patterns using a novel method and then investigated relationships with health outcomes using regression models.36–40,48,50,51,54 In contrast, some studies incorporated variables indicative of health outcomes or risk directly into the machine learning models.49,56 For example, Zhao et al.56 applied Bayesian kernel machine regression, a machine learning model designed to incorporate high dimensional data, to jointly model the relationship between several dietary components and cardiovascular disease risk. Similarly, a paper by Hoang et al.49 included health variables within mixed graphical models, though directionality of diet-health relationships could not be ascertained given the cross-sectional nature of the data. In two case-control studies, dietary patterns were identified using mutual information to estimate dietary pattern networks, with stratification by health outcomes.52,55
Nineteen studies considered sociodemographic characteristics, such as sex, age, race/ethnicity, education, and income.36,37,39–43,45,48–58 In one case, sociodemographic characteristics were included in models used to characterize dietary patterns.49 Two studies stratified by sociodemographic characteristics, examining dietary patterns by sex45 or age groups.42 Studies that used case-control designs typically considered sociodemographic characteristics through matching.52,55 In the remaining studies that considered sociodemographic characteristics, these were incorporated in regression models to explore how dietary patterns characterized using novel methods were associated with health and other characteristics.
Two studies included comparisons of novel methods and traditional statistical approaches.36,57 While there was overlap between the patterns identified through novel and traditional approaches in these studies, the patterns identified differed across approaches. For instance, Biesbroek et al.36 found that dietary patterns identified through reduced rank regression were more strongly associated with coronary artery disease compared to those identified through random forest with classification tree analysis.
Discussion
The application of novel methods to dietary pattern research is rapidly expanding, with the aim of better understanding their complexity and how they are related to health and other factors. Many studies used methods that characterize distinct dietary patterns based on the population being studied, such as the ‘prudent’ pattern or the ‘Western’ pattern. Most studies used cross-sectional data, limiting opportunities to examine the effect of dietary patterns on health.
Methods newly being applied in this field offer promising capacity to better understand the totality of dietary patterns and synergistic relationships among dietary components when compared with traditional approaches that do not assume synergy.4,14 Given the large variation in how dietary patterns were characterized using novel methods, multidimensionality and potential synergistic relationships between dietary components were considered and presented in a range of ways, from latent classes to networks. Several studies incorporated dynamism into their consideration of dietary patterns, though in most cases this was through stratification, for example, by meal, rather than through direct use of novel methods.43,44,46,59 In these cases, it was a combination of input variables, stratification by time, and the novel method that enabled explorations of dynamism.
The methods highlighted have a range of strengths and limitations for the characterization of dietary patterns. Methods that focused on the classification of distinct patterns allowed for the assessment of relationships between these patterns and health outcomes or other indicators of interest but explored the interrelationships between dietary components to a lesser degree.36–38,41–43,48,50,51,54,57,58 Other methods were designed to better consider synergistic relationships among dietary components, such as compositional data analysis, mutual information, or probabilistic graphical models, but required further analyses, such as the development of a score, to assess relationships with health outcomes.45–47,49,52,55,59
There are trade-offs between novel and traditional methods that should be considered when contemplating the most appropriate methods for a given study. Though potential benefits such as a greater ability to discern multidimensionality may be desirable, these must be weighed against the implications for interpretability and computational costs. The application of novel methods may not always yield insights beyond those gained from traditional approaches. For example, Biesbroek et al.36 found that random forest models did not outperform reduced rank regression when examining associations of dietary patterns with coronary artery disease. Conversely, a study that was not included in this review because it first identified dietary patterns using a traditional method—principal component analysis—found that machine learning algorithms were better able to classify the identified dietary patterns according to cardiometabolic risk compared to traditional approaches.60
Several sociodemographic characteristics are indicators of systemic health inequity and have been shown to be associated with dietary patterns among populations.61–63 The degree to which studies incorporated sociodemographic characteristics into their consideration of dietary patterns or relationships between dietary patterns and health varied, with adjusted regression models applied after dietary patterns were characterized as the most common approach. Consistent with nutrition research more broadly,63–65 there was little consideration of possible interactions among sociodemographic characteristics in relation to dietary patterns. Methods particularly suited to pattern recognition and complexity could be leveraged to simultaneously explore potential joint relationships among facets of social identity and dietary patterns66 and advance our understanding of how broader systems of oppression and intersecting characteristics contribute to dietary patterns.62
Beyond the inclusion of sociodemographic characteristics in models, considering equity from the beginning of study design is a critical consideration given potential bias in data and algorithms that can have immense implications for those who already experience inequities, such as structural racism.67–69 The included studies did not explicitly discuss the incorporation of equity into study design, and many conducted secondary analyses of existing datasets. The use of directed acyclic graphs has been identified as a potential solution to mitigate some possible issues with bias through careful model design67 and has been applied in other domains of nutrition research using novel methods.14 Engaging individuals with lived experience and the integration of interdisciplinary teams with broad expertise that can combine content knowledge with data-driven approaches can help to mitigate potential bias in algorithms.66
The level of description of methods varied and it was sometimes challenging to decipher the specifics of how novel methods were applied. Although the Strengthening the Reporting of Observational Studies in Epidemiology—Nutritional Epidemiology (STROBE-nut) reporting guidelines provide guidance for transparently reporting nutritional epidemiology and dietary assessment research,70 it was not designed specifically for the methods used in the studies considered in this review and the ways in which they are being applied in dietary patterns research. Other reporting guidelines, such as the Consolidated Standards of Reporting Trials (CONSORT), have been extended to consider the application of artificial intelligence (AI).71 Motivations related to the extension of CONSORT included inadequate reporting of studies using AI and the lack of full consideration of potential sources of bias specific to AI within existing reporting guidelines.71 Relevant items added to CONSORT-AI pertain to the role of AI in the study, the nature of the data used in AI systems, and how humans interacted with AI systems, for example.71 The extension of reporting guidelines such as STROBE-nut to consider applications of AI, including machine learning, and other methods that are becoming more commonly used, may facilitate consistent and complete reporting and improved comparability of studies. Reporting guidelines should continue to emphasize strategies applied to mitigate measurement error in dietary intake data,70 as studies using novel methods are not immune to the effects of error on findings.72 Along with reporting guidelines, the development of tailored quality appraisal tools may facilitate synthesis of high-quality evidence to inform recommendations about dietary patterns and health.
This review provides a snapshot of a rapidly evolving field,73,74 with the involvement of an interdisciplinary team of researchers lending to a robust consideration of emerging methods in dietary patterns research. While prior reviews have provided perspectives on the potential applications of machine learning within the field of nutrition,19–21 this review considered dietary patterns in particular, as well as considering approaches beyond machine learning that have not been traditionally used in this area, broadening the scope compared to prior reviews.22,75 The search terms were informed by preliminary searching, though it is unlikely that all relevant articles applying novel methods to characterize dietary patterns were captured. This is partially driven by the wide range of descriptors used for these methods and the lack of reporting standards. As well, determining whether a method is novel is somewhat subjective. Methods such as factor analysis and principal component analysis once revolutionized dietary pattern analysis, providing data-driven approaches to identify patterns.15 Now, they are widely applied and recognized as limited in their capabilities to capture complexity compared to some newer approaches. Further, the search terms skewed toward multidimensionality versus dynamism, potentially overlooking some studies focusing on variation of dietary patterns over time or across eating occasions. Nonetheless, this review documents an acceleration of the application of a range of novel methods to dietary patterns research and captures a broad scope of methods being used to characterize these patterns, highlighting the need for researchers to develop the lexicon and knowledge needed to interpret the emerging literature.
Conclusion
The findings of this review indicate a strong motivation to apply novel methods, including but not limited to machine learning, to improve understanding of dietary patterns and how they relate to health and other factors. The application of these methods may help us to learn about complex relationships that may not be possible to discern through traditional approaches. However, these methods may not be suitable for every question and do not necessarily overcome the limitations of more traditional approaches.
Given the proliferation of these methods, it is becoming increasingly worthwhile for nutrition researchers to have at least a basic understanding of novel methods such as machine learning and latent class analysis, so they can interpret the results of emerging studies. The development and implementation of reporting guidelines and quality appraisal mechanisms for studies that apply novel methods may improve the capacity for synthesis of evidence generated to inform strategies that promote improved population health and well-being.
Data Availability
Extracted metadata for all articles is available upon reasonable request to the authors.
Conflict of Interest
RML is a statistical editor for the British Journal of Nutrition. Other authors have none to declare.
Funding
This review was funded by the Canadian Institutes of Health Research, a University of Waterloo Research Incentive Fund award, an Ontario Ministry of Research and Innovation Early Researcher Award held by SIK, and Microsoft AI for Good. RML is funded by a National Health and Medical Research Council Emerging Leadership Fellowship (APP1175250). LMB was funded by the National Institutes of Health (R01 HD102313, MPI Bodnar LM, Naimi AI).
Contributions
SIK conceived of the review and planned it with the co-authors; AR, AP, SH, SIK conducted the search and screening; JMH and TEW conducted extraction; LA conducted verification; JMH led the first draft of the manuscript with support from AP to write the methods; all co-authors provided critical input to the manuscript and all co-authors read and approved the final manuscript.
Acknowledgements
We thank research librarian Jackie Stapleton (JS) of the University of Waterloo for support with the search strategy.
Footnotes
Author affiliation updated for Dr. Jill Reedy.
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.
- 21.↵
- 22.↵
- 23.↵
- 24.
- 25.
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.
- 65.↵
- 66.↵
- 67.↵
- 68.
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.
- 77.
- 78.
- 79.
- 80.
- 81.
- 82.
- 83.
- 84.
- 85.
- 86.
- 87.
- 88.
- 89.
- 90.
- 91.
- 92.
- 93.