Abstract
Background In the era of evidence-based medicine, decision-making about treatment of individual patients involves conscious, specific, and reasonable use of modern, best evidences. Diagnostic tests are usually obeying to the well-established quality standards of reproducibility and validity. Conversely, it could be tedious to assess the validation studies of tests used for diagnosis of mental and behavioral disorders. This work aims at establishing a methodological reference framework for the validation process of diagnostic tools for mental disorders. We implemented this framework as part of the protocol for the systematic review of burnout self-reported measures. The objectives of this systematic review are (a) to assess the validation processes used in each of the selected burnout measures, and (b) to grade the evidence of the validity and psychometric quality of each burnout measure. The optimum goal is to select the most valid measure(s) for use in medical practice and epidemiological research.
Methods The review will consist in systematic searches in MEDLINE, PsycINFO, and EMBASE databases. Two independent authors will screen the references in two phases. The first phase will be the title and abstract screening, and the second phase the full-text reading. There will be 4 inclusion criteria for the studies. Studies will have to (a) address the psychometric properties of at least one of the eight validated burnout measures (b) in their original language (c) with sample(s) of working adults (18 to 65 years old) (d) greater than 100. We will assess the risk of bias of each study using the Consensus-based Standards for the selection of health Measurement Instruments checklist. The outcomes of interest will be the face validity, response validity, internal structure validity, convergent validity, discriminant validity, predictive validity, internal consistency, test-retest reliability, and alternate form reliability, enabling assessing the psychometric properties used to validate the eight concerned burnout measures. We will examine the outcomes using the reference framework for validating measures of mental disorders. Results will be synthetized descriptively and, if there is enough homogenous data, using a meta-analysis.
Ethics and dissemination We will publish this review in a peer-reviewed journal. A report will be prepared for the health practitioners and scientists and disseminated through the Network on the Coordination and Harmonization of European Occupational Cohorts (https://www.cost.eu/actions/CA16216, http://omeganetcohorts.eu/) and the Network of scientists from Swiss universities working in different areas of stress (https://www.stressnetwork.ch/).
PROSPERO registration number CRD42019124621
BACKGROUND
Rationale
In the era of evidence-based medicine (EBM), decision-making about treatment of individual patients involves conscious, specific, and reasonable use of modern, best evidences (1). The purpose of EBM is ultimately to provide patients with the best treatment solutions. Thus, EBM helps avoid mistakes in the course of treatment and raises the quality and the cost-effectiveness of health care. Diagnosis and prognosis, two basic aspects of medicine and paramedicine, provide valuable information enabling patients and professionals to make decision. The results of diagnostic and prognostic processes must be as correct as possible, as they can have far-reaching consequences. The application of the EBM methods in diagnostic and prognostic processes used in healthcare is thus essential (2).
EBM requires from the physician the ability to search the medical literature and the skills in the interpretation of epidemiological and statistical results. However, evaluating the quality of a given study can be challenging in some cases, depending on the nature of the diagnostic test, the study design and statistics used. For instance, diagnostic tests involving measurable functional, biological or morphological changes of clinical significance usually obey to well established quality standards of reproducibility and validity and are relatively easy to compare based on their predictive values, sensitivity and specificity (3). In contrast, validity studies of tests in questionnaire format, commonly used for the diagnosis of mental and behavioral disorders, are more challenging to assess. Diagnostic questionnaire assessing mental disorders should obey to a number of methodological standards, such as psychometric properties, as part of its validation process (4). However, terms that denominate the psychometric properties have rather broad, sometimes vague definitions, while the statistical methods for their assessment vary widely across publications (4-11). Moreover, available methodological guidelines are heterogeneous and generally incomplete. Some of them are even contradictory (4, 6, 7). To date, no consensual methodological guideline exists for the whole validation process of mental health questionnaires and rating scales used for screening and diagnosis of mental disorders. The currently available standards focus on the methodological quality of single studies reporting diagnostic accuracy and psychometric properties. Examples of those standards are the Quality Assessment of Diagnostic Accuracy Studies (QUADAS) (12) or the Standards Reporting of Diagnostic Accuracy Studies (STARD) (13). The Consensus-based Standards for the selection of health Measurement Instruments (COSMIN) (14) is often used for the qualitative evidence appraisal in the systematic reviews. However, the latest is rather unhelpful from the statistical point of view.
This lack of harmonization regarding acceptable validity standards or criteria for various mental health questionnaires directly challenges the EBM application in diagnosis and subsequently in treatment of mental disorders, in particular, among non-specialized health professionals. In order to remedy this situation, we have established a general reference framework for the validation process of diagnostic tools for mental disorders, including self-reported measures of burnout. The burnout syndrome remains ill-defined and nosologically uncharacterized (15). Despite its increasing importance (16), burnout syndrome still has no consensual definition, which makes it difficult to manage. Maslach and Jackson (17) proposed the most prominent definition of burnout: a psychological syndrome that occurs in professionals who work with other people in challenging situations that is measured through three domains: 1-emotional exhaustion 2-depersonalisation and 3-personal accomplishment. From this definition, Maslach developed a first burnout measure: the Maslach Burnout Inventory (MBI). Apart from the MBI, a meta-analysis by O’Connor et al. (18) cited six other validated burnout measures: the Pines Burnout Measure (BM), the OLdenburg Burnout Inventory (OLBI), the Copenhagen Burnout Inventory (CBI), the Professional Quality of Life Scale (ProQOL III), the Psychologists Burnout Inventory (PBI), the Children’s Services Survey (CSS), and the Organizational Social Context Scale (OCS). Considering that psychological syndromes measures are heterogeneous, a closer look to the validation process of the currently used burnout measures should give insight on their legitimacy in medical practice and research.
Objectives
This article aims at presenting our methodological reference framework for the validation process of diagnostic tools for mental disorders as part of the protocol for our systematic review of burnout self-reported measures. The objectives of this systematic review are to assess the validation processes used in each of the selected burnout measures and to grade the evidence of the validity and psychometric quality of each burnout measure to select the most valid one(s) for use in medical practice and epidemiological research.
METHODS AND ANALYSIS
We developed the protocol according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) recommendations. We registered the protocol with the International Prospective Register of Systematic reviews (registration number CRD42019124621).
Reference framework for the validation process of diagnostic tools for mental disorders
This framework is provided in Supplementary material Table 1, organized in four columns, as follows: 1-psychometric validity criteria, 2-their definitions, 3-the methods commonly used to analyze them, and 4-the resulting statistical estimates and indices as well as the objective criteria for their respective interpretation. To construct this framework, we completed the demarche initiated by the French National Institute of Research on Security (INRS) for a comparative analysis of different scales and tools used for assessing psychosocial risks available in French language (4). First, we listed as exhaustively as possible the psychometric validity criteria and their definitions, using handbooks and published guidelines (4-11, 14, 15, 17-44). Second, we sorted the validity criteria, according to their most consensual denomination and definition and grouped them by sub-types according to Bolarinwa (6). Third, we filled the third and fourth columns of the table with appropriate analyses and indices’ interpretation for each validity criterion, using handbooks and published methodological guidelines (4-6, 8-11, 19-33, 35-44). Forth, we submitted the completed table of our framework to two independent experts with strong psychometric skills for critical review of the retained definitions, the completeness of the methods, and the appropriate choice of interpretation criteria. Finally, after discussion of the reviewers’ comments and getting consensus, we produced a current version of the framework. We consider it as a methodological referential because it allows non-specialized health professionals and researchers to understand and to correctly interpret the overall and specific validity criteria of a diagnostic tool for mental disorders, whatever the study design and statistical method used for its validation. Thanks to its multiple entries, it is possible to shift through validation studies by picking up terms about either validity criteria (20 criteria), analytical methods (21 methods) or the resulting indices and statistics grouped into 19 categories. Because of its analytical exhaustiveness and completeness for the three elements of the validation of diagnostic tests (i.e., validity, reproducibility and sensitivity), it constitutes a useful framework for quality appraisal of diagnostic tests for mental disorders.
Eligibility criteria
We will include 1-studies with quantitative methodology; 2-published in the original scientific article formats; 3-adressing the psychometric properties of at least one above-mentioned burnout measures in its original (not translated) version; 4-with sample size of at least of 100 participants. We will exclude 1-studies that do not meet the inclusion criteria; 2-studies for which no abstract and full text could be found; 3-studies where one of the eight burnout measures was used as a reference against another one, not included in this review; 4-studies where a translated version of burnout measure was used (e.g., translational validity and cross-cultural studies); 5-studies in which quantitative data on reliability or validity were missed; 6-studies where participants were not professionally employed (e.g., students, medical residents).
Participants
We will include studies with working adult participants aged between 18 and 65 years old. We will exclude studies where participants had no professional occupation (e.g., students, medical residents).
Exposures/Interventions
This review is focused on the psychometric properties and validity of the selected burnout self-reported measures. It would not consider the exposures or predictors of burnout in workers.
Comparators
We will consider measures of depression, anxiety, and somatic disorders as comparators to assess the discriminant validity of burnout measures.
Outcome measures
The outcome are the psychometric properties used to validate the eight aforementioned burnout measures: Face validity; Response validity; Internal structure validity; Convergent validity; Discriminant validity; Predictive validity; Internal consistency reliability; Test-retest reliability; Alternate form reliability.
Time frame
As we include quantitative studies reporting one of the above-mentioned outcomes, we expect different time frames to be used in the selected studies. Thus, no restriction to any particular time frame will be applied.
Setting
Given that the study population consists of working adults, all occupational settings will be considered. If enough homogenous data are available per type of occupation, we will perform additional analysis for specific occupational settings (e.g., health care, education).
Language
There will be no language restriction
Information sources
Systematic literature search will be performed for the period from 1980 to 2018 (September). This period was determined with the argument that the first validated measure of burnout was published in 1981 with the MBI (17). We will use three databases to search for studies of interest via the online catalog of databases OVID interface: the Medical Literature Analysis and Retrieval System Online (MEDLINE) database, the world-class resource for abstracts and citations of behavioral and social science research PsycINFO database, and the Excerpta Medica database (EMBASE). In addition, we will check the reference lists from articles and reviews retrieved in our electronic search for any additional studies to include.
Search strategy
An experienced librarian will review the search strategy. It will consist of free-text words to specify three search strings: terms focusing on the burnout measure of interest (e.g., MBI), terms related to the validation of the measure, and a combination of the two first search strings results. Finally, one additional search string will consist of removing duplicates.
Study records
Data management
We will import the collected studies in the bibliography software EndNote X8.
Selection process
Two independent reviewers will screen the references to eliminate the eventual remaining duplicates within each database. They will also eliminate duplicates between databases. They will screen the remaining articles based on their title and abstract. They will retain or reject the articles based on the above-mentioned inclusion and exclusion criteria. The two reviewers will then screen the remaining articles based on full-text reading. They will discuss any discrepancies and if needed, ask a third reviewer to arbitrate the decision. A reviewer will illustrate the selection process with a flowchart following the PRISMA guidelines.
Data collection process
To elaborate a standardized data extraction form convenient for all kinds of study design and methods applied; we will use our reference framework for the validation process of diagnostic tools for mental disorders (Table 1). Each burnout measure will have its own exemplary of data extraction form (MS Excel file) that will be filled with studies’ data concerning the burnout measure in question. Two independent reviewers will test the form using articles on different burnout measures. They will discuss any discrepancies and if needed, they will ask a third reviewer to arbitrate the decision and add clarification. This process will continue until complete agreement is reached between both reviewers on the finalized data extraction form. The data of the included studies will be extracted by one of two reviewers. A second reviewer will crosscheck a random 20% sample of the extracted data. The missing data will be identified by a code depending on the reason why they are missing (e.g., not assessed, not reported). The data extraction process will provide additional validation of the referential framework completeness.
Data items
The extracted data will concern studies’ identification (i.e., authors, year of publication, journal, and title); samples’ characteristics (i.e., size, gender ratio, age, occupational activity, participation rate, representativity, burnout scores’ distribution); burnout measures’ characteristics (i.e., name, version, number of items, number of domains, domains’ names); and statistical methods used for assessing the psychometric properties outcome.
Outcomes and prioritization
The outcomes of interest will be the face validity, response validity, internal structure validity, convergent validity, discriminant validity, predictive validity, internal consistency, test-retest reliability, and alternate form reliability. Those criteria will enable to assess the psychometric properties used to validate the eight concerned burnout measures.
Risk of bias in individual studies
Two reviewers will independently assess the quality of each study using the COSMIN checklist (14). They will discuss any discrepancies, and they will resort to the arbitration of a third reviewer if needed.
Data synthesis
Descriptive analyses
We will interpret the quantitative based on our methodological reference framework. We will create a narrative synthesis of the findings from the included studies. We will structure this synthesis around the burnout measure, the target population characteristics, and the type of outcome.
We plan to carry out subgroup analysis on the primary outcomes by grouping studies based on the following: 1-Burnout self-reporting measure: MBI, BM, OLBI, CBI, ProQOL III, PBI, CSS, and OCS.
2-Burnout domain: Emotional exhaustion, Depersonalization, Personal accomplishment (MBI); Physical exhaustion, Mental exhaustion (BM); Disengagement, Exhaustion (cognitive and physical) (OLBI); Professional exhaustion, Personal Exhaustion, Relational Exhaustion (CBI); Compassion fatigue burnout (ProQOL III); Aspects of control, Support in the work setting, Type of negative clientele, Overinvolvement with the client (PBI); Emotional exhaustion (CSS); Culture, Climate, Work attitudes (OCS).
3-Participants’ characteristics: gender, age, and burnout score.
Meta-analyses
There might be a limited scope for meta-analysis. There will be a range of different factors and outcomes measured and reported across existing studies. However, we will pool summary estimates in form of multiple logistic regression coefficients whenever possible. We will do it for study overlapping in terms of outcome measures, for at least one of the burnout domains. Since the participants in the various studies might be construed as coming from the same population (workers) or from different populations (i.e., according to each study’s inclusion criteria) we will use a fixed effects model.
Meta-biases
According to standard practice in meta-analysis, the first step will be to represent the data as forest plots including the I-square that estimates the percentage of the between-study heterogeneity. If the latter is very large, this means that the between-study heterogeneity is much larger than the between-subject heterogeneity and any attempt of obtaining a reference value for individual subjects will not be valid(45).
Assessment of publication bias
We will produce funnel plots to investigate possible publication bias, as recommended in the epidemiological literature.
Assessment of heterogeneity
For each model, heterogeneity will be assessed by quantifying the inconsistency across studies using I2 statistic greater than 50% as criterion. If heterogeneity is identified, potential causes will be explored (e.g. clinical and/or methodological diversity). We will try to clarify heterogeneity via subgroup analysis, but if it cannot be explained (i.e. there is considerable variation in the results), then a meta-analysis using a random-effect model will be conducted. We will exclude studies with a high risk of bias to determine the extent to which the synthesized results are sensitive to risk of bias. Statistical analysis will be performed using STATA software, 16th version.
Confidence in cumulative evidence
The strength of the evidence for the relationship between different risk factors and burnout onset will be assessed using the Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach. It will allow to rate the certainty of a body of evidence as suggested by GRADE guidelines 18 (46). We will use a checklist designed by Meader et al. (2014) (47) to improve consistency and reproducibility of our GRADE assessment. The results will be presented using the GRADE Summary of Findings Tables and Evidence Profiles (48).
Data Availability
All data ate provided in the Table 1
Acknowledgements
The authors thank Aline Sager, the Unisanté/DSTE librarian.
Footnotes
sandy.marca{at}unisante.ch
paola.paatz{at}bluewin.ch
gyorkosc{at}gmail.com
felix.cuneo{at}unil.ch
Merete.Bugge{at}stami.no
lode.godderis{at}kuleuven.be
Renzo.bianchi{at}unine.ch
irina.guseva-canu{at}unisante.ch
Declarations Extracted data will be available as supplementary material of the systematic review article.
Funding University of Lausanne and University of Bern BNF – National Qualification Program funded the salary of young researchers (PP and SCM); European Cooperation in Science & Technology (COST Action CA16216), OMEGA-NET: Network on the Coordination and Harmonization of European Occupational Cohorts covered the meetings and travel expenses as well as the open access publication costs.
REFERENCES
References
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
- 19.
- 20.
- 21.
- 22.
- 23.
- 24.
- 25.
- 26.
- 27.
- 28.
- 29.
- 30.
- 31.
- 32.
- 33.
- 34.
- 35.
- 36.
- 37.
- 38.
- 39.
- 40.