Abstract
Accurate assessments of symptoms and diagnoses are essential for health research and clinical practice but face many challenges. The absence of a single error-free measure is currently addressed by assessment methods involving experts reviewing several sources of information to achieve a more accurate or best-estimate assessment. Three bodies of work spanning medicine, psychiatry, and psychology propose similar assessment methods: The Expert Panel, the Best-Estimate Diagnosis, and the Longitudinal Expert All Data (LEAD). However, the quality of such best-estimate assessments is typically very difficult to evaluate due to poor reporting of the assessment methods and when it is reported, the reporting quality varies substantially. Here we tackle this gap by developing reporting guidelines for such studies, using a four-stage approach: 1) drafting reporting standards accompanied by rationales and empirical evidence, which were further developed with a patient organization for depression, 2) incorporating expert feedback through a two-round Delphi procedure, 3) refining the guideline based on an expert consensus meeting, and 4) testing the guideline by i) having two researchers test it and ii) using it to examine the extent previously published articles report the standards. The last step also demonstrates the need for the guideline: 18 to 58% (Mean = 33%) of the standards were not reported across fifteen randomly selected studies. The LEADING guideline comprises 20 reporting standards related to four groups: The Longitudinal design; the Appropriate data; the Evaluation – experts, materials, and procedures; and the Validity group. We hope that the LEADING guideline will be useful in assisting researchers in planning, reporting, and evaluating research aiming to achieve best-estimate assessments.
Open data (Delphi surveys 1 and 2), code (analyses), and material (surveys): https://osf.io/fkv4b/
Introduction
Establishing valid and reliable assessments of symptoms and diagnoses is the foundation of health and clinical sciences. Given that reliable biological markers or specific objective signs for most mental health problems are lacking and many medical conditions only show objective markers in late stages, accurate diagnostic assessments are difficult1,2. Essentially every single measure of a psychological construct has some potential source of bias (e.g., self-report and recall bias) or can be seen as fallible in some respect3,4 – which can result in inaccurate assessments and delayed treatments.
The absence of a single error-free measure can be addressed by involving multiple experts reviewing several sources of information to form a best-estimate assessment or a reference standard5–7. To understand the quality of such an assessment, it is crucial to understand how it was reached (i.e., the quality of the specific assessment method used). However, the quality of best-estimate assessments is typically very difficult to evaluate due to poor reporting of the assessment method, and when the method is reported, the reporting quality varies substantially7. Here we tackle this problem by developing a guideline for how to report assessment methods that aim to achieve such a best-estimate assessment standard, i.e., where experts review several sources of (longitudinal) information to achieve a more accurate assessment than a single, error-prone measure.
Assessment
Assessment includes the evaluation, integration, and interpretation of several sources of information (e.g., outcomes of different measures, tests, or scans) to derive a valid and reliable decision (e.g., a best-estimate diagnosis)8. Accurate assessments are crucial for understanding the prevalence of clinical problems9,10, detecting and starting early treatment11,12, validating measurement tools13,14, and evaluating interventions/therapies15,16. In clinical practice, under- or over-estimation of disorders can have severe negative impacts on people’s lives. In research, they threaten the validity of scientific results. For policy and implementation development, assessments are the basis for guideline development and methods for economic and societal evaluations of interventions. Furthermore, obtaining more accurate assessments has become increasingly important considering that high-accuracy assessments are needed in diverse fields such as Biological Psychiatry (e.g., to find reliable biomarkers linked to reference standard assessments17–19) and Artificial Intelligence (e.g., to train models on reference standard assessments20–22). In addition, technologies such as smartphone sensor data and video calls have made it easier to collect highly relevant, rich, and longitudinal data.
A methodological solution
Through our literature search and based on the expertise of the author team, we identified three bodies of literature that have proposed similar assessment methods: The Expert Panel method in medicine7,23,24 – as well as the Best-Estimate Diagnosis6,25 and the Longitudinal Expert All Data (LEAD)5 methods in psychiatry and clinical psychology. All three use expert panels or consensus teams to establish a more accurate assessment. They share the same goal of attaining best-estimate assessments through similar methodological approaches while accentuating different parts of it. The Best-Estimate Diagnosis method accentuates the use of informants and objective tests next to self-reported data6,25; and the Expert Panel method focuses on the characteristics, constitution, and procedure of the panel7,23. Only the LEAD method requires a longitudinal design5, although longitudinal data are also used in some Expert Panel designs (≈27% of studies7). Herein we collectively refer to these three approaches as the assessment methods.
The result of the assessment methods is a consensually derived criterion (e.g., a best-estimate assessment) that has been used for many different applications where there is no single error-free measure. It has, for example, been used to i) understand the accuracy of a measurement tool or marker through comparison to a best-estimate assessment26–31; ii) establish the prevalence of symptoms and disorders9,10,32; iii) establish the temporal stability or development of symptoms and disorders 33–35; iv) improve (earlier) detection or screening of symptoms or disorders11,12,36; v) study genetics and family history37–39; and vi) examine classification systems or diagnostic criteria40–42. The applications span diverse fields, including medicine, psychiatry, clinical psychology, public health/epidemiology, and artificial intelligence. Box 1 provides more examples of how the assessment methods have been applied in different types of studies across fields.
Overview of applications of the assessment methods across different study designs and fields
The absence of a single error-free measure can be mitigated by involving multiple experts reviewing several sources of information to form a best-estimate assessment. The developed guideline aims to assist users in planning, evaluating, and reporting assessment method procedures to derive such best-estimate assessments. The guideline has been developed with stakeholders from a broad range of fields and backgrounds and is relevant to areas in which the assessment methods are used, such as medicine, psychiatry, clinical psychology, and epidemiology. Below we show use cases and areas where the assessment methods have been applied.
1. To evaluate a measure’s accuracy against a reference standard. To understand the accuracy of a measurement tool, there is a need to compare it to a more accurate or best-estimate assessment. For example, it has been used:
- in psychiatry, for evaluating MINI-KID diagnoses for children and adolescents26 and evaluating DSM diagnoses in patients with psychosis.27
- in clinical psychology, for evaluating MDI depression severity scores.28
- in medicine, for evaluating deep learning models assessing liver cancer29 and evaluating prediction rules for coronary artery disease.30
- in public health/epidemiology, for evaluating electronic health record algorithms for assessing asthma.31
2. To establish the prevalence of symptoms or disorders. For example, it has been used:
- in public health/epidemiology, for assessing the prevalence and familiality of pathological gambling.9
- in psychiatry, for assessing the prevalence of eating disorders in patients with personality disorders.10
- in medicine, for assessing the prevalence of clinically relevant incidental findings when diagnosing pulmonary embolism.32
3. To establish the temporal stability or development of symptoms or disorders. For exmaple, it has been used:
4. To improve (earlier) detection or screening of symptoms or disorders. For example, it has been used:
5. To study genetic history and family heritability. For example, it has been used:
6. To examine classification systems or diagnostic criteria. For example, it has been used:
Description of an example study including a best-estimate assessment
Best-estimate assessments have, for example, been established to evaluate the validity of the Schedule for Affective Disorders and Schizophrenia for School-Age Children-Present and Lifetime version (K-SADS-PL; a semi-structured diagnostic interview used in child and adolescent psychiatry)43. The best-estimate assessments were diagnoses of neurodevelopmental and related disorders made by five experienced child psychiatrists based on the DSM-5 criteria. To achieve best-estimate assessments, patients were followed for at least three months, and the psychiatrists had access to all available data (except for the K-SADS-PL diagnoses), including information from medical records, interviews, questionnaires, laboratory tests, as well as information provided by clinical staff, caregivers and teachers. Criterion validity of the K-SADS-PL was established as the agreement of the diagnoses with the best-estimate assessments.
Reporting issues
The assessment methods possess high potential for achieving best-estimate reference standards in many situations. However, the quality of such proclaimed best-estimate assessments varies substantially and is typically very difficult to evaluate due to poor reporting of the method how they were achieved (e.g., see reviews of expert panels7,23). A systematic review of assessment methods and reporting of expert panels7 has demonstrated that the methods used for panel or consensus diagnoses vary substantially across studies and that many aspects of the procedure are often unclear or not reported at all. Many recent studies fail to report central aspects of the three assessment methods, including the quality, structure, or presentation of the data44, the training and qualifications of the experts45, the method for avoiding biases and achieving consensus13, and the time span of the longitudinal design-component46. The poor operationalization of the assessment methods jeopardizes the goal of achieving best-estimate assessments – where a vaguely described method makes it difficult to evaluate the research. Referring to an assessment as a best-estimate (and sometimes even as a gold standard) while vaguely describing or poorly operationalizing the method for achieving the assessment is alarming47,48.
The degree of validity
These assessment methods aim to achieve high validity (i.e., the degree to which the assessment captures what it aims to measure). Typically, the assessment methods aim to achieve as high validity as possible (i.e., a “leading” assessment), or, depending on resources, at least more accurate than a single error-prone measure. Despite this central aim, research often fails to clearly describe the degree of validity of the attained assessment. Using these assessment methods does not automatically guarantee high validity – it depends on how well the method is executed.
In addition, the assessments are often described with different terms: reference standard is often used in medicine, and criterion standard or best-estimate diagnosis is often used in psychology. We propose that the reporting of these assessment methods benefit from more explicitly describing what was measured and how well it measures up to different standards – whether and how they relate to a state-of-the-art assessment. Whereas reference and criterion standards fail to convey an intention of “nearing” a state-of-the-art assessment, the best-estimate diagnosis narrowly focuses on the classification of a diagnosis and not on symptom severity. Therefore, we here use the term best-estimate assessment in the context of describing a “leading”, state-of-the-art assessment.
Reporting standards
Previous well-established guidelines have focused on the complete reporting of specific study designs, such as the STrengthening the Reporting of OBservational studies in Epidemiology (STROBE)49 for observational studies; the Statement for Reporting for Diagnostic Accuracy (STARD)50 for diagnostic accuracy studies; the Consolidated Standards of Reporting Trials (CONSORT)51 for randomised trials, and the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD)52 for prediction model studies (see the supplementary material [SM] for other relevant guidelines). The STARD guidance is most closely related to the reporting of the assessment methods since best-estimate assessments are often used to evaluate a measure’s (diagnostic) accuracy. However, none of the guidelines are sufficient for complete reporting of the assessment methods, where (multiple) experts review several sources of (longitudinal) information to form a best-estimate assessment. Although an earlier systematic review identified and structured the various choices involved in Expert Panel procedures7, no attempt was made to develop a formal guideline for the reporting of Expert Panel assessments.
Aim
Our aim is to develop reporting standards for comprehensive reporting of these assessment methods – which can help researchers plan, carry out, and report studies employing these assessment methods, as well as help readers evaluate them. We call the reporting guideline the LEADING guideline, emphasizing the methodological components and the importance of describing what was assessed and how well (i.e., how it relates to a “leading” assessment). We further revise the original meanings of LEAD (Longitudinal, Expert, All Data5) to Longitudinal, Evaluation – experts, materials and procedures, and Appropriate Data). In short, the LEADING guideline aims to guide the reporting of assessment methods to improve evaluations of the assessment standard.
Methods
Development stages
We developed the reporting guideline over four stages: 1) drafting reporting standards; 2) incorporating expert feedback; 3) refining the final guideline, and 4) testing the guideline. The development method largely followed Moher and colleagues’ guidance for developing reporting guidelines53 (See Table S1 for elaborations on each recommended step). For organizational purposes, a working group (V.E., K.K., & O.K.) was set up, and a steering group (H.A.S., J.B., E.F., D.K., P.G., I.A., & P.B.) was formed to provide a wide range of expertise. The steering group included seven experts and was selected to cover a diverse range of expertise and fields related to the assessment methods (e.g., psychiatry/clinical psychology, medicine, epidemiology/public health, and Artificial Intelligence). See the SM for information regarding ethics.
Drafting reporting standards
The working group drafted the original reporting standards. First, the working group, with the support of the steering group, identified relevant research using or describing the assessment methods, including the three bodies of literature: Expert Panel7, Best-Estimate Diagnosis6, and LEAD5. Second, relevant reporting guidelines and systematic reviews were identified, including a review of expert panels applications7; the STROBE statement49; and the STARD guidance50 (other complementary reporting guidelines and systematic reviews are presented in the SM). The aim was for the reporting standards in the LEADING guideline to complement rather than repeat these guidelines (i.e., new standards should extend or complement existing standards rather than repeat them53). The use of multiple reporting guidelines may often be appropriate; for example, when reporting a randomised trial that includes best-estimate assessments, one may use CONSORT51 to report the trial design and main results, and the LEADING guideline for describing the specifics for reaching the best-estimate assessment.
Potential standards were drafted by the working group with the objective of encompassing a comprehensive reporting of the assessment methods. The reporting standards were grouped into four groups: Longitudinal design, Appropriate data, Evaluation – experts, materials and procedures, and Validity. Empirical and theoretical inclusion rationales were stated for the groups and the individual standards. Lastly, the standards with inclusion rationales were further developed through a workshop with a patient organization for depression, followed by receiving feedback from the steering group members to receive a wide range of perspectives early in the process.
Incorporating expert feedback
To systematically collect expert feedback from different perspectives, we used a consensus-building procedure called the Delphi technique54. We used an iterative process based on two rounds of questionnaires (i.e., Delphi surveys), enabling feedback from round 1 to feed into round 2. Delphi participants received relevant background research, the reporting guideline aims, as well as the groups and the individual standards with their inclusion rationales. They provided feedback through open- and closed-ended response formats. Through open-ended responses, experts could propose new standards and provide feedback on clarifications and reformulations of existing standards, their inclusion rationales, and evidential support. In addition, two closed-ended questions55 about standard inclusion (This item should be included in the reporting checklist) and perception of study quality (Whether this information is present or not would influence my perceptions of the quality of a study) were answered with rating scales ranging from 1 = Strongly disagree to 7 = Strongly agree.
The first and/or last authors of articles since 2013 (n = 87 articles, n = 124 authors; the search strategy is detailed in the SM) using any of the three assessment methods as well as the seven steering group members, were invited via email to participate in the Delphi Round 1 (n = 131 participants emailed). In total, 27 participants completed the survey (response rate 21%). Only participants from Round 1 who provided their contact details were invited to Round 2 (n = 25). In total, 20 participants completed the survey (response rate 80%). All participants provided their informed consent. Figure 1 presents the research experiences and demographics of the Delphi participants. Participants reported a wide range of academic backgrounds (e.g., Medicine, Psychiatry or Clinical Psychology, Artificial Intelligence, Journal Editors), and an extensive variety of relevant methodological experiences (e.g., Biological Markers, Ecological Momentary Assessments, and Expert Panels; Figure 1), with an age range of 30 – 70 years (M = 51.54, SD = 12.40).
Delphi survey results. In Round 1, the mean ratings for the item inclusion scale ranged from 5.37 - 6.67 (M = 6.06; SD = 0.31; Table S2). The feedback resulted in the removal of one reporting standard and the clarification and reformulation of 20 standards. The standard on Transparency and replicability was rated as relevant but removed because it is achieved by reporting the other reporting standards. Standard 4.2 Validity and Standard needed a major clarification about the meaning of validity as well as standard. Minor clarifications and reformations, such as grammar, or word changes, were made for 19 standards. (see open material). The mean ratings in Round 2 ranged from 5.47 – 6.70 (M = 6.20; SD = 0.37; Table S3), with open feedback resulting in minor clarifications and reformulations of nine standards.
Refining the final guideline
The guideline was finalized by the authors in an expert consensus meeting. The meeting was held online with nine members of the working group and steering group. The content and structure of the consensus meeting were prepared by the working group, and the meeting was led by the last author (O.K.). Participants had access to the guidelines, elaboration and explanation (inclusion) rationales, and the drafted paper before the meeting, where they also had the option to provide comments and feedback in writing. The meeting included going through the findings of Delphi Rounds 1 and 2 and discussing the draft of the paper, including the individual reporting standards and groups. The criterium for including a reporting standard was that the median of Delphi expert responses was at least Agree on the question about its inclusion. We decided to not carry out another Delphi round since i) the median agreement for each reporting standard in both Delphi Rounds 1 and 2 ranged from Agree to Strongly Agree, ii) no new standards were suggested, and iii) only minor changes were needed after Round 2, which taken together suggest consensus.
Testing the guideline
After the Delphi rounds, the guideline was tested i) by researchers with experience of each method piloting the reporting of each standard and ii) by the authors (V.E., K.K.) using it to evaluate published articles. The two test procedures resulted in minor clarifications being added to three standards (2.4 The access to the index measure, 3.3 Blindness and conflict of interest, and 3.4 Instructions and training). Also, a concrete example of how to report the items was added to the general guideline instructions.
Incorporating test-user feedback. Two test users (Ph.D., with experience using the LEAD and Expert Panel method) who had not been involved in the development of the guideline (e.g., in the Delphi procedure) were recruited to test the guideline (see the SM for more details). They were asked to report each standard and/or provide feedback about the formulation of the standards.
Reports of the standards in 2022. Three separate targeted searches (LEAD, Expert-panel, Best-estimate) were conducted, and the first author examined which standards were reported in fifteen randomly selected articles applying the assessment methods in 2022. Out of the fifteen articles, three were randomly selected and examined by the second author. This procedure also provided information about the strengths and shortcomings of contemporary reporting of published articles using these methods (see Results section).
Results
The reporting guideline is presented in Table 1 (see Figure 2 for an overview). It comprises 20 standards for comprehensive reporting of the assessment methods divided into four groups: 1. The Longitudinal design group (4 standards), 2. The Appropriate data group (4 standards), 3. The Evaluation – experts, materials, and procedures group (10 standards), and 4. The Validity group (2 standards). The reporting standards encourage researchers to elaborate on what was done and why – whilst avoiding normative standards, such as a minimum number of experts. Each standard description in Table 1 is accompanied by an example. Further Explanations and Elaborations regarding the individual reporting standards and the four groups are presented in the SM including Table S4 and S5.
Using the LEADING guideline to evaluate published studies
Evaluating a random selection of 15 articles indicates severe heterogeneity in what of the methods is reported and how, suggesting the need for a guideline that enables comprehensive reporting of these assessment methods (Table 3; see the SM for the search strategy). Across the fifteen studies, 18 to 58% of the standards were not reported. Regarding the reporting standards, the access to the index measure (2.4), the expert and panel characteristics (3.1), the number of experts and panels (3.2), and the assessment description (4.1) were mostly reported (green in 73-100% of the studies). However, the data presentation (2.3), the instructions and training (3.4), the data combination method (3.7), the inter-rater and inter-panel reliability (3.9), and the validity and standard (4.2) were not reported at all in the majority of the studies (red in 60-87% of the studies).
Discussion
Our objective was to develop a guideline that supports comprehensive reporting of studies collecting longitudinal, appropriate data that experts evaluate to achieve an assessment that is more accurate than using a single error-prone measure. The aim is to help researchers plan, report, and evaluate the assessment method-related elements of their study design.
The LEADING reporting standards were established through an open process, incorporating relevant empirical evidence and methodological work, complementary reporting guidelines, and comprehensive iterations of expert feedback and patients’ perspectives. As this guideline focuses on the assessment methods, we recommend that researchers also rely on established guidelines for other parts of their research, such as sampling and other epidemiological aspects (e.g., STROBE49, CONSORT51, and STARD50). We encourage knowledge about and adherence to the LEADING guideline via scientific journals, editorials, and the EQUATOR network, as well as inclusion in research method courses in clinical studies.
Limitations
Based on the expertise of the author group and our literature search, we identified three assessment methods with similar approaches from related fields and drafted applicable reporting standards. We presented the rationale for selecting these three methods and each reporting standard with supporting evidence in the Delphi survey for review, which did not bring up additional methods or reporting standards. However, as we did not carry out a systematic literature review of the three identified literature bodies or for each of the reporting standards, we cannot exclude the existence of other assessment methods with similar approaches. We welcome any suggestions about similar methods to which the guideline is applicable.
Although the Delphi survey participants and the author group had a wide range of experiences and backgrounds, psychiatry and clinical psychology (n = 18) were overrepresented as compared to, for example, other areas of medicine (n = 8) in the Delphi and author group. Geographically, Europe and North America were the most common in the Delphi and author group, whereas several areas were not represented. The Delphi participants were the first or last authors of studies employing the assessment methods. However, the quality of the articles, and the education or experience of the authors, were not taken into account as selection criteria (although it was self-reported as presented in Figure 1). Finally, the number of Delphi participants (27 in Round 1, 20 in Round 2) is relatively small compared to some other standard developments (e.g., 73 in the development of STARD69), but it is comparable to others (e.g., 24 for development of the TRIPOD statement52). Even though the response rate in Round 1 (21%) can be considered low, the number of participants was sufficient to cover a broad range of academic backgrounds, methodological experiences, and demographics (Figure 1). The same limitation is applicable to the size of the steering group (n = 7) as well as the test-user group (n = 2). The LEADING guideline should be regarded as an evolving reporting guideline requiring ongoing evaluation, refinement, and revision. Suggestions and recommendations for improvements are welcomed by emailing the corresponding authors.
Conclusions
The LEADING guideline emphasizes the transparent reporting of the methodological components of the assessment method and the importance of reporting what was assessed and how well. Considering the increasing need for high-accuracy assessments in diverse fields, we hope that the LEADING guideline will be useful in assisting researchers in planning, carrying out, reporting, and evaluating research that aims to achieve accurate assessments.
Data Availability
Data sharing statement: Open data (Delphi surveys 1 and 2), code (analyses), and material (surveys) can be found on the Open Science Framework: https://osf.io/fkv4b/
Data sharing statement
Open data (Delphi surveys 1 and 2), code (analyses), and material (surveys) can be found on the Open Science Framework: https://osf.io/fkv4b/
Patient and public involvement statement
Prior to the Delphi procedure, the reporting standards with inclusion rationales were discussed in an online workshop with a patient organization for depression (the chairman and vice chairman from Libra Balans Skåne), followed by receiving feedback from the steering group members to receive a wide range of perspectives early in the process.
Delphi round 1
I Augenstein (Professor, Computer Science, University of Copenhagen)
S Aydin (PhD, Developmental and Educational Psychology, Leiden University)
E Billstedt (Professor, Neuroscience and Physiology, University of Gothenburg)
J R Boehnke (PhD, School of Health Sciences, University of Dundee)
D W Black (MD, Professor, Medicine, University of Iowa Carver College of Medicine)
B Cannell (Associate Professor, Public Health, University of Texas Health Science Center at Houston) G A Carlson (Professor, Psychiatry, Stony Brook University)
K A S Davis (Researcher, Psychiatry Psychology and Neuroscience, King’s College London)
F Dereboy (MD, Psychiatry, Aydın Adnan Menderes University)
E I Fried (Associate Professor, Clinical Psychology, Leiden University)
P Gustafsson (Associate Professor, Child and adolescent psychiatry, Lund University)
R Handels (Assistant Professor, Psychiatry and Neuropsychology, Maastricht University)
K Jenniskens (Assistant Professor, Clinical Epidemiology, Utrecht University)
C Klaiman (Associate Professor, Pediatrics, Emory University)
D N Klein (Professor, Clinical Psychology, Stony Brook University)
M McCloskey (Professor, Cognitive Science and Psychology, Johns Hopkins University)
A C Miers (Associate Professor, Developmental and Educational Psychology, Leiden University)
K G M Moons (Professor, Clinical Epidemiology, Utrecht University)
L Mosqueda (Professor, Medicine, University of Southern California)
H A Schwartz (Associate Professor, Computer Science, Stony Brook University)
M Stein (Professor, Psychiatry and Behavioral Sciences, University of Washington)
J G Tillman (PhD, Clinical Psychology, Yale School of Medicine)
Y P Wang (MD, PhD, Medicine, University of Sao Paulo Medical School)
J Yonashiro-Cho (PhD, Medicine, University of Southern California)
Delphi round 2
I Augenstein (Professor, Computer Science, University of Copenhagen)
S Aydin (PhD, Developmental and Educational Psychology, Leiden University)
J R Boehnke (PhD, School of Health Sciences, University of Dundee)
G A Carlson (Professor, Psychiatry, Stony Brook University)
K A S Davis (Researcher, Psychiatry Psychology and Neuroscience, King’s College London)
E I Fried (Associate Professor, Clinical Psychology, Leiden University)
P Gustafsson (Associate Professor, Child and adolescent psychiatry, Lund University)
R Handels (Assistant Professor, Psychiatry and Neuropsychology, Maastricht University)
K Jenniskens (Assistant Professor, Clinical Epidemiology, Utrecht University)
C Klaiman (Associate Professor, Pediatrics, Emory University)
D N Klein (Professor, Clinical Psychology, Stony Brook University)
M McCloskey (Professor, Cognitive Science and Psychology, Johns Hopkins University)
A C Miers (Associate Professor, Developmental and Educational Psychology, Leiden University)
K G M Moons (Professor, Clinical Epidemiology, Utrecht University)
L Mosqueda (Professor, Medicine, University of Southern California)
Y P Wang (MD, PhD, Medicine, University of Sao Paulo Medical School) J Yonashiro-Cho (PhD, Medicine, University of Southern California)
Test-Users
T Ivarsson (Associate Professor, Neuroscience and Physiology, University of Gothenburg)
P J Snelling (PhD, Medicine, Gold Coast University Hospital and Griffith University)
Competing interests
All authors have completed the ICMJE uniform disclosure form and declare: V C Eijsbroek, K Kjell, and O Kjell received funding from FORTE (2022-01022); and H A Schwartz from the National Institutes of Health (Grant R01 AA028032-01); O. Kjell and K. Kjell have co-founded and hold shares in a start-up using computational language assessments to diagnose mental health problems based on best-estimate assessments; J R Boehnke is as editor part of the International Society for Quality of Life Research; no other relationships or activities that could appear to have influenced the submitted work.
Contributorship statement
The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.
Contributor Roles
E = Equal; S = Supporting; L = Lead
Working group
V C Eijsbroek (corresponding author, guarantor)
K Kjell (guarantor)
O Kjell (guarantor)
Steering group
H A Schwartz
J R Boehnke
E I Fried
D N Klein
P Gustafsson
I Augenstein
P M M Bossuyt
Transparency declaration
The lead author (the manuscript’s guarantor) affirms that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned have been explained.
Acknowledgment
Patient organization members:
E Sellberg (Chairman of Libra Balans Skåne, board member of Balans National) I Odenbrand (Vice Chairman of Libra Balans Skåne)
Footnotes
Information in Box 1 is extended with an example.