Abstract
Background The Big Data for Quality of Life (BD4QoL) study investigates quality of life (QoL) in head and neck cancer (HNC) survivors, focusing on survivorship and characterizing survivor demographics.
Methods We screened data from 5 studies across Europe (N=7276) and included patients with a diagnosis of squamous cell carcinoma (oral cavity, hypopharynx, larynx, oropharynx, nasal cavity and paranasal sinuses), treated with curative intent, alive after treatment, TNM 7th ed. stages I, II, III, IVa and IVb, with availability of QoL questionnaires.
Results The cohort of 4448 HNC survivors primarily includes men (78%) with median age 61 years. Most received radiotherapy (75%) and had a history of smoking (78%). Survivors’ scores on EORTC QLQ-C30 functioning scales indicated high functioning, with prevalent symptoms of fatigue, pain, and insomnia. Lower rates of missing data were observed in older patients, those with higher education and income levels, nonsmokers, married individuals, and patients not treated with radiotherapy. The odds ratios ranged from 0.47 to 0.99, indicating these factors may predict more consistent QoL data reporting in HNC survivors.
Conclusions These data support the development and validation of clinical prediction models for QoL in HNC survivors in a multicentre randomized controlled trial.
1 Introduction
The incidence of head and neck cancer (HNC) has been increasing worldwide in the last years and reached 1.1 million new diagnoses in 20161,2. Globally, it is the seventh most common type of cancer3,4. At the same time, the overall 5-year survival has improved considerably in the last decades, changing from 55% in 1992-1996 to 66% in 2002-20065, which makes long-term quality of life (QoL) a key concern for patients. These changes in survival can be partially explained by better treatments and a deeper understanding of the disease mechanisms, but also due to an increasing proportion of human papilloma virus (HPV)-induced oropharyngeal tumours which mainly affect a younger population with fewer comorbidities and better prognosis6–10. According to the stage of the disease and patient status (i.e., performance status, comorbidities), HNC can be treated with either single modality or multimodal approaches, including combinations of surgery, radiotherapy, and systemic therapy11. Despite the intention to cure, recurrences in the local and regional areas, as well as distant relapses, are common12,13. Additionally, these treatment methods often lead to significant toxicities and long-term complications14–16. Consequently, the QoL of HNC survivors is frequently compromised. Health related QoL has been found to be associated with clinical endpoints in oncology patients17, particularly studies have shown strong evidence of association between physical functioning and global QoL change with overall survival in individuals with HNC18. However, there is limited research on the long-term changes in QoL and the factors that influence these changes. Existing evidence suggests that global QoL tends to recover within 12 months after HNC treatment, but late complications persist, including declines in physical functioning, fatigue, xerostomia (dry mouth), and sticky saliva19, affecting overall QoL. Furthermore, the available literature on QoL in HNC survivors is relatively limited, particularly concerning long-term changes and determinants of QoL over time20. Most studies have focused on short-term recovery, but there is a lack of information regarding the sustained effects and late sequelae experienced by HNC survivors21,22. Investigating these factors is essential for optimizing patient care, identifying potential interventions to alleviate specific needs, and improving survivorship outcomes.
This study describes the creation of the multi-national Big Data for Quality of Life (BD4QoL) historical cohort, which was established to investigate QoL in HNC survivors. This cohort will be used for research to better understand the QoL trajectory in HNC survivors.
The aim of the study is to define and describe clinical, demographic, quality of life and behavioural characteristics of the patients in the BD4QoL historical cohort.
2 Methods
2.1 Quality of life questionnaires
The EORTC QLQ-C30 is a questionnaire developed by the European Organization for Research and Treatment of Cancer, for assessing the quality of life of cancer patients 23,24. This questionnaire is a Patient Reported Outcome (PRO) instrument which contains 30 questions that compose 10 sub-scales divided in three groups: functional sub-scales (physical function, role function, cognitive function, emotional function and social function), symptom sub-scales (pain, fatigue, nausea/vomiting) and global health status (GHS)/quality of life. In addition, it contains 6 individual items to assess: dyspnoea, insomnia, appetite loss, constipation, diarrhoea and financial difficulties. All scales and single-item measures range in score from 0 to 100. A high scale score represents a higher response level. Thus, a high score for a functional scale represents a high/healthy level of functioning, a high score for GHS/QoL represents a good overall QoL, but a high score for a symptom scale/item represents a high level of symptoms/problems.
The head and neck cancer module (EORTC QLQ-H&N35) incorporates seven multi-item scales that assess pain, swallowing, senses (taste and smell), speech, social eating, social contact and sexuality. There are also eleven single items. For all items and scales, high scores indicate a higher degree of problems, i.e., there are no functioning scales.
The EORTC QLQ-HN43 module is a revised and updated version of the head and neck cancer module EORTC QLQ-HN35. The 43 items can be combined into the following scales: fear of progression, body image, dry mouth and sticky saliva, pain in the mouth, sexuality, problems with senses, problems with shoulder, skin problems, social eating, speech, swallowing, and problems with teeth. Single item scales are coughing, swelling in the neck, neurological problems, trismus, social contact, weight loss, and problems with wound healing.
2.2 Cohort participants
2.2.1 Datasets
We screened data collected in prior research projects at Istituto Nazionale dei Tumori (Italy), University Hospitals Bristol and Weston NHS Foundation Trust, and University of Bristol (UK) and University Medical Centre Mainz (UMM) in Germany to construct this cohort (Table 1). Data collection was performed with the understanding and written consent of patients enrolled in the original studies.
Head and Neck 5000 (HN5000) is a large UK based study of people with head and neck cancer25,26. The study is sponsored by University Hospitals Bristol and Weston NHS Foundation Trust (UHBW) and is run by UHBW and the University of Bristol. Briefly, 5511 people were recruited from 76 UK centres between 2011 and 2014 making it one of the largest prospective cohort studies of people with head and neck cancer in the world. The study collected more than 200 variables at several timepoints (from diagnosis to 3 years follow-up), including clinical and demographic characteristics, standard QoL questionnaires, and information about physical and mental health. The inclusion criteria were25: individuals over the age of 16 with a new head and neck primary cancer seen or discussed at an appropriate multidisciplinary team (MDT) meeting or clinic; People presenting with a cancer of unknown primary (CUP); those without a definitive histological diagnosis were eligible if the MDT decision was that the primary site was likely to be a HNC. The exclusion criteria included: people considered to meet the criteria for mental incapacity or vulnerability set out in the mental capacity/ vulnerable adult act, recurrent HNC, a second head and neck cancer, skin cancer, lymphoma and a histological diagnosis of Carcinoma in Situ with no clear evidence of invasion (these patients were eligible if later upstaged following MDT discussion); Patients who had already commenced their cancer treatment (with the exception of those whose treatment was also their diagnostic procedure) were also excluded.
The second data set (UMM1)27 was a prospective cohort study in patients before and after total laryngectomy (TLE). Further eligibility criteria were written informed consent and age of 18 years or older. The patients were interviewed in a face-to-face setting before the surgery (t1), shortly before discharge from the hospital (t2), at the end of rehabilitation (t3), one year after baseline (t4) as well as two (t5) and three years (t6) after baseline. Participants also completed self-administered questionnaires, including the EORTC QLQ-C30 and the EORTC QLQ-H&N35. A total of 389 patients were enrolled between the years 2001-2011 from 13 hospitals in Germany.
The third data set (UMM2)28 comes from a similar study, but this time in patients who were scheduled for partial laryngectomy (PLE). The study design and data collection were in parallel with the UMM1 study up to t4. Data collection began in 2007 and ended in 2015. A sample of 391 patients were enrolled from 16 hospitals in Germany.
The fourth data set (UMM3)29 comes from an international validation study for the update of the EORTC QLQ-H&N35 questionnaire, the HN43 Phase IV study. In total, 812 patients from 18 countries in Europe, the Americas and Asia were enrolled. Patients with cancer of the larynx (ICD-10 code C32), lip (C00), oral cavity (C01-06), salivary glands (C07-08), oro-hypopharynx (C09-10, C12-14), nasopharynx (C11), nasal cavity (C30), nasal sinuses (C31), sarcoma in the head and neck region (C49), and lymph node metastases from Missing primary in the head and neck area (C77, C80.0).were included. There were no restrictions regarding stage, recurrence status, or treatments planned or performed. Patients with a tumour of the eyes, orbit, thyroid, skin (even if in the head and neck area), or lymphomas in the head and neck region were excluded. Patients completed the questionnaires up to 14 days before start of treatment (t1), three months (t2), and six months thereafter (t3).
The Big Data to Decide (BD2Decide) project was a European multicentre observation study including 1537 HNC patients from Italy, Germany and the Netherlands30. The aim of this project was to develop a multisource database to allow for prognostic prediction modelling in loco-regionally advanced HNC patients. The database was made of two cohorts: retrospective (diagnosis 2008-2014), and prospective (diagnosis 2015-2017). Main inclusion criteria were diagnosis of head and neck squamous cell carcinoma (HNSCC); stage III and IVA/B (based on AJCC/UICC seventh edition); receiving treatments with curative intent; availability of pre-treatment tumour specimen for biological analysis; availability of pre-treatment imaging scans for radiomic analysis; for patients enrolled prospectively, PROs were collected (EORTC C30, EORTC HN35 and EQ-5D-5L). The BD4QoL study included the prospective BD2Decide patients enrolled at one of the Italian cancer centres.
These longitudinal studies enrolled patients at diagnosis and before treatment initiation.
In the BD4QoL study, survivors are defined by the following inclusion and exclusion criteria:
Inclusion criteria
Non-metastatic head and neck cancers from one of the following subsites (ICD in annex 1): oral cavity, hypopharynx, larynx, oropharynx, nasopharynx, major or minor salivary glands, nasal cavity, paranasal sinuses.
Having received and concluded treatments with curative intent at time of study inclusion.
Being alive and disease-free at last post-treatment follow-up.
Stage I, II, III, IVa or IVb according to TNM 7th edition31.
Age ≥18 years.
Exclusion criteria
Histologies other than squamous cell carcinoma and salivary gland carcinomas (e.g., sarcoma, melanoma are excluded). Thyroid cancers, neuroendocrine tumours and non-epithelial HNC (e.g., melanoma, sarcoma, etc.) are excluded.
Distant metastases at the time of study entry.
Any previous HNC unrelated to the primary HNC for which the participant was treated; premalignant lesions (e.g., leukoplakia, erythroplakia, lichen etc.) are allowed.
Subjects with previous malignancies (except localized non-melanoma skin cancers, and the following in situ cancers: bladder, gastric, colon, esophageal endometrial, cervical/dysplasia, melanoma, or breast) unless a complete remission was achieved at least 5 years prior to study entry and no additional therapy is required during the study period.
2.2.2 Baseline definition
HNC survivors were defined as those patients alive and disease free after the end of treatment. However, end of treatment was not recorded in HN5000 and BD2Decide studies, and the QoL measurements were not exactly aligned with end of treatment (Table 2). Therefore, we defined the BD4QoL “baseline” as the first available QoL measurement after the known or inferred end of treatment. The baseline for UMM1 and UMM2 studies is defined as 4 months after diagnosis and for UMM3 is at 3 months, when end of treatment is recorded. For BD2Decide, the baseline is defined at 6 months after diagnosis, when all patients are assumed to have finished treatment in this study. In the case of HN5000, the baseline is established at 12 months after diagnosis, when all patients have finished their curative treatment.
2.3 Data harmonization
We performed data harmonization in a subset of variables to achieve compatible and comparable measurements across the different studies following the BD4QoL ontology32. In total, we harmonized 29 variables, consisting of 14 demographic and clinical variables and 15 QoL domains (Table 3). In addition, survival time and status were recorded in HN5000, BD2Decide and UMM1 studies, while UMM2 and UMM3 presented interval censored data.
Beyond the harmonised data, the studies that contributed data to this project recorded between hundreds to more than 2000 variables, which are therefore available in subsets of the BD4QoL cohort. Two versions of the head and neck module of the EORTC questionnaire were used in the studies, in addition to demographics and several other questionnaires which attempt to capture fear of recurrence, personal costs, hospital anxiety and depression, and general health.
2.4 Statistical analysis
Survival curves, stratified by study, were estimated using the Kaplan-Meier estimator. Survival curves were only made for the BD2Decide, HN5000 and UMM1 studies, as other studies presented interval censored data with few measurements.
To show the trajectory of overall quality of life over time, conditional on survival, we estimated the mean GHS/QoL for each time point with a QoL measurement in each study. A 95% confidence interval for the means were obtain by bootstrap with 1000 bootstrap resamples per time point.
To explore the factors associated with missing quality of life (QoL) measurements, univariate logistic regression analyses were performed. The response variable was a binary indicator for whether the GHS/QoL scale was missing or not. This scale was chosen because it summarises global QoL. The covariates considered in the respective univariate models were age, sex, tumour stage, tumour location, treatment, income, education, smoking status, and alcohol consumption.
For interpretation of the EORTC QLQ-C30 scales we used the thresholds for clinical importance (TCIs) proposed by Giesinger et al., which allow to identify patients with clinically important problems or symptoms33.
3 Results
3.1 Cohort characteristics
In total, 4448 HNC survivors were eligible for inclusion in this study (Figure 1). INT contributed 112 subjects with QoL measurements under BD2Decide semantics30,34,35. UMM contributed 883 subjects across 3 different longitudinal cohort studies (UMM1, UMM2, and UMM3), and Bristol contributed 3587 subjects using Head and Neck 5000 semantics and documentation available online (headandneck5000.org.uk). All datasets contain quality of life scores based on the EORTC QLQ-C30 and EORTC QLQ-H&N35 or EORTC QLQ-HN43 questionnaires.
The patients in the cohort were predominately male (78%), had a median age of 61 years, had primarily low (47%) to medium education level (35%), most were married or lived with a partner (68%), had a history of smoking (78%) and high alcohol consumption (median 84 alcohol units1/month) (Table 3). The education level variable presented distinct definitions across studies due to different educational systems. Thus, the education levels were mapped to years of education and converted into three categories: low (<10 years), medium (from 10 to 12 years), and high (>12 years). Overall, 75% of patients underwent radiotherapy, while 41% were treated with surgery and 38.6% received chemotherapy. Nearly half of the patients (48%) were treated with at least two therapeutic modalities, with the combination of chemotherapy and radiation being the most prevalent at 29%. The pairing of radiotherapy and surgery was the next most common, involving 10% of patients, and a combination of all three treatment approaches was used in 8% of cases. There was considerable heterogeneity between the datasets in some characteristics, particularly tumour stage and tumour region, where some studies had enrolled patients with only specific regions or tumour stages. For instance, in the BD2Decide dataset, only loco-regionally advanced stage tumours were represented (stages III and non-metastatic IV in TNM 7th). The UMM1 dataset predominantly consisted of advanced stage cases as well with 79.2% of stage III and IV subjects, while the remaining datasets demonstrated a more balanced distribution across tumour stages. In terms of tumour region, the UMM3 and HN5000 datasets had a limited number of cases involving the nasopharynx, salivary glands, nasal cavity, and paranasal sinuses. In addition, the UMM1 and UMM2 datasets contained cases with tumours overlapping multiple areas. In the integrated data, the three biggest tumour site groups were oropharynx (37.1%), larynx (31.3%) and oral cavity (23.6%). Tumour stages I, II, III and IV were distributed as 24.4%, 17.9%, 13.8% and 42.0% respectively.
Survivors had an average probability of overall survival for one year after end of treatment of 0.94 [95% CI: 0.93, 0.95] and of 0.88 [0.87, 0.89] after 2 years. For BD2Decide, the 1- and 2-years overall survival were 0.95 [0.91, 0.99] and 0.86 [0.79, 0.94] respectively, for UMM1 0.88 [0.83, 0.93] and 0.76 [0.70, 0.83], and for HN5000 0.94 [0.93, 0.95] and 0.89 [0.88, 0.90] (see Figure 2).
Concerning the GHS/QoL of survivors, we observed that survivors presented some level of QoL impairment at diagnosis and their QoL tends to decrease further during or immediately after treatment (3 and 4 months) (Figure 3). After reaching the minimum QoL, survivors recovered to a higher level (see the mean QoL at 6 and 12 months and later time points). However, according to recently established TCIs33 for the EORTC QLQ-C30 functioning and symptom scales, 40% and 35% of patients presented scores within ranges that indicate clinically important troubles related to physical and cognitive functioning respectively, 36 months post-diagnosis. Additionally, 30% reported fatigue and 39% indicated experiencing pain over the respective TCIs (see Table 4). Table 5 displays the data for the head and neck module, showing that the highest median scores among the EORTC QLQ-H&N35 scales were for dry mouth, coughing, sticky saliva, and reduced sexuality, with all four symptoms sharing a median value of 33.
3.2 Missing data
The proportion of missing values depended on the type of variable. Most clinical variables (e.g., sex, age, tumour stage and location and treatment) presented low missingness of 0-6.4%. Variables collected through interviews (e.g., marital status, education level and tobacco usage) had moderate missingness of 25.9-39.3%, while QoL variables had higher missingness, 39.9-41.8% at baseline. The majority of missing data in QoL variables was related with missing of the whole questionnaire (N=1692), usually because patients did not send them back, whereas 8.8% of subjects (N=393) presented missing values only in specific QoL subscales while others were measured (See Table 6 and figure S1 in the Supplementary Information). It is important to notice that the high proportion of unfilled questionnaires is driven by patients receiving the questionnaires by post and never returning them, while most patients that returned the questionnaire answered all questions. Finally, some data was missing in blocks in the dataset because studies did not collect the exact same variables. Body mass index (BMI) information was solely available for HN5000, and BD2Decide lacked data on education level and income. Furthermore, the UMM3 dataset did not include information on alcohol consumption and smoking status.
There were several associations between the frequency of missing QoL measurements and certain patient features (p-value < .05, Table 6). Specifically, being a female (OR: 0.88), older age (OR: 0.99), possessing a high level of education (OR: 0.48), having a high income (OR: 0.51), never being a smoker (OR: 0.48), being married or living with a partner (OR: 0.74), and treatment without radiotherapy (OR: 0.80) were found to be associated with a lower frequency of missing QoL data (see Table 6 for the full list of odds ratios).
4 Discussion
In this study, we defined and built a cohort of HNC survivors from historical data contributed from studies conducted in Germany, Italy, and the UK, with clear eligibility criteria and survivorship definition. To the best of our knowledge this is the largest cohort of HNC survivors in the world.
We observed large interstudy heterogeneity in global health status GHS/QoL at the survivorship baseline, i.e., at the first QoL measurement after end of treatment. Two important factors that can impact GHS/QoL should be considered: the time at which the baseline GHS/QoL was measured and the amount of missingness. Usually, QoL immediately after treatment is lower than at later timepoints36, which can explain the lower GHS/QoL in some contributing studies compared to others. For example, the UMM studies measured QoL just after treatment completion (3-4 months after diagnosis), while the HN5000 measured QoL 12 months after diagnosis, which may be several months after actual end of treatment for some patients. BD2Decide measured QoL 6 months after diagnosis, but the proportion of missing values is 50.9%. Since the missingness is related to the measurement, e.g., patients with the lowest GHS/QoL might not answer the questionnaire, the bias introduced can lead to a higher QoL on average than expected.
Overall survival 1-2 years after treatment remained high on average, but there were differences between the studies with UMM1 having shorter survival than BD2Decide and HN5000. This may be related to the patient characteristics in the studies but may also be partly explained by how end-of-treatment (baseline) was defined in the studies. In UMM1 the baseline was at 4 months after diagnosis, while the baseline was later in BD2Decide (at 6 months after diagnosis) and in HN5000 (12 months after diagnosis).
Regarding the QoL trajectory over time, we observe a dip in GHS/QoL around treatment, followed by an increase after treatment and a flattening out over time. This pattern has also been observed in previous studies37–39. Based on the EORTC QLQ-H&N35 scales, the most prominent concerns for this cohort are related to physical symptoms such as dry mouth, sticky saliva, and coughing.
We identified being female, older age, high education level, high income, and not smoking as factors associated with lower probability of not completing the GHS/QoL scale. This highlights the importance of proper missing data handling for improving the interpretability and analysis of QoL data in this population, as complete case approaches can introduce selection bias and impact the validity of the results. Furthermore, it is important to acknowledge the presence of data missing in blocks in the dataset, defined as structured missingness (SM)40. SM can be caused by several reasons, but it arises naturally when integrating multisource data due to studies collecting different sets of variables to address different research questions. This is a missing pattern that is becoming more common due to the increase in use of multisource data integration for machine learning models development. In the BD4QoL data SM is also present due to patients who never received the questionnaires or never returned them by post.
The BD4QoL historical cohort has several important strengths. It is to date the largest dataset on HNC survivors with 4448 subjects. The BD4QoL study is composed mainly by HNC survivors from 3 countries in Europe (Italy, Germany, and UK), in addition to smaller groups from many countries in different continents (Americas, Asia and Europe) included in the UMM3 study. This provides a dataset with rich demographics and regional variations. The data contains repeated QoL measurements acquired at various timepoints giving opportunity to unveil the longitudinal course of QoL in HNC survivors. The standardized and validated EORTC QLQ-C30 and the head and neck modules H&N35/43 were applied in all studies integrated in the BD4QoL cohort, which provides rich information about multidomain QoL. This can be used to describe HNC survivors, identify their needs to improve patient care and long-term QoL. The data providers maintained high quality data standards in the original studies, which consequently make the integrated BD4QoL data of excellent quality as well.
Nonetheless, it is important to state weaknesses in this study that can limit the scope of potential future analyses. Most of the survivors in the BD4QoL historical cohort are white Europeans, with the Global South and other ethnicities being underrepresented – though this represents the population where the studies were conducted. The number of questionnaires that were not returned by the patients and the patterns of missing data identified have the potential to introduce biases in future analyses, so missing data handling strategies should be carefully considered. In addition, structured missingness is present which requires specialized methods if the affected variables are kept in the analyses41,42. The heterogeneity in the time of which patients are considered survivors – whether after 3, 4, 6, or 12 months – is also a challenging aspect in this data. All patients are survivors, but those who have survived for longer, like 12 months, may not only have different QoL but also a greater likelihood of surviving even longer.
5 Findings to date
Time is often a neglected variable in clinical studies, but it has a clear impact in PROs43. Instruments measuring PROs are administered at various timepoints across studies due to the different research questions and assumed intervention effects. Quality of life measurements are sensitive to time of collection and impose a challenge for interoperability when different datasets are to be integrated.
However, measuring QoL “as often as possible”, which is sometimes done, is also not a solution for this problem because study participants then lose the motivation to complete the forms properly and completely. As a rule of thumb, about 4 times a year in the first year after diagnosis and once a year after that are usually acceptable by patients.
6 Ethics statement
Our study was conducted in full accordance with ethical principles, including the World Medical Association’s Declaration of Helsinki (2002 version). The study protocol received ethical approval by the Norwegian Regional Committee for Medical and Health Research Ethics (REK) South-East D under application number 154191. The data is stored in compliance with GDPR legislation in the secure server for sensitive data at the University of Oslo (TSD/USIT) and access is granted to authorised collaborators included in the ethical approval. Each individual study that provided data received ethical approval from local authorities in Italy, Germany, the UK, and all further countries involved. The data providers submitted copies of these ethical approvals to the principal investigator.
BD2Decide study was approved by institutional research ethics board with identifier N. INT 65/16.
The HN5000 study was approved by the National Research Ethics Committee (South West Frenchay Ethics Committee, reference number 10/H0107/57, 5 November 2010) and the Research and Development departments of participating NHS Trusts. Informed consent was obtained from all patients recruited to HN5000.
UMM1 and UMM2 received ethical approval from the Leipzig University Ethics Committee. UMM3 was approved by Landesa rztekammer Rheinland-Pfalz ethics committee under approval number 837.281.14 (9520).
7 Conflict of Interest
Marissa LeBlanc reports receiving a speaker fee from MSD unrelated to the content of this work. Susanne Singer reports receiving honoraria for reviewing journal papers for the Quality-of-life-prize of Lilly, outside of this work. Lisa Licitra declares research funds to the institute for clinical studies from Astrazeneca, BMS, Boehringer Ingelheim, Celgene International, Eisai, Exelixis, Debiopharm International SA, Hoffmann-La Roche ltd, IRX Therapeutics, Medpace, Merck-Serono, MSD, Novartis, Pfizer, Roche, Buran, Alentis; occasional fees for participation as a speaker at conferences/congresses or as a scientific consultant for advisory boards from Astrazeneca, Bayer, MSD, Merck-Serono, AccMed, Neutron Therapeutics, Inc.
The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
8 Author Contributions
MMS, KT, SS, KH, ST, AN, CV, LL-P, MFC-U, GF, LL, ML contributed to the conceptualization of the study. SS, KH, ST, AN, MFC-U, AF, LL, ML were responsible for funding acquisition. KT, SS, KH, ST, MP, AN, SC, LL conducted data sharing. MMS, EIFF, AF, ML participated in the investigation process. MMS, EIFF developed and/or worked with the necessary software. MMS, EIFF, KT, CV, ML crafted the methodology framework of the study. LL-P, AF, LL provided the resources needed for the research. All authors contributed to writing the manuscript.
9 Funding
The BD4QoL project, in the frame of which this work is being conducted, has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No 875192. MM-S received funding from the European Union’s Horizon 2020 Research and Innovation program under the Marie Skłodowska-Curie Actions Grant, agreement No. 80113 (Scientia fellowship). This Publication presents data from the Head and Neck 5000 study. The study was a component of independent research funded by the National Institute for Health and Care Research (NIHR) under its programme Grands for applied Research scheme (RP-PG-070-10034). The views expressed in this publication are those of the author(s) and are not necessarily those of the NHS, the NIHR or the department of health and Social Care. Core funding was also provided through awards from Above and Beyond, University Hospitals Bristol and Weston Research Capability Funding and the NIHR Senior Investigator award to Professor Andy Ness. The UMM1 study was funded by the German Federal Ministry of Education and Science (Grant Number: 7DZAIQTX), the UMM1 and UMM2 study by the German Cancer Aid (Grant Numbers #106654, #107440, #108758, #109604), the UMM3 study by the European Organisation for Research and Treatment of Cancer (Grant Number: 001/2014).
11 Data Availability Statement
The BD4QoL historical cohort dataset is hosted by the Services for Sensitive Data (TSD) at the University of Oslo. Access to the data may be granted by the data owners upon application.
The data that support the findings of this study are available from head and neck 5000. Further information may be found on the Head and Neck 5000 website: https://www.headandneck5000.org.uk/information-for-researchers.
10 Acknowledgments
This work was performed on the TSD (Tjeneste for Sensitive Data) facilities, owned by the University of Oslo, operated and developed by the TSD service group at the University of Oslo, IT-Department (USIT). The statistical analysis and data harmonisation were performed on resources provided by Sigma2 - the National Infrastructure for High Performance Computing and Data Storage in Norway. The Authors thank the researchers and clinicians who designed the BD2Decide, HN5000, and Mainz studies, the research, laboratory and clinical staff who supported the conduct of the studies; and the people with head and neck cancer who took part.
Footnotes
↵1 Alcohol unit is a dimensionless measurement unit defined as 10ml or 8g of pure alcohol.