Automatically Identifying Twitter Users for Interventions to Support Dementia Family Caregivers: Annotated Data Set and Benchmark Classification Models

Ari Z. Klein; Arjun Magge; Karen O’Connor; Graciela Gonzalez-Hernandez

doi:10.1101/2022.05.18.22275268

Abstract

Background More than 6 million people in the United States have Alzheimer’s disease and related dementias, receiving help from more than 11 million family or other informal caregivers. A range of traditional interventions have been developed to support family caregivers; however, most of them have not been implemented in practice and remain largely inaccessible. While recent studies have shown that family caregivers of people with dementia use Twitter to discuss their experiences, methods have not been developed to enable the use of Twitter for interventions.

Objective The objective of this study was to develop an annotated data set and benchmark classification models for automatically identifying a cohort of Twitter users who have a family member with dementia.

Methods Between May 4, 2021 and May 20, 2021, we collected 10,733 tweets, posted by 8846 users, that mention a dementia-related keyword, a linguistic marker that potentially indicates a diagnosis, and a select familial relationship. Three annotators annotated one random tweet per user to distinguish those that indicate having a family member with dementia from those that do not. We used the annotated tweets to train and evaluate deep neural network classifiers based on pretrained transformer models. To assess the scalability of our approach, we, then, deployed automatic classification on tweets that were continuously collected between May 4, 2021 and March 9, 2022.

Results Inter-annotator agreement was 0.82 (Fleiss’ kappa). A classifier based on a BERT model pretrained on tweets achieved the highest F₁-score of 0.962 (precision = 0.946, recall = 0.979) for the class of tweets indicating that the user has a family member with dementia. The classifier detected 128,838 tweets that indicate having a family member with dementia, posted by 74,290 users between May 4, 2021 and March 9, 2022—that is, approximately 7500 users per month.

Conclusions Our annotated data set can be used to automatically identify Twitter users who have a family member with dementia, enabling the use of Twitter on a large scale to not only explore family caregivers’ experiences, but also directly target interventions at these users.

Introduction

More than 6 million people in the United States have Alzheimer’s disease and related dementias, and the burden is projected to double by 2060 [1]. Alzheimer’s disease is the sixth leading cause of death in the United States [2], and only 8% of people with dementia do not receive help from family members or other informal care providers [3], amounting to more than 11 million family or other unpaid caregivers in 2020 [4]. Caregivers of people with dementia are impacted physically, cognitively, socially, mentally, and financially. For instance, compared with non-caregivers, they are more vulnerable to disease due to chronic stress [5] and have lower durations and quality of sleep [6]. Compared with non-dementia caregivers, they are more likely to experience a decline in cognition [7] and social network size [8]. They are also more likely to experience depression than non-caregivers [9] and non-dementia caregivers [10], and depressive symptoms in dementia caregivers are associated with increased health care use and costs [11]. In addition to the increased costs of their personal health care, family caregivers of people with dementia pay for much of the recipient’s total care costs, with the costs being significantly higher for people with dementia than without dementia [12].

A range of traditional interventions have been developed to support family caregivers of people with dementia [13]; however, most of them have not been implemented in practice and remain largely inaccessible [14]. Recent systematic reviews have concluded that internet-based interventions are valued by family caregivers of people with dementia for their easy access [15] and can have beneficial effects on caregivers’ health [16]. While recent studies [17-23] have shown that family caregivers of people with dementia use Twitter to discuss their experiences, as far as we know, methods have not been developed to enable the use of Twitter as a platform for internet-based interventions. Given that nearly one of every four adults in the United States uses Twitter [24], Twitter may present a novel opportunity to reach family caregivers on a large scale, such as through user-targeted advertisements providing information about dementia, caregiving, resources, or services. The objective of this study was to develop an annotated data set and benchmark classification models for automatically identifying a cohort of Twitter users who have a family member with dementia.

Methods

Data Collection and Annotation

Between May 4, 2021 and May 20, 2021, we collected 67,060 publicly available tweets from the Twitter Streaming Application Programming Interface (API) that are in English, are not retweets, and include both a dementia-related keyword (e.g., dementia, youngdementia, #yod, #ftd, alzheimer’s, alz, alzheimersdisease, mild cognitive impairment) and a linguistic marker that potentially indicates a diagnosis (e.g., diagnosed, diagnosis, has, got, developed, with, from). We, then, searched these tweets for references to select familial relationships (e.g., mom, dad, grandma, grandpa, wife, husband, sister, brother, daughter, son), identifying 10,733 (16%) of the 67,060 tweets. We randomly sampled one tweet per user—8846 (82%) of the 10,733 tweets—and developed annotation guidelines to help three annotators distinguish tweets that indicate having a family member with dementia from those that do not.

Automatic Classification

We performed benchmark supervised machine learning experiments to assess the utility of the annotated data set for automatically identifying Twitter users who have a family member with dementia. We used six deep neural network classifiers based on bidirectional encoder representations from transformers (BERT): the BERT-Base-Uncased [25], DistilBERT-Base-Uncased [26], RoBERTa-Large [27], BioBERT-Large-Cased [28], Bio+ClinicalBERT [29], and BERTweet-Large [30] pretrained models in the Flair Python library. We split the 8846 tweets into 80% (7077 tweets) and 20% (1769 tweets) random sets as training data and held-out test data, respectively, stratified based on the distribution of the binary annotated classes. Prior to automatic classification, we preprocessed the tweets by normalizing URLs and usernames, and lowercasing the text. For training, we used stochastic gradient descent (SGD) optimization, a batch size of 8, 15 epochs, and a learning rate of 0.001. During training, we fine-tuned all layers of the transformer model with our annotated tweets. To optimize performance, the model was evaluated after each epoch, on a 5% split of the training set. To assess the scalability of our approach, we, then, deployed automatic classification on 198,674 tweets, posted by 119,640 users, that were continuously collected from the Twitter Streaming API between May 4, 2021 and March 9, 2022 and mention a select familial relationship. To assess the potential for tailoring interventions, we used Carmen [31] to infer the geolocation of users that the classifier determined have a family member with dementia.

Results

Among the 8846 annotated tweets, 8346 (94%) were dual annotated, and 500 (6%) were annotated by all three annotators. Inter-annotator agreement, based on the 500 tweets annotated by all three annotators, was 0.82 (Fleiss’ kappa). Upon resolving the disagreements, it was determined that 5946 (67%) of the tweets indicate that the user has a family member with dementia (“positive” class), and 2900 (33%) of the tweets do not (“negative” class). Table 1 presents the precision, recall, and F₁-scores of six deep neural network classifiers for the “positive” class, evaluated on a held-out test set of 1769 (20%) of the 8846 annotated tweets. The classifier based on a model pretrained on tweets (BERTweet-Large) achieved the highest F₁-score: 0.962 (precision = 0.946, recall = 0.979). The BERTweet classifier detected 128,838 tweets indicating that the user has a family member with dementia, posted by 74,290 users between May 4, 2021 and March 9, 2022—that is, approximately 7500 users per month. Carmen [31] inferred the geolocation of 31,653 (43%) of these 74,290 users.

View this table:

Table 1.

Precision, recall, and F₁-scores of classifiers for detecting tweets indicating that the user has a family member with dementia.

Table 2 presents examples of false positives and false negatives of the BERTweet classifier in the test set. Among the 68 false positives, 36 (47%) refer to people with dementia who are not or may not be select family members (Tweet1), 8 (12%) report that a family member has a condition other than dementia (Tweet 2), and 5 (7%) merely speculate that a family member has dementia (Tweet 3). Another 8 (12%) of the 68 false positives were a result of manual annotation errors. Among the 25 false negatives, 14 (56%) use deixis or anaphora, requiring additional context in the tweet to understand that a non-first-person determiner (e.g., their in Tweet 4) actually refers to the user, or that a personal pronoun (e.g., she in Tweet 5) refers to a select family member with dementia. Furthermore, 12 (86%) of these 14 tweets also include references to people who are not family members or do not have dementia. Another 4 (16%) of the 25 false negatives were a result of manual annotation errors.

View this table:

Table 2.

Sample false positives and false negatives of a BERTweet classifier for detecting tweets indicating that the user has a select family member with dementia.

Discussion

The benchmark performance of automatic classification demonstrates that our annotated data set has utility for accurately identifying Twitter users who have a family member with dementia, and deploying automatic classification on unlabeled tweets demonstrates that a large cohort of users can be identified. Although we assumed that having “close” relatives with dementia would more likely imply the users’ involvement in caregiving, the users identified in this study may not be caregivers or may have been caregivers but are no longer. Nonetheless, we believe that limiting our identification of caregivers to users who explicitly state that they are providing ongoing care would underutilize the potential of Twitter for scaling up accessible interventions. Furthermore, our approach to identifying users and our ability to identify the geolocation of nearly half of them would enable opportunities to tailor interventions based on the care recipient and community.

Conclusions

This paper presented an annotated data set and benchmark classification models for automatically identifying Twitter users who have a family member with dementia, enabling the use of Twitter on a large scale to not only explore family caregivers’ experiences among their tweets, but also directly target interventions at these users.

Data Availability

All data produced in the present study are available upon reasonable request to the authors.

Conflicts of Interest

None declared.

Acknowledgments

AZK designed the data collection, edited the annotation guidelines, conducted the error analysis, and wrote the manuscript. AM performed the machine learning experiments, deployed the BERTweet classifier, and edited the manuscript. KO developed the annotation guidelines, annotated the Twitter data, and edited the manuscript. GGH conceptualized the study and edited the manuscript. The authors thank Ivan Flores for contributing to software applications, and Alexis Upshur and Aiden McRobbie-Johnson for contributing to annotating the Twitter data. This work was supported by the National Library of Medicine (grant number R01LM011176).

References

1.↵
Matthews KA, Xu W, Gaglioti AH, Holt JB, Croft JB, Mack D, McGuire LC. Racial and ethnic estimates of Alzheimer’s disease and related dementias in the United States (2015-2060) in adults aged ≥65 years. Alzheimers Dement 2019;15(1):17–24.
OpenUrl CrossRef PubMed
2.↵
Kochanek KD, Xu J, Arias E. Mortality in the United States, 2019. NCHS Data Brief 2020;395:1–8.
OpenUrl
3.↵
Kasper JD, Freedman VA, Spillman BC, Wolff JL. The disproportionate impact of dementia on family and unpaid caregiving to older adults. Health Aff (Millwood) 2015;34(10):1642–1649.
OpenUrl Abstract/FREE Full Text
4.↵
2021 Alzheimer’s disease facts and figures. Alzheimers Dement 2021;17(3):327–406.
OpenUrl
5.↵
Fonareva I, Oken BS. Physiological and functional consequences of caregiving for relatives with dementia. Int Psychogeriatr 2014;26(5):725–47.
OpenUrl CrossRef PubMed
6.↵
Gao C, Chapagain NY, Scullin MK. Sleep Duration and sleep quality in caregivers of patients with dementia: a systematic review and meta-analysis. JAMA Netw Open 2019;2(8):e199891.
OpenUrl
7.↵
Dassel KB, Carr DC, Vitaliano P. Does caring for a spouse with dementia accelerate cognitive decline? findings from the Health and Retirement Study. Gerontologist 2017;57(2):319–328.
OpenUrl
8.↵
Liu C, Fabius CD, Howard VJ, Haley WE, Roth DL. Change in social engagement among incident caregivers and controls: findings from the Caregiving Transitions Study. J Aging Health 2021;33(1-2):114–124.
OpenUrl CrossRef
9.↵
Ma M, Dorstyn D, Ward L, Prentice S. Alzheimers’ disease and caregiving: a meta-analytic review comparing the mental health of primary carers to controls. Aging Ment Health 2018;22(11):1395–1405.
OpenUrl
10.↵
Sheehan OC, Haley WE, Howard VJ, Huang J, Rhodes JD, Roth DL. Stress, burden, and well-being in dementia and nondementia caregivers: insights from the Caregiving Transitions Study. Gerontologist 2021;61(5):670–679.
OpenUrl
11.↵
Zhu CW, Scarmeas N, Ornstein K, Albert M, Brandt J, Blacker D, Sano M, Stern Y. Health-care use and cost in dementia caregivers: longitudinal results from the Predictors Caregiver Study. Alzheimers Dement 2015;11(4):444–454.
OpenUrl
12.↵
Kelley AS, McGarry K, Bollens-Lund E, Rahman OK, Husain M, Ferreira KB, Skinner JS. Residential setting and the cumulative financial burden of dementia in the 7 years before death. J Am Geriatr Soc 2020;68(6):1319–1324.
OpenUrl
13.↵
Gaugler JE, Potter T, Pruinelli L. Partnering with caregivers. Clin Geriatr Med 2014;30(3):493–515.
OpenUrl CrossRef PubMed
14.↵
Gitlin LN, Marx K, Stanley IH, Hodgson N. Translating evidence-based dementia caregiving interventions into practice: state-of-the-science and next steps. Gerontologist 2015;55(2):210–226.
OpenUrl CrossRef PubMed
15.↵
Hopwood J, Walker N, McDonagh L, Rait G, Walters K, Iliffe S, Ross J, Davies N. Internet-based interventions aimed at supporting family caregivers of people with dementia: systematic review. J Med Internet Res 2018;20(6):e216.
OpenUrl PubMed
16.↵
Leng M, Zhao Y, Xiao H, Li C, Wang Z. Internet-based supportive interventions for family caregivers of people with dementia: systematic review and meta-analysis. J Med Internet Res 2020;22(9):e19468.
OpenUrl
17.↵
Yoon S. What can we learn about mental health needs from tweets mentioning dementia on World Alzheimer’s Day? J Am Psychiatr Nurses Assoc 2016;22(6):498–503.
OpenUrl CrossRef PubMed
18.
Danilovich MK, Tsay J, Al-Bahrani R, Choudhary A, Agrawal A. #Alzheimer’s and dementia: expressions of memory loss on Twitter. Topics in Geriatric Rehabilitation 2018;34(1):48–53.
OpenUrl
19.
Cheng TY, Liu L, Woo BK. Analyzing Twitter as a platform for Alzheimer-related dementia awareness: thematic analyses of tweets. JMIR Aging 2018;1(2):e11542.
OpenUrl
20.
Yoon S, Lucero R, Mittelman MS, Luchsinger JA, Bakken S. Mining Twitter to inform the design of online interventions for Hispanic Alzheimer’s disease and related dementias caregivers. Hisp Health Care Int 2020;18(3):138–143.
OpenUrl
21.
Mehta N, Zhu L, Lam K, Stall NM, Savage R, Read SH, Wu W, Pop P, Faulkner C, Bronskill SE, Rochon PA. Health forums and Twitter for dementia research: opportunities and considerations. J Am Geriatr Soc 2020;68(12):2881–2889.
OpenUrl
22.
Bacsu JD, O’Connell ME, Cammer A, Azizi M, Grewal K, Poole L, Green S, Sivananthan S, Spiteri RJ. Using Twitter to understand the COVID-19 experiences of people with dementia: infodemiology study. J Med Internet Res 2021;23(2):e26254.
OpenUrl PubMed
23.↵
Yoon S, Broadwell P, Alcantara C, Davis N, Lee H, Bristol A, Tipiani D, Nho JY, Mittelman M. Analyzing topics and sentiments from Twitter to gain insights to refine interventions for family caregivers of persons with Alzheimer’s disease and related dementias (ADRD) during COVID-19 pandemic. Stud Health Technol Inform 2022;289:170–173.
OpenUrl
24.↵
Auxier B, Anderson M. Social media use in 2021. Pew Research Center. 2021 Apr 07. URL: https://www.pewresearch.org/internet/2021/04/07/social-media-use-in-2021/ [accessed 2022-02-25]
25.↵
Devlin J, Cheng M, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. 2019 Presented at: 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT); June 2-7, 2019; Minneapolis, MN p. 4171–4186.
26.↵
Sanh V, Debut L, Chaumond J, Wolf T. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. 2019 Presented at: 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing; December 13, 2019; Vancouver, Canada.
27.↵
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer M, Stoyanov V. RoBERTa: a robustly optimized BERT pretraining approach. arXiv Preprint posted online on July 26, 2019.
28.↵
Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, Kang J. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 2020;36(4):1234–1240.
OpenUrl CrossRef PubMed
29.↵
Alsentzer E, Murphy J, Boag W, Weng WH, Jindi D, Naumann T, McDermott M. Publicly available clinical BERT embeddings. 2019 Presented at: 2^nd Clinical Natural Language Processing Workshop; June 7, 2019; Minneapolis, MN p. 72–78.
30.↵
Nguyen DQ, Vu T, Nguyen AT. BERTweet: a pre-trained language model for English tweets. 2020 Presented at: Conference on Empirical Methods in Natural Language Processing: System Demonstrations; November 16, 2020; Online p. 9–14.
31.↵
Dredze M, Paul MJ, Bergsma S, Tran H. Carmen: a Twitter geo-location system with applications to public health. 2013 Presented at: Association for the Advancement of Artificial Intelligence Workshop Expanding the Boundaries of Health Informatics Using AI; Jul 14–15, 2013; Bellevue, WA.