RT Journal Article SR Electronic T1 A classification model to predict specialty drug use JF medRxiv FD Cold Spring Harbor Laboratory Press SP 2021.06.30.21259718 DO 10.1101/2021.06.30.21259718 A1 Ni, Xianglian A1 Fairless, Andrew A1 McCammon, Jasmine M. A1 Rahmanian, Farbod A1 Lavoie, Heather YR 2021 UL http://medrxiv.org/content/early/2021/07/05/2021.06.30.21259718.abstract AB Objective Predicting who is likely to become utilizers of specialty drugs allows care managers to have an early intervention and payers to have financial preparation for the upcoming spending. Our administrative claims-based predictive model is to predict the members who might use specialty drugs.Materials and Methods A national database* and a commercial health plan claim data were used to select a total 6.5 million people who were not taking any specialty drugs before the Target Prediction Window. There were about 136,700 members who were older than 65 in the study. We extracted 81 features from past history of medical, pharmacy claims, and demographic data to predict the specialty drug use in the following year. Members having at least three-month continuous enrollment either under medical or pharmacy plan in the previous year immediately before the start of the target prediction window and with no specialty drug taking history were eligible for this study. We trained and tuned on 75% of the data using an extreme gradient boosting binary classifier. We used the remaining 25% of the data to predict the outcomes and evaluate the performance. We also recorded the performance for the age group older than 65 years old.Results There were 3% of members who used specialty drugs in the cohort under the current study. The important features for prediction included age, monthly pharmacy payment, monthly medical payment, diseases, procedure, or drug-related codes. On the test data with members of all ages, model performance for the area under the receiving operator characteristics curve (AUROC) was 78.6%. For the test set on members older than 65 (prevalence rate 3.6%), we had an AUROC of 79.3%.Discussion There is no similar machine learning model in the field to predict specialty drug use. Our model provides an unparalleled opportunity to allow early intervention for people who might develop diseases that require specialty drug use. It is also important for health plans and providers to know their covered population who might use specialty drugs and predict the increased cost in the next year.Conclusion A predictive model of specialty drug use can be helpful for both payers and providers to prepare for a spending spike or have an early intervention. In return, this helps to improve patients’ overall satisfaction.Competing Interest StatementAll authors are employees of Geneia LLC.Funding StatementNo external funding was received.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The Geneia Ethics Committee has determined based upon the information provided below that 'ethics review is not applicable'. All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesThe data used for this manuscript are commercially licensed from the CCAE database and not publically available. The independent dataset contains sensitive personal health information and is therefore also not publically available.