Abstract
Background This study aimed to investigate model-based reasoning (MBR) algorithms for the diagnosis of integrative medicine based on electronic medical records (EMRs) and natural language processing.
Methods A total of 14,075 medical records of clinical cases were extracted from the EMRs as the development dataset, and an external test dataset consisting of 1,000 medical records of clinical cases was extracted from independent EMRs. MBR methods based on word embedding, machine learning, and deep learning algorithms were developed for the automatic diagnosis of syndrome pattern in integrative medicine. MBR algorithms combining rule-based reasoning (RBR) were also developed. A standard evaluation metrics consisting of accuracy, precision, recall, and F1 score were used for the performance estimation of the methods. The association analyses were conducted on the sample size, number of syndrome pattern type, and diagnosis of lung diseases with the best algorithms.
Results The Word2Vec CNN MBR algorithms showed high performance (accuracy of 0.9586 in the test dataset) in the syndrome pattern diagnosis. The Word2Vec CNN MBR combined with RBR also showed high performance (accuracy of 0.9229 in the test dataset). The diagnosis of lung diseases could enhance the performance of the Word2Vec CNN MBR algorithms. Each group sample size and syndrome pattern type affected the performance of these algorithms.
Conclusion The MBR methods based on Word2Vec and CNN showed high performance in the syndrome pattern diagnosis in integrative medicine in lung diseases. The parameters of each group sample size, syndrome pattern type, and diagnosis of lung diseases were associated with the performance of the methods.
Competing Interest Statement
The authors have declared no competing interest.
Clinical Trial
NCT03274908
Funding Statement
Grants from the Institutes of Integrative Medicine of Fudan University. ClinicalTrials.gov Identifier: NCT02461472; and China Postdoctoral Science Foundation funded project (2017M611461).
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
N/A
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Footnotes
Funding sources: grants from the Institutes of Integrative Medicine of Fudan University. ClinicalTrials.gov Identifier: NCT03274908; and China Postdoctoral Science Foundation funded project (2017M611461).
Author’s email:
W.G: drug{at}fudan.edu.cn
X.Q: qinxuanfeng777{at}163.com
Z.W: shwzhuo{at}cn.ibm.cn or flezze{at}163.com
Q.K: kq2016829{at}163.com
Z.T: dr_zhtang{at}yeah.net
L.J: jianglinhappy{at}126.com
Data Availability
The datasets generated and/or analyzed during the current study are not publicly available due to private information but are available from the corresponding author on reasonable request. Dataset are from the study whose authors may be contacted at Center of Bioinformatics and Biostatistics, Institutes of Integrative Medicine, Fudan University. The data concerning external test dataset and an example of development of dataset were available in https://github.com/zihuitang/clincial_decision_support_system_im.
https://github.com/zihuitang/clincial_decision_support_system_im
Abbreviations
- ANN
- Artificial neural network
- CI
- Confidence interval
- CNN
- Convolutional neural network
- EMRs
- Electronic medical records
- XGBoost
- Extreme gradient boosting
- KNN
- K-nearest neighbor
- MBR
- Model-based reasoning
- MLP
- Multilayer perceptron
- NLP
- Natural language processing
- RF
- Random forest
- RBR
- Rule-based reasoning
- SVM
- Support vector machines
- TCM
- Traditional Chinese medicine