Abstract
Recently, image-based diagnostic technology has made encouraging and astonishing development. Modern medical care and imaging technology are increasingly inseparable. However, the current diagnosis pattern of Signal-to-Image-to-Knowledge inevitably leads to information distortion and noise introduction in the procedure of image reconstruction (Signal-to-Image). Artificial intelligence (AI) technologies that can mine knowledge from vast amounts of data offer opportunities to disrupt established workflows. In this prospective study, for the first time, we developed an AI-based Signal-to-Knowledge diagnostic scheme for lung nodule classification directly from the CT rawdata (the signal). We found that the rawdata achieved almost comparable performance with CT indicating that we can diagnose diseases without reconstructing images. Meanwhile, the introduction of rawdata could greatly promote the performance of CT, demonstrating that rawdata contains some diagnostic information that CT does not have. Our results break new ground and demonstrate the potential for direct Signal-to-Knowledge domain analysis.
Introduction
The discovery of X-rays in 1895 ushered in a new era in the use of imaging for medical diagnostic purposes. Since then, the non-invasive medical imaging technology subverts the palpation and cut-and-see scheme [1]. The technological advances in medical imaging have been astounding over the past 120 years, and modern medical care is increasingly inseparable from imaging technology. Medical imaging is essential for humans, to allow clinicians to observe from the images and diagnose diseases. This process can be defined as a path of image-to-knowledge. However, recently, it is found that the human ability has become a bottleneck in this path hindering the accurate diagnosis and treatment of diseases [2][3].
The emergence of Artificial Intelligence (AI) technology partially solves the problem of the limited ability of humans in the diagnosis process [4][5][6][7]. AI could automatically mine the radiographic patterns that related to the occurrence and progression of diseases from the imaging data, and it has been shown to match and even surpass human abilities in many clinical applications [8][9][10][11][12]. The essential reason why AI could surpass humans may be that AI treats images as data rather than the visual image and extracts huge amounts of features for analysis [13][14]. However, the medical image is compressed or filtered data to fit the human eye, which may be insufficient or imperfect for diagnosis. Take computed tomography (CT) for example, the CT system first collects rawdata (signal) from the patient, then the reconstruction method converts rawdata to images (signal-to-image) [15]. Therefore, both AI-based and human-based diagnosis are processes of signal-to-image-to-knowledge. Medical images suffer from information distortion in both acquisition and reconstruction processes. The current high sampling frequency greatly compresses the influence of factors such as motion artifacts in the acquisition process, so the main reason for the loss of resolution is concentrated in the operations such as interpolation and sub-optimal statistical weighting in the reconstruction process [16]. Meanwhile, the unprocessed data size of rawdata is about 10 to 20 times larger than that of CT images (2GB compare with 180MB). The huge amount of information inside the rawdata is not optimally mined in current signal-to-image-to-knowledge process, and how to analysis rawdata is of great scientific interest.
Skipping the image process and going directly from signal to knowledge, will hopefully bring new breakthroughs in disease diagnosis. Inspired by this idea, several previous studies have talked about the potential value of analysis of rawdata [17][18][19], directly from signal to knowledge. De Man Q et.al. conducted a simulation experiment to detect and estimate the vessel centerline from rawdata in the sinogram domain [19]. They achieved encouraging initial results showing the feasibility of rawdata analysis for clinical CT analysis tasks. We have also reported our simulation results about lung cancer at American Association for Cancer Research (AACR) conference [20]. However, there is no study about signal-to-knowledge analysis in real clinical tasks of patients.
In this prospective study, for the first time, we developed an AI-based signal-to-knowledge diagnostic scheme for lung nodule classification directly from the CT rawdata (The flowchart was shown in Fig. 1). The value of rawdata alone (Discussion), as well as its added value to CT, are studied on 276 patients. We found that the rawdata achieved almost comparable performance with CT indicating that we can diagnose diseases without reconstructed images. Meanwhile, the introduction of rawdata could greatly promote the performance of CT, demonstrating that rawdata contains some diagnostic information that CT does not have. This research breaks the routinely used circle of image-based diagnosis, which may open up a new pathway of signal-to-knowledge for disease diagnosis.
Results
Clinical characteristics
The clinical characteristics was summarized in Table S1. A total of 276 patients were included and the number of patients in the training cohort, validation cohort and test cohort were 166, 55 and 55, respectively. Fifty percent (n = 138) patients were female and the mean of age in the entire dataset was 58.48 years. Furthermore, there were 21 (8%) small cell carcinoma, 35 (13%) squamous cell carcinomas and 149 (54%) adenocarcinomas. With respect to lesion location, most patients were identified as right upper lobe (n = 89, 32%), followed by left lower lobe (n = 67, 24%) and left upper lobe (n = 64, 23%) in all patients. For lung cancer diagnosis, most patients (n = 225, 82%) were evaluated as malignancy.
Performance of CT model and rawdata gain model
This experiment explores the performance improvement that the residual fusion model (Methods) based on both rawdata and CT images can bring to the model based on CT images only. For further explore the repeatability and stability of this gain, we tested four different CT models (Abbreviation for CTM1∼CTM4; Methods) and adopted three backbone network architectures for rawdata feature extraction, namely Densenet121 (DN) [21], Resnet18 (RE) [22] and Resnext18 (RX) [23]. For each CT model (CTM), three rawdata gain models (RGM) based on different backbone feature extraction networks were constructed. The performance of each RGM was compared with the original CTM. The receiver operating characteristic curves (ROC) and its area under the curve (AUC) of four CTMs and the corresponding RGMs based on difference backbone networks is shown in Fig. 2.
For each CTM, the residual fusion models based on different backbone networks can obtain better classification performance on training, validation and test cohorts. For CTM1 model, the fusion model that produced the maximum performance improvement for the training cohort is RGM-RX1, and its AUC improvement can reach 0.051 (from 0.757 to 0.808). The fusion model that produced the maximum performance improvement for the validation cohort is RGM-RE1 and RGM-RX1, and its AUC improvement can reach 0.033 (from 0.756 to 0.789). The fusion model that produced the maximum performance improvement for the test cohort is RGM-RE1, and its AUC improvement can reach 0.046 (from 0.807 to 0.853). For CTM2 model, the fusion model that produced the maximum performance improvement for the training cohort is RGM-RX2, and its AUC improvement can reach 0.109 (from 0.745 to 0.854). The fusion model that produced the maximum performance improvement for the validation cohort is RGM-DM2, and its AUC improvement can reach 0.124 (from 0.698 to 0.822). The fusion model that produced the maximum performance improvement for the test cohort is RGM-DM2, and its AUC improvement can reach 0.022 (from 0.760 to 0.782). For CTM3 model, the fusion model that produced the maximum performance improvement for the training cohort is RGM-RX3, and its AUC improvement can reach 0.083 (from 0.765 to 0.848). The fusion model that produced the maximum performance improvement for the validation cohort is RGM-RX3, and its AUC improvement can reach 0.093 (from 0.760 to 0.853). The fusion model that produced the maximum performance improvement for the test cohort is RGM-RE3, and its AUC improvement can reach 0.027 (from 0.773 to 0.800). For CTM4 model, the fusion model that produced the maximum performance improvement for the training cohort is RGM-RX4, and its AUC improvement can reach 0.035 (from 0.832 to 0.867). The fusion model that produced the maximum performance improvement for the validation cohort is RGM-DM4, and its AUC improvement can reach 0.026 (from 0.756 to 0.782). The fusion model that produced the maximum performance improvement for the test cohort is RGM-RE4, and its AUC improvement can reach 0.034 (from 0.833 to 0.867). Overall, using Resnext18 as the backbone network of rawdata feature extraction can obtain the maximum average performance improvement on the three cohorts.
Image feature distribution of the RGMs and gain stability analysis
We performed t-SNE dimensionality reduction on all deep learning features obtained by different feature extraction networks and counted the true positives, false positives, true negatives and false negatives of each patient (Table 1). Besides, we assigned different colors and markers to visualize them in the same coordinate system (Fig. 3). It can be observed from the Fig. 3 that the results of the various RGMs within each CTM are relatively similar, even though they use different feature extraction networks. Table 1 also shows the same situation. The gain of the RGMS inside each CTM is approaching the same trend, such as improving malignant or benign detectable rates. Meanwhile, RGM-RE1, RGM-DN2, RGM-RE2, RGM-DN3, RGM-DN4, RGM-RE4, and RGM-RX4 could achieve a significant increase in the detectable rate of one category at the expense of a small number of the other category detectable rates.
Therefore, we calculated the optimization rate and error rate of each RGM for the CTM, and also calculated the proportion of at least 2 model optimizations to all optimization samples, which can reflect the stability of rawdata’s gain. All results were summarized in Table S2.
The results show that the analysis method incorporating the rawdata has a high optimization rate for the CTM 1∼3 and is greater than the error rate, which is also reflected in the improvement of AUC. In addition, although different feature extraction networks were used to analysis the rawdata, the proportion of at least two networks that can be optimized in each CTM is about 80%. Finally, we found that 7 samples were mispredicted within 4 CTMs. For these 7 samples, the income of rawdata can correct the prediction results of the 6 CTMs, and the corrected model exists in each CTM. In summary, the gain of the rawdata for the CTM is very stable.
Visual statistics and analysis of the RGMs
To better explain the prediction process of RGM, we visualized the region of most interest in the RGM by using Gradient-weighted Class Activation Mapping (Grad-CAM). The predictive results of RGM were most dependent on the information of the RGM-discovered suspicious areas. Fig. 4 illustrated the lesion masks and corresponding attention maps from different views of the rawdata. From Fig. 4, we can see that the RGMs can always focus on areas of lesion for prediction although the input data includes some non-lesion areas. We also calculated the average attention score of each voxel in lesion and non-lesion in rawdata (Method), and the result showed that the attention score of the lesion area was 1-2 times as high as the non-lesion area.
Stratified analysis of different malignant subgroups
The results of the subgroup analysis for age, sex and lesion size were shown in the Table S3 and Table 2.
In the subgroup with age of <= 60, RGM-RX 4 and CTM 4 achieved the similar highest model performance, with AUC of 0.837 (0.746-0.924) and 0.831 (0.749-0.904), respectively; In the subgroup with age of > 60, RGM-RX 4 achieved the highest model performance with an AUC of 0.845 (0.713-0.949), which outperformed the best CTM (CTM4 with an AUC of 0.790). In the male subgroup, CTM4 and RGM-RX 4 performed best, with a similar AUC of 0.804 (0.706-0.882) and 0.810 (0.707-0.897), respectively; In the female subgroup, RGM-RX 3 achieved the highest performance with an AUC of 0.885 (0.818-0.945), far exceeding the best CTM (CTM4 with an AUC of 0.823 (0.720-0.920)). In the subgroup with lesion size <=23mm, RGM-RX 3 achieved the highest model performance with an AUC of 0.847 (0.781-0.916), far exceeding the best CTM (CTM4 with an AUC of 0.806); In the subgroup of lesion size >23mm, CTM 4 and RGM-RE 4 showed the similar highest model performance, with AUC of 0.819 (0.719-0.906) and 0.833 (0.703-0.925), respectively. As for the lesion location subgroups, in addition to the similar performance of CTM 4 and RGM-RX 4 in the subgroup of superior lobe of left lung, RGM outperformed the CTM, with AUC of 0.840 vs. 0.812 in the subgroup of inferior lobe of left lung, 0.849 vs. 0.807 in the subgroup of superior lobe of right lung, and 0.872 vs. 0.843 in the subgroup of inferior lobe of right lung.
Discussion
In this prospective study, for the first time, we validated the potential value of rawdata in real clinical practice. Interestingly, the rawdata analysis showed comparable performance with CT images, which indicates that leveraging non-image information holds promise as an alternative to image-based methods. Moreover, the add value of rawdata to CT images was also confirmed in this study, which means that the combination of non-image and image data will further promote the advance of disease diagnosis. This study proposed and validated a feasible method for diagnosis without image reconstruction, and it has the potential to change existing imaging-based diagnosis and treatment strategies.
The classification of benign and malignant pulmonary nodules is a matter of great clinical concern[24][25][26]. This study explores the feasibility of rawdata analysis in classifying indeterminate lung nodules greater than 2 cm in size. The results indicated that rawdata can well discriminate malignant nodules from benign nodules. Meanwhile, the AUCs of the rawdata in the training cohort, validation cohort and test cohort are 0.768 (95%CI: 0.681∼0.851), 0.760 (95%CI: 0.558∼0.922) and 0.782 (95%CI: 0.592∼0.924), respectively (Extended Data Fig. 1), and there is no statistical difference between the performances of rawdata and CT, which means that the classification of lung nodules may not need image reconstruction and clinician participation. Think further, the rawdata model could be applied to the majority of grassroots hospitals, who have mainstream CT systems but lack technical personnel and clinicians.
Our study showed that the introduction of rawdata to CT had an overall improvement over different CTMs, no matter which backbone network was used. This indicates that rawdata has unique information which may be lost during the reconstruction processing. Moreover, the compared with CTM, RGMs showed better stability on the training cohort, validation cohort, and test cohort. The combination of both non-image and image data could make the model robust. In addition, we also performed intra-CT and inter-CT analyses. For intra-CTM, fused rawdata prediction has a higher optimization rate than the error rate which shows a similar gain trend, and about 80% of the optimized patients appear in at least 2 feature extraction networks. For inter-CTMs, eighty-five percent of the patients that all CTMs predicted incorrectly have optimizable RGMs within each CTM. The results also proved that the gain of rawdata is stable across different convolutional networks and different CTM approaches. Therefore, exploring the drawbacks of post-reconstructed CT image analysis and developing models for direct diagnosis from rawdata are the keys to future research. The results of the subgroup analysis showed that the RGMs performed better than the CTM in most subgroups, especially in the subgroup of older, female, and smaller lesion size, indicating that the rawdata could provide more valuable information that brings model gains in subgroups, while this information may have been lost in the process of CT reconstruction.
Our study has some limitations. First, this study involved a small number of patients and the proportion of positive and negative samples is unbalanced. Further study on large-scale multicenter datasets should be performed. Second, only patients with single nodule are included in this study, further validation of our method on patients with multiple nodules should be further studied. Third, although the rawdata had a comparable performance with CT, it still had a certain gap with the best CT diagnosis. There is an urgent need to develop novel AI methods specifically for rawdata.
Meanwhile, strong computing power is a problem that cannot be ignored when calculating rawdata. It is not realistic to read the complete high-frequency scanning data directly to the computing device. Designing appropriate pre -processing algorithms and building deep networks in combination with characteristics of rawdata are the potential breakthrough points in the future. Finally, the CT scan scheme is designed for image reconstruction and it may be not suitable for rawdata analysis. Therefore, novel scan strategies, e.g., scanning for specific diagnostic purposes, should be developed to maximize the gain of rawdata.
In conclusion, for the first time, we validated the potential value of rawdata in real clinical practice. The rawdata analysis showed comparable performance with CT images, which indicates that leveraging non-image information holds promise as an alternative to image-based methods. Moreover, the added value of rawdata to CT images was also confirmed in this study, which means that the combination of non-image and image will further promote the advance of disease diagnosis. This study proposed and validated a new feasible direction for diagnosis without image reconstruction, and it may facilitate the development of fully automated scanning and diagnostic processes.
Methods
Patients
In this prospective study, 626 consecutive patients who had a chest CT scan in the First Hospital of Jilin University from November 2019 to May 2021 were recruited. Eligible patients were included according to the following inclusion criteria: (i) patients who had a pulmonary lesion more than 2 cm with contrast enhanced chest CT scan, (ii) rawdata obtained from CT machine after the imaging examination, (iii)pathological diagnosis of pulmonary lesion with two weeks interval from CT scan. Patients were excluded on the basis of the following: (i) previous systemic antineoplastic treatments, (ii) CT images with poor image quality or unreadable scan. After exclusion, a total of 276 patients were included for modeling experiments.
The methods were performed in accordance with Standards for Reporting Diagnostic accuracy studies (STARD) and approved by the Ethics Committee of the First Hospital of Jilin University (AF-IRB-032-05).
Collection of CT image and rawdata
Both the CT images and rawdata were consecutively collected from the First Hospital of Ji Lin University and were acquired with a NeuViz Prime CT system (Neusoft Medical Systems Co., Ltd., Shenyang, China). The system parameters of CT scanner included source-to-isocenter distance of 570 mm, source-to-detector distance of 1040mm, and scanning FOV of 500 mm. The imaging protocol included contrast-enhanced CT of the chest with variable imaging parameters. Contrast-enhanced CT scans were performed at a spiral scan mode using 324 mA tube current, 100 kVp tube voltage, 0.5 s ration time, and 0.9 spiral pitch. CT rawdata were reconstructed using a kernel F20 at slice thickness of 1.0 mm with image pixel range from 0.59 mm to 0.98 mm and image matrix of 512 by 512. In addition, we also acquired the initial height and initial view angle of the CT detector each time the patient underwent a scan. Finally, CT images and rawdata from each scanner were randomly stratified into one of three cohorts in a 6:2:2 ratio: a training cohort, a validation cohort and a test cohort. All in all, Table S4 describes CT scanner information, system parameters and imaging parameters.
lesion segmentations in CT images
The segmentations of primary lesion were manually delineated across all the sections in the axial view using annotation tool in ISD (IntelliSpace Discovery, Philips, German). The regions of interest were annotated and reviewed by four radiologists with 8 to 25 years’ chest CT experience. All radiologists were blinded to any clinical or histopathologic information. The annotation was labeled as five common categories according lesions’ pulmonary lobe.
Realization of typical CT models
There are many studies on benign-malignant lung nodule classification on chest CT. We selected four typical papers from the major journals, including IEEE Transactions on medical imaging, Medical image analysis, and Nature medicine, which refer to multi-scale ensemble method (CTM1) [27], global and local information fusion method (CTM2) [26], loss function-based method (CTM3) [28], and multi-view fusion (CTM4) method [29]. We further performed experiments on four typical models with our dataset, and all the realization details are described in Supplementary 1.
Extraction of lesion region from raw data
After acquiring 4 CTMs, we proceeded to perform rawdata gain experiments. The first step of the experiment is to select projection surface containing lesions in the rawdata. The rawdata of CT scans contains three dimensions: 1) the index dimension representing the acquisition order; 2) the projection surface which is the detector receives the x-ray attenuation, where the channel and row directions are defined as x and y, respectively. All lesion segmentation regions of rawdata were derived from the binarized segmentation of CT image after being represented in a unified coordinate system. The complete derivation can be condensed into three steps: orientation, querying and mapping.
1) Orientation
On the derivation of localization, we took the segmented regions in the CT image as the research object, and the localization calculation mainly includes cross-sectional localization and height localization. For cross-sectional positioning, we set the center point between the CT source and detector as the coordinate origin (which is also the rotation center of the CT gantry), parallel to the cross-section of the CT image. Next, the motion trajectory can be characterized by the scan index t, the rotation radius r and the angle θ. In order to obtain the above parameters, we first read the origin coordinates (xorigin, yorigin and zorigin), and the offset values (xoffset and yoffset) can be calculated through voxel spacing and image size (xsize and ysize). Next, with the help of the offset values and the voxel coordinates xCT and yCT in the CT image, the distance from the coordinate origin (xlength and ylength) can be calculated as: Otherwise, the radius r, and the starting angle θ0 can be obtained. Meanwhile, by introducing the scanning period of the machine, we got the angle change Δθ with the following relationship: For height positioning, we directly obtained the initial height hstart of the voxel through the coordinate z, slice thickness and origin of the voxel point in CT images.
2) Querying
Our purpose in this step is to determine the interval of index dimension t in which tumor voxel appears in the raw data. Since there is a cone beam in the projection, we first calculated the change function harea of the voxel, where d is the distance from the voxel to the X-ray focal spot on the x-axis; l is the distance from the focal spot to the detector. The number of detector rows is ny ; the channel spacing along y-axis is Δny. Next, we determined the index (t) range of the voxel in the rawdata by the following inequality, where h0 is the initial height at which the detector start to scan; Δh is the height change in a scan. To reduce computational complexity, we first extracted the highest and lowest masks in the segmentation images, and calculated the start and end indices of two voxels. Then, we initially located the range of index dimensions. Within this interval, we computed the mapping result of voxels within the layer.
3) Mapping
Through the above calculation, we have obtained the index interval corresponding to the voxel, then the voxel appearing in index is obtained by calculating the projection data of the index layer by layer. The coordinates of each voxel on the projection surface are defined as xraw and yraw, respectively. Yraw is related to the height ht, harea at the t index and the number of detector rows ny in the detector, so we determined its height difference relative to the detector by the following formula, and then calculated its coordinates in the projection. Since the x-axis of the projection plane is equiangularly sampled, xraw can be acquired through the angle at the t index θt, the view angle θd of the detector, and the number of channels in the detector nx. After obtaining the segmentation files of lesions in the raw data, we saved the raw data segment through the initial index interval, and used this as the training data for this gain experiment. It should be added that there are different directions in the actual retrieval of raw data (From head to foot or foot to head). We used the same spatial relationship to modify the inequality for different directions and then located the lesion.
Construction of RGM based on CT images
To explore whether the rawdata contained unique information, we built residual fusion models through the rawdata and fused it with CTMs’ output to determine whether the rawdata could bring benefits. First, we built three feature extraction networks using the rawdata. Based on the memory need of calculation, we sampled the index dimension of rawdata fragments containing lesions to one-eighth, and the same size was resampling based on the average value by equal interval sampling. For the Channel dimension, we directly removed the data outside the reconstruction area from both sides, and resampling with the row dimension into half of the size. For model building, we did not modify Densenet121 [21], Resnet18 [22] and Resnext18 [23] in 3D with the purpose of directing the direct gain of nude data as much as possible. The training settings and parameters are detailed in Supplementary 2.
The core of the residual fusion model is to obtain the correction of the CT model output, and the origin of the idea is that the learning residual is easier which is mentioned in Resnet. The probabilities of predicting the patients as positive by the CTMs were fused with the predicted probabilities of the rawdata models. This fusion is performed during the training process. Specifically, the probability of predicting one patient as positive was calculated as: The probability of predicting one patient as negative was calculated as: After the output fusion of the CTM and rawdata model, the loss function was used to calculate the loss and optimize the model. The three feature extraction networks built with rawdata were fused with the four representative CTMs described above to obtain four raw gain models respectively, which were: raw gain model-Densenet121 (RGM-DN 1/2/3/4), raw gain model-Resnet18 (RGM-RE 1/2/3/4), raw gain model-Resnext18 (RGM-RX 1/2/3/4). Therefore, we obtain 12 raw gain models. Then the RGMs were compared with the CTMs to evaluate the benefits of the rawdata.
The calculation of the average attention score
For calculating the average attention score of each voxel, we first used the segmentation data of lesion in rawdata to obtain non-lesion area by unary complement. Next, we dotted and summed the segmentation data of lesion and non-lesion areas with the attention matrix. Finally, the average attention score was obtained by dividing the total amount of attention in the two areas by the number of voxels in the segmented regions, respectively. It should be noted that we also normalized the average attention score of the lesion area and the non-lesion area in each rawdata, so as to obtain a more intuitive comparison result.
Data Availability
The image features and output values associated with the CTMs and the RGMs are stored on GitHub (https://github.com/CASIAMI/rawdata_gain). The original data that support the findings of this study are available from the corresponding author upon reasonable request.
Data availability
The image features and output values associated with the CTMs and the RGMs are stored on GitHub (https://github.com/CASIAMI/rawdata_gain). The original data that support the findings of this study are available from the corresponding author upon reasonable request.
Code availability
Source code for related CT and raw data methods can be found from GitHub (https://github.com/CASIAMI/rawdata_gain).
Author contributions
B.X.H., Y.B.Z., C.X.S., T.T., K.S. and H.L.L. developed the network architecture and data/modeling infrastructure, training and testing setup. B.X.H., Y.B.Z., C.X.S., T.T., K.S. and H.L.L. wrote the methods. B.X.H., C.X.S. and T.T. created the figures. B.X.H. and M.J.F. performed statistical analysis. J.T., D.D., Z.Y.L., K.W., Z.Q.W., W.M., S.W., Z.C.T., S.T.Z., J.W.W. and L.Z.S. advised on the modeling techniques. L.X.T., L.W.W., S.P.Z. and J.T.L. provided raw data structure information. D.D., B.X.H., Y.B.Z., C.X.S., T.T., K.S. H.L.L., L.W.W. and Y.G. wrote the manuscript. H.M.Z., Y.G., M.W., F.Y.M., L.D., L.L.Z. and S.W. provided clinical expertise and guidance on the study design. H.M.Z., Y.G., M.W., F.Y.M. and L.D. created the clinical datasets, interpreted the data and defined the clinical labels. L.X.T., W.L.W., and F.H. created the rawdata sets. J.T., D.D., F.H. and H.M.Z. initiated the project and provided guidance on the concept and design. J.T., F.H. and H.M.Z. supervised the project.
Competing interests
The authors declare that they have no competing interests.
Figures and Tables
Supplement 1. Implementation and optimization details of typical CT models
Considering the differences in experimental design between our study and the typical models, we need to make appropriate modifications to make the typical models have the best performances on our dataset. The difference is mainly reflected in data volume and data imbalance. The amount of data in previous articles ranged from 1018 to more than 20,000, much larger than that in this experiment (276). Therefore, we applied small data learning related techniques (including pre-trained model and sharing network weight) for four typical CT models. Hence, in our study, a pre-trained 3D-resnet18 was used for 3D input, which was pre-trained using eight medical datasets [30], a pre-trained 2D-resnet50 was used for 2D input. To train the subnets effectively using our small dataset, only the last layer (layer 4) and full connected network were trained in this study. In addition, for data imbalance, we used resampling strategy during training.
Specific to each of these models, we describe them in detail below. CTM1 was a multi-scale ensemble model, which ensembled three subnets whose networks were all the same, except for the input size. The input data were cropped from CT images using three different sizes, 32×32×32, 48×48×48, and 64×64×64 pixels. The outputs of three subnets were weighted to obtain the final ensemble results, and a grid search was used to tune the weight values. In addition, AUC loss proposed in original paper was used for dealing with data imbalance. CTM2 combined the features of both the entire CT volume and the region of interest cropped from CT volume, so that all predictions relied on both nodule-level local information and global context from the entire CT volume. In addition, CTM2 was trained with focal loss to mitigate the data imbalance. CTM3 designed a deep network with a margin ranking loss to enhance the discrimination capability on ambiguous nodule cases. In our study, 3D input was used to preserve the spatial information of pulmonary nodule. CTM4 was a multi-view deep network, which ensembled the outputs of nine 2D-view subnets to characterize the 3D nodule. Each subnet combined three types of image patches, and each patch was input into a pre-trained ResNet-50 network. In this study, all networks shared weights. Finally, nine subnets were used jointly to classify nodules with an average weighting scheme. For model training, the batch size was set as 32. The cross-entropy loss was used as the loss function, and the SGD optimizer was applied. The start learning rates were set as 0.001, and the models were trained for 100 epochs.
Supplement 2. Model training and statistical analysis methods
For model building, the batch size was set as 64. The Cross-Entropy loss was used as the loss function, and the Adam optimizer was applied. The start learning rates were set as 1 × 10−5, 1 × 10−6 separately, and the learning rate decay was set as 0.001. Meanwhile, we also added weight attenuation. The weight decay was set as 0.01, 0.05, and 0.1 separately. The model was trained for 50 epoches and select the best parameters with the lowest loss of validation cohort. The three RGMs built based on the three structures were all trained with the above setting.
For statistical analysis, discrete variables and continuous variables are calculated in the chi-squared test and Man-Whitney U test, respectively. All models are implemented in Python 3.7.3 (https://www.python.org/) with Numpy (version ≥ 1.16.4), SciPy (version ≥ 1.3.0), Matplotlib (version ≥ 3.1.1), Scikit-Learn (version ≥ 1.10.1), Statsmodels (version ≥ 0.12.2) and Pandas (version ≥ 0.25.0). All models were trained in python package named Pytorch (version ≥ 1.10.1; https://pytorch.org/) with 4 Graphics Processing Units of NVIDIA TITAN RTX (24G).
Acknowledgements
This work was supported by the National Key R&D Program of China (2017YFA0205200), National Natural Science Foundation of China (82022036, 91959130, 81971776, 62027901, 81930053, 81771924), the Beijing Natural Science Foundation (Z20J00105), Strategic Priority Research Program of Chinese Academy of Sciences (XDB38040200), Chinese Academy of Sciences under Grant No. GJJSTD20170004 and QYZDJ-SSW-JSC005, the Project of High-Level Talents Team Introduction in Zhuhai City (Zhuhai HLHPTP201703), the Youth Innovation Promotion Association CAS (Y2021049) and the China Postdoctoral Science Foundation (2021M700341). The authors would like to acknowledge the instrumental and technical support of Multi-modal biomedical imaging experimental platform, Institute of Automation, Chinese Academy of Sciences.