From signal to knowledge: The diagnostic value of rawdata in artificial intelligence prediction of human data for the first time

Bingxi He; Yu Guo; Yongbei Zhu; Lixia Tong; Boyu Kong; Kun Wang; Caixia Sun; Hailin Li; Feng Huang; Liwei Wu; Meng Wang; Fanyang Meng; Le Dou; Kai Sun; Tong Tong; Zhenyu Liu; Ziqi Wei; Wei Mu; Shuo Wang; Zhenchao Tang; Shuaitong Zhang; Jingwei Wei; Lizhi Shao; Mengjie Fang; Juntao Li; Shouping Zhu; Lili Zhou; Shuo Wang; Di Dong; Huimao Zhang; Jie Tian

doi:10.1101/2022.08.01.22278299

Abstract

Recently, image-based diagnostic technology has made encouraging and astonishing development. Modern medical care and imaging technology are increasingly inseparable. However, the current diagnosis pattern of Signal-to-Image-to-Knowledge inevitably leads to information distortion and noise introduction in the procedure of image reconstruction (Signal-to-Image). Artificial intelligence (AI) technologies that can mine knowledge from vast amounts of data offer opportunities to disrupt established workflows. In this prospective study, for the first time, we developed an AI-based Signal-to-Knowledge diagnostic scheme for lung nodule classification directly from the CT rawdata (the signal). We found that the rawdata achieved almost comparable performance with CT indicating that we can diagnose diseases without reconstructing images. Meanwhile, the introduction of rawdata could greatly promote the performance of CT, demonstrating that rawdata contains some diagnostic information that CT does not have. Our results break new ground and demonstrate the potential for direct Signal-to-Knowledge domain analysis.

Introduction

The discovery of X-rays in 1895 ushered in a new era in the use of imaging for medical diagnostic purposes. Since then, the non-invasive medical imaging technology subverts the palpation and cut-and-see scheme [1]. The technological advances in medical imaging have been astounding over the past 120 years, and modern medical care is increasingly inseparable from imaging technology. Medical imaging is essential for humans, to allow clinicians to observe from the images and diagnose diseases. This process can be defined as a path of image-to-knowledge. However, recently, it is found that the human ability has become a bottleneck in this path hindering the accurate diagnosis and treatment of diseases [2][3].

The emergence of Artificial Intelligence (AI) technology partially solves the problem of the limited ability of humans in the diagnosis process [4][5][6][7]. AI could automatically mine the radiographic patterns that related to the occurrence and progression of diseases from the imaging data, and it has been shown to match and even surpass human abilities in many clinical applications [8][9][10][11][12]. The essential reason why AI could surpass humans may be that AI treats images as data rather than the visual image and extracts huge amounts of features for analysis [13][14]. However, the medical image is compressed or filtered data to fit the human eye, which may be insufficient or imperfect for diagnosis. Take computed tomography (CT) for example, the CT system first collects rawdata (signal) from the patient, then the reconstruction method converts rawdata to images (signal-to-image) [15]. Therefore, both AI-based and human-based diagnosis are processes of signal-to-image-to-knowledge. Medical images suffer from information distortion in both acquisition and reconstruction processes. The current high sampling frequency greatly compresses the influence of factors such as motion artifacts in the acquisition process, so the main reason for the loss of resolution is concentrated in the operations such as interpolation and sub-optimal statistical weighting in the reconstruction process [16]. Meanwhile, the unprocessed data size of rawdata is about 10 to 20 times larger than that of CT images (2GB compare with 180MB). The huge amount of information inside the rawdata is not optimally mined in current signal-to-image-to-knowledge process, and how to analysis rawdata is of great scientific interest.

Skipping the image process and going directly from signal to knowledge, will hopefully bring new breakthroughs in disease diagnosis. Inspired by this idea, several previous studies have talked about the potential value of analysis of rawdata [17][18][19], directly from signal to knowledge. De Man Q et.al. conducted a simulation experiment to detect and estimate the vessel centerline from rawdata in the sinogram domain [19]. They achieved encouraging initial results showing the feasibility of rawdata analysis for clinical CT analysis tasks. We have also reported our simulation results about lung cancer at American Association for Cancer Research (AACR) conference [20]. However, there is no study about signal-to-knowledge analysis in real clinical tasks of patients.

In this prospective study, for the first time, we developed an AI-based signal-to-knowledge diagnostic scheme for lung nodule classification directly from the CT rawdata (The flowchart was shown in Fig. 1). The value of rawdata alone (Discussion), as well as its added value to CT, are studied on 276 patients. We found that the rawdata achieved almost comparable performance with CT indicating that we can diagnose diseases without reconstructed images. Meanwhile, the introduction of rawdata could greatly promote the performance of CT, demonstrating that rawdata contains some diagnostic information that CT does not have. This research breaks the routinely used circle of image-based diagnosis, which may open up a new pathway of signal-to-knowledge for disease diagnosis.

Figure 1. Flow chart of rawdata gain experiment.

Results

Clinical characteristics

The clinical characteristics was summarized in Table S1. A total of 276 patients were included and the number of patients in the training cohort, validation cohort and test cohort were 166, 55 and 55, respectively. Fifty percent (n = 138) patients were female and the mean of age in the entire dataset was 58.48 years. Furthermore, there were 21 (8%) small cell carcinoma, 35 (13%) squamous cell carcinomas and 149 (54%) adenocarcinomas. With respect to lesion location, most patients were identified as right upper lobe (n = 89, 32%), followed by left lower lobe (n = 67, 24%) and left upper lobe (n = 64, 23%) in all patients. For lung cancer diagnosis, most patients (n = 225, 82%) were evaluated as malignancy.

Performance of CT model and rawdata gain model

This experiment explores the performance improvement that the residual fusion model (Methods) based on both rawdata and CT images can bring to the model based on CT images only. For further explore the repeatability and stability of this gain, we tested four different CT models (Abbreviation for CTM1∼CTM4; Methods) and adopted three backbone network architectures for rawdata feature extraction, namely Densenet121 (DN) [21], Resnet18 (RE) [22] and Resnext18 (RX) [23]. For each CT model (CTM), three rawdata gain models (RGM) based on different backbone feature extraction networks were constructed. The performance of each RGM was compared with the original CTM. The receiver operating characteristic curves (ROC) and its area under the curve (AUC) of four CTMs and the corresponding RGMs based on difference backbone networks is shown in Fig. 2.

Figure 2. CT model and raw data gain results.

(A) shows the ROC curves of each CTM and residual fusion model based on different backbone feature extraction networks; (B) Bar graph of reaction AUC gain. DM, Densenet121; RE, Resnet18, RX, Renext18.

For each CTM, the residual fusion models based on different backbone networks can obtain better classification performance on training, validation and test cohorts. For CTM1 model, the fusion model that produced the maximum performance improvement for the training cohort is RGM-RX1, and its AUC improvement can reach 0.051 (from 0.757 to 0.808). The fusion model that produced the maximum performance improvement for the validation cohort is RGM-RE1 and RGM-RX1, and its AUC improvement can reach 0.033 (from 0.756 to 0.789). The fusion model that produced the maximum performance improvement for the test cohort is RGM-RE1, and its AUC improvement can reach 0.046 (from 0.807 to 0.853). For CTM2 model, the fusion model that produced the maximum performance improvement for the training cohort is RGM-RX2, and its AUC improvement can reach 0.109 (from 0.745 to 0.854). The fusion model that produced the maximum performance improvement for the validation cohort is RGM-DM2, and its AUC improvement can reach 0.124 (from 0.698 to 0.822). The fusion model that produced the maximum performance improvement for the test cohort is RGM-DM2, and its AUC improvement can reach 0.022 (from 0.760 to 0.782). For CTM3 model, the fusion model that produced the maximum performance improvement for the training cohort is RGM-RX3, and its AUC improvement can reach 0.083 (from 0.765 to 0.848). The fusion model that produced the maximum performance improvement for the validation cohort is RGM-RX3, and its AUC improvement can reach 0.093 (from 0.760 to 0.853). The fusion model that produced the maximum performance improvement for the test cohort is RGM-RE3, and its AUC improvement can reach 0.027 (from 0.773 to 0.800). For CTM4 model, the fusion model that produced the maximum performance improvement for the training cohort is RGM-RX4, and its AUC improvement can reach 0.035 (from 0.832 to 0.867). The fusion model that produced the maximum performance improvement for the validation cohort is RGM-DM4, and its AUC improvement can reach 0.026 (from 0.756 to 0.782). The fusion model that produced the maximum performance improvement for the test cohort is RGM-RE4, and its AUC improvement can reach 0.034 (from 0.833 to 0.867). Overall, using Resnext18 as the backbone network of rawdata feature extraction can obtain the maximum average performance improvement on the three cohorts.

Image feature distribution of the RGMs and gain stability analysis

We performed t-SNE dimensionality reduction on all deep learning features obtained by different feature extraction networks and counted the true positives, false positives, true negatives and false negatives of each patient (Table 1). Besides, we assigned different colors and markers to visualize them in the same coordinate system (Fig. 3). It can be observed from the Fig. 3 that the results of the various RGMs within each CTM are relatively similar, even though they use different feature extraction networks. Table 1 also shows the same situation. The gain of the RGMS inside each CTM is approaching the same trend, such as improving malignant or benign detectable rates. Meanwhile, RGM-RE1, RGM-DN2, RGM-RE2, RGM-DN3, RGM-DN4, RGM-RE4, and RGM-RX4 could achieve a significant increase in the detectable rate of one category at the expense of a small number of the other category detectable rates.

View this table:

Table 1

Detailed result statistics of raw data for CT models

Figure 3. Feature distribution of the RGMs and the heatmap of prediction results.

(A) shows the feature distribution in the RGM, and different marks and colors reflect different authenticity and prediction categories. (B) shows the prediction results of all real categories, CTMs and RGMs.

Therefore, we calculated the optimization rate and error rate of each RGM for the CTM, and also calculated the proportion of at least 2 model optimizations to all optimization samples, which can reflect the stability of rawdata’s gain. All results were summarized in Table S2.

The results show that the analysis method incorporating the rawdata has a high optimization rate for the CTM 1∼3 and is greater than the error rate, which is also reflected in the improvement of AUC. In addition, although different feature extraction networks were used to analysis the rawdata, the proportion of at least two networks that can be optimized in each CTM is about 80%. Finally, we found that 7 samples were mispredicted within 4 CTMs. For these 7 samples, the income of rawdata can correct the prediction results of the 6 CTMs, and the corrected model exists in each CTM. In summary, the gain of the rawdata for the CTM is very stable.

Visual statistics and analysis of the RGMs

To better explain the prediction process of RGM, we visualized the region of most interest in the RGM by using Gradient-weighted Class Activation Mapping (Grad-CAM). The predictive results of RGM were most dependent on the information of the RGM-discovered suspicious areas. Fig. 4 illustrated the lesion masks and corresponding attention maps from different views of the rawdata. From Fig. 4, we can see that the RGMs can always focus on areas of lesion for prediction although the input data includes some non-lesion areas. We also calculated the average attention score of each voxel in lesion and non-lesion in rawdata (Method), and the result showed that the attention score of the lesion area was 1-2 times as high as the non-lesion area.

Figure 4. Lesion trajectory in rawdata and the Grad-Cam graphs of the RGMs.

Stratified analysis of different malignant subgroups

The results of the subgroup analysis for age, sex and lesion size were shown in the Table S3 and Table 2.

View this table:

Table 2.

The performance of 12 raw gain models in subgroup analysis. S

In the subgroup with age of <= 60, RGM-RX 4 and CTM 4 achieved the similar highest model performance, with AUC of 0.837 (0.746-0.924) and 0.831 (0.749-0.904), respectively; In the subgroup with age of > 60, RGM-RX 4 achieved the highest model performance with an AUC of 0.845 (0.713-0.949), which outperformed the best CTM (CTM4 with an AUC of 0.790). In the male subgroup, CTM4 and RGM-RX 4 performed best, with a similar AUC of 0.804 (0.706-0.882) and 0.810 (0.707-0.897), respectively; In the female subgroup, RGM-RX 3 achieved the highest performance with an AUC of 0.885 (0.818-0.945), far exceeding the best CTM (CTM4 with an AUC of 0.823 (0.720-0.920)). In the subgroup with lesion size <=23mm, RGM-RX 3 achieved the highest model performance with an AUC of 0.847 (0.781-0.916), far exceeding the best CTM (CTM4 with an AUC of 0.806); In the subgroup of lesion size >23mm, CTM 4 and RGM-RE 4 showed the similar highest model performance, with AUC of 0.819 (0.719-0.906) and 0.833 (0.703-0.925), respectively. As for the lesion location subgroups, in addition to the similar performance of CTM 4 and RGM-RX 4 in the subgroup of superior lobe of left lung, RGM outperformed the CTM, with AUC of 0.840 vs. 0.812 in the subgroup of inferior lobe of left lung, 0.849 vs. 0.807 in the subgroup of superior lobe of right lung, and 0.872 vs. 0.843 in the subgroup of inferior lobe of right lung.

Discussion

In this prospective study, for the first time, we validated the potential value of rawdata in real clinical practice. Interestingly, the rawdata analysis showed comparable performance with CT images, which indicates that leveraging non-image information holds promise as an alternative to image-based methods. Moreover, the add value of rawdata to CT images was also confirmed in this study, which means that the combination of non-image and image data will further promote the advance of disease diagnosis. This study proposed and validated a feasible method for diagnosis without image reconstruction, and it has the potential to change existing imaging-based diagnosis and treatment strategies.

The classification of benign and malignant pulmonary nodules is a matter of great clinical concern[24][25][26]. This study explores the feasibility of rawdata analysis in classifying indeterminate lung nodules greater than 2 cm in size. The results indicated that rawdata can well discriminate malignant nodules from benign nodules. Meanwhile, the AUCs of the rawdata in the training cohort, validation cohort and test cohort are 0.768 (95%CI: 0.681∼0.851), 0.760 (95%CI: 0.558∼0.922) and 0.782 (95%CI: 0.592∼0.924), respectively (Extended Data Fig. 1), and there is no statistical difference between the performances of rawdata and CT, which means that the classification of lung nodules may not need image reconstruction and clinician participation. Think further, the rawdata model could be applied to the majority of grassroots hospitals, who have mainstream CT systems but lack technical personnel and clinicians.

Our study showed that the introduction of rawdata to CT had an overall improvement over different CTMs, no matter which backbone network was used. This indicates that rawdata has unique information which may be lost during the reconstruction processing. Moreover, the compared with CTM, RGMs showed better stability on the training cohort, validation cohort, and test cohort. The combination of both non-image and image data could make the model robust. In addition, we also performed intra-CT and inter-CT analyses. For intra-CTM, fused rawdata prediction has a higher optimization rate than the error rate which shows a similar gain trend, and about 80% of the optimized patients appear in at least 2 feature extraction networks. For inter-CTMs, eighty-five percent of the patients that all CTMs predicted incorrectly have optimizable RGMs within each CTM. The results also proved that the gain of rawdata is stable across different convolutional networks and different CTM approaches. Therefore, exploring the drawbacks of post-reconstructed CT image analysis and developing models for direct diagnosis from rawdata are the keys to future research. The results of the subgroup analysis showed that the RGMs performed better than the CTM in most subgroups, especially in the subgroup of older, female, and smaller lesion size, indicating that the rawdata could provide more valuable information that brings model gains in subgroups, while this information may have been lost in the process of CT reconstruction.

Our study has some limitations. First, this study involved a small number of patients and the proportion of positive and negative samples is unbalanced. Further study on large-scale multicenter datasets should be performed. Second, only patients with single nodule are included in this study, further validation of our method on patients with multiple nodules should be further studied. Third, although the rawdata had a comparable performance with CT, it still had a certain gap with the best CT diagnosis. There is an urgent need to develop novel AI methods specifically for rawdata.

Meanwhile, strong computing power is a problem that cannot be ignored when calculating rawdata. It is not realistic to read the complete high-frequency scanning data directly to the computing device. Designing appropriate pre -processing algorithms and building deep networks in combination with characteristics of rawdata are the potential breakthrough points in the future. Finally, the CT scan scheme is designed for image reconstruction and it may be not suitable for rawdata analysis. Therefore, novel scan strategies, e.g., scanning for specific diagnostic purposes, should be developed to maximize the gain of rawdata.

In conclusion, for the first time, we validated the potential value of rawdata in real clinical practice. The rawdata analysis showed comparable performance with CT images, which indicates that leveraging non-image information holds promise as an alternative to image-based methods. Moreover, the added value of rawdata to CT images was also confirmed in this study, which means that the combination of non-image and image will further promote the advance of disease diagnosis. This study proposed and validated a new feasible direction for diagnosis without image reconstruction, and it may facilitate the development of fully automated scanning and diagnostic processes.

Methods

Patients

In this prospective study, 626 consecutive patients who had a chest CT scan in the First Hospital of Jilin University from November 2019 to May 2021 were recruited. Eligible patients were included according to the following inclusion criteria: (i) patients who had a pulmonary lesion more than 2 cm with contrast enhanced chest CT scan, (ii) rawdata obtained from CT machine after the imaging examination, (iii)pathological diagnosis of pulmonary lesion with two weeks interval from CT scan. Patients were excluded on the basis of the following: (i) previous systemic antineoplastic treatments, (ii) CT images with poor image quality or unreadable scan. After exclusion, a total of 276 patients were included for modeling experiments.

The methods were performed in accordance with Standards for Reporting Diagnostic accuracy studies (STARD) and approved by the Ethics Committee of the First Hospital of Jilin University (AF-IRB-032-05).

Collection of CT image and rawdata

Both the CT images and rawdata were consecutively collected from the First Hospital of Ji Lin University and were acquired with a NeuViz Prime CT system (Neusoft Medical Systems Co., Ltd., Shenyang, China). The system parameters of CT scanner included source-to-isocenter distance of 570 mm, source-to-detector distance of 1040mm, and scanning FOV of 500 mm. The imaging protocol included contrast-enhanced CT of the chest with variable imaging parameters. Contrast-enhanced CT scans were performed at a spiral scan mode using 324 mA tube current, 100 kVp tube voltage, 0.5 s ration time, and 0.9 spiral pitch. CT rawdata were reconstructed using a kernel F20 at slice thickness of 1.0 mm with image pixel range from 0.59 mm to 0.98 mm and image matrix of 512 by 512. In addition, we also acquired the initial height and initial view angle of the CT detector each time the patient underwent a scan. Finally, CT images and rawdata from each scanner were randomly stratified into one of three cohorts in a 6:2:2 ratio: a training cohort, a validation cohort and a test cohort. All in all, Table S4 describes CT scanner information, system parameters and imaging parameters.

lesion segmentations in CT images

The segmentations of primary lesion were manually delineated across all the sections in the axial view using annotation tool in ISD (IntelliSpace Discovery, Philips, German). The regions of interest were annotated and reviewed by four radiologists with 8 to 25 years’ chest CT experience. All radiologists were blinded to any clinical or histopathologic information. The annotation was labeled as five common categories according lesions’ pulmonary lobe.

Realization of typical CT models

There are many studies on benign-malignant lung nodule classification on chest CT. We selected four typical papers from the major journals, including IEEE Transactions on medical imaging, Medical image analysis, and Nature medicine, which refer to multi-scale ensemble method (CTM1) [27], global and local information fusion method (CTM2) [26], loss function-based method (CTM3) [28], and multi-view fusion (CTM4) method [29]. We further performed experiments on four typical models with our dataset, and all the realization details are described in Supplementary 1.

Extraction of lesion region from raw data

After acquiring 4 CTMs, we proceeded to perform rawdata gain experiments. The first step of the experiment is to select projection surface containing lesions in the rawdata. The rawdata of CT scans contains three dimensions: 1) the index dimension representing the acquisition order; 2) the projection surface which is the detector receives the x-ray attenuation, where the channel and row directions are defined as x and y, respectively. All lesion segmentation regions of rawdata were derived from the binarized segmentation of CT image after being represented in a unified coordinate system. The complete derivation can be condensed into three steps: orientation, querying and mapping.

1) Orientation

On the derivation of localization, we took the segmented regions in the CT image as the research object, and the localization calculation mainly includes cross-sectional localization and height localization. For cross-sectional positioning, we set the center point between the CT source and detector as the coordinate origin (which is also the rotation center of the CT gantry), parallel to the cross-section of the CT image. Next, the motion trajectory can be characterized by the scan index t, the rotation radius r and the angle θ. In order to obtain the above parameters, we first read the origin coordinates (x_origin, y_origin and z_origin), and the offset values (x_offset and y_offset) can be calculated through voxel spacing and image size (x_size and y_size). Next, with the help of the offset values and the voxel coordinates x_CT and y_CT in the CT image, the distance from the coordinate origin (x_length and y_length) can be calculated as: Otherwise, the radius r, and the starting angle θ₀ can be obtained. Meanwhile, by introducing the scanning period of the machine, we got the angle change Δθ with the following relationship: For height positioning, we directly obtained the initial height h_start of the voxel through the coordinate z, slice thickness and origin of the voxel point in CT images.

2) Querying

Our purpose in this step is to determine the interval of index dimension t in which tumor voxel appears in the raw data. Since there is a cone beam in the projection, we first calculated the change function h_area of the voxel, where d is the distance from the voxel to the X-ray focal spot on the x-axis; l is the distance from the focal spot to the detector. The number of detector rows is n_y ; the channel spacing along y-axis is Δn_y. Next, we determined the index (t) range of the voxel in the rawdata by the following inequality, where h₀ is the initial height at which the detector start to scan; Δh is the height change in a scan. To reduce computational complexity, we first extracted the highest and lowest masks in the segmentation images, and calculated the start and end indices of two voxels. Then, we initially located the range of index dimensions. Within this interval, we computed the mapping result of voxels within the layer.

3) Mapping

Through the above calculation, we have obtained the index interval corresponding to the voxel, then the voxel appearing in index is obtained by calculating the projection data of the index layer by layer. The coordinates of each voxel on the projection surface are defined as x_raw and y_raw, respectively. Y_raw is related to the height h_t, h_area at the t index and the number of detector rows n_y in the detector, so we determined its height difference relative to the detector by the following formula, and then calculated its coordinates in the projection. Since the x-axis of the projection plane is equiangularly sampled, x_raw can be acquired through the angle at the t index θ_t, the view angle θ_d of the detector, and the number of channels in the detector n_x. After obtaining the segmentation files of lesions in the raw data, we saved the raw data segment through the initial index interval, and used this as the training data for this gain experiment. It should be added that there are different directions in the actual retrieval of raw data (From head to foot or foot to head). We used the same spatial relationship to modify the inequality for different directions and then located the lesion.

Construction of RGM based on CT images

To explore whether the rawdata contained unique information, we built residual fusion models through the rawdata and fused it with CTMs’ output to determine whether the rawdata could bring benefits. First, we built three feature extraction networks using the rawdata. Based on the memory need of calculation, we sampled the index dimension of rawdata fragments containing lesions to one-eighth, and the same size was resampling based on the average value by equal interval sampling. For the Channel dimension, we directly removed the data outside the reconstruction area from both sides, and resampling with the row dimension into half of the size. For model building, we did not modify Densenet121 [21], Resnet18 [22] and Resnext18 [23] in 3D with the purpose of directing the direct gain of nude data as much as possible. The training settings and parameters are detailed in Supplementary 2.

The core of the residual fusion model is to obtain the correction of the CT model output, and the origin of the idea is that the learning residual is easier which is mentioned in Resnet. The probabilities of predicting the patients as positive by the CTMs were fused with the predicted probabilities of the rawdata models. This fusion is performed during the training process. Specifically, the probability of predicting one patient as positive was calculated as: The probability of predicting one patient as negative was calculated as: After the output fusion of the CTM and rawdata model, the loss function was used to calculate the loss and optimize the model. The three feature extraction networks built with rawdata were fused with the four representative CTMs described above to obtain four raw gain models respectively, which were: raw gain model-Densenet121 (RGM-DN 1/2/3/4), raw gain model-Resnet18 (RGM-RE 1/2/3/4), raw gain model-Resnext18 (RGM-RX 1/2/3/4). Therefore, we obtain 12 raw gain models. Then the RGMs were compared with the CTMs to evaluate the benefits of the rawdata.

The calculation of the average attention score

For calculating the average attention score of each voxel, we first used the segmentation data of lesion in rawdata to obtain non-lesion area by unary complement. Next, we dotted and summed the segmentation data of lesion and non-lesion areas with the attention matrix. Finally, the average attention score was obtained by dividing the total amount of attention in the two areas by the number of voxels in the segmented regions, respectively. It should be noted that we also normalized the average attention score of the lesion area and the non-lesion area in each rawdata, so as to obtain a more intuitive comparison result.

Data Availability

The image features and output values associated with the CTMs and the RGMs are stored on GitHub (https://github.com/CASIAMI/rawdata_gain). The original data that support the findings of this study are available from the corresponding author upon reasonable request.

https://github.com/CASIAMI/rawdata_gai

Data availability

Code availability

Source code for related CT and raw data methods can be found from GitHub (https://github.com/CASIAMI/rawdata_gain).

Author contributions

B.X.H., Y.B.Z., C.X.S., T.T., K.S. and H.L.L. developed the network architecture and data/modeling infrastructure, training and testing setup. B.X.H., Y.B.Z., C.X.S., T.T., K.S. and H.L.L. wrote the methods. B.X.H., C.X.S. and T.T. created the figures. B.X.H. and M.J.F. performed statistical analysis. J.T., D.D., Z.Y.L., K.W., Z.Q.W., W.M., S.W., Z.C.T., S.T.Z., J.W.W. and L.Z.S. advised on the modeling techniques. L.X.T., L.W.W., S.P.Z. and J.T.L. provided raw data structure information. D.D., B.X.H., Y.B.Z., C.X.S., T.T., K.S. H.L.L., L.W.W. and Y.G. wrote the manuscript. H.M.Z., Y.G., M.W., F.Y.M., L.D., L.L.Z. and S.W. provided clinical expertise and guidance on the study design. H.M.Z., Y.G., M.W., F.Y.M. and L.D. created the clinical datasets, interpreted the data and defined the clinical labels. L.X.T., W.L.W., and F.H. created the rawdata sets. J.T., D.D., F.H. and H.M.Z. initiated the project and provided guidance on the concept and design. J.T., F.H. and H.M.Z. supervised the project.

Competing interests

The authors declare that they have no competing interests.

Figures and Tables

Extended Data Fig. 1. The ROC curve of the model built by rawdata only.

View this table:

Table S1.

Clinical characteristic statistics

View this table:

Table S2

Detailed gain statistics of raw data for CT models

View this table:

Table S3.

The performance of four CT models in subgroup analysis.

View this table:

Table S4.

Scanner information, system parameters and imaging parameters for the Chest CT examinations

Supplement 1. Implementation and optimization details of typical CT models

Considering the differences in experimental design between our study and the typical models, we need to make appropriate modifications to make the typical models have the best performances on our dataset. The difference is mainly reflected in data volume and data imbalance. The amount of data in previous articles ranged from 1018 to more than 20,000, much larger than that in this experiment (276). Therefore, we applied small data learning related techniques (including pre-trained model and sharing network weight) for four typical CT models. Hence, in our study, a pre-trained 3D-resnet18 was used for 3D input, which was pre-trained using eight medical datasets [30], a pre-trained 2D-resnet50 was used for 2D input. To train the subnets effectively using our small dataset, only the last layer (layer 4) and full connected network were trained in this study. In addition, for data imbalance, we used resampling strategy during training.

Specific to each of these models, we describe them in detail below. CTM1 was a multi-scale ensemble model, which ensembled three subnets whose networks were all the same, except for the input size. The input data were cropped from CT images using three different sizes, 32×32×32, 48×48×48, and 64×64×64 pixels. The outputs of three subnets were weighted to obtain the final ensemble results, and a grid search was used to tune the weight values. In addition, AUC loss proposed in original paper was used for dealing with data imbalance. CTM2 combined the features of both the entire CT volume and the region of interest cropped from CT volume, so that all predictions relied on both nodule-level local information and global context from the entire CT volume. In addition, CTM2 was trained with focal loss to mitigate the data imbalance. CTM3 designed a deep network with a margin ranking loss to enhance the discrimination capability on ambiguous nodule cases. In our study, 3D input was used to preserve the spatial information of pulmonary nodule. CTM4 was a multi-view deep network, which ensembled the outputs of nine 2D-view subnets to characterize the 3D nodule. Each subnet combined three types of image patches, and each patch was input into a pre-trained ResNet-50 network. In this study, all networks shared weights. Finally, nine subnets were used jointly to classify nodules with an average weighting scheme. For model training, the batch size was set as 32. The cross-entropy loss was used as the loss function, and the SGD optimizer was applied. The start learning rates were set as 0.001, and the models were trained for 100 epochs.

Supplement 2. Model training and statistical analysis methods

For model building, the batch size was set as 64. The Cross-Entropy loss was used as the loss function, and the Adam optimizer was applied. The start learning rates were set as 1 × 10⁻⁵, 1 × 10⁻⁶ separately, and the learning rate decay was set as 0.001. Meanwhile, we also added weight attenuation. The weight decay was set as 0.01, 0.05, and 0.1 separately. The model was trained for 50 epoches and select the best parameters with the lowest loss of validation cohort. The three RGMs built based on the three structures were all trained with the above setting.

For statistical analysis, discrete variables and continuous variables are calculated in the chi-squared test and Man-Whitney U test, respectively. All models are implemented in Python 3.7.3 (https://www.python.org/) with Numpy (version ≥ 1.16.4), SciPy (version ≥ 1.3.0), Matplotlib (version ≥ 3.1.1), Scikit-Learn (version ≥ 1.10.1), Statsmodels (version ≥ 0.12.2) and Pandas (version ≥ 0.25.0). All models were trained in python package named Pytorch (version ≥ 1.10.1; https://pytorch.org/) with 4 Graphics Processing Units of NVIDIA TITAN RTX (24G).

Acknowledgements

This work was supported by the National Key R&D Program of China (2017YFA0205200), National Natural Science Foundation of China (82022036, 91959130, 81971776, 62027901, 81930053, 81771924), the Beijing Natural Science Foundation (Z20J00105), Strategic Priority Research Program of Chinese Academy of Sciences (XDB38040200), Chinese Academy of Sciences under Grant No. GJJSTD20170004 and QYZDJ-SSW-JSC005, the Project of High-Level Talents Team Introduction in Zhuhai City (Zhuhai HLHPTP201703), the Youth Innovation Promotion Association CAS (Y2021049) and the China Postdoctoral Science Foundation (2021M700341). The authors would like to acknowledge the instrumental and technical support of Multi-modal biomedical imaging experimental platform, Institute of Automation, Chinese Academy of Sciences.

References

[1].↵
Ciccarelli, E., Jacobs, A. & Berman, P. Looking back on the millennium in medicine. N Engl J Med 342, 42–49 (2000).
OpenUrl CrossRef PubMed Web of Science
[2].↵
Lauwerends, L.J., et al. Real-time fluorescence imaging in intraoperative decision making for cancer surgery. The Lancet Oncology 22, e186–e195 (2021).
OpenUrl
[3].↵
Lehman, C.D., et al. Diagnostic accuracy of digital screening mammography with and without computer-aided detection. JAMA internal medicine 175, 1828–1837 (2015).
OpenUrl
[4].↵
Bi, W.L., et al. Artificial intelligence in cancer imaging: clinical challenges and applications. CA: a cancer journal for clinicians 69, 127–157 (2019).
OpenUrl
[5].↵
Hosny, A., Parmar, C., Quackenbush, J., Schwartz, L.H. & Aerts, H.J. Artificial intelligence in radiology. Nature Reviews Cancer 18, 500–510 (2018).
OpenUrl CrossRef PubMed
[6].↵
Litjens, G., et al. A survey on deep learning in medical image analysis. Medical image analysis 42, 60–88 (2017).
OpenUrl CrossRef PubMed
[7].↵
Lambin, P., et al. Radiomics: the bridge between medical imaging and personalized medicine. Nature reviews Clinical oncology 14, 749–762 (2017).
OpenUrl
[8].↵
Liu, X., et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. The lancet digital health 1, e271–e297 (2019).
OpenUrl
[9].↵
Killock, D. AI outperforms radiologists in mammographic screening. Nature Reviews Clinical Oncology 17, 134–134 (2020).
OpenUrl
[10].↵
Cruz Rivera, S., Liu, X., Chan, A.-W., Denniston, A.K. & Calvert, M.J. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension. Nature medicine 26, 1351–1363 (2020).
OpenUrl CrossRef PubMed
[11].↵
Dong, D., et al. Deep learning radiomic nomogram can predict the number of lymph node metastasis in locally advanced gastric cancer: an international multicenter study. Annals of Oncology 31, 912–920 (2020).
OpenUrl
[12].↵
Huang, Y.-q., et al. Development and validation of a radiomics nomogram for preoperative prediction of lymph node metastasis in colorectal cancer. Journal of clinical oncology 34, 2157–2164 (2016).
OpenUrl Abstract/FREE Full Text
[13].↵
Mu, W., Schabath, M.B. & Gillies, R.J. Images Are Data: Challenges and Opportunities in the Clinical Translation of Radiomics. Cancer Research 82, 2066–2068 (2022).
OpenUrl
[14].↵
Gillies, R.J., Kinahan, P.E. & Hricak, H. Radiomics: images are more than pictures, they are data. Radiology 278, 563 (2016).
OpenUrl CrossRef PubMed
[15].↵
Zhu, B., Liu, J.Z., Cauley, S.F., Rosen, B.R. & Rosen, M.S. Image reconstruction by domain-transform manifold learning. Nature 555, 487–492 (2018).
OpenUrl CrossRef
[16].↵
Wang, G., Ye, J.C. & De Man, B. Deep learning for tomographic image reconstruction. Nature Machine Intelligence 2, 737–748 (2020).
OpenUrl
[17].↵
Kalra, M., Wang, G. & Orton, C.G. Radiomics in lung cancer: Its time is here. Medical physics 45, 997–1000 (2018).
OpenUrl
[18].↵
Wang, G., Ye, J.C., Mueller, K. & Fessler, J.A. Image reconstruction is a new frontier of machine learning. IEEE transactions on medical imaging 37, 1289–1296 (2018).
OpenUrl
[19].↵
De Man, Q., et al. A two-dimensional feasibility study of deep learning-based feature detection and characterization directly from CT sinograms. Medical physics 46, e790–e800 (2019).
OpenUrl
[20].↵
Dong, D., et al. Abstract CT274: Diagnosis based on signal: The first time break the routinely used circle of signal-to-image-to-diagnose. Cancer Research 80, CT274–CT274 (2020).
OpenUrl CrossRef
[21].↵
Huang, G., Liu, Z., Van Der Maaten, L. & Weinberger, K.Q. Densely connected convolutional networks. in Proceedings of the IEEE conference on computer vision and pattern recognition 4700-4708 (2017).
[22].↵
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. in Proceedings of the IEEE conference on computer vision and pattern recognition 770-778 (2016).
[23].↵
Xie, S., Girshick, R., Dollár, P., Tu, Z. & He, K. Aggregated residual transformations for deep neural networks. in Proceedings of the IEEE conference on computer vision and pattern recognition 1492-1500 (2017).
[24].↵
Shen, W., et al. Multi-crop convolutional neural networks for lung nodule malignancy suspiciousness classification. Pattern Recognition 61, 663–673 (2017).
OpenUrl CrossRef
[25].↵
Mukherjee, P., et al. A shallow convolutional neural network predicts prognosis of lung cancer patients in multi-institutional computed tomography image datasets. Nature machine intelligence 2, 274–282 (2020).
OpenUrl
[26].↵
Ardila, D., et al. End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nature medicine 25, 954–961 (2019).
OpenUrl CrossRef PubMed
[27].↵
Xu, X., et al. MSCS-DeepLN: Evaluating lung nodule malignancy using multi-scale cost-sensitive neural networks. Medical Image Analysis 65, 101772 (2020).
OpenUrl
[28].↵
Liu, L., Dou, Q., Chen, H., Qin, J. & Heng, P.-A. Multi-task deep model with margin ranking loss for lung nodule analysis. IEEE transactions on medical imaging 39, 718–728 (2019).
OpenUrl
[29].↵
Xie, Y., et al. Knowledge-based collaborative deep learning for benign-malignant lung nodule classification on chest CT. IEEE transactions on medical imaging 38, 991–1004 (2018).
OpenUrl
[30].↵
Chen, S., Ma, K. & Zheng, Y. Med3d: Transfer learning for 3d medical image analysis. arXiv preprint arxiv:1904.00625 (2019).

View the discussion thread.

Posted August 03, 2022.

Download PDF

Data/Code

Citation Tools

Subject Area

Radiology and Imaging

Subject Areas

All Articles

Addiction Medicine (399)
Allergy and Immunology (708)
Anesthesia (201)
Cardiovascular Medicine (2942)
Dentistry and Oral Medicine (334)
Dermatology (249)
Emergency Medicine (439)
Endocrinology (including Diabetes Mellitus and Metabolic Disease) (1036)
Epidemiology (12743)
Forensic Medicine (12)
Gastroenterology (828)
Genetic and Genomic Medicine (4583)
Geriatric Medicine (417)
Health Economics (729)
Health Informatics (2916)
Health Policy (1069)
Health Systems and Quality Improvement (1077)
Hematology (389)
HIV/AIDS (924)
Infectious Diseases (except HIV/AIDS) (14098)
Intensive Care and Critical Care Medicine (846)
Medical Education (425)
Medical Ethics (115)
Nephrology (469)
Neurology (4353)
Nursing (236)
Nutrition (639)
Obstetrics and Gynecology (805)
Occupational and Environmental Health (735)
Oncology (2268)
Ophthalmology (646)
Orthopedics (258)
Otolaryngology (325)
Pain Medicine (279)
Palliative Medicine (83)
Pathology (501)
Pediatrics (1196)
Pharmacology and Therapeutics (504)
Primary Care Research (496)
Psychiatry and Clinical Psychology (3755)
Public and Global Health (6937)
Radiology and Imaging (1527)
Rehabilitation Medicine and Physical Therapy (905)
Respiratory Medicine (915)
Rheumatology (437)
Sexual and Reproductive Health (443)
Sports Medicine (385)
Surgery (488)
Toxicology (60)
Transplantation (212)
Urology (180)

[1] [1].↵
Ciccarelli, E., Jacobs, A. & Berman, P. Looking back on the millennium in medicine. N Engl J Med 342, 42–49 (2000).
OpenUrl CrossRef PubMed Web of Science

[2] [2].↵
Lauwerends, L.J., et al. Real-time fluorescence imaging in intraoperative decision making for cancer surgery. The Lancet Oncology 22, e186–e195 (2021).
OpenUrl

[3] [3].↵
Lehman, C.D., et al. Diagnostic accuracy of digital screening mammography with and without computer-aided detection. JAMA internal medicine 175, 1828–1837 (2015).
OpenUrl

[4] [4].↵
Bi, W.L., et al. Artificial intelligence in cancer imaging: clinical challenges and applications. CA: a cancer journal for clinicians 69, 127–157 (2019).
OpenUrl

[5] [5].↵
Hosny, A., Parmar, C., Quackenbush, J., Schwartz, L.H. & Aerts, H.J. Artificial intelligence in radiology. Nature Reviews Cancer 18, 500–510 (2018).
OpenUrl CrossRef PubMed

[6] [6].↵
Litjens, G., et al. A survey on deep learning in medical image analysis. Medical image analysis 42, 60–88 (2017).
OpenUrl CrossRef PubMed

[7] [7].↵
Lambin, P., et al. Radiomics: the bridge between medical imaging and personalized medicine. Nature reviews Clinical oncology 14, 749–762 (2017).
OpenUrl

[8] [8].↵
Liu, X., et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. The lancet digital health 1, e271–e297 (2019).
OpenUrl

[9] [9].↵
Killock, D. AI outperforms radiologists in mammographic screening. Nature Reviews Clinical Oncology 17, 134–134 (2020).
OpenUrl

[10] [10].↵
Cruz Rivera, S., Liu, X., Chan, A.-W., Denniston, A.K. & Calvert, M.J. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension. Nature medicine 26, 1351–1363 (2020).
OpenUrl CrossRef PubMed

[11] [11].↵
Dong, D., et al. Deep learning radiomic nomogram can predict the number of lymph node metastasis in locally advanced gastric cancer: an international multicenter study. Annals of Oncology 31, 912–920 (2020).
OpenUrl

[12] [12].↵
Huang, Y.-q., et al. Development and validation of a radiomics nomogram for preoperative prediction of lymph node metastasis in colorectal cancer. Journal of clinical oncology 34, 2157–2164 (2016).
OpenUrl Abstract/FREE Full Text

[13] [13].↵
Mu, W., Schabath, M.B. & Gillies, R.J. Images Are Data: Challenges and Opportunities in the Clinical Translation of Radiomics. Cancer Research 82, 2066–2068 (2022).
OpenUrl

[14] [14].↵
Gillies, R.J., Kinahan, P.E. & Hricak, H. Radiomics: images are more than pictures, they are data. Radiology 278, 563 (2016).
OpenUrl CrossRef PubMed

[15] [15].↵
Zhu, B., Liu, J.Z., Cauley, S.F., Rosen, B.R. & Rosen, M.S. Image reconstruction by domain-transform manifold learning. Nature 555, 487–492 (2018).
OpenUrl CrossRef

[16] [16].↵
Wang, G., Ye, J.C. & De Man, B. Deep learning for tomographic image reconstruction. Nature Machine Intelligence 2, 737–748 (2020).
OpenUrl

[17] [17].↵
Kalra, M., Wang, G. & Orton, C.G. Radiomics in lung cancer: Its time is here. Medical physics 45, 997–1000 (2018).
OpenUrl

[18] [18].↵
Wang, G., Ye, J.C., Mueller, K. & Fessler, J.A. Image reconstruction is a new frontier of machine learning. IEEE transactions on medical imaging 37, 1289–1296 (2018).
OpenUrl

[19] [19].↵
De Man, Q., et al. A two-dimensional feasibility study of deep learning-based feature detection and characterization directly from CT sinograms. Medical physics 46, e790–e800 (2019).
OpenUrl

[20] [20].↵
Dong, D., et al. Abstract CT274: Diagnosis based on signal: The first time break the routinely used circle of signal-to-image-to-diagnose. Cancer Research 80, CT274–CT274 (2020).
OpenUrl CrossRef

[21] [21].↵
Huang, G., Liu, Z., Van Der Maaten, L. & Weinberger, K.Q. Densely connected convolutional networks. in Proceedings of the IEEE conference on computer vision and pattern recognition 4700-4708 (2017).

[22] [22].↵
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. in Proceedings of the IEEE conference on computer vision and pattern recognition 770-778 (2016).

[23] [23].↵
Xie, S., Girshick, R., Dollár, P., Tu, Z. & He, K. Aggregated residual transformations for deep neural networks. in Proceedings of the IEEE conference on computer vision and pattern recognition 1492-1500 (2017).

[24] [24].↵
Shen, W., et al. Multi-crop convolutional neural networks for lung nodule malignancy suspiciousness classification. Pattern Recognition 61, 663–673 (2017).
OpenUrl CrossRef

[25] [25].↵
Mukherjee, P., et al. A shallow convolutional neural network predicts prognosis of lung cancer patients in multi-institutional computed tomography image datasets. Nature machine intelligence 2, 274–282 (2020).
OpenUrl

[26] [26].↵
Ardila, D., et al. End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nature medicine 25, 954–961 (2019).
OpenUrl CrossRef PubMed

[27] [27].↵
Xu, X., et al. MSCS-DeepLN: Evaluating lung nodule malignancy using multi-scale cost-sensitive neural networks. Medical Image Analysis 65, 101772 (2020).
OpenUrl

[28] [28].↵
Liu, L., Dou, Q., Chen, H., Qin, J. & Heng, P.-A. Multi-task deep model with margin ranking loss for lung nodule analysis. IEEE transactions on medical imaging 39, 718–728 (2019).
OpenUrl

[29] [29].↵
Xie, Y., et al. Knowledge-based collaborative deep learning for benign-malignant lung nodule classification on chest CT. IEEE transactions on medical imaging 38, 991–1004 (2018).
OpenUrl

[30] [30].↵
Chen, S., Ma, K. & Zheng, Y. Med3d: Transfer learning for 3d medical image analysis. arXiv preprint arxiv:1904.00625 (2019).

From signal to knowledge: The diagnostic value of rawdata in artificial intelligence prediction of human data for the first time

Abstract

Introduction

Results

Clinical characteristics

Performance of CT model and rawdata gain model

Image feature distribution of the RGMs and gain stability analysis

Visual statistics and analysis of the RGMs

Stratified analysis of different malignant subgroups

Discussion

Methods

Patients

Collection of CT image and rawdata

lesion segmentations in CT images

Realization of typical CT models

Extraction of lesion region from raw data

1) Orientation

2) Querying

3) Mapping

Construction of RGM based on CT images

The calculation of the average attention score

Data Availability

Data availability

Code availability

Author contributions

Competing interests

Figures and Tables

Supplement 1. Implementation and optimization details of typical CT models

Supplement 2. Model training and statistical analysis methods

Acknowledgements

References

Citation Manager Formats

Subject Area