Abstract
New techniques for individualized assessment of white matter integrity are needed to detect traumatic axonal injury (TAI) and predict outcomes in critically ill patients with acute severe traumatic brain injury (TBI). Diffusion MRI tractography has the potential to quantify white matter microstructure in vivo and has been used to characterize tract-specific changes in TBI. However, tractography is not routinely used in the clinical setting to assess the extent of TAI, in part because focal lesions reduce the robustness of automated methods. In the present study, we propose an automated pipeline for individualized assessment of white matter damage in patients with acute severe TBI that is based on tractography reconstructions and multivariate analysis of along-tract diffusion metrics. We used the Mahalanobis distance to test for a deviation of diffusion metrics along 40 white matter tracts in 18 patients with acute severe TBI as compared to 33 healthy subjects. The automated pipeline successfully reconstructed white matter tracts in all patients with a successful FreeSurfer anatomical segmentation (17 of 18 patients), including 13 with focal lesions. In these 17 patients, a mean of 37.5 +/- 2.1 tracts were reconstructed without the need for manual intervention and a mean of 2.47 +/- 2.1 needed to be reinitialized upon visual inspection. The pipeline detected at least one injured tract in all patients (mean: 12.6, SD: 7.6). The number and neuroanatomic location of disrupted tracts varied across patients and levels of consciousness. The pre-motor, parietal, and temporal sections of the corpus callosum were the structures most frequently injured (in 10, 9, and 8 patients respectively), consistent with histological studies of TAI. Multivariate measures of TAI did not show a significant association with behavioral measures of consciousness, or subscales measuring basic language and motor function. In summary, we provide proof- of-principle evidence that an automated tractography pipeline can be used to detect and quantify TAI in patients with acute severe head trauma and could therefore assist in the clinical assessment of TAI.
1. Introduction
Traumatic axonal injury (TAI) is the most common pathologic substrate of head trauma (Brody et al., 2015; McGinn & Povlishock, 2016) and has a strong association with adverse clinical outcomes (Benson et al., 2007; Gennarelli et al., 1982; Moe et al., 2020; Newcombe et al., 2010). TAI is caused by high- velocity translational and rotational forces that stretch and shear axons, leading to either primary axotomy or a secondary cascade that can result in axonal degeneration (Hill et al., 2016). TAI lesions are typically multifocal, with a neuroanatomic distribution that varies based on the mechanism of injury (Adams et al., 1989; Blumbergs et al., 1995; Johnson et al., 2013). For example, individuals in high-speed motor vehicle accidents are more likely to experience TAI in the deep subcortical white matter than are people who are assaulted, whose burden of TAI is more likely to be limited to the hemispheric white matter (Adams et al., 1989; Johnson et al., 2013). Neuroimaging studies that compare patients at the group level are insensitive to these variations in lesion location and therefore are suboptimal for assessing individual patients in clinical care (Betz et al., 2012). For clinicians and the families of patients with acute severe traumatic brain injury (TBI), the inability to measure the burden of TAI creates prognostic uncertainty and complicates discussions about continuation of life-sustaining therapy in the intensive care unit (ICU) (Izzy et al., 2013; Turgeon et al., 2011). New techniques for individualized assessment of white matter (WM) injury (Jolly et al., 2020; Kim et al., 2013; Yuh et al., 2014) are therefore needed to enhance detection of TAI and improve the accuracy of outcome prediction in critically ill patients with acute severe TBI.
Due to its diffuse and microscopic nature (Adams et al., 1989; Blumbergs et al., 1995; Johnson et al., 2013), TAI is challenging to detect and quantify using routine imaging modalities, such as CT and conventional MRI (Betz et al., 2012). Diffusion MRI (dMRI) possesses the unique ability to probe WM tissue microstructure non-invasively (Basser et al., 1994), making it possible to identify TAI in vivo (Hashim et al., 2017; Li et al., 2011; Xu et al., 2007). Numerous studies have used measures obtained from the diffusion tensor (DT) model to assess WM integrity in patients with TBI (Arfanakis et al., 2002; Asken et al., 2017; Galanaud et al., 2012; Hulkower et al., 2013; Inglese et al., 2005; Newcombe et al., 2007; Zhang et al., 2017) and have shown its sensitivity to detect WM abnormalities across severities (i.e., mild, moderate, severe) and recovery phases (i.e., acute, subacute, chronic). Typically, summary DT metrics such as fractional anisotropy (FA) and mean diffusivity (MD) are extracted from WM regions of interest (ROIs) that are manually drawn in subject’s space (Inglese et al., 2005; Mac Donald et al., 2011) or segmented from an atlas (Galanaud et al., 2012; O’Phelan et al., 2018). Whole-brain voxel-based analysis (VBA) (Van Hecke et al., 2016) and tract-based spatial statistics (Smith et al., 2006) approaches are also used to compare normalized DT scalar maps between patient groups (Kinnunen et al., 2011), or between a control group and a patient (Jolly et al., 2020). While these approaches provide flexibility to extract DT metrics from WM regions and have demonstrated sensitivity to TAI (Jolly et al., 2020; Perlbarg et al., 2009), they do not localize injury along the trajectory of white matter tracts at the individual level. Moreover, these methods rely on accurate registration between patients and an atlas, which can be challenging in the presence of lesions and may not reflect interindividual anatomical differences. They also may involve manual drawing of ROIs, which can be time-consuming and infeasible in clinical contexts.
Diffusion-based tractography allows for the delineation and visualization of the 3D profiles of WM pathways in subject space. Tractography methods provide measurements of the magnitude and distribution of axonal damage along specific WM tracts, and thus may have additional prognostic utility for guiding early decisions about goals of care and continuation of life-sustaining therapy. Indeed, a patient’s potential to recover consciousness and cognitive function depends upon the integrity of brain networks probed by tractography (Edlow et al., 2021). However, few studies have utilized tractography in patients with acute TBI (D’Souza et al., 2015; Ordóñez-Rubiano et al., 2017; Snider et al., 2019; Wang et al., 2008; Warner et al., 2010) and even fewer have focused on critically ill patients with acute severe TBI (Ordóñez-Rubiano et al., 2017; Snider et al., 2019; Wang et al., 2008). dMRI tractography techniques are not routinely used in the clinical setting to assess the extent of TAI (Schweitzer et al., 2019) because the presence of focal lesions is often a challenge for these techniques.
We aimed to test a robust pipeline for individualized TAI assessment in patients with acute severe TBI that combines automated reconstruction of WM tracts in subject space with a multivariate analysis of along-tract DT metrics. Several considerations are important in this application. Manual delineation of ROIs for tractography is time-intensive and operator-dependent (Rheault et al., 2020; Schilling et al., 2021). Automated tractography methods that use ROIs from an atlas rely on accurate registration between the patient and the atlas, which can be difficult to achieve in the presence of focal lesions (Crinion et al., 2007; Ripollés et al., 2012). Along-tract measures can provide higher sensitivity than diffusion metrics averaged over the entire extent of the tract (Colby et al., 2012; Pieri et al., 2021). Finally, local tractography methods, which reconstruct a tract step by step, by considering only the local diffusion orientation at each step, can be confounded by lesions and unable to step through them.
We used TRACULA (TRActs Constrained by UnderLying Anatomy), a method for global probabilistic tractography with anatomical neighborhood priors (Yendiki et al., 2011), which allowed us to automatically reconstruct 40 WM tracts at the individual level (Maffei et al., 2021). The anatomical neighborhood priors in TRACULA encode information about the relative position of the tracts with respect to their surrounding anatomical structures, rather than their absolute coordinates in an atlas space. Therefore, TRACULA does not necessitate accurate registration to an atlas (Zollei et al, 2019; Maffei et al., 2021). It uses global tractography (i.e., it represents the tract as a spline and fits its shape to the diffusion orientations from the entire brain). Thus, TRACULA would not “stop” at a lesion the way that a local tractography method might. We previously showed that TRACULA, when trained on very high-quality data from the Human Connectome Project (HCP) (Fan et al., 2015), improves the accuracy of tractography reconstructions in routine quality data when compared to a common multi-ROI tractography approach (Maffei et al., 2021). These methodologic advantages of TRACULA suggest potential for translation of this technique to critically ill patients with acute severe TBI, who are not medically stable enough to be scanned with a time-intensive, high-quality dMRI sequence.
We applied TRACULA to a diffusion MRI dataset prospectively acquired in a cohort of 18 critically ill patients with acute severe TBI who were imaged on a clinical 3 Tesla MRI scanner. We used TRACULA to identify the extent and location of injured WM tracts in each patient compared to 33 healthy subjects. We focused on along-tract measures of FA and MD, as these microstructural measures are the most commonly studied in the field of TBI (Hulkower et al., 2013). The goals of this study were: i) to demonstrate the feasibility of applying the TRACULA automated tractography stream (Maffei et al., 2021; Yendiki et al., 2011) in patients with acute severe TBI, including those with focal lesions, using dMRI data acquired on a clinical scanner; ii) to implement an individualized multivariate approach to measure patient-specific TAI severity, using reconstructions derived from the automated TRACULA tractography pipeline; iii) to test for associations between bundle-specific dMRI measures of TAI and behavioral measures of consciousness; and iii) to explore the relationship between TAI within domain-specific tracts and subscales measuring basic language and motor function.
2. Methods
2.1 Participants
We prospectively enrolled 18 patients (mean age: 28.6 +/- 8.7 years, 13 male) with acute severe TBI, sixteen of whom were enrolled as part of a previously described pilot study (Edlow et al., 2017), and two of whom were subsequently enrolled as pilot cases for an ongoing observational study (ClinicalTrials.gov NCT03504709) (See Table 2 for additional information). Inclusion criteria were: 1) age 18 to 65 years; and 2) traumatic coma, defined by Glasgow Coma Scale (GCS) (Teasdale & Jennett, 1974) total score of six without eye opening on at least one neurologic examination before ICU admission; and 3) no eye opening for 24 hours after injury. Exclusion criteria were: 1) prior history of severe brain injury or neurodegenerative disease; 2) life expectancy less than six months per physician judgment; 3) presence of metal contraindicating MRI; and/or 4) no pre-injury English fluency.
We also enrolled 33 sex- and age-matched healthy control subjects (mean age 31.2 years, 21 male) with no history of neurological, psychiatric, cardiovascular, pulmonary, renal or endocrinological disease (Edlow et al., 2017). We confirmed that age was not statistically different between the two groups using a nonparametric Wilcoxon sum rank test in Python 3 using SciPy Stats 1.3.1 (Jones et al., 2001) (W= −1.116, p= 0.2643). Written informed consent was obtained from healthy subjects and from patients’ surrogate decision-makers in accordance with a study protocol approved by the Mass General Brigham Institutional Review Board.
2.2 Standardized Assessment
Immediately prior to MRI, each patient’s level of consciousness was prospectively assessed by a neurologist on the investigator team (B.L.E.) via behavioral evaluation with the GCS (Teasdale & Jennett, 1974) and the Coma Recovery Scale-Revised (CRS-R) (Giacino et al., 2004). Per routine ICU care at our institution, neurological examinations and GCS assessments were performed off of sedation every two- to-four hours (depending on clinical stability and safety considerations) by treating clinicians. ICU clinician GCS assessments were used to determine time from injury to coma emergence (i.e., eye opening or localization to noxious stimulation) and time to command-following. If a patient’s neurological examination fluctuated, the first behavioral evidence of coma emergence and command-following were reported.
2.3 MRI Data Acquisition
MRI data were acquired as soon as the treating clinicians deemed patients stable for transport. Diffusion weighted imaging data were acquired in the ICU on a 3T Skyra scanner (Siemens Medical Solutions, Malvern, PA) with a 32-channel head coil. Three healthy subjects and eight patients were scanned using an echo-planar imaging (EPI) sequence with the following parameters: 2x2x2 mm, 60 b= 2,000 s/ mm2 and 10 b= 0 s/ mm2 volumes, TR= 13,700 ms, and TE= 98 ms. Thirty healthy subjects and 8 patients were scanned using an EPI sequence with simultaneous multi-slice (SMS) acceleration (Setsompop et al., 2012) and the following parameters: 2x2x2 mm, 60 b= 2000 s/mm2 and 10 b= 0 s/ mm2, TR= 6,700 ms, TE= 100 ms, acceleration factor = 2. Previous work from our group showed that diffusion-derived connectivity metrics did not differ significantly between these two acquisitions (Snider et al., 2019). The length of the diffusion sequence was 8 minutes and 46 seconds. Multi-echo MPRAGE structural images were also acquired with the following parameters: 1x1x1 mm, acquisition matrix = 256 × 256, TE= 1.69 ms, TR 2530 ms, flip angle= 7°. All patients and healthy subjects were imaged using the same scanner and head coil.
2.4 Diffusion MRI Data Processing
Diffusion weighted images were skull stripped and corrected for eddy current distortions and movement in FSL 6.0.1 (Andersson & Sotiropoulos, 2015). The tensor and the ball and stick model were fit to the data using DTIFIT and BEDPOSTX (Behrens et al., 2003) in FSL 6.0.1, respectively. Automated reconstruction of 40 major WM pathways was performed using the global probabilistic tractography algorithm TRACULA in FreeSurfer 7.2.0 (Maffei et al., 2021; Yendiki et al., 2011). Although TRACULA can reconstruct a total of 42 WM tracts, we decided to exclude the right and left fornix, because optimal identification of these WM bundles requires segmentation of thalamic subnuclei that are not provided by the standard “recon-all” FreeSurfer processing pipeline. The mathematical formulation of TRACULA has been described elsewhere (Yendiki et al., 2011, 2016). Briefly, the algorithm models a pathway as a cubic spline, which is initialized with the median streamline of the training set. A random sampling algorithm is used to draw samples from the posterior probability distribution of the pathway, which is decomposed into the likelihood term and the prior term. The likelihood term fits the shape of the spline to the diffusion orientations obtained from the ball-and-stick model in the voxels that it goes through. The prior term fits the shape of the spline to its anatomical neighborhood, given the manually labeled examples of this pathway from the training set and the anatomical segmentation volumes of both test and training subjects. The number of control points of the cubic spline were chosen as previously described (Maffei et al., 2021). Along-tract tensor-derived metrics obtained from DTIFIT were extracted for each of the reconstructed tracts using a pointwise assessment of streamline tractography attributes (PASTA) (Jones et al., 2005). For each of the 40 tracts, 1D along-tract profiles of FA and MD were generated by projecting the value of each measure from every point on every automatically reconstructed streamline to its nearest point on a reference streamline, as previously described (Maffei et al., 2021). The reference streamline is the mean of the training streamlines for each tract, ensuring that all subject data were sampled at the same number of cross-sections along a given bundle. Before extracting the diffusion measures, the posterior probability distribution estimated by TRACULA was thresholded by masking out all values below 20% of the maximum, which is the default threshold in TRACULA (Maffei et al., 2021; Yendiki et al., 2011; Zöllei et al., 2019). All the pre-processing steps detailed in this section are performed automatically by the TRACULA pipeline.
2.5 Structural MRI Data Processing
The anatomical segmentation that was needed to compute the anatomical neighborhood priors in TRACULA was obtained by analyzing the structural T1-weighted image in FreeSurfer 7.1.1 (Dale et al., 1999; Fischl, 2012). The automated “recon-all” pipeline was run with default parameters, except for the “bigventricle” option, which was added to optimize segmentation in patients that may have enlarged ventricles post-injury. The obtained segmentation volumes included a combination of the Desikan-Killiany cortical parcellation labels (Desikan et al., 2006) and the standard FreeSurfer subcortical segmentation. To compute the prior probabilities on the anatomical neighbors of the tracts using TRACULA, the anatomical segmentations were transformed to the subject’s individual dMRI space. This within-subject, dMRI-to-T1 alignment was performed using a boundary-based, affine registration method (Greve & Fischl, 2009). To ensure that the relative position of the anatomical structures was the same for all subjects, and to map the median streamline from the training data to the subject during initialization, all subjects were mapped onto a template brain. We used a non-linear SyN registration in ANTs (Avants et al., 2008) to map each subject onto an FA template constructed from the training dataset (Maffei et al., 2021). It is important to highlight that this step is only used to initialize the reconstruction of the tract, which is then refined by fitting it to the anatomy of the individual subject.
2.6 Quality assessment of tract reconstructions
To assess the feasibility of implementing TRACULA in a clinical setting, we performed a qualitative assessment to evaluate the accuracy of the tract reconstructions in patients with acute severe TBI. We considered the application of TRACULA in this population to be feasible if more than 50% of the automated reconstructions were successful. For this qualitative assessment, a study investigator (C.M.) visually inspected all 40 tracts for all patients and healthy subjects. Reconstructions were considered successful if tracts traversed the relevant WM regions and reached the cortical regions that were used as inclusion ROIs in the protocols defined to manually label the training set. These protocols were carefully defined on the known anatomy of each tract and are detailed in (Maffei et al., 2021).
When a focal lesion was present, the reconstruction was considered successful if the tract reached the cortical termination regions delineated in (Maffei et al., 2021), even if the tract’s course was altered by the presence of the lesion. As TRACULA uses a global probabilistic approach to estimate the tracts, post-processing steps to remove false positive reconstructions were not necessary. However, when a suitable solution for the initial reconstruction of the pathway is not found, the tract appears as a single curve (See Figure 1 for a simulated example). This can be due to errors in the subject-to-template registration step, or to a misplacement of the FreeSurfer segmentation labels in the presence of lesions. We refer to these reconstructions as “failed reconstructions”. Moreover, as a single threshold is used across all tracts and groups to threshold the posterior probability of the tract, some reconstructions can result in incomplete tracts or in unusually small tracts for which many voxels did not survive the threshold (See Figure 1 for an example). We refer to these reconstructions as “partial reconstructions”. We reran TRACULA on the tracts that resulted in either failed or partial reconstructions by reinitializing the control points of the initial spline, by setting the reinit parameter to 1 (Yendiki et al., 2011). We retained the re- initialized reconstructions if they resulted in correct tracts as defined at the beginning of this section. The same criteria were applied to patients and controls. This was the only manual intervention performed in the TRACULA pipeline. Tracts that had partial or failed reconstructions after reinitialization were excluded from subsequent analyses. We performed individualized assessment of TAI using the remaining tracts, including both the tracts that had been successfully reconstructed without need of reinitialization and the tracts that needed to be reinitialized (Section 2.7). We then compared the results when performing individualized assessment of TAI using only the tracts that had been successfully reconstructed without need of reinitialization. We used a Wilcoxon signed-rank test in Python 3 (SciPy Stats 1.3.1) to assess differences in the number of injured tracts per patient.
2.7 Multivariate analysis
Figure 2 shows the steps of the multivariate analysis. To measure the extent of WM injury in each patient, we employed a multivariate analysis that included along-tract measures of both FA and MD. For this analysis we used both the tracts that had been successfully reconstructed without need of reinitialization and the tracts that needed to be reinitialized (Section 2.6). For each subject, each tract was divided uniformly in i= 1, . . ,4 segments and the along-tract measures of FA and MD, extracted at each point along the tract in TRACULA (see Section 2.4), were averaged within each segment to obtain a total of eight features for each tract (FA1, FA2, FA3, FA4 , MD1, MD2 , MD3 , MD4). We decided to collapse the along-tract measurements in four segments to ensure that for each tract the number of features (p = 8) was lower than the number of controls (n = 33). To meet normality assumptions, we ran Shapiro-Wilk tests in Python 3 using SciPy Stats 1.3.1 (Jones et al., 2001) and tested for the normality of FAi and MDifor each of the 40 tracts in the control group. For the tracts for which the distribution of FAi or MDi departed significantly from normality (p < 0.05) a rank-based inverse normal transformation (INT) was applied to the data for both controls and patients (Blom constant = 3/ 8) (Blom, 1958) in Python 3, as in (Jolly et al., 2020).
We compared the 8-D feature vector of dMRI measures (FA1,…,4 and MD1,…,4) between each patient and the healthy population based on the Mahalanobis distance DM, which is defined as (Mahalanobis, 1936): where x is the feature vector of a specific WM tract in a patient, μ is the mean feature vector of the same tract over controls, and C is the covariance matrix of the feature vector in controls. The values of D2M reflect how far a patient, represented as a point in the 8-D feature space, is from the normative distribution, which is estimated from the controls. The Mahalanobis distance has been used in neuroimaging studies, e.g., to classify between patients with neurological diseases and controls (Lindemer et al., 2015) and to detect individual neurodevelopmental differences (Dean et al., 2017). We computed the DM between patients and healthy controls for each of the 40 tracts to build individual profiles and identify the location and severity of WM injuries in patients. For multivariate Gaussian data, the distribution of D2M values is known to be Chi-squared with degrees of freedom equal to the number of features (Gnanadesikan & Kettenring, 1972). We computed the p-values corresponding to the Chi-square statistic in Python 3 using SciPy Stats 1.3.1 (Jones et al., 2001) and considered a WM tract to be injured if its p − value was < 0.001 (p < 0.05, Bonferroni adjusted for multiple comparisons across 40 tracts). We extracted the total number of injured tracts for each patient.
To measure the accuracy of the pipeline in the task of detecting TAI, we computed its performance in discriminating between patients and controls. We first computed the DM between each control and the remaining control population in a leave-one-out fashion and then performed a receiver operating characteristic (ROC) analysis by thresholding both the alpha used to identify a tract as extreme (thresholds= 0.001, 0.01, 0.05) and the number of tracts used to assign a subject to the control or patient population (range= 1: 1: 40). For each combination of alpha and number of tracts we computed the false positives (FP; number of controls classified as patients), the false negative (FN; number of patients classified as controls), the true positives (TP; number of patients classified as patients), the true negatives (TN; number of controls classified as controls), the true-positive rate (TRP; TP/(TP + FN)) and false-positive rate (FPR; FP/(FP + TN)). We obtained the ROC curve by plotting the TPR as a function of FPR and computed the area under the ROC curve (AUC) to quantify accuracy. To visualize the difference in D2M values between the two populations we plotted the probability density functions of D2M values for patients and controls for each of the 40 tracts. Finally, we compared the results when using only the tracts that had been successfully reconstructed without need of reinitialization (Section 2.6). We used a Wilcoxon signed-rank test in Python 3 (SciPy Stats 1.3.1) to assess differences in the number of injured tracts per patient.
2.8 Testing for the effect of head motion
To ensure differences in D2M values between patients and controls were not influenced by head motion in the scanner, we first computed a total motion index (TMI) for each subject as described in (Yendiki et al., 2014). The TMI is a composite score of four motion measures: i) average volume-by-volume translation, ii) average volume-by-volume rotation, iii) percentage of slices with signal drop-out, and iv) signal drop-out severity. For each subject, the TMI was computed in the following manner: where j= 1,… ,4 are the four motion measures listed above, xmj is the value of the j-th motion measure for the m-th subject and Mj , Qj, and qj are respectively the median, upper quartile, and lower quartile of the j-th measure over all the subjects (Yendiki et al., 2014). We then used a nonparametric Wilcoxon rank sum test in Python 3 (SciPy Stats 1.3.1) to assess if there were statistically significant differences in head motion between the patients and controls. We also performed a simple least squares regression in Python 3 using statsmodels 0.10.1 (Seabold & Perktold, 2010) to test if the interaction between group and TMI significantly predicted the D2M for each of the 40 reconstructed tracts (Bonferroni adjusted alpha level = 0.001).
2.9 Testing the relationship with behavioral measures
Before undertaking our analyses, GCS total scores were calculated by summing the eye opening, motor, and verbal subscale scores. CRS-R total scores were derived by the summing the auditory function, visual function, motor function, oromotor/verbal function, communication, and arousal subscales scores.
Both the GCS and CRS-R were included in analyses as they have complementary strengths (i.e., CRS-R has higher psychometric properties than the GCS; GCS is more widely-used and faster to administer than the CRS-R) (Bodien et al., 2021; Giacino et al., 2004; Teasdale et al., 2014).
Next, non-parametric Spearman correlations were conducted to test the relationship between our primary dependent variables (i.e., CRS-R total, GCS total, days in coma, days to command-following), primary independent variables (i.e., average DM and number of injured tracts per patient), and potential confounding variables (i.e., age, TMI, days to MRI) (Edlow et al., 2016). Results of these analyses informed variable selection for our subsequent analyses in two ways: i) only independent and confounding variables that were significantly correlated with the dependent variables were included in the linear regression model(s); and ii) if our dependent variables were significantly correlated with each other then, separate linear regression models were performed to avoid multicollinearity. Correlations were conducted in R using ‘cor.test’ (RStudio Team, 2020). Significance was set at α= 0.05 to ensure identification of potential confounding variables.
Based on the results of the previously described correlations, linear regression analyses would be conducted using the average DM, the number of injured tracts, and any identified confounding variables to predict i) GCS Total scores, ii) CRS-R Total scores, ii) days in coma, and iv) days to command-following. Linear regressions were computed in R using the ‘lm’ function (RStudio Team, 2020). We also performed ordinal logistic regression using the average DM of the injured tracts, the number of injured tracts, and any identified confounding variables to predict levels of consciousness (i.e., levels from more to less severe impairment: coma, vegetative state, minimally conscious state, minimally conscious state plus, posttraumatic confusional state) (Giacino et al., 2004; Thibaut et al., 2020). This statistical approach was selected to manage the categorical nature of the level of consciousness variable. We tested the relationship between levels of consciousness and potential confounding variables (i.e., days to MRI, age, and TMI), and included variables identified to be confounding in the final ordinal logistic regression model. Ordinal logistic regression analyses were computed in R (RStudio Team, 2020) using the ‘plor’ function from the “MASS” package (Venables & Ripley, 2002). T-values were compared against the standard normal distribution to obtain t-values for the estimates. To support interpretability, odds ratios and confidence intervals were obtained by exponentiating the estimates and confidence intervals
Then, as a secondary, exploratory analysis, ordinal logistic regression analyses were implemented using the average DM and the number of affected tracts for the three tract subsets measuring motor (motor tracts (MT)), language comprehension (language comprehension tracts (LCT)), and language production (language production tracts (LPT)), and any identified confounding variables to predict the corresponding GCS and CRS-R subscales as shown in Table 2. Tract subsets were chosen based on prior literature (Catani & Mesulam, 2008; Dick et al., 2019; Herbet et al., 2018; Maffei et al., 2019; Makris et al., 2005, 2009; Mars et al., 2016; Welniarz et al., 2017). Ordinal logistic regression was selected to manage the categorical nature of the GCS and CRS-R subscales (i.e., see Table 1 for details). Non-parametric Spearman correlations were conducted in R to test the relationship between our continuous independent variables (i.e., the average DM and number of injured tracts for the motor, language comprehension, and language production tract subsets) and potential confounding variables (i.e., age, TMI, days to MRI). Ordinal logistic regression analyses were applied to test the relationship between the categorical dependent variables (i.e., GCS and CRS-R subscales) and potential confounding variables (i.e., days to MRI, age, and TMI). Independent variables or confounding variables that were significantly correlated with one another were included in separate models to manage multicollinearity. We focused these exploratory analyses on language and motor function (e.g., as opposed to visual function) as their presence is considered a positive sign of recovery (Formisano et al., 2004; Tamashiro et al., 2012), reflecting reemergence of consciousness (Giacino et al., 2014). We included the CRS-R-A (i.e., assesses basic language comprehension) in addition to the CRS-R-O/V and GCS-V (i.e., assess basic language production) as these measures would be anticipated to be related to damage to different tracts, the ventral and dorsal streams, respectively (Hickok & Poeppel, 2004; Saur et al., 2008).
3. Results
3.1 Patient Demographics and Clinical Characteristics
Patient demographics and clinical characteristics are provided in Table 2, including clinical indicators of coma duration (i.e., days in coma, days until command-following, level of consciousness at MRI); and behavioral measures of consciousness (i.e., GCS and CRS-R total scores), including subscales measuring language and motor function.
3.2 Quality assessment and feasibility of tractography reconstructions
Figure 3 shows the reconstruction of the 40 tracts for six representative patients with different mechanisms of injury and one representative control. We observed that the TRACULA pipeline successfully reconstructed 93% of the tracts (mean: 37,5; SD: 2.18) across all patients without manual intervention. This is 43% more successful reconstructions than the threshold we set to consider the TRACULA automated stream feasible in this study (Section 2.6). Figure 4 reports the number of reconstructions for each patient that i) were correctly reconstructed without the need for reinitialization, ii) resulted in either partial or failed reconstruction and were reinitialized successfully, and iii) resulted in partial or failed reconstructions even after reinitialization (See Section 2.6). For a complete list of tracts that were reinitialized for each patient see supplementary table 1.
In one patient (P1) the FreeSurfer “recon-all” stream failed due to the presence of a massive lesion that injured a substantial portion of the cerebral cortex in the right hemisphere. This patient was thus excluded from the TRACULA pipeline. For five of the remaining 17 patients (P2, P3, P6, P12, P16), reconstructions were complete and accurate and did not require reinitialization, even in the presence of focal lesions (P2, P6, P12) (Supplementary Figure S1). For the remaining twelve patients, a small number of tracts (mean = 2.47, range = 1:7) resulted in failed or partial reconstructions and required reinitialization. For nine patients out of twelve, reinitialization led to successful reconstructions (as defined in Section 2.6). For one of these twelve patients (P4) two tracts were only partially reconstructed after reinitialization (MCP, LH CST), and for two patients (P10, P14) two and three tracts respectively (P10: ACOMM, RH UF; P14: ACOMM, RH SLFII, CC BODYPF) did not improve after reinitialization and resulted in failed reconstructions (Figure 3, P14). Upon visual inspection, failed or partial reconstructions were attributable to hemorrhagic contusions in the temporal and frontal lobes, respectively, which caused disruption of three tracts each (Supplementary Figure S2).
3.3 Multivariate individualized assessment of white matter damage
The Shapiro-Wilk test showed that the distribution of FAi and MDi departed significantly from normality (p-value < 0.05) in 8.6 tracts per measure on average. Supplementary Table 2 reports the tracts along with the W- and p-value. Figure 5 shows the D2M-based individual profiles for four representative patients at different levels of consciousness (Coma, VS, MCS, PTCS). Individual profiles for all 17 patients are shown in Supplementary Figures S3 and S4. The individual profiles show the distance of each of the WM tracts from the multivariate distribution of that tract in the control population. To investigate whether a common pattern of TAI localization was visible across individuals, we grouped the reconstructed WM tracts as follows: i) commissural tracts: ACOMM, CC-BODYC, CC-BODYT, CC-BODYPF, CC-BODYP, CC-BODYPM, CC-GENU, CC-ROSTRUM, CC-SPLENIUM; ii) projection tracts: CST, ATR, AR, OR; iii) association tracts: all the other WM tracts. Commissural tracts were shown to have overall higher D2M values (Supplementary Figure S5) that resulted in a higher number of significantly extreme values (p < 0.001) across patients compared to projection and association tracts (Supplementary Figure S6). The number and type of tracts that resulted injured varied across level of consciousness (LOC) (Figure 5, Figures S3, S4). For each patient we extracted the total number of injured tracts (we considered a WM tract to be injured if its p − value was < 0.001 (Section 2.7)). On average 12.6 tracts resulted injured in patients with TBI (Figure 6).
To investigate the accuracy of the TRACULA pipeline at detecting TAI, we quantified its performance in classifying patients versus controls. We first computed the DM for each control in a leave- one-out fashion and extracted the number of injured tracts for each control. On average 3.3 tracts resulted injured in controls (Figure 6, right panel). A Wilcoxon sum rank test showed significant difference between the number of injured tracts in patients and controls (W= 4.79, p= 1.64e − 6). We then performed a ROC analysis and found an accuracy of 0.91 (Supplementary Figure 8). To visualize the difference in D2M values between the two populations we plotted the probability density functions of D2M for patients and controls. Figure 7 shows the probability density functions of D2M values for the different sections of the corpus callosum for patients and controls. For subjects with severe TBI, D2M values are shifted to the right suggesting a larger mean and greater degree of microstructural deviation from the normative reference group. Probability density functions of D2M values for patients and controls for all 40 tracts are shown in supplementary figure S7.
Finally, there was no significant difference in number of injured tracts per patient when using only the tracts that had been successfully reconstructed without need of reinitialization (i.e., discarding the tracts that needed to be reinitialized (p=0.083), suggesting the low number of tracts that needed to be reinitialized across subjects did not significantly impact the overall individualized assessment of TAI.
3.4 Effects of head motion
The TMI and the four motion measures used to compute it are reported for each subject in Supplementary Table S4. A nonparametric Wilcoxon sum rank test revealed a non-statistically significant difference in TMI between patients and controls (W= 1.91, p= 0.055) and the group-by-TMI interaction did not significantly predict D2M values for any of the 40 tracts nor the number of injured tracts per subject (all p − values > 0.05). Taken together, these results emphasize that the greater number of injured tracts detected for patients versus controls was not confounded by differences in head motion between the groups.
3.5 Relationship with behavioral measures
Non-parametric Spearman correlations identified several significant relationships as shown in Supplementary Table S4. See Supplementary Figure S10 for full results of the Spearman correlations testing associations between continuous variables.
Based on these findings, separate linear regression analyses were conducted for each of the dependent variables (i.e., GCS-Total, CRS-R Total, Days in Coma, Days to command-following) with both the average DM and number of affected tracts included as predictor variables. Days to MRI was included as a continuous covariate in the regression predicting GCS-Total, as they were revealed to be significantly correlated (rho = 0.54, p < 0.01). Average DM and number of affected tracts did not significantly predict GCS-Total, CRS-R Total, Days to command-following, or Days in coma (all uncorrected p-values > 0.05). Days to MRI did not significantly predict GCS-Total in the regression model (beta = 0.21, SE = 0.10, t = 2.048, p = 0.06). No confounding variables were identified for the ordinal regression using the average DM and number of affected tracts to predict levels of consciousness, although Days to MRI approached significance (beta = 0.37, SE = 0.20, t = 1.87, p = 0.06). Average DM and number of affected tracts did not significantly predict level of consciousness (beta = -0.01, SE = 0.02, t = -0.73, p = 0.46 and beta = -0.03, SE = 0.09, t = -0.35, p = 0.72, respectively).
In terms of our secondary, exploratory analyses with language and motor function, non- parametric Spearman correlations identified several relationships between our dependent (i.e., GCS Motor, GCS Verbal, CRS-R-Auditory, CRS-R-Oromotor/Verbal, CRS-R Motor) and independent variables (e.g., average DM of tract subsets, n of affected tracts within each subset) as shown in Supplementary Table S4. These results influenced our analyses in two ways: i) separate ordinal regression analyses were undertaken for each of the dependent variables (i.e., GCS Verbal, GCS Motor, CRS-R Auditory, CRS-R- Oromotor/Verbal, CRS-R-R Motor); and ii) the average DM of tract subsets and the number of affected tracts within a subset were not included in the same model. Before conducting ordinal logistic regressions testing the relationship between the dMRI-derived measures and language and motor subscales of the GCS and CRS-R, the relationship between age, TMI, and days to MRI and the sub-scales were individually investigated. TMI (beta = 0.11, SE = 0.05, t = 1.98, p = 0.047) showed a significant relationship with the GCS Verbal subscale and thus, was included as a confounding variable. TMI (beta = 0.14, SE = 0.06, t = 2.39, p = 0.167) and Days to MRI (beta = 0.16, SE = 0.08, t = 1.98, p = 0.047) demonstrated significant relationships with the CRS-R Oromotor/Verbal subscale and thus, were included as confounding variables. Neither the average DM, nor the number injured tracts for the motor, language comprehension, and language production subsets significantly predicted their corresponding GCS and CRS-R subscales (all uncorrected p − values > 0.05). Interestingly, however, for every one unit increase in TMI, the likelihood of a patient demonstrating a higher level of verbal expression function (as measured by the GCS Verbal subscale) increased by 18% in the ordinal logistic regression with the average DM of language production tracts as a predictor variable (beta = 0.16 , SE = 0.09, t = 1.83 , p = 0.06, Profiled 95% Confidence Interval = 0.02 to 0.38, Proportional Odds Ratio = 1.18) and by 19% in the ordinal logistic regression with the number of affected language production tracts as a predictor variable (beta = 0.17, SE = 0.09 , t =1.91 , p = 0.055, Profiled 95% Confidence Interval = 0.03 to 0.38, Proportional Odds Ratio = 1.19).
Furthermore, for every one unit increase in days to MRI, the likelihood of a patient demonstrating a higher level of verbal expression function (as measured by the CRS-R Oromotor/Verbal subscale) increased by 17% in the ordinal logistic regression with the average DM of language production tracts as a predictor variable (beta = 0.16 , SE = 0.08, t = 1.98, p = 0.04, Profiled 95% Confidence Interval = 0.02 to 0.34, Proportional Odds Ratio = 1.17) and by 18% in the ordinal logistic regression with the number of affected language production tracts as a predictor variable (beta = 0.16, SE = 0.08 , t = 1.99, p = 0.045, Profiled 95% Confidence Interval = 0.02 to 0.35, Proportional Odds Ratio = 1.18).
Finally, for every one unit increase in TMI, the likelihood of a patient demonstrating a higher level of verbal expression function (as measured by the CRS-R Oromotor/Verbal subscale) increased by 14 % in the ordinal logistic regression with the average DM of language production tracts as a predictor variable (beta = 0.13, SE = 0.06 , t = 2.39, p = 0.017, Profiled 95% Confidence Interval = 0.03 to 0.26, Proportional Odds Ratio = 1.14) and by 16 % in the ordinal logistic regression with the number of affected language production tracts as a predictor variable (beta = 0.15 , SE = 0.06, t = 2.48, p = 0.013, Profiled 95% Confidence Interval = 0.04 to 0.28, Proportional Odds Ratio = 1.16).
All statistical analyses were conducted again using data from the completed tracts only. There were no differences in the results using the DM or number of affected tracts that had been successfully reconstructed without need of reinitialization. This finding further supports that the few tracts that needed to be reinitialized across subjects did not significantly impact the overall individualized assessment of TAI.
4. Discussion
In this prospective observational study, we show that TRACULA provides automated reconstruction of white matter tracts from diffusion MRI data in critically ill patients with acute severe TBI, even in the presence of large focal lesions. These findings demonstrate the feasibility of applying the TRACULA automated tractography pipeline (Maffei et al., 2021; Yendiki et al., 2011) during the acute stage of severe TBI, addressing a major barrier to the assessment of TAI in this population. If replicated in larger studies, these proof-of-principle results suggest that TRACULA can be used by clinicians for acute detection of TAI in critically ill patients – information that can inform prognosis, guide therapeutic decision-making, and provide additional information to the standardized clinical assessment. We discuss the clinical implications and technical limitations of the current results and provide further directions for optimization of the proposed pipeline.
With respect to TAI detection, TRACULA provided individualized profiles of WM disruption that differentiated patients from healthy controls and from one another (Fig. 5, Fig. 6), providing an acute assessment of TAI burden for each individual patient. This finding builds on recent studies showing that DT measures can be used to identify the neuroanatomic distribution and severity of TAI at the individual level (Jolly et al., 2020), an important goal given the well-established heterogeneity of this condition (Douglas et al., 2019). Multivariate approaches that combine different neuroimaging measures have shown higher power in distinguishing different clinical populations from controls (Dean et al., 2017). Indeed, the multivariate analysis of along-tract DT metrics revealed marked variability in the number and location of injured WM tracts between patients (Fig. 5, Fig. 6). While no clear pattern of WM damage was visible within LOC, results of the multivariate individual analysis revealed commissural connections to be overall more significantly and more frequently affected compared to projection and association fibers (Fig. 5, Fig. S5, Fig. S6). Specifically, the frontal, parietal, and temporal sections of the CC were the most affected, consistent with previous histological and neuroimaging studies in humans (Moen et al., 2014; Nolan et al., 2020; Ubukata et al., 2016) and with studies in animal models of TAI (Baker et al., 2004; Gennarelli et al., 1982).
Interestingly, individual multivariate measures of TAI did not show a significant relationship with behavioral measures of consciousness (i.e., GCS-Total, CRS-R Total, days to command-following, days in coma) or basic language and motor function (i.e., GCS Verbal, GCS Motor, CRS-R Auditory, CRS-R Oromotor/Verbal, CRS-R Motor). There are several potential explanations for this observation. First, widespread tract disruption in the brainstem may have confounded any associations between i) the global burden of TAI and behavioral measures of consciousness; and ii) TAI within domain-specific tracts and behavioral measures of language or motor function. TRACULA does not yet provide automated reconstructions of brainstem tracts – a key direction for future work in this field, especially as the presence of TAI in the brainstem pathways can cause altered consciousness, and impaired language and motor function (Edlow et al., 2013; Snider et al., 2019). Second, it is important to acknowledge that the GCS and CRS-R subscales are not “pure” measures of motor and language function (e.g., highest score on the GCS motor requires intact auditory comprehension) (Schnakers et al., 2015), and these scores may not reflect “covert consciousness” in the presence of cognitive-motor dissociation (Schiff, 2015). As prior correlations between acute tractography data and long-term cognitive function indicate a possible prognostic role for tract-specific TAI assessments (Wang et al., 2008), it will be important in future work to assess if these multivariate tract-specific measurements in the ICU are relevant to overall and/or domain-specific recovery in the subacute-to-chronic stages of care, when language and motor assessments may be less affected by global injury and/or brainstem injury.
TRACULA allowed to automatically reconstruct on average 93% of the WM tracts with minimal manual intervention, providing potential for dissemination to a broad range of hospital settings without the need for local tractography expertise. However, the potential clinical translation of this pipeline for TAI detection in the ICU will depend upon additional methodological and logistical factors that require further study. While on average TRACULA could successfully reconstruct 37.5 of 40 tracts without the need of manual intervention, an average of 2.4 tracts resulted in failed or partial reconstructions and needed to be reinitialized in 12 patients (Supplementary Table 1). This step relied upon a visual quality assessment from a tractography expert that would limit its applicability to clinical settings. Furthermore, some of the reinitialized tracts could not be successfully reconstructed due to the presence of focal lesions in two patients. While it has been previously shown that TRACULA is robust to errors in the anatomical segmentation (Zöllei et al., 2019), because prior probabilities are based on the relative positions of the bundles with respect to the labels, large lesions can lead to missing labels that can affect tract reconstructions. Further optimization of the TRACULA pipeline will include i) automated quality assessment of tract reconstructions and ii) correct handling of missing FreeSurfer anatomical labels, including in cases where the FreeSurfer “recon-all” pipeline fails due to the presence of lesions (e.g., Patient 1 in this study). Additionally, the dMRI data in this study were acquired on a clinical scanner in a research hospital. More specifically, the dMRI sequence used high spatial resolution (2 mm isotropic) and high angular resolution (b= 2,000 s/mm2, 60 diffusion-encoding directions) leading to a longer acquisition time (∼ 9 minutes), relative to more standard clinical diffusion sequences (Chilla et al., 2015). Thus, while obtained on a clinical scanner, the dMRI data collected in this study were likely of “higher- quality” than dMRI data acquired on clinical scanners in hospitals not associated with a research facility, and it will be important to determine whether automated reconstruction of WM tracts with TRACULA is robust in the setting of lower-resolution, lower-quality dMRI data in the future. It is also important to consider that the individualized assessment can only be performed comparing patients to a control cohort scanned with the same dMRI sequence. For this pipeline to be applied in a clinical setting that does not have data from healthy controls, future optimizations of this pipeline should explore its robustness to dMRI data acquired on a different scanner and with a different dMRI sequence. Finally, it is important to consider that some patients with acute severe TBI are not stable enough to travel to an MRI scanner, and thus this pipeline may not be feasible to use until these patients reach the subacute stage of care.
The results of this study should also be interpreted in the context of further methodological limitations, including the small sample size. Although there was no statistically significant difference in age between the patient and control cohorts, the controls were not enrolled with an even distribution of age, sex and educational attainment. Identification of optimal characteristics and sample sizes for control cohorts is an area of active inquiry (Jolly et al., 2020), and future studies may leverage large normative databases such as the Human Connectome Project in this effort (Bookheimer et al., 2019; Harms et al., 2018).
In summary, this prospective observational study demonstrates the feasibility and utility of an automated tractography pipeline for detecting TAI in the acute injury phase of severe TBI. Quantitative analysis of patient-specific TAI burden in the ICU could increase the accuracy of outcome prediction in this population and guide decisions regarding the provision of life-sustaining treatment and rehabilitation.
Data Availability
The TRACULA version and the training dataset are included in FreeSurfer 7.2 (https://github.com/freesurfer/freesurfer/tree/fs-7.2). Extensive documentation and tutorials are available on the FreeSurfer wiki (https://surfer.nmr.mgh.harvard.edu/fswiki/Tracula). Part of the raw DWI data are available at https://openneuro.org/datasets/ds003367/versions/1.0.0. All remaining DWI data produced in the present study are available upon request to the authors.
Funding
This study was supported by the NIH National Institute of Neurological Disorders and Stroke (R01NS119911, R21NS109627, RF1NS115268, U01NS1365885, U01NS086090), NIH Director’s Office (DP2 HD101400), NIH National Institute of Biomedical Imaging and Bioengineering (R01EB021265, U01EB026996), NIH National Institute of Mental Health (U01MH108168, R56MH121426), James S. McDonnell Foundation, American Academy of Neurology, Tiny Blue Dot Foundation, and National Institute on Disability, Independent Living and Rehabilitation Research (NIDILRR), Administration for Community Living (90DPCP0008-01-00, 90DP0039).
Disclosures
The authors report no relevant disclosures.
Acknowledgments
We thank the nursing staffs of the Massachusetts General Hospital Neurosciences ICU, Multidisciplinary ICU, and Surgical ICU. We also thank the Massachusetts General Hospital MRI technologists for assistance with data acquisition. We are grateful to the patients and families in this study for their participation and support.