Abstract
Background A robust definition of normal is required to confidently identify vertebral abnormalities such as fractures. Between 1976 and 1980, the 2nd National Health and Nutrition Examination Survey (NHANES-II) was conducted. Justified by the prevalence of neck and back pain, approximately 10,000 lateral cervical spine and 7,000 lateral lumbar spine X-rays were collected. Demographic, anthropometric, health, and medical history data were also collected. This resource can be used for establishing normative reference data that can subsequently be used to diagnose abnormal vertebral morphology.
Purpose 1) Develop normative reference data for vertebral morphology using the lateral spine radiographs from NHANES-II. 2) Document sources of variability.
Subject Sample Nationwide probability sample to document health status of the United States.
Methods The coordinates of the four vertebral body corners were obtained using previously validated, automated technology consisting of a proprietary pipeline of neural networks and coded logic. These landmarks were used to calculate six vertebral body morphology metrics: 1) anterior/posterior vertebral body height ratio (VBHR); 2) superior/inferior endplate width ratio (EPWR); 3) forward/backward diagonal ratio (FBDR); 4) height/width ratio (HWR); 5: angle between endplates (EPA); 6) Angle between posterior wall and superior endplate (PSA). Descriptive statistics were generated and used to identify and trim outliers from the data and obtain a gaussian distribution for each metric. Descriptive statistics were tabulated using the trimmed data for use in quantifying deviation from average for each metric. The dependency of these metrics on sex, age, race, nation of origin, height, weight, and BMI was also assessed.
Results Computer generated lumbar landmarks were obtained for 42,980 vertebrae from lumbar radiographs and 54,093 vertebrae from cervical radiographs for subjects 25 to 74 years old. After removing outliers, means and standard deviations for the remaining 35,275 lumbar and 44,938 cervical vertebrae changed only slightly, suggesting that normal morphology and intervertebral alignment is dominant in the data. There was low variation in vertebral morphology after accounting for vertebra (L1, L2, etc.), and the R2 was high for analyses of variance. The EPWR, FBDR and PSA generally had the lowest coefficients of variation. Excluding outliers, Age, sex, race, nation of origin, height, weight, and BMI were statistically significant for most of the variables, though the F-statistic was very small compared to that for vertebral level. Excluding all variables except vertebra changed the R2 very little (e.g. for the lumbar data, VBHR R2 went from 0.804 to 0.795 and FBDR R2 went from 0.9005 to 0.9000). Reference data were generated that can be used to produce standardized metrics in units of standard deviation from average. This allows for easy identification of abnormalities resulting from vertebral fractures, atypical vertebral body morphologies, and other congenital or degenerative conditions. Standardized metrics also remove the effect of vertebra thereby enabling data for all vertebrae to be pooled in research studies.
Conclusions The NHANES-II collection of spine radiographs and associated data may prove to be a valuable resource that can facilitate standardized spine metrics useful for objectively identifying abnormalities. The data may be particularly valuable for identification of vertebral fractures, although X-rays taken early in life would be needed in some cases to differentiate between normal anatomic variants, fractures, and vertebral shape remodeling.
Introduction
Although vertebral fractures can be asymptomatic, these fractures can be a source of symptoms. One of the most important predictors of subsequent fractures is the number of prior fractures.(1-3) Vertebral fractures are therefore diagnostically important. Fractures can be missed in clinical practice(4,5). Ferrar et al. succinctly identified the challenge of fracture detection: “The identification of vertebral fractures is problematic because (1) ‘‘normal’’ radiological appearances in the spine vary greatly both among and within individuals; (2) ‘‘normal’’ vertebrae may exhibit misleading radiological appearances due to radiographic projection error; and (3) ‘‘abnormal’’ appearances due to non-fracture deformities and normal variants are common, but can be difficult to differentiate from true vertebral fracture.”(6)
A typical definition of a fracture is “20% or greater loss in expected vertebral body height.” This definition requires definition of “expected” vertebral body height and that requires accurate measurements of vertebral body shape in a large and representative population. Validation of the clinically meaningful threshold level for change in vertebral dimensions also requires a reliable reference standard. Inaccuracies in measuring shape and the challenge of finding a suitable reference standard are well documented. (6-43) One option is to use the patient’s own adjacent vertebral bodies as a reference, but there is no definitive method to assure that the adjacent levels are normal, or reference data to know how much variability between levels is normal. Normative vertebral body shape metrics that represent a large proportion of the population and account for age, sex, and other confounding factors have not been previously published. Robust reference data and an understanding of the important confounding factors would facilitate more accurate and reproducible diagnosis of the presence and severity of vertebral body fractures.
Between 1976 and 1980, the 2nd National Health and Nutrition Examination Survey (NHANES-II) was conducted1. This was a nationwide probability sample to document health status of the United States. Justified by the prevalence and societal impact of neck and back pain, approximately 10,000 lateral cervical spine and 7,000 lateral lumbar spine X-rays were collected as part of this survey. Demographic, anthropometric, health, and medical history data were also collected. This resource can be used for establishing normative reference data that can subsequently be used to diagnose abnormalities such as fractures, vertebral deformities, and congenital abnormalities. These data can also be used to determine if the sex, age, race, height, weight and/or body mass index (BMI) of an individual are needed to predict normal vertebral morphology.
Methods
The NHANES-II images and data were obtained through public access2. A previously validated series of neural networks and coded logic (Spine Camp™, Medical Metrics, Inc., Houston, TX) were used to obtain four landmarks for each vertebral body from L1 to S1 (Figure 1). Although the NHANES-II lumbar X-rays almost always included up to T10 in the field-of-view, the neural networks were only trained to recognize L1 to S1. The neural networks include quality control networks to assure the X-ray was a lateral lumbar view, manipulate the image if needed so upper vertebrae are toward the top of the X-ray, with spinous processes toward the left side, and with bone being whiter than air. Neural networks and coded logic were then used to segment the bone, find individual vertebrae, label the vertebrae, find the four corners, and finally refrain from reporting landmark coordinates when the confidence of the networks was too low. All the neural networks were trained with over 50 thousand lateral X-rays where analysts had previously digitized standardized landmarks. The NHANES images were not used in training the neural networks. The analysts used Quantitative Motion Analysis software (QMA®, Medical Metrics, Inc., Houston, TX) to place the landmarks. The QMA® software has previously been validated (44-49).
Details of anatomic landmark placement. Dashed lines show the assumed mid-sagittal plane of the superior and inferior endplates, identified as bisecting the radiographic shadows of the left and right sides of the endplates (yellow arrows). The red circles show the four landmarks used to measure vertebral body morphology. The red arrow points to an anterior osteophyte that is ignored. The dotted lines show the anterior and posterior vertebral body heights.
Details of landmark placement are provided, since as Keynan et al. noted, “There is a lack of standardization in the literature regarding choice and technique for the measurement of these parameters.”(37). The standardized landmark placement was focused on the mid-sagittal plane of each vertebral body (Figure 1), excluding residual uncinate processes, and placing landmarks that identify the corners of the vertebral bodies prior to any osteophyte formation (50,51). Landmark placement is similar to that illustrated in Fig 3 of Genant et al (52). The posterior superior landmarks are placed so as to represent the endplate as it would appear on a mid-sagittal slice of a CT exam, rather than up at the top of the posterior ridges on the upper endplate similar to logic described in Keynan et al. (37). When the X-ray beam path through a vertebra is not perpendicular to the mid-sagittal plane of the vertebra, the left and right sides of the vertebral endplates and the left and right aspects of the posterior vertebral body may be seen on the X-ray.(16,20) It is understood that the radiographic shadows that might be interpreted as the left and right rims of the endplates might not exactly represent the endplate rims(16), but it is assumed that the line bisecting these shadows is the best-available radiographic approximation of the mid-sagittal plane. Therefore, the mid-sagittal plane is assumed to be midway between the radiographic shadows of the left and right sides of endplate, and midway between the radiographic shadows of the left and right aspects of the posterior wall. Multiple examples of landmark placement are provided online3 to help readers appreciate the nuances of how landmarks were placed. Vertebral morphometry calculated from landmarks is dependent on details of landmark placement, and it is therefore important to appreciate landmark placement details. The examples provided include vertebrae with abnormal morphology (as defined by the metrics presented in this paper) and help to appreciate the variety of conditions that may be identified through utilization of standardized morphology metrics.
The neural networks and coded logic produced landmark coordinates for the vertebral bodies from L1 to S1 in lumbar spine radiographs and C2 to C7 in cervical spine radiographs. An anomaly detection algorithm was then run on those data to identify anomalies due to very poor image quality, severe image artifacts, and very unusual anatomy. Those anomalies were removed from subsequent data analysis.
The coordinates of the four vertebral body corners were used to calculate six vertebral body morphology metrics:
anterior / posterior vertebral body height ratio (VBHR)
superior / inferior endplate width ratio (EPWR)
forward / backward diagonal ratio (FBDR)
height / width ratio (HWR)
angle between the superior and inferior endplates (EPA)
angle between the posterior wall and the superior endplate (PSA)
The NHANES-II lumbar radiographs were obtained with the subject in the lateral recumbent position with body flexion (53). As this is not the typical position used to obtain a radiograph for vertebral fracture detection, global and local spinal lordosis were not calculated. The NHANES-II cervical spine radiographs were obtained with the subject standing and a 25 lb weight in each hand to pull down the shoulders, and the head and neck flexed forward. (53)
Principal component analysis was used to determine if the dimensionality of the data can be reduced (Stata Ver 15). The dependency of the six vertebral morphology metrics on sex, age, race, nation of origin, height, weight, and BMI was assessed using analysis of variance (ANOVA, Stata Ver 15). In the NHANES-II study, race was recorded as white, black, or other. Nation of Origin was one of 13 categories.
Descriptive statistics were tabulated and used to identify and trim outliers in the data that may compromise the definition of “normal” vertebral morphology (54-57). Outliers <> 3 standard deviations (SD) from the mean were inspected to identify typical explanations for each type of outlier. The data were trimmed by excluding all vertebrae where any of the six metrics was <> 2 SD from the mean, and descriptive statistics were tabulated for use in quantifying deviation from average for each metric. Correlations between the six morphology metrics were also explored. Finally, means and standard deviations were tabulated for each metric, for each vertebra from L1 to S1 and from C2 to C7. Those means and standard deviations were then used to calculate standardized metrics (standard deviations from mean) for the entire data sets. All statistical analysis was completed using Stata ver 15 (College Station, TX, USA).
The NHANES-II study also includes a response to the question: “Have you ever had pain in your back on most days for at least two weeks”. That data was analyzed for associations with the six morphology metrics.
To assess reproducibility of the metrics, the six morphology metrics were also calculated for flexion and extension radiographs for 415 asymptomatic volunteers. This is an expanded version of previously published data.(58) The morphology metrics were calculated from landmark coordinates obtained using the same neural networks and coded logic used with the NHANES-II X-rays. The morphology of each vertebra should not have changed between flexion and extension, unless there was an unhealed vertebral fracture, which is unlikely in an asymptomatic person. However, the position of each vertebrae relative to the central beam path did change substantially between flexion and extension in most cases. The morphology measured from the flexion X-ray was compared to the morphology calculated from the extension X-ray using Bland-Altman limits of agreement. Any variability in the metrics is assumed to be error due to variability in radiographic projection.
To further document expected variability in the metrics specifically due to variations in radiographic projections that can occur in clinical practice, digitally reconstructed radiographs were used. Thin-slice (1 mm or less) anonymized computed tomography exams of the lumbar spine of 26 individuals or cadavers (from past research studies) were interpolated to 0.2 mm isotropic resolution. The three-dimensional coordinates of the four anatomic landmarks described in Figure 1 were digitized, in 3D, for the mid-sagittal plane of each vertebra from L1 to S1, using intersecting axial, coronal, and sagittal slices. Landmarks were thus placed in the mid-sagittal plane at clear and well-defined locations. This may be as close as possible to “gold standard” landmark placement. The image processing and landmark digitization was completed using Slicer 3D.(59) Custom python code was developed to create 2D, digitally reconstructed radiographs (DRR) from the 3D data using Plastimatch DRR (60). The projection matrices used by Plastimatch DRR to create the X-rays were also used to calculate, from the known 3D coordinates, the precise coordinates of each landmark on the 2D simulated X-ray. A 40 inch source-to-image distance was used for all X-rays. The base x-ray was centered on the centroid of the L3 vertebra (this is the baseline isocenter). Forty additional X-rays were generated with the following variations:
random beam tilts of between ± 10 deg in each of the sagittal and axial planes
random displacements from the baseline isocenter ± 2 endplate widths (EPW) in the AP and LR directions, plus ± 3 EPW shifts in the cranial-caudal direction
random image rotations between ± 24 deg
These variations resulted in a wide range of radiographic projections. The landmarks in every image were precisely known so there were zero errors in landmark coordinates relative to the 3D coordinates. This allows for documenting the effect of variability in vertebral morphology metrics that are due to radiographic projection, without uncertainty in landmark placement. These precisely calculated landmarks were analyzed to obtain the six vertebral body morphology metrics. There were thus 41 morphology measurements for every vertebra from L1 to S1 for each of 26 independent CT exams. The variability in each metric was assessed using the standardized version of the metric, since all of these metrics are thereby expressed as standard deviations from the average and that allows for combining data for all vertebrae and for consistent interpretation of the data. Although the coordinates of landmarks in every simulated X-ray were precisely calculated, the quality of the simulated radiographs was below clinical standards and not appropriate for use toward validating the accuracy of neural networks in obtaining landmarks.
Results and Discussion
Due to the large quantity of data to be presented, the results and discussion are combined.
Summary of Data Analyzed
Computer generated lumbar landmarks were obtained for 42,980 vertebrae from 7,364 of 7,423 total lumbar radiographs and 54,093 vertebrae from 9,662 out 9,669 cervical radiographs for subjects 25 to 74 years old. 50 lumbar X-rays were not analyzed due to an error in file transfer and seven cervical and nine lumbar X-rays were non-analyzable. Landmark generation for all lumbar and cervical radiographs took 22 hrs on a dual CPU, 4 GPU (Nvidia T4) production server. The age range was 25 to 74 years and the demographic and body size data are summarized in Table 1.
Sample size, average age and [SD], and average BMI (kg/m2) and [SD]. Age and BMI were missing for two subjects.
Note that females are under-represented in the lumbar data. By design of the NHANES-II study, no lumbar X-rays were taken for pregnant women or women under 50.(61) An additional limitation relates to the data on race and nation of origin recorded in the NHANES-II data. Based on the cervical data, 86.9% of NHANES-II subjects were “white”, 11.2% “black” and the rest “other”. A more uniform representation of races would likely be needed to fully understand the importance of race. The same is true for the “nation of origin” data in NHANES-II. NOTE: the language in quotes is as used in the NHANES-II documentation: 74.1% of subjects are classified as “OTHER EUROPE SUCH AS GERMAN, FRENCH, ENGLISH, IRISH”, 10.8% classified as “BLACK, NEGRO OR AFRO-AMERICAN”, and the rest distributed in 1 of 11 other classifications. Djoumessi et al. have reported differences in the VBHR between different nations of origin (56). It would be valuable to repeat this analysis on X-rays that better represent races and nations of origin not well represented in the NHANES-II data. The ages of the subjects (using cervical data) were also biased toward older ages (Figure 5). That may explain the relatively high rate of vertebrae with at least one abnormal morphology metric.
Distribution of ages in the NHANES-II study.
Principal component analysis (PCA) of the lumbar data documented that 5 primary components can explain 99.8% of the variance in the morphology data. PCA of the cervical data documents that 6 components explain 99.7% of the variance. Since PCA does not appreciably reduce the dimensionality of the data, it was not investigated further.
Effect of covariates
Analyses of variance (after excluding outliers) revealed that age, sex, race, and nation of origin were statistically significant for most of the variables (Tables 2 and 3), and the R2 was high. However, the F-statistic was generally small for all variables except vertebral level. Excluding all variables except vertebra changed the R2 very little (e.g. for the lumbar data, VBHR R2 went from 0.848 to 0.842 and HWR R2 went from 0.839 to 0.800). This documents that age, sex, race and body size are significant, but contribute relatively little to the variability in the morphology metrics. Therefore, except for vertebral level, all data were pooled for subsequent analysis. Previous studies have shown a dependence of vertebral morphology on age, gender, race and other variables, but the relative magnitude of the influence was not described.(62,63)
Results of multivariate analysis of variance to determine the importance of multiple independent variables for explaining variability in the lumbar vertebral morphology metrics. The “F” columns provide the “F” statistic. The “P>F” columns provide the statistical significance. The F values document how much of the variability in the dependent variable is explained by each independent variable.
Results of multivariate analysis of variance to determine the importance of multiple independent variables for explaining variability in the cervical vertebral morphology metrics.
Descriptive Statistics for Normative Reference
Descriptive statistics for the morphologic parameters are provided in Tables 4 and 5. Mean values changed very little for the morphology metrics after removing the outliers (<> 2 SD from mean), suggesting that normal morphology is dominant in the data. Standard deviations were reduced after removing outliers. These observations are the same as reported in a similar study (56). Data with a Gaussian distribution have a skewness of 0 and kurtosis of 3. Excluding outliers substantially reduced the skewness from an average of 0.27 to 0.05, and substantially narrowed the tails of the distribution from an average kurtosis of 4.1 to 2.5. This documents that the distribution of data was more Gaussian after removing outliers. There was remarkably small variation in vertebral morphology after accounting for vertebra (L1, L2, etc.). The PSA, FBDR, and EPWR generally had the lowest coefficients of variation (Tables 6 and 7).
Means and [standard deviations] for 35,275 lumbar vertebrae. All vertebrae where any metric was <> 2 Std Dev from the average for all vertebrae in the NHANES-II study were excluded.
Means and [standard deviations] for 44938 cervical vertebrae. All vertebrae where any metric was <> 2 Std Dev from the average for all vertebrae in the NHANES-II study were excluded.
Coefficients of variation (CV = SD / mean) for 35,275 lumbar vertebrae. Note that the large values for L3 EPA is due to the mean value being very close to zero (ref Table 4) and the CV is difficult to interpret when the denominator is near zero.
Coefficients of variation (SD / mean) for 44,938 cervical vertebrae.
Figure 6 provides comparison of VBHR data in Table 4 to some previously reported morphology metrics.(54-57,64-66)
The six morphologic variables were significantly correlated with each other (Tables 8 and 9). EPA and VBHR were almost perfectly correlated, suggesting that only one of these two variables needs to be analyzed. Although VBHR and EPA were almost perfectly correlated, the other metrics were not. That observation supports that the metrics each independently describe some unique aspect of vertebral morphology. It cannot be determined from the NHANES data alone which of the six metrics will prove most valuable in research and clinical practice. It may be worth exploring the utility of each metric, and combinations of metrics, in future studies where the goal is to diagnose fractures or deformities, or to predict response to treatment.
Pearson’s correlation coefficients between morphologic variables for 35,275 lumbar vertebrae, after excluding vertebrae where any of the morphologic variables was ±2 SD from the average for 42,980 vertebrae.
Pearson’s correlation coefficients between morphologic variables for 44,938 cervical vertebrae, after excluding vertebrae where any of the morphologic variables was ±2 SD from the average for 54,093 vertebrae.
Multiple prior publications report efforts to establish reference data that can be used to help diagnose vertebral body fractures. (6-10,12-15,17-43) Although multiple metrics need to be considered, such as intervertebral angles, listhesis and canal occlusion (37), compression of the vertebral body height is commonly used to identify fractures. When severe loss of vertebral body height has occurred, clinicians can unanimously agree that a fracture has occurred, but when the fracture is subtle, reference data may be important. Mid-vertebral landmarks that could be used to measure mid-vertebral height were not routinely available in the images used to train the neural networks, so mid-vertebral body heights could not be obtained automatically. Change in vertebral shape is considered one of the best indicators of a fracture (6). The NHANES-II images provide only one radiograph per individual, so change in shape cannot be explored. Nevertheless, the low variability of vertebral shape within a large population supports the potential for using a single X-ray to detect abnormal morphology with reference to the NHANES-II data. It is also likely, that with advancements in the application of artificial intelligence to producing spine metrics, the data in Tables 4 and 5 may provide valuable quality control data to determine if the AI predictions are aberrant or if caution is needed in interpreting the results.
Application of the vertebral morphology metrics, such as in Tables 4 and 5, to the challenge of documenting vertebral fractures will require defining a threshold level of these metrics that can be used to document the presence/absence of a fracture. Eastell et al. and McCloskey et al. used a threshold of 3 SD for the VBHR (64,67). Black et al. used a threshold of 20% difference from normal VBHR (25). Genant et al. used the criteria of a 15% difference compared to the mean VBHR of a normal population (52). The average VBHR in Table 4 for L1 to L4 is about 1 and one SD is about 0.042, so a 15% difference as used by Genant et al. would be equivalent to approximately a VBHR Z-score of 3.5. A gold-standard test for a true vertebral fracture does not exist, so validating a threshold remains a challenge. An alternative, treatment specific approach, would be to validate a threshold that predicts a positive response to treatment. That approach could potentially be retrospectively tested at low cost using automated measurements of morphology applied to imaging and data from prior studies that collected appropriate imaging.
Differences between levels
It is reasonable to hypothesize that differences in morphology between adjacent levels may help to identify abnormal morphology. For example, if one level has an anterior wedge fracture, then a significant difference in the VBHR would be expected when compared to the immediately adjacent levels. A problem occurs when adjacent vertebrae are both fractured or otherwise abnormal. Comparison to adjacent levels has been a criteria used or proposed by many investigators, so data that could be used to implement adjacent vertebrae comparisons are provided in tables 10 and 11. For example, using data in Table 10, if the VBHR is to be compared between L4 and L5 vertebrae, and normal variability is defined as anything within the 95% confidence interval for data from the NHANES-II study, than the VBHR when comparing L4 to L5 should not vary by more than 0.124 ± 1.96*0.059, or 0.007 to 0.24.
The means and [standard deviations] for the absolute differences between adjacent lumbar vertebrae for each of the vertebral morphology metrics. These data include only those vertebrae with no abnormalities in any morphology metric using the data in Table 4. The sample size is smaller, since if either vertebra had any abnormality, that comparison between levels was excluded.
The means and [standard deviations] for the absolute differences between adjacent cervical vertebrae for each of the vertebral morphology metrics. These data include only those vertebrae with no abnormalities in any morphology metric using the data in Table 5.
Abnormalities in the NHANES data
Mean and standard deviation data from Table 4 and Table 5 were used to calculate the vertebra specific Z-score for each metric according to the following equation
where x is the metric for a vertebra, μ is the vertebra specific mean for the corresponding metric, and σ is the vertebra specific standard deviation for the corresponding metric. Based on these Z-scores, 28,813 of 42,980 lumbar vertebrae (67%) were within ± 2 SD of average (Table 4) for all six of the metrics. These number are less than obtained after trimming since the standard deviations in Table 4 are smaller than originally used to identify outliers, and are therefore more sensitive to abnormalities. Similarly, 36,869 of 54,093 cervical vertebrae (68%) were within ± 2 SD of average (Table 5) for all six of the metrics.
Box and whisker plots illustrate the magnitude of outliers that exist in the NHANES data (Figures 7 and 8). Note that the medians are generally centered within the interquartile range and the whiskers are symmetric. Positive VBHR outliers represent posterior wedge-shaped vertebral bodies, while negative VBHR outliers represent anterior wedge-shaped vertebrae. The wedging may either be from fractures, remodeling, or congenital conditions and identifying a specific cause would require more than a single radiograph. Positive EPWR outliers represent vertebrae that are unusually wide superiorly versus inferiorly, and negative outliers are the opposite. Positive and negative FBDR outliers generally appear to be best explained by remodeling, perhaps after a fracture that caused splaying of the fracture fragments. Positive HWR outliers appear as unusually tall vertebrae, while negative HWR outliers generally appear to be best explained by a crush fracture. Positive EPA outliers appear as anteriorly wedge-shaped vertebrae while negative outliers appear as posteriorly wedge-shaped vertebrae. Positive PSA outliers generally appear as posterior wedge-shaped vertebrae and negative outliers generally appear as anteriorly wedge-shaped vertebrae. It is not possible to definitively identify the cause of each abnormality from a single radiograph. Examples of each of high positive and low negative outliers, for each of the six metrics, including landmarks, can be viewed online. (https://www.dropbox.com/sh/qzrocrh86goxarx/AAADRXs5HoEjVdRcGIwA4beIa?dl=0).
Box and whisker plots for the six morphology metrics calculated for 42,980 lumbar vertebrae. These plots document the types and magnitude of outliers in the data.
Box and whisker plots for the six morphology metrics calculated for 54,093 cervical vertebrae. These plots document the types and magnitude of outliers in the data.
Error analysis for the morphology metrics
One advantage of the six vertebral morphology metrics is the lack of dependence on radiographic magnification. All metrics are ratios of single vertebra dimensions or angles between anatomic reference lines and should not vary appreciably with radiographic magnification (assuming sufficient spatial resolution). This is important since radiographic magnification can vary substantially between lumbar X-rays (78-80) and this can result in errors in measurements of vertebral dimensions (26).
Some error in the data from Tables 4 and 5 may occur as a result of the challenges of vertebral labeling. A systematic approach to definitively labeling vertebrae requires whole spine imaging (68). An X-ray or CT of the entire spine from occiput to sacrum was not available (X-rays typically included T10 to S1), so it cannot be certain that vertebrae were optimally labeled if there were <> 5 lumbar vertebrae transitional vertebrae, or other abnormalities (69,70). Spines with <> 5 lumbar vertebrae are uncommon(71). Transitional vertebrae may be more common(72), but it is unclear whether this would consistently compromise labeling from a lateral spine x-ray. The assumption is that the vertebrae were correctly labeled in the majority of cases and the possibility of mislabeled vertebrae in a minority of cases would not significantly affect the results. The prevalence of transitional vertebrae reported in the peer-reviewed literature varies between 4 and 36% (73-77). The prevalence in the NHANES-II radiographs is unknown.
The six morphology metrics calculated from flexion X-rays were compared to the metrics calculated from paired extension X-rays for 2,132 vertebrae from L1 to S1. There should be no difference in vertebral morphology between flexion and extension (assuming no mobile vertebral body fractures). Table 12 provides the results of the Bland-Altman limits of agreement analysis. This variability is assumed due to the differences in radiographic projection between images. Typically, there would not be much difference in radiographic projection between carefully obtained flexion versus extension X-rays, and this is reflected in the mean differences in Table 12, but substantial differences did occur in some subjects, and this is reflected in the low and high limits of agreement in Table 12. This experiment documents that some morphology metrics could vary by as much as 1.65 Std Dev due to radiographic projection differences. Thus, it is important to strive for radiographs where the vertebrae of interest are positioned near the center of the radiograph. The experiment with the simulated X-rays, discussed below, compliments and helps to explain results in Table 12.
Results of a Bland-Altman limits of agreement analysis for morphology (expressed as SD from the average) measured from 2,132 lumbar flexion X-rays compared to paired extension X-rays. The mean differences between the two measurements, as well as the low and high limits of agreement are provided. These data are specific to the methods used to obtain anatomic landmarks from the radiographs.
Variability due to radiographic projection
The experiment using simulated X-rays was completed to help understand how variability in radiographic projection can affect the vertebral morphology metrics. This could be important, for example, with sub-optimally positioned patients, or in patients with scoliosis or other deformities, all of which can result in the x-ray beam passing from source to image obliquely to the vertebral endplates or the posterior wall of the vertebrae. There should ideally be no variability in vertebral morphology in the experiment using the simulated x-rays, since all x-rays (for a subject) were created from the same CT exam. All landmarks were precisely calculated from one set of 3D landmark coordinates, based on the geometry of the radiographic projection. Any variability is due to the effect of variability in radiographic projection on calculation of morphology. Variability was assessed using the standardized version of the metrics so that all metrics have the same units (standard deviations from average) and can thus be more easily compared. Conventional measures of variability, such as the coefficient of variation (CV = std dev / mean) or the coefficient of quartile variation (CQV = 100* (Q3 – Q1)/ (Q3 + Q1)) were not used as the mean values and (Q3 + Q1) were close to zero for some of the standardized metrics and CV or QCV are uninterpretable when the denominator approaches zero. Instead, the median and the range (max – min) for the standardized metrics was calculated for the 41 simulated X-rays that were generated from each CT exam. The median and 95th percentile for these ranges, across the 26 CT exams, are provided in Table 13. In Table 13, the observed ranges of standardized metrics due to variability in radiographic projection was largest for FBDR and PSA, although the experiment demonstrates that all of the morphology metrics can vary by over 1 std dev, solely due to variability in radiographic projection. This is not as much a weakness in the metrics as it is evidence to use caution when interpreting these metrics in the presence of substantial out-of-plane imaging. It is of course possible that strategies can be developed for mitigating the effect of variability in radiographic projection. One option would be a neural network that determines the type and magnitude of out-of-plane and uses that data to correct the morphology metrics. It is assumed that the error sources discussed above were random sources of error and would not appreciably affect the means and standard deviations in Table 4 and Table 5.
Variability in standardized metrics (Std Dev from Average Normal) due to variability in radiographic projection. Each of the morphology metrics was calculated for each of the 41 radiographic projections generated from each CT exam. The range in each metric (5th to 95th percentile) was then calculated over all 41 random variations in radiographic projection. This table has the median and 95th percentile variability over the 26 CT exams. For example, due to a wide range in radiographic projections, EPA will typically vary by 0.37 Std Devs but could vary by as much as 1.1 Std Devs from the average EPA in a normal vertebra.
Additional observations
Based on logistic regression, the subject response to the question: “Have you ever had pain in your back on most days for at least two weeks” was significantly (P<0.05) associated with age, race, and BMI. Since there is no way to determine which vertebra (if any) might be causing symptoms, the sum of abnormalities for each subject (metrics < -2 or > 2 SD from the average based on Table 4) from L1 to S1 was calculated. This sum of abnormalities for the EPWR and the FBDR was significantly associated with the response to the “ever had back pain” question (P<0.05), although the R2 for these regressions was very small (>0.01), indicating a very weak association.
Of the six morphology metrics, FBDR was the most sensitive and specific metric (area under ROC curve=0.995) to use for the purpose of using landmarks to classify vertebrae as S1 vs L1-L5. A threshold level of FBDR>1.0574 misclassified only 278/42980 vertebrae as S1 vs L1-L5. Including all data for all vertebrae, the proportions of vertebrae with abnormal morphology (<> 2 SD from average based on Tables 4 and 5) are summarized in Tables 14 and 15.
Percent of lumbar vertebrae in the NHANES-II study where the morphology metric was either > 2 or < -2 SD from the average, using data from Table 4 for the average and SD, and including all vertebrae.
Percent of lumbar vertebrae in the NHANES-II study where the morphology metric was either > 2 or < -2 SD from the average, using data from Table 5 for the average and SD, and including all vertebrae.
Conclusion
The lateral spine lumbar and cervical radiographs from the NHANES-II study were used to establish normative vertebral morphology reference data. These data can be used to help objectively identify vertebral body fractures, with the assumption that fractures will result in significant deviations from large population-based average vertebral body dimensions. Early diagnosis of fracture allows for early evaluation and prevention ultimately decreasing deformity as well as risk of future fracture. The data in Table 4 and Table 5 can be used to produce standardized metrics in units of standard deviation from average. This allows for easy identification of abnormalities resulting from vertebral fractures, atypical vertebral body morphologies, and other congenital or degenerative conditions, and removes the effect of vertebra allowing data for all vertebrae to be pooled in research studies. Metrics developed from a large database such as NHANES-II should help with differentiating between fractures and normal variants of shape. The deterministic thresholds for standardized metrics that can provide clinically meaningful diagnostic or prognostic information are yet to be determined. These data may also be useful for quality control in technology designed to automatically obtain landmarks coordinates and other metrics from medical imaging.
Data Availability
Vertebral morphology data produced in the present study are available upon reasonable request to the authors
https://www.dropbox.com/sh/qzrocrh86goxarx/AAADRXs5HoEjVdRcGIwA4beIa?dl=0