ABSTRACT
Introduction The use of biologic adjuvants (orthobiologics) is becoming commonplace in orthopaedic surgery. Amongst other applications, biologics are often added to enhance fusion rates in spinal surgery and to promote bone healing in complex fracture patterns. Generally, orthopaedic surgeons use only one biomolecular agent (ie allograft with embedded bone morphogenic protein-2) rather than several agents acting in concert. Bone fusion, however, is a highly multifactorial process and it likely could be more effectively enhanced using biologic factors in combination, acting synergistically. We used artificial neural networks to identify combinations of orthobiologic factors that potentially would be more effective than single agents.
Methods Available data on the outcomes associated with various orthopaedic biologic agents, electrical stimulation, and pulsed ultrasound were curated from the literature and assembled into a form suitable for machine learning. The best among many different types of neural networks was chosen for its ability to generalize over this dataset, and that network was used to make predictions concerning the expected efficacy of 2400 medically feasible combinations of 9 different agents and treatments.
Results The most effective combinations were high in the bone-morphogenic proteins (BMP) 2 and 7 (BMP2, 15mg; BMP7, 5mg), and in osteogenin (150ug). In some of the most effective combinations, electrical stimulation could substitute for osteogenin. Some other effective combinations also included bone marrow aspirate concentrate. BMP2 and BMP7 appear to have the strongest pairwise linkage of the factors analyzed in this study.
Conclusions Artificial neural networks are powerful forms of artificial intelligence that can be applied readily in the orthopaedic domain, but neural network predictions improve along with the amount of data available to train them. This study provides a starting point from which networks trained on future, expanded datasets can be developed. Yet even this initial model makes specific predictions concerning potentially effective combinatorial therapeutics that should be verified experimentally. Furthermore, our analysis provides an avenue for further research into the basic science of bone healing by demonstrating agents that appear to be linked in function.
CLINICAL RELEVANCE Bone healing is a highly multifactorial process, and it likely could be more effectively enhanced using combinations of factors rather than single factors in isolation. This study provides a starting point for an integration of biomedical experimentation and computational AI that ultimately could lead to highly sophisticated combinatorial treatments for bone repair and other applications in orthopaedic medicine.
INTRODUCTION
Bone repair is a highly multifactorial process involving a wide array of molecular and cellular factors.[1] Orthopaedic surgeons have manipulated these factors by administering various biologic agents in order to augment bone repair.[2] In most cases, surgeons have administered only one biologic agent. Considering the physiological complexity of the process, it is reasonable to suggest that superior bone repair could be achieved using biologic factors in combination.
Combinatorial explosion prohibits exhaustive experimental evaluation of the full set of possible combinations. An alternative is to use computational methods to extrapolate, or generalize, from existing data and predict which combinations would be the most effective, and then expend experimental resources to evaluate only those. The most powerful AIs in use today for making accurate predictions are artificial neural networks (ANNs).[3, 4] ANNs are composed of many, highly interconnected neuron-like elements known as units, which can be arranged in layers or circuits.
ANNs are computational devices that process information from their input units to produce a pattern of activation at their output units. Feedforward networks have their units arranged in layers. The simplest feedforward networks have only two layers of units: input and output. More complex feedforward networks have one or more layers of hidden units, so called because they are interposed between the input and output layers. Feedforward ANNs are considered deep if they have more than two hidden layers.[5] Recurrent networks have their units arranged in circuits. Multiple processing layers, or circuits, are needed when the production of useful output patterns requires the processing of complex interactions among the inputs.
ANNS are trained via machine learning on a set of input/desired-output examples. They have been applied in many domains of biomedicine.[6-8] The most extensive medical applications of ANNs have been in radiology.[9-12] Generally in these applications the inputs are the pixels of (usually MRI) images, and the desired outputs are the components of known radiological diagnoses. Once trained, the ANN could generalize from its training data and could make a diagnosis from an image on which it has not been trained. It is likely that clinicians will soon use ANNs adjunctively in radiological diagnosis.
Other applications under development of ANNs in medicine include cancer diagnosis from gene expression data[13, 14], heart-disease diagnosis from electrocardiogram data[15], osteoporosis diagnosis from bone-density data[16], and diabetes diagnosis from blood chemistry data[17, 18]. More recent applications involve patient records as inputs.[19-21] ANNs have been applied to orthopaedic patient records, to predict outcomes such as bone fracture healing or mortality following hip fracture.[22, 23]
The application of various biologic agents to bone repair is a rapidly growing subfield of orthopedics. Over the past few decades, many reports have demonstrated the benefits of specific agents on post-surgical bone fusion rates. Still, to our knowledge, AI has not been applied in this domain.
The usefulness of an ANN derives from its ability to generalize beyond its training data, so that it can predict the correct output for inputs on which it has not been trained. We collected a large amount of the available data, organized it into a form suitable for machine learning, and used it to train an ANN with an architecture that we had determined beforehand would generalize well over the dataset. We used this ANN to explore potential combinatorial therapies within the realm of orthobiologic adjuncts.
METHODS
Our study design consisted of six steps: (1) assemble a dataset on the outcomes associated with the use of various agents and organize them into a form suitable for machine learning; (2) build a series of ANNs with increasingly complex architectures and processing potentials; (3) determine for each network type its optimal machine learning parameters; (4) assess the ability of each network type to generalize over the dataset; (5) train the best generalizing network on the dataset and use it to predict the efficacy of a large set of combinations of the factors on which it had been trained; and (6) analyze the predictions and determine which combinations of factors are potentially the most effective, and should be experimentally verified. The first five steps are methodological and are summarized here in the Methods section. Further methodological details are available in Supplementary Texts S1 – S4. Step (6) is elaborated in the Results section.
Step 1
We curated the dataset from the experimental literature and organized the data into input/desired-output pairs. In total, 17 factors (active orthobiologic agents and their vehicles of administration, or other nonpharmacological treatment types) constituted the inputs, and 26 outcomes (metrics quantifying the efficacy for improvement of bone healing due to the agents) constituted the desired outputs. The input/desired outputs are quantified precisely in the dataset according to their units as appropriate. We used as many factors as inputs, and outcomes as desired outputs, as were available in the literature, in order to maximize the amount of ANN training data.
Step 2
We evaluated the generalizability of 16 different ANN types. Our 8 basic types were feedforward with 0, 1, 2, 3, 5, 7, or 10 hidden layers, and a recurrent network with a hidden circuit. All hidden layers (and the hidden circuit) were composed of 100 units. We evaluated each of these 8 network types with and without an autoencoder layer. An autoencoder is the hidden-unit representation developed by a network that learns to reproduce its own input at the output. Placing an autoencoder layer after the input layer can improve the generalizability of an ANN that is trained to produce a desired output for every input.
Steps 3 and 4
We optimized the parameters of the machine-learning algorithm used to train each of the network types, and then tested the ability of each type to generalize over the dataset following training. We found that the feedforward ANN composed of an input layer, an autoencoder layer, 2 hidden layers, and an output layer exhibited the best generalizability. A diagram of this ANN is shown in Figure 1.
Step 5
Following ANN training, we used clinical judgement in deciding which combinations of factors to evaluate, and which outcomes to use in assessing the predicted post-surgical benefit of those selected combinations. We generated a set of 2400 combinations of 9 of the factors that were included among the 17 inputs in the dataset. These factors were chosen because they could be combined appropriately in a surgical setting. The 9 chosen agents were bone-morphogenic protein-2 (BMP2), bone-morphogenic protein-7 (BMP7), osteogenin (OG), platelet-derived growth factor (PDGF), bone marrow aspirate concentrate (BMAC), and platelet rich plasma (PRP).[2] The vehicles carrying these agents varied greatly among published studies, so we included the most common one, exogenous bone graft (EBG), as a stand-in for all vehicles. Pulsed ultrasound (US) and electrical stimulation (ES) were also chosen for the combination screen because they have been shown to increase bone healing rates.[24, 25]
We quantized input levels in order to generate a finite number of input combinations. For the combination screen the factor BMP7 takes 2 levels; BMP2, OG, and PU each take 4 levels; and PDGF takes 5 levels in their ranges. The factors ES, BMAC, and PRP are either present or absent. EBG, as the common vehicle of administration, is present in all combinations.
We further constrained the number of combinations for reasons of practicality. Although PU and ES could theoretically be used in combination, the feasibility of carrying out this dual therapy in practice is low. Due to the additional operative time of harvesting PRP and BMAC to use at the bone healing site, we decided to not to include combinations involving both agents. Editing according to these constraints left 2400 combinations.
To screen the combinations for efficacy, we set the input in turn to each one of the 2400 combinations: the 9 input units corresponding to BMP2, BMP7, OG, PDGF, PU, ES, BMAC, PRP, and EBG took their values as specified for that combination; the other 8 of the 17 input units took value 0. We then computed the activities in response to each input of the 26 output units. Due to randomness inherent in the machine-learning algorithms, ANNs of the same type trained on the same dataset can nevertheless vary slightly. Therefore, the best predictions are derived from the averaged outputs of several ANNs.[26] We based our predictions on the averaged outputs of 10 separately trained ANNs of the type shown in Figure 1.
To compute a relative efficacy measure for each factor combination, we combined 17 of the 26 averaged output unit activations into a single number. We chose these 17 outcomes because they best assessed the degree of bone healing and functional outcome across studies. The chosen outcomes are distraction rate (DR), bone formation at 3 months (BF3), bone formation at 6 months (BF6), mineralized tissue volume/total tissue volume (MV/TV), 1-level posterior lumbar fusion rate (PLF-FR), Oswestry disability index improvement (ODI), fusion rate (FR), fracture healing percentage (FH), Oswestry Score (OW), radiographic outcome (RO), histomorphometric outcome (HO), implant survival percentage (IS), time to achieve full weight bearing/clinical healing (TWB/CH), mean time to radiographic union (TRU), need for repeat bone grafting (RBG), not healed at end of trial (NH), and need for dynamization (DY). We flipped the outputs whose high score indicated poor efficacy, normalized all outputs into the range [0, 1] and then averaged the 17 normalized outputs to arrive at a single-number efficacy score. By this relative measure, perfectly effective and ineffective combinations would have efficacy scores of 1 and 0, respectively.
RESULTS
We trained the ANN with the best generalizability (Figure 1) to achieve a good but not perfect match between its actual and desired outputs, because the overtraining required to achieve a perfect match would impair its ability to generalize. Comparison of the desired and actual output images for an example ANN (Figure 2) shows that the agreement is good but not perfect. Precisely this sort of relationship would be expected for an ANN that could generalize beyond its training data.
We rank-ordered the predicted efficacy scores for the 2400 combinations (Figure 3). They ranged from about 0.30 to almost 0.75 and so covered almost half of the possible [0, 1] range. The efficacy scores seemed to plateau for the most effective several hundred combinations.
The 2400 rank-ordered combinations are shown in two separate images in Figure 4: one for all 2400 combinations and another for the top 200. Analysis of the top 200 reveals some statistically significant, pairwise correlations among the 9 factors (Table 1). BMP2 and BMP7 are positively correlated, while OG and PDGF are negatively correlated. PDGF is negatively or positively correlated with BMAC or PRP, respectively. PU and ES, and likewise BMAC and PRP, are also negatively correlated, but this is due largely to constraints in the design of the combination screen (see Methods).
The 10 best factor combinations show some consistent similarities, and some consistent differences with the 10 worst combinations. BMP2 and BMP7 are at their highest levels in the 10 best combinations, while they are 0 in the 10 worst combinations, and this is consistent with the positive correlation observed between BMP2 and BMP7 in the top 200 combinations. OG tends to be at its highest levels in the 10 best combinations but is 0 in the 10 worst combinations. In contrast, PDGF tends to be at its lowest or highest levels in the 10 best or worst combinations, respectively, and this is consistent with the negative correlation observed between OG and PDGF in the top 200 combinations.
The analysis suggests that the most effective combinations are high in BMP2, BMP7, and OG, but low in PDGF (see Table 2 and its caption for quantification of amounts). These 4 factors seem to be the most determinative of the best factor combinations. BMAC appears in some of the 10 best combinations but in none of the 10 worst, and this is consistent with the negative correlation between PDGF and BMAC. PRP is absent from all 10 best and worst combinations. The 10 best and 10 worst combinations seem indifferent to the levels of PU and ES, with the potentially important exception that ES appears in some of the 10 best combinations that lack OG.
The 2400 combinations in the screen include the null combination (ie none of the 9 factors are present except for EGB, the common vehicle), and all combinations in which EGB and 1 other factor only are present. The analysis clearly indicates that combinations of several orthobiologic factors would be more effective than any single factor alone. The analysis indicates that combinations of BMP2, BMP7, and OG, perhaps including ES or BMAC, each at the high end of their ranges as reported in relevant studies, should outperform combinations that lack those components. Experimental verification of these predictions could lead to the development of orthobiologic factor combinations that outperform single factors for the enhancement of bone repair.
DISCUSSION
To properly situate our model within the orthopaedic literature, it is necessary to distinguish between process-driven and data-driven models. Process-driven models represent processes explicitly. There is a long tradition of process-driven modeling in bone fracture healing (see [1] for review). Process-driven models are valuable in that they explicitly describe the processes involved, but they are limited to what is known about the processes themselves. This limits their predictive power.
Data-driven models are built almost entirely on observed input-output relationships, without regard for the specifics of the underlying processes. Data-driven models offer little mechanistic insight, but they provide a powerful means to leverage all available data for predicting the outputs to novel inputs. Deep neural networks are the premier form of data-driven modeling in AI today. The multilayered ANN we chose to make our predictions (Figure 1) is a deep neural network.[5] To our knowledge, our model is the first data-driven, deep neural network model of the relationship between biologic factors and bone repair.
Even though it is data-driven, our model may indicate avenues for further research into the molecular physiology of bone healing. For example, OG (osteogenin, or bone morphogenic protein-3 (BMP3)) and ES seem to act interchangeably in our model. Interestingly, ES has been shown to upregulate BMPs 2 through 8, and is effective in upregulating BMP3 (also called OG) in cultured bone cells.[27] The fact that our model is able to postdict previously known molecular pathways advocates for its use in predicting previously unknown molecular interactions.
The main limitation in our study was in the size and composition of the dataset on which we trained our ANN. At 225 input/desired-output training patterns, our dataset is large in comparison with other datasets that are curated from the literature, but still very small in comparison with datasets used to train ANNs in many applications. Also, most of the input/desired-output patterns in our dataset included only one active orthobiologic factor. The risk in training mainly on single factors is that the network would fail to learn interactions among them but in our case, it seems that this did not occur.
If machine learning failed to pick up interactions, then the simplest ANN, that composed only of input and output layers, would have generalized as well as, if not better than, ANNs with hidden layers (or circuits) intervening between input and output (see Supplemental Text S4). The fact that the ANN that generalized best over our dataset was a multilayered network strongly suggests that it did learn some of the interactions between the factors.
The best way to remedy the main limitation in this analysis is to train deep neural networks on larger datasets containing more combinations of factors. The analysis already suggests both good and bad combinations that could be explored experimentally. Any and all new data on the outcomes for bone healing associated with orthobiologic factors administered alone, or better, in combination could be added to the training dataset and would improve the ability of an ANN to identify combinations of factors with the potential to outperform single agents.
Data Availability
All data produced in the present work are contained in the manuscript, and the dataset on which the analysis is based is included in supplementary material.
CONFLICTS OF INTEREST
The authors have no conflicts of interest to declare.
ACKNOWLEDGEMENTS
This work was conducted in the absence of any public or private funding.