ABSTRACT
Objective To present Motiro, an unified framework for non-supervised statistical analysis endomicroscopy videos of the colorectal mucosa.
Materials and Methods We wrote an open-source Python wrapper using ImageJ software with OpenCV, Seaborn and NumPy libraries. It generates a mosaic from the video of the mucosa, evaluates morphometric properties of the crypts, their distribution, and return their statistics. Shannon entropy (and Hellinger distance) are used for quantifying variability (and comparing different mucosa).
Results The segmentation process applied to normal mucosa of pre(post)- neoadjuvant patient is presented along with the corresponding statistical analysis of morphometric parameters.
Discussion Our analysis provides estimation of morphometric parameters consistent with available methods, is faster, and, additionally, provides statistical characterization of the mucosa morphometry. Motiro enables the analysis of large amounts of endomicroscopy videos for building a normal rectum features dataset to help on: detection of small variability; classification of post-neoadjuvant recovery; decision about surgical intervention necessity.
INTRODUCTION
Probe-based laser endomicroscopy (pCLE) of colorectal cancer (CRC) enhanced screening and post-neoadjuvant surveillance(1). pCLE videos aid endoscopists to classify population groups accordingly with their probabilities of developing a CRC(2), e.g. by aberrant crypt foci quantification for inference of recurrence(3) and neoplasms(4) chances. That entails the use of computational tools for complementing human analysis(5) using machine learning-based classification of polyps(6) or quantitative analysis of the mucosa architecture by supervised, decoupled use of Icy and ImageJ softwares(7). In Ref.(7), the necessity of a unified framework for the analysis of the architecture of the mucosa was raised. In this manuscript we present Motiro1, a Python-based non-supervised unified framework for statistically characterize the morphometry of the colorectal mucosa at pre or post neoadjuvant CRC using pCLE videos. Because of its unified, non-supervised functioning, Motiro enables the efficient creation of a database of analyzed images and clinical data(8) to be jointly used for elaborating an optimal screening and surveillance agenda for CRC patients(2,9,10).
The molecular processes controlling tissue patterning in the colorectal mucosa(11) are unavoidable noisy(12). That may cause small fluctuations on the position, shape, and size of the crypts of the normal mucosa. Indeed, the link between molecular level fluctuations and tissue level variation has been reported on analysis of Drosophila embryos development(13,14) and it is fair to extrapolate such a conceptual framework for understanding the patterning of the colorectal mucosa. Therefore, we analyze the morphometric parameters of the crypts using histograms to represent the small degree of disorder exhibited by the normal mucosa. For distinguishing the disorder of normal mucosa from that observed at earlier stages of a neoplasm or recurrent CRC, Motiro brings two methods for evaluating the distribution of morphometric parameters, namely, the differential Shannon entropy and the Hellinger distance. The former has been widely used to quantify the disorder in physical systems(15), and the latter is employed to compare the overlay of two histograms (16) representing the distribution of a given morphometric parameter. These two quantities enable one to comparatively compute the degree of disorder of the mucosa of, for example, a patient and a population, the crypts at pre- and post-neoadjuvant therapy, or different regions of the mucosa of a patient. They also set a quantitative criterion for helping on the distinction of normal-abnormal variability of the mucosa’s architecture. Moreover, the unsupervised functioning of Motiro enables one to efficiently construct a large database of endomicroscopic images statistically characterized. That has a clear clinical implication: one may quantify the probability of an observed variability within the colon mucosa of a patient to be the classified as either normal or as the early stage of a neoplasm or recurrent CRC by, e.g., application of a hypothesis test.
METHODS
Dataset images
The images were acquired using pCLE. pCLE is a real time in vivo method for acquisition of 1000 times magnified optical biopsies for evaluating cellular and vascular patterns. Before the pCLE procedure, 5 ml of 10% fluorescein diluted in 100 ml of saline solution were injected intravenously. The probe was inserted through the working channel of the endoscope into the rectum a few minutes after the fluorescein injection. The pCLE (2.5 mm UHD ColoFlex probe, Cellvizio; Mauna Kea Technologies, Paris, France) provided depth of examination of 55 to 65 μm, a 240 μm field of view at a resolution of 1 μm and magnification of 1000X at 12 frames/s. In all patients, normal mucosa located at least 5 cm from the target lesion was also examined by pCLE, in order to have a comparison image to the altered mucosa.
Framework software
Figure 1 shows a flowchart of the major stages of Motiro. We use a pCLE video as input of our wrapper(17). Stage 1, Motiro combines tools from Open Source Computer Vision Library (OpenCV)(18), and ImageJ plugin Register Virtual Stack Slices (RVSS)(19,20). OpenCV (RVSS) is used for dismantling the video into frames and text removal (frame stitching). On the resulting mosaic, OpenCV tools are employed for pre-processing, segmentation using k-means algorithm(21), and morphometric analysis are executed in Stage 2 (please, see Supplementary Digital Content 1 (SDC1) for a detailed description). Stage 3, NumPy and Seaborn Python libraries are used for statistical analysis, and data generation for calculation of differential Shannon entropy and Hellinger distance.
A visual representation of the major stages of Motiro. Stage 1 receives endomicroscopic videos as input and provides mosaics and the elliptical contours as results. Stage 2 the contours are used to analyze the morphometric data and highlight the images for visual assessment of the parameters estimation. Stage 3 generates the output by plotting the statistical analysis of the morphometric parameters.
Morphometric parameters
A geometric interpretation of our results was facilitated by converting pixel values on the mosaic to micrometers. The crypts were approximated as elliptical contours (here on denoted simply as contours) drawn using OpenCV. The ratio of the major to the minor edges of a rectangle parallel to the Cartesian axis surrounding the elliptical crypt is the axis ratio (α). The elongation factor (ε), is that ratio in a rotated rectangle with edges parallel to the major (2 a) and minor (2 b) axis of the contour, with the former, called maximal Feret diameter, being estimated by both an exhaustive search and a heuristic algorithm. The perimeter (ρ) of the contour is estimated using(22):
The area of a contour (A) is estimated in μm2 using, A=π a b with roundness (Σ) being:
and sphericity (σ) being:
The mucosa state is further characterized by the crypts’ distribution. We estimate the mean (Δ) and minimal (δ) intercrypt distances; the minimal distance separating two nearest neighbor contours, called wall thickness (ω), and the tissue density (θ). Please, see SDC2 for further details on the methods of analysis of the morphometric parameters.
Statistical analysis
The images were classified as pre (R) and post (T) neoadjuvant for a statistical evaluation of the morphometric parameters. We quantify the degree of disorganization of R and T images applying the differential Shannon entropy(23) on the histogram of each morphometric parameter:
where fi is the relative frequency of observing the i-th range of values of a morphometric parameter and Li is the i-th bin width (please, see SDC3 for further details). The differential Shannon entropy for a uniform distribution of a continuous random variable within an interval of width L is S0=log2 (L) can be used as a reference value determining the maximal degree of disorder of a distribution or histogram. Then we define the quantity
to evaluate the disorder of a distribution or histogram relative to its uniform counterpart: the distribution or histogram reflects a higher degree of disorder for
and better organization otherwise.
The use of histograms for characterizig the morphometric parameters requires the use of a statistical distance for differentiating two sets of images. Here we choose the Hellinger distance(16,24,25) between two distributions P and Q, denoted by H (P, Q), to compute the overlay between two probability densities:
where p (x) and q (x) are probability densities evaluated in x and
is the Bhattacharyya coefficient. Relative frequencies are used to approximate the probability densities when we compare the histograms of a morphometric parameter from two images, R and T (please, see SDC3 for further details).
RESULTS
Endomicroscopy image segmentation
Figure 2 shows an example of the Motiro estimation of the crypts’ contours as obtained from the pCLE videos. Fig 2A shows a mosaic which irregular geometry results from the imaging acquisition process: the sensibility of the probe and absence of reference points causes a maneuvering variability. The field of view of the probe is wider than the diameter of a crypt in a normal mucosa which leads to a mosaic composed by multiple crypts prone to statistical analysis. Application of contrast enhancement and noise removal highlights the crypts from background as shown in Fig. 2B. Fig 2C shows a segmentation of the crypts and surrounding stroma. Application of morphological operations and application of convex hull to smooth the crypts’ peripheries is shown in Fig 2D. Fig. 2E shows the elliptical contours estimation of the crypts’ surrounding after the convex hull. Inspection of Figs 2E and 2C indicates the similarity of the elliptical estimates to the segmented crypts after removal of noise of the stroma and irregularities of the crypts’ boundaries. Fig. 2F shows Motiro’s (Icy’s(26)) contours estimation in green (red) overlaid to the original mosaic. The contours of some crypts are less accurate because of inaccuracy of segmentation process caused by the brightness of some crypts border being similar to that of the background.
(A) Mosaic image built from by pCLE video. (B) Mosaic after noise removal and enhanced contrast processes. (C) Binary image after application of threshold of pixels clustered in the darkest groups. (D) Result of morphological and convex hull (green boundary) operations applied on segmented image. (E) Estimated ellipses after application of convex hull. (F) Overlay of rough mosaic image, estimated elliptic contour obtained using Motiro (green), and Icy-based algorithm(red).
Statistical analysis of the crypts morphometry
Fig. 3 shows our statistical analysis, the superposed blue and beige histograms summarize data for R and T images, respectively, after outliers’ removal (please, see SDC4 for complete data analysis). The prevalence of single mode histograms and similarity between R and T images indicate the regular structure of the normal mucosa. Most of the axis ratio (and elongation factor) of the crypts lie within the range [1, 1.37] (and [1,1.4]) embracing 84% (and 78%) of the crypts in the R and 75% (and 69%) in the T images. The R and T images concentrate 74% and 78%, respectively, of the crypts’ roundness (and 84% and 81% for sphericity) within the range of [66%,90%] (and [94.6%,99.9%]). The maximal Feret diameter (and the perimeter) of the crypts are lying within the ranges of [81,146] (and [229,461]) embracing 98% (and 98%) in the R and 92% (and 89%) in the T images. The minimal intercrypt distance (and the wall thickness) are lying within the range [104,155] (and [4,48]) that encompass 84% (and 71%) in the R images and 81% (and 93%) in the T ones. The mean intercrypt distance values are distributed accordingly with histograms having a mode within the range [118, 180] and another within [211, 273] which, respectively, in the R (and T) images concentrate 56% (and 33%) and another 30,3% (37,5%) of the data.
The blue and beige histograms represent the statistics of the mucosa’s morphometric parameters in R and T neoadjuvant treatment, respectively.
Table 1 gives additional statistical information, after outliers’ removal, with columns 3 to 9 labeled as morphometric parameters as in Figs. 3A to 3I. The first two rows account for analysis of images R and T, with their first sub-row indicating mean values and standard deviations and the degree of disorder given in the second sub-row. The third row shows the Hellinger distance between histograms from R and T images. The Hellinger distance ranges from 0.230 to 0.407 and it is fair to conclude that there is a good similarity on the morphometry of the normal mucosa in both R and T images. Such a conclusion is reinforced by noticing that, for almost all morphometric parameters, the absolute value of the difference between the means of R and T images lay within the standard deviation of one or another, the exception being δ, which standard deviation in the R images is smaller. We compare the differential entropy of each histogram to their corresponding uniform and 2S / L< 0.848. That reinforces our conclusion that the evaluated mucosa has a high degree of order despite their intrinsic variability.
Statistical analysis of R and T images.
DISCUSSION
The morphometry analysis presented here has been performed by Quénéhervé(7) and collaborators(QA) using Icy and ImageJ softwares separately. Because in QA approach the crypts are contoured manually, one may consider these contours as the gold-standard. In SDC5 we present a quantitative comparison between Motiro and QA morphometric estimates and obtain a mean relative error of 0.167. That demonstrates the viability of the unified non-supervised segmentation of pCLE videos of the colon’s mucosa.
Though the learning curves of Motiro and QA are similar, Motiro unifies functionalities, adds new functionalities, is non-supervised and is 5.7 times faster than QA with heuristic algorithm to evaluate crypts(SD6). Motiro runs on a Linux Operational System, requires installation of Python(3.6.9), Seaborn(0.10.1), Numpy(1.19), and OpenCV(4.3.0), and is executed in a terminal (instead of a graphical interface). Images with aberrant crypts could not be properly segmented by Motiro and in future work we expect to work on those drawbacks.
The addition of the statistical analysis of the morphometric parameters enables the quantification of the intrinsic small fluctuations of the mucosa’s architecture. For such study, we interpret the crypt-crypt interlinks as edges of a graph, which enables us to establish an estimate of the topographical properties of the mucosa. Then, we can use the differential Shannon entropy for quantifying the degree of disorder of the mucosa, and Hellinger distances for comparing the statistics of the architecture of the multiple mucosas, or of different positions or instants of the same mucosa. The statistical analysis based on removed outliers helps to reduce the bias caused by the analysis of partial crypts appearing within the edges of the mosaics. The use of histogram-based statistical analysis (as alternative to often employed average, mode, and median) opens the way for a comprehensive approach based on machine learning techniques for the classification of the normal mucosa in R and T images and estimation of neoadjuvance success chances. The analysis of a large data set may indicate whether the histogram-level differences on morphometric parameters of R and T images have a significance and set reference values for both the differential Shannon entropy and the Hellinger distances.
Motiro brings significant improvements to statistically assess and analyze colon morphometry using pCLE videos, and is ready for approaching a large amount of data obtained from normal and quasi-normal mucosa. Motiro’s image analysis approach has the benefit of not needing a large amount of pre-classified data to extract features and the morphometry parameters have clear geometrical and biological interpretations. That contrasts with machine learning approaches which demand large amounts of high-quality data to provide models which geometrical or biological meaning can be hard to determine(27). Therefore, machine learning approaches may benefit from the construction of a set of results having a clear interpretation. Indeed, Motiro is prompt to be employed for building a large database of mucosa features to be used on artificial intelligence studies aiming to establish the connection among mucosa morphometry and clinical data. The use of statistical analysis enables a refinement on the differentiation of normal and post-neoadjuvant fully recovered mucosa. That may help on elaborating more assertive machine learning-based models to assist physicians to decide about the necessity of surgical interventions on post-neoadjuvant colorectal cancer patients. Additionally, the use of a unified framework for a quantitative characterization of the architecture of the mucosa enables to set additional standards to aid human analysis by reducing the role of subjectivity(28). Besides, the modular structure of Motiro can be used for its adaptation for analyzing images obtained by advanced methods(29).
SUPPLEMENTARY MATERIAL
Supplementary material is available online.
ACKNOWLEDGMENTS
We would like to thank Prof. Roger Chammas for helpful comments.
Footnotes
Guarantor of the article: Alexandre Ferreira Ramos
Financial support: The study design, collection, analysis, and interpretation of the data and the manuscript was independent of the financial support. AUS thanks Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) Finance Code 88882.376562/2019-01 for partial financial support of this study. AFR thanks CAPES Finance Code 88881.062174/2014-01 for partial support of this study.
Potential competing interests: The authors declare no potential competing interest.
↵1 From tupi-guarani, the language of native Brazilians, meaning a reunion for building.