Fully automatic segmentation of sinonasal cavity and pharyngeal airway based on convolutional neural networks


This study aimed to test the accuracy of a new automatic deep learning–based approach on the basis of convolutional neural networks (CNN) for fully automatic segmentation of the sinonasal cavity and the pharyngeal airway from cone-beam computed tomography (CBCT) scans.


Forty CBCT scans from healthy patients (20 women and 20 men; mean age, 23.37 ± 3.34 years) were collected, and manual segmentation of the sinonasal cavity and pharyngeal subregions were carried out by using Mimics software (version 20.0; Materialise, Leuven, Belgium). Twenty CBCT scans from the total sample were randomly selected and used for training the artificial intelligence model file. The remaining 20 CBCT segmentation masks were used to test the accuracy of the CNN fully automatic method by comparing the segmentation volumes of the 3-dimensional models obtained with automatic and manual segmentations. The accuracy of the CNN-based method was also assessed by using the Dice score coefficient and by the surface-to-surface matching technique. The intraclass correlation coefficient and Dahlberg’s formula were used to test the intraobserver reliability and method error, respectively. Independent Student t test was used for between-groups volumetric comparison.


Measurements were highly correlated with an intraclass correlation coefficient value of 0.921, whereas the method error was 0.31 mm 3 . A mean difference of 1.93 ± 0.73 cm 3 was found between the methodologies, but it was not statistically significant ( P >0.05). The mean matching percentage detected was 85.35 ± 2.59 (tolerance 0.5 mm) and 93.44 ± 2.54 (tolerance 1.0 mm). The differences, measured as the Dice score coefficient in percentage, between the assessments done with both methods were 3.3% and 5.8%, respectively.


The new deep learning–based method for automated segmentation of the sinonasal cavity and the pharyngeal airway in CBCT scans is accurate and performs equally well as an experienced image reader.


  • An automated deep learning method on the basis of convolutional neural networks is proposed.

  • The method works with cone-beam computed tomography images.

  • It can detect and segments the sinonasal cavity and pharyngeal airway in 60 seconds.

  • The detection and segmentation results are accurate.

Cone-beam computed tomography (CBCT) has become a popular method to diagnose and visualize the upper airways because of its relatively low cost, less radiation dose than traditional computed tomography (CT), and better effectiveness in identifying the limits between soft and hard tissues. , In addition, this 3-dimensional (3D) imaging system offers information on cross-sectional areas, volume, and 3D form that cannot be determined by 2-dimensional (2D) images.

Interest in upper airway shape and dimensions has increased steadily during the past decades, mainly because of the relationship between obstructive sleep apnea and craniofacial morphology.

This quantitative airway assessment involves a process called segmentation. In medical imaging, segmentation is defined as the construction of 3D virtual surface models to match the volumetric data. In other words, it means to separate a specific element (eg, the upper airway) and remove all other structures of noninterest for better visualization and analysis. The currently adopted procedure for segmentation of the sinonasal cavity and the pharyngeal airway in clinical practice still relies on manual intervention. Users are required to outline in detail the regions of interest, separate air from soft tissues, , and finally obtain a 3D volume by specific software.

Thus, in the very recent past, some techniques for fully automatic segmentation of the upper airway have been proposed, such as region growing, set algorithm, model-based approaches, and localized active contour. Although they require less time compared with manual segmentation, these approaches still need manual intervention to locate seed points or to initiate the contours. Furthermore, most of these investigations , operated only on the pharyngeal upper airway and excluded the nasal cavity and the paranasal sinuses. Therefore, studies that focus only on the oropharyngeal airway will likely over-represent the true accuracy of the evaluated tools.

In addition, the available commercial semiautomatic software tools for upper airway assessment mainly focus on pharyngeal airway segmentation and largely neglect the appropriate segmentation of the sinonasal cavity.

Recently, the application of artificial intelligence (AI), through its deep learning paradigm, has shown very promising results in automated segmentation of anatomic structures from CT and CBCT. In particular, convolutional neural networks (CNNs) have led to a series of breakthroughs in CBCT segmentation, especially when compared with previous methods employing general hand-crafted features, thanks to learning task-specific features directly from data.

Currently, for craniomaxillofacial structure bone segmentation, the scenario is dominated by encoder-decoder fully convolutional networks on the basis of U-Net architecture. Since U-Net introduction, extended literature has improved the base encoding-decoding model through architecture modifications , to enhance representation capabilities. However, none of the existing deep learning models has dealt with automated pharyngeal airway segmentation or the sinonasal cavity. This is a very challenging task because of the complex structure of the nasal cavity with the narrow and tortuous pathways of the conchae and meatuses. Accordingly, the objective of the current study was to develop and train a new automatic deep learning–based approach for fully automatic segmentation of the sinonasal cavity and the pharyngeal airway from CBCTs and to test the accuracy of this CNN-based method against a set of manual segmentations obtained by a clinically experienced human.

Material and methods

This study followed the Helsinki Declaration on medical protocols and ethics. The study protocol was approved by the local ethics review board approval (no. 0127) at the School of Dentistry, University of Catania, Catania, Italy.

A total of 40 CBCT scans from healthy patients (20 women and 20 men; mean age, 23.37 ± 3.34 years) were prospectively collected. All CBCT scans were performed by 1 operator between January 2017 and May 2019 in a private practice specializing in CBCT using an iCAT Next Generation CBCT unit (Imaging Sciences International, Hatfield, Pa). The acquisition protocol used for this study was a low dose protocol (120 kVp; 48 mA; 0.3-mm voxel size; scan time, 26 seconds; and field view of 17 cm in height × 23 cm in depth).

Special attention was paid to proper patient positioning for the scan: patients were instructed to sit upright and position themselves in a natural head position with the help of a mirror and a laser beam. They were also taught to place the mandible at maximum intercuspidation to maintain the rest position of the tongue, which was in contact with the anterior hard palate, without touching the anterior teeth, and to refrain from swallowing during the scanning period.

Patients aged younger than 18 years or with clefts, congenital, or acquired craniofacial anomalies; detectable airway pathology; or those with previous orthognathic or craniofacial surgery were excluded from the sample.

The acquired data sets images were saved in Digital Imaging and Communications in Medicine (DICOM) and deidentified to protect patients’ data.

The methodology flowchart is described in Figure 1 . The 40 CBCT scans were preliminary imported into Dolphin software (version 11.0; Dolphin Imaging and Management Solutions, Chatsworth, Calif) to perform skull reorientation, according to a validated protocol. , At first, on multiple planar reconstruction images, the skull was reoriented as follows: (1) In the coronal view, the midsagittal plane was aligned through the center of the anterior nasal spine and the crista galli. After that, the axial plane was constructed through both the infraorbital skeletal landmarks; (2) In the right sagittal view, which was used for standardization, the axial plane was placed to fit the Frankfurt horizontal plane passing through the right porion and the right infraorbital landmarks; (3) In the axial view, the midsagittal plane was constructed through the crista galli and basion, then the external auditory meatus was oriented, and it was verified that neither mandible nor zygomatic arch yaws were present ( Fig 2 ).

Fig 1
Methodology flowchart.

Fig 2
Head reorientation on axial, sagittal, and coronal plane of CBCT scans. The 3D image shows the head orientation on a 3D space. Colored lines represent the reference axis.

Afterward, new DICOM files were generated for each patient (40 CBCTs) and imported into Mimics software (version 20.0; Materialise, Leuven, Belgium) to perform a manual segmentation of the sinonasal cavity and pharyngeal subregions. Twenty CBCT scans were subsequently used for the CNN training.

By using Mimics software’s specific tools, a Hounsfield (HU) value threshold was selected to generate a segmentation mask of the pharyngeal and sinonasal subregions. In this study, we used a fixed threshold for each CBCT scan with a lower HU value of −1000 and a higher HU value of −400.

Then, the boundary of the volume of interest (VOI), were identified. VOI limits were based on the landmarks on the midsagittal plane and are shown in Table I and Figure 3 . Each landmark was marked with a 0.3-mm diameter sphere, and the software used the center of each sphere as a coordinate.

Table I
Cephalometric landmarks of pharyngeal airway and sinonasal cavity, used to select the VOI
Landmarks Description
Na Most anterior point of the frontonasal junction
N Most anterior point of the nasal prominence on soft tissue
C2sp Most superior posterior point on the second cervical vertebra
C3ai Most anterior inferior point on the third cervical vertebra
Po Most anterior superior point of the external acoustic meatus
Or Most inferior point of the orbit
Anterior Plane perpendicular to FH passing through N
Posterior Plane perpendicular to FH passing through C2sp
Superior Plane parallel to FH passing through Na
Inferior Plane parallel to FH passing through C3ai
Lateral Plane perpendicular to FH passing though the most lateral point of the maxillary sinus

FH , Frankfurt horizontal.

Fig 3
Landmarks and boundaries used for the region of interest selection in both sagittal and coronal views.

Once landmarks were selected, the VOI was extracted from the totality of the segmentation mask using the crop function of the Mimics software to eliminate the regions exceeding the VOI boundary limits. To enhance the quality of the segmentation, the airway mask obtained was then adjusted by erasing or adding pixels with the objective of highlighting the sinonasal and pharyngeal airway region on each slice in the sagittal view ( Fig 4 ) using 300% magnification.

Fig 4
Segmentation mask of the sinonasal cavity and pharyngeal subregion after the enhancement of boundaries by manual erasing the parts outsides the region of interest.

Once obtained the segmentation masks, 20 CBCT scans out of the 40 CBCT scans were randomly selected and used for training the AI model file, whereas the remaining 20 CBCT segmentation masks were used for testing performance ( Fig 1 ). For both training and test, CBCT scans were resized to 128 × 128 spatial resolution, whereas slice depth remained unchanged. Manual segmentation of the pharyngeal and sinonasal airway subregions was performed by 1 orthodontist (R.L.) with 25 years of experience in both the clinical and research fields. To assess intraobserver reliability, 15 CBCT scans were randomly selected, and manual segmentation was again carried out by the same examiner 2 weeks later, without consciousness of the first measurements set.

The proposed deep learning model for automated segmentation is a fully convolutional deep encoder-decoder network on the basis of the Tiramisu model enriched with squeeze-and-excitation blocks, to suitably calibrate the most significant features and bidirectional convolutional long short-term memory to model spatiotemporal correlations between regions of interest in consecutive CBCT slices. The architecture of the employed model contains the following key differences with reference to the Tiramisu model ( Fig 5 ): (1) convolutional long short-term memory was used at the network’s bottleneck layer to exploit the spatial axial correlation of consecutive scan slices instead of processing each CBCT scan individually; (2) in the down-sampling and up-sampling paths, we add residual squeeze-and-excitation layers to improve the representational power of the model and support feature interpretation at a later stage.

Fig 5
The architecture of the employed CNN model, consisting of a down-sampling path and an up-sampling path, interconnected by skip connections and by the bottleneck layer. For simplicity, only 2 dense blocks are depicted in each path, whereas we employ 5 in our experiments.

A detailed description of the model, including used hyperparameters, is given in the Supplementary Material.

Our model was trained using manual segmentations from 20 CBCT scans and their 20 standard tessellation language models. The CNN model processes 3 CBCT slices at a time for identifying the 2D segmentation mask of the central slice, which is compared with the corresponding manual segmentation ( Fig 6 , AC show some examples). After processing all slices, the obtained masks were combined into a 3D volume, rendered using the ray casting method.

Fig 6
Examples of 2D automatic segmentation compared with manual segmentations. A, CBCT scans, B, manual segmentations, C, automatic segmentations.

The training stage of the CNN net necessitated approximately 18 hours per single CBCT (on a Titan X Pascal GPU [NVIDIA Corporate, Santa Clara, Calif]), whereas segmenting a whole input CT scan (at inference time) by using the trained model required approximately 60s on a standard consumer personal computer, including a Core i7 CPU with 8 GB RAM (Intel, Santa Clara, Calif), which is a suitable condition for clinical workflows.

To test the accuracy of the CNN fully automatic segmentation method, the 20 CBCT scans not included in the training data were used ( Table II ). On the one hand, these scans were processed by CNN model to achieve a fully automatic segmented sinonasal cavity and pharyngeal airway, and on the other hand, they were segmented manually by 1 operator using Mimics software. After that, each 3D model, the one obtained by CNN processing, and its homolog obtained from manual segmentation were compared with assess the accuracy.

Table II
Descriptive statistics for volumetric measurements (cubic millimeters) made on manual segmented and CNN segmented 3D Models of the sinonasal cavity and pharyngeal region
Patients Age Sex Manual segmentation CNN segmentation Differences
Control 1 19.8 M 102.12 99.46 2.67
Control 2 20.3 F 94.77 91.45 3.32
Control 3 22.4 F 90.36 88.55 1.8
Control 4 25.3 F 92.57 89.78 2.79
Control 5 19.3 M 106.32 104.78 1.55
Control 6 22.5 M 104.66 102.35 2.31
Control 7 21.5 M 69.78 67.37 2.41
Control 8 23.8 F 57.23 55.88 1.35
Control 9 24.3 F 87 86.43 0.57
Control 10 19.4 M 78.94 77.32 1.62
Control 11 20.1 M 110.64 108.48 2.17
Control 12 22.5 M 118.11 116.23 1.88
Control 13 23.5 F 56.4 55.24 1.16
Control 14 24.6 F 111.73 109.79 1.94
Control 15 27.9 F 74.43 73.54 0.89
Control 16 30.8 M 69.85 68.32 1.52
Control 17 28.5 M 130.54 127.43 3.11
Control 18 22.4 F 106.37 104.38 1.99
Control 19 21.5 F 88.77 86.43 2.33
Control 20 30.4 M 93.77 92.65 1.11
Mean 23.54 92.22 90.29 1.92
SD 3.50 19.96 19.62 0.73
Only gold members can continue reading. Log In or Register to continue

Stay updated, free dental videos. Join our Telegram channel

Jun 12, 2021 | Posted by in Orthodontics | Comments Off on Fully automatic segmentation of sinonasal cavity and pharyngeal airway based on convolutional neural networks

VIDEdental - Online dental courses

Get VIDEdental app for watching clinical videos