Our objectives were to evaluate the reliability of agreement between orthodontists, with various degrees of cone-beam computed tomography (CBCT) imaging manipulation comfort, in classifying adenoid hypertrophy through CBCT generated images and also to determine how accurate orthodontists are compared with the gold standard diagnosis, nasopharyngoscopy.
This was a cross-sectional study in which a randomized list of board-certified orthodontists evaluated different degrees of adenoid hypertrophy of a stratified sampling of 10 scans. The available pool of CBCT images was from a multidisciplinary airway clinic in which children and adolescents had a CBCT scan and a nasopharyngoscopy (reference standard) by an otolaryngologist (head and neck surgeon) on the same day. The participating orthodontists used the same viewer software and computer, and had access to a previously published visual guideline for evaluating adenoid size.
Fourteen orthodontists evaluated 10 CBCT reconstructions. Interoperator reliability was excellent (intraclass correlation coefficient [ICC], 0.941; 95% confidence interval, 0.882-0.984). However, the orthodontists’ evaluations against the reference standard demonstrated poor accuracy, (ICC mean, 0.39; ICC range, 0.0-0.74). Dichotomous data representing healthy and unhealthy patients were analyzed individually, and the orthodontists’ evaluations and the nasopharyngoscopy results (accuracy) showed, on average, poor kappa values (mean, 0.44; range, 0.20-0.80).
Different levels of CBCT expertise impacted the assessment accuracy. The participating orthodontists showed excellent consistency among themselves; however, poor agreement between their CBCT assessments compared with nasopharyngoscopy demonstrated that this sample of clinical orthodontists had poor diagnostic accuracy. Together, these findings suggest that orthodontists may make consistent and systematic errors in this type of evaluations.
Orthodontists’ assessments of adenoid hypertrophy on cone-beam computed tomography (CBCT) were compared with nasopharyngoscopy.
Orthodontists’ diagnostic accuracy evaluating adenoid hypertrophy with CBCT was poor.
This sample of orthodontists made consistent and systematic errors evaluating adenoid hypertrophy with CBCT.
Different levels of CBCT expertise impacted diagnostic and screening accuracy.
Among children and adolescents, a common cause of an obstructed upper airway is hypertrophy of the adenoids or tonsils; this can lead to the development of sleep-disordered breathing and, in severe cases, obstructive sleep apnoea. Neurocognitive impairment and behavioral effects, such as attention deficit hyperactivity disorder and aggression, have been linked to sleep-disordered breathing.
The initial diagnosis of upper airway dysfunction is primarily based on medical history, as well as consideration of patients’ and parents’ complaints. Signs and symptoms may include chronic snoring, breathing interruption during sleep, delayed growth, tendency to fall asleep during the day, behavioral difficulties, and chronic runny nose. To supplement the initial assessment, an otolaryngologist/head and neck surgeon (OHNS) may use direct visualization of the area with nasopharyngoscopy (NP).
The advent of cone-beam computed tomography (CBCT), with its lower ionizing radiation dosage compared with conventional computer tomography, has allowed clinicians to assess the upper airway in 3 dimensions. In this regard, 3-dimensional (3D) CBCT diagnosis and screening can certainly be considered a useful imaging method when properly indicated, since it refines the image definition and diagnostic accuracy when compared with traditional 2-dimensional (2D) imaging.
A previous study investigated CBCT imaging usage to screen for adenoid hypertrophy when CBCT images were already available and indicated for other reasons. Strong sensitivity (88%) and specificity (93%) were reported, supporting the ability of this type of 3D imaging as a reliable tool for adenoid hypertrophy screening. However, the number of evaluators was small, and all of them had a high comfort level for manipulating CBCT images.
Therefore, a 2-fold follow-up study was designed. Primarily this study aimed (1) to evaluate interobserver reliability between several orthodontists, with various degrees of CBCT manipulation comfort, in classifying adenoid hypertrophy through CBCT-generated images; and (2) to determine how accurate these orthodontists were in screening for adenoid hypertrophy using CBCT imaging compared with the reference standard diagnosis, an NP assessment by an OHNS. See Supplemental Materials for a short video presentation about this study.
Material and methods
This cross-sectional study protocol was granted ethical approval from the Research Ethics Board at the University of Alberta (Pro00043684).
The CBCT samples were from consecutively evaluated patients taken at the University of Alberta Interdisciplinary Airway Research Clinic. The CBCT population and the methodology followed closely a previous study. Following the ALARA (as low as reasonable achievable) principle, the CBCT scans were not specifically taken to evaluate adenoids, but to evaluate more complex craniofacial patterns common in children with a high likelihood of obstructive sleep apnea.
The selection criteria were children and adolescents 6 to 15 years old with a referral based on obstructive upper airway concerns. CBCT images of patients with previous treatment of upper airway dysfunction or sleep disorders, or previous orthodontic treatment were not used in this study.
All subjects had the CBCT scan performed by a radiology technician, and a pediatric OHNS completed the NP. Both were obtained within 2 hours of each other. The CBCT image was taken with an i-CAT scanner (Imaging Sciences International, Hatfield, Pa). The same technician following a protocol-imaging algorithm acquired all images.
The NP was performed according to the protocol established by the American Academy of Allergy. Subsequently, the same OHNS analyzed the NPs of each patient, and the adenoid size was classified on a 4-point scale based on its obstruction level. The severity of adenoid hypertrophy was graded by a validated method : grade 1 (up to 25% obstruction), grade 2 (25%-50%), grade 3 (50%-75%), and grade 4 (>75%). ( Appendix ; Table I ).
|Orthodontist||Intraclass correlation||95% CI|
|Lower bound ∗||Upper bound|
We estimated the sample size of boarded-certified orthodontists among a population of 34 specialists of a major Canadian city as follows. We assumed that 50% of orthodontists would choose healthy adenoids with margin of error of 0.2 and 95% confidence interval (95% CI). Therefore, the final sample required was 14. Based on this, 14 orthodontists, from the 34, were chosen randomly by a statistician (G.H.). The number of 10 scans was established on a time-base of a 30-minute evaluation of each participant. Ten patients’ scans were selected from the pool of 39 scans using stratified sampling (CBCT images were stratified by grades of the NP and randomized by the statistician) based on the defined strata and distribution. The stratified sampling was representative of the disease spectrum, and a preestablished distribution was determined as follows: 2 patients classified by the OHNS as grade 1, 3 patients as grade 2, 3 patients as grade 3, and 2 patients as grade 4. This aimed to evenly distribute the obstruction grades. The participating orthodontists used a visual analysis of the upper airway obstruction as depicted from the CBCT reconstructions, all limited to the area of the adenoids. InVivo Dental viewer software (Anatomage, San Jose, Calif), specifically the “Lay Egg” function was used, and the DICOM data were anonymized. The evaluators had access solely to the CBCT reconstructed images and were blinded to any other patient information.
The evaluators were recruited consecutively based on a randomized list until the necessary sample size was obtained. Prospective evaluators were contacted by e-mail. Orthodontists consenting to participate received an information package and informed consent by e-mail, and it was discussed verbally before they signed. All participants had the opportunity to ask technical questions about the viewer software and had access to a previously published visual guideline for evaluating adenoid size ( Appendix ).
An investigator (C.P.P.) was present during all assessments. The participants received instructions on how to manipulate the software before beginning. The orientation demonstrated how to access and scroll the various slices and evaluate the adenoids using the grades of obstruction based on the standardized classification provided via a cheat sheet. The OHNS’s classification of the adenoid hypertrophy was also verbally described. The evaluators were asked to classify the adenoid size as mild (grade 1, <25% obstruction), moderate (grade 2, 25%-50% obstruction), advanced (grade 3, 50%-75% obstruction), or severe (grade 4, >75% obstruction). A CBCT reporting template was completed and sealed in an envelope.
The SPSS statistical package for the social sciences (version 23; IBM, Armonk, NY) was used for data analysis. Interobserver reliability between the orthodontists and the accuracy of evaluations against the NP were investigated with the intraclass correlation coefficient (ICC) and kappa statistics. The ICC test was used to evaluate the interobserver reliability classification of adenoid size on the 4-point scale, testing the reliability of the assessments between the evaluators. ICC was also used to assess the accuracy of the orthodontists’ classifications against the NP. Agreement was classified according to the following ICC values: excellent (>0.9), good (0.75-0.9), moderate (0.5-0.75), or poor (<0.50). P values less than 0.05 indicated statistical significance. The accuracy of a dichotomous diagnosis “diseased” vs “healthy” was evaluated with kappa statistics. Grades 1 and 2 were renamed as healthy, and grades 3 and 4 were considered unhealthy. The levels of agreement reflected by the kappa values were considered excellent above 0.9, good between 0.75 and 0.9, moderate in the range of 0.5 to 0.75, and poor when less than 0.5. Intraobserver reliability was previously studied.
Fourteen Royal College of Dentists of Canada board-certified orthodontists practicing in the same major Canadian city participated in this study. All participants used the same computer and visualization software to prevent performance bias. The participants’ evaluations of adenoid size followed a similar classification as the reference standard (OHNS via NP). The mean time spent by the orthodontists was 12 minutes 2 seconds (SD, ± 3 minutes 49 seconds). The answer sheet was placed in an envelope, and a third person transferred the data to an Excel file (Microsoft, Redmond, Wash). The data were checked carefully, and outliers were verified.
ICC was used to evaluate interobserver reliability. No clinically relevant discrepancies were found, showing high consistency among evaluators (ICC, 0.94; 95% CI, 0.882-0.984). However, the orthodontists’ evaluations against the reference standard demonstrated poor accuracy (ICC mean, 0.39; ICC range, 0.00-0.74). Table I and Figure 1 present the ICC results from each orthodontist against the NP. For a second interobserver reliability analysis and a different analysis perspective, we used the statistical “mode” (the value that appears more often on visual analysis data: ie, the most frequently occurring number in a set of numbers). The purpose of this data transformation was to limit the influence of possible outliers. In this scenario, the results showed moderate agreement between the orthodontists and the NP grades (ICC, 0.753; 95% CI, 0.119, 0.937).