In this study, we aimed to assess interrater and intrarater agreement among orthodontic clinicians in their assessments of reported incidental findings in regard to both the need for additional follow-up and the impact on future orthodontic treatment in large-field maxillofacial cone-beam computed tomography (CBCT) imaging.
The study sample consisted of 18 nonrandomly selected large-field maxillofacial CBCT volumes containing a reported total of 88 radiographic findings. All scans were associated with formal radiologic reports. However, the suggestions of further follow-up were removed from the radiologic reports so as to not bias the 3 evaluating orthodontists in their subsequent decision making. The evaluators had on average 7.6 years of CBCT usage and self-interpretation experience. Reliability was determined by quantifying the level of agreement between the evaluators’ assessments for both research questions for all 88 findings using a binary response (yes/no) as the outcome measure. The Cohen kappa statistic was calculated to quantify intrarater and interrater agreement globally for both statements.
Although interrater agreement was considerable, potential decisions with clinical impact were not consistent. This needs to be considered when interpreting maxillofacial incidental findings. Evaluators demonstrated higher levels of agreement for dentoalveolar findings compared with all other extragnathic regions when assessing clinical significance.
Among the evaluators who were considered experienced in CBCT, “fair-to-good” interrater agreement and “excellent” intrarater agreement were demonstrated in terms of the need for further follow-up and their potential impact on future orthodontic treatment.
Incidental findings (IFs) on CBCT are relatively common (1.5-3 IFs per scan).
The clinical impact should be considered when interpreting maxillofacial IFs.
Although agreement was considerable, potential decisions with clinical impact were inconsistent.
Cone-beam computed tomography (CBCT) provides 3-dimensional (3D) evaluation of craniofacial structures, and its use is increasing in orthodontic diagnosis and treatment planning. The traditional approach to orthodontic imaging involves a panoramic radiograph and a lateral cephalogram for initial treatment planning. More recently, some orthodontic clinicians have been implementing CBCT imaging to either augment or replace conventional 2-dimensional imaging. Proper justification of CBCT imaging in orthodontics has been demonstrated when there is a suspicion of root resorption, supernumerary teeth, canine impaction, surgical planning, or upper-airway obstruction.
Several studies have contrasted the reliability of panoramic imaging with that of CBCT in orthodontic-related issues. Based on the literature, there is some evidence that CBCT may have greater diagnostic capability, which may lead to alternative orthodontic treatment plans in some specific scenarios.
When compared with conventional 2-dimensional imaging, CBCT captures a 3D image with a larger field of view. Hence, there is an increased potential to identify incidental findings (IFs). IFs are defined as any findings detected by any diagnostic imaging modality that are unrelated to the clinical indication for the imaging being performed. Arguably, as important as the detection is the action that each unexpected finding invokes, in terms of determining the necessity for further evaluation or management. Since most IFs detected in CBCT images are extragnathic, dental clinicians may be unfamiliar with interpretation of anatomic structures outside the primary region of interest. The European Academy of Dentomaxillofacial Radiology and the American Academy of Oral and Maxillofacial Radiology have stated that the entire volume must be interpreted regardless of the region of interest; if the interpreting clinician is not highly experienced in CBCT interpretation, a referral is required to an oral and maxillofacial radiologist or a medical radiologist for review.
A number of studies in the literature have demonstrated the high frequency of IFs in large-field CBCT imaging in various patient samples. Of these studies, only a few have investigated an orthodontic sample exclusively, in which the reported IFs ranged from 1 to 2 IFs per CBCT scan. With the expected high rate of IFs, combined with the relative novelty of CBCT imaging in clinical practice and the lack of formal training requirements in CBCT interpretation in some regions and some dental curricula, it may be difficult to ensure that orthodontic clinicians who use CBCT images are properly interpreting them. Reliability or agreement among orthodontic clinicians in their assessment of the impact of maxillofacial IFs identified in CBCT imaging has not been evaluated.
In this study, we aimed to assess the agreement between orthodontic clinicians regarding the impact of maxillofacial IFs using multiplanar and 3D reconstructed views in large-field CBCT imaging. Specifically, we evaluated the interrater and intrarater agreements of orthodontic clinicians in their assessments of reported IFs in regard to both the need for additional follow-up and the impact on future orthodontic treatment in large-field maxillofacial CBCT imaging.
Method and materials
Approval to conduct this cross-sectional study was obtained from the Health Research Ethics Board of the University of Alberta in Edmonton, Alberta, Canada.
The study sample consisted of 18 nonrandomly selected large-field maxillofacial CBCT volumes containing a reported total of 88 radiographic findings. All scans were associated with formal radiologic reports. The CBCT volumes were hand-selected from a larger sample of 427 consecutively obtained CBCT images acquired for orthodontic purposes at a private diagnostic imaging center (Edmonton Diagnostic Imaging). The selected volumes were chosen to best represent the approximate distribution of expected findings based on anatomic region, as previously demonstrated in the literature. When appropriate, CBCT volumes containing findings that were specifically recommended for follow-up by the interpreting oral and maxillofacial radiologist were given priority. However, the suggestion of further follow-up was removed from the radiologic reports so as not to bias the evaluators in subsequent decision making.
All selected CBCT images were acquired using an i-CAT Next Generation machine (Imaging Sciences International, Hatfield, Pa) with the patient in the upright position and with similar imaging parameters (120 kVp; 5 mA; 0.3-mm voxel size; scan time, 4.8-8.9 seconds; and field of view no more than 17 cm in height × 23 cm in depth). Each CBCT scan was associated with a formal radiologic interpretation provided by 1 board-certified oral and maxillofacial radiologist. All image volumes were reviewed by the radiologist using the imaging software InVivoDental (version 5.0; Anatomage, San Jose, Calif). The radiologist was blinded to the objective of the study and unaware that the data would be retrospectively collected for analysis. All CBCT images were coded for blinding and randomized for prospective evaluation by the assessors. An independent consultant held the code hidden until all evaluations were completed.
Evaluations of the CBCT findings were completed independently by 3 licensed orthodontic clinicians recruited via e-mail solicitation on the basis of having a high level of experience in CBCT imaging requests and use in their clinical practices. The evaluators had on average 7.6 years of CBCT usage and self-interpretation experience (evaluator A, 5 years; evaluator B, 10 years; evaluator C, 8 years).
The evaluators were initially introduced to an instruction and discussion session, in which they completed a mock evaluation consisting of 3 sample CBCT volumes. They were directed in how to access and manipulate the CBCT volumes using the imaging software (version 11.5; Dolphin Imaging & Management Solutions, Chatsworth, Calif) and could ask questions throughout the instruction session.
Upon completion of the calibration session, the evaluators were instructed to review each CBCT volume, assessing both the CBCT volume using slices in all 3 planes of space (multiplanar and 3D volume rendering) and the modified associated radiologic report. For convenience, a data collection instrument was provided, listing all specific reported radiographic findings in each volume. For each finding, the evaluator was asked to locate the finding in the volume and to answer yes or no to the following statements: (1) this finding requires follow-up with a dental or medical professional (yes/no), and (2) this finding may alter the orthodontic treatment plan (yes/no).
The evaluators were instructed to complete this for all findings in the radiologic report and then proceed to the next CBCT volume without going back. Sequential progression of CBCT volumes 1 through 18 occurred in this manner. All evaluators were blinded to the subject’s identity and detailed clinical history; they evaluated the images in a unique random order and evaluated each image set twice, separated by a minimum 30-day wash-out period. All evaluators reviewed the CBCT images using the same computer hardware (high-definition graphics 2000, 1920 × 1080 pixel resolution; Lenovo-Intel, Santa Clara, Calif) and viewing monitor (Aquos 70-in LCD television; Sharp, Osaka, Japan).
Reliability was determined by quantifying the level of agreement between the 3 evaluators’ assessments for both research questions for all 88 findings with the binary response (yes/no) as the outcome measure. The Cohen kappa statistic ( κ ) was calculated to quantify intrarater and interrater agreement globally for both statements. Kappa statistics were computed using the SPSS software package (version 20.0; IBM, Armonk, NY). All 95% confidence intervals were obtained using the bootstrap method, with the concept that an inference about a population from sample data can be modeled by resampling the data and then making an inference. Raw agreement measures (proportion of overall agreement) and proportions of specific agreement were also calculated in cases of unbalanced marginal total distributions to overcome the possible paradox of the kappa statistic’s using the value of 0.75 to represent the defining value, above which represents acceptable agreement. In addition, the proportion of overall agreement was assessed separately for findings in each anatomic category for descriptive purposes.
There are guidelines on what value of kappa reflects adequate agreement in the literature; they are useful in interpretation but are not universally accepted.
Sample size was determined using the suggestions of Donner and Rotondi, in which they defined the following variables for interobserver-agreement studies with a binary outcome using multiple raters: κ L , the minimum acceptable value of kappa; κ 0 , the expected value of kappa; and π, the probability that the rating is a success. To determine κ 0 , a small pilot study was performed with the same research methodology outlined above, but 3 orthodontic residents analyzed only 15 CBCT findings. From this, it was suggested that the value of κ 0 should be 0.8, and κ L was set at 0.6. As a conservative strategy, π was set at 0.1. Using these values for the 3 raters, Donner and Rotondi suggested a minimum sample size of 78 radiographic findings if the lower bound of the 95% confidence interval is at least κ = 0.6.
In total, 18 CBCT volumes were hand-selected containing a total of 88 radiologic findings, and these findings were subdivided into the following previously determined anatomic categories ( Table I ): 37 upper airway findings, 17 paranasal sinus findings, 12 dentoalveolar findings, 10 findings in the surrounding hard and soft tissues, 9 temporomandibular joint findings, and 3 cervical vertebrae findings. The ages of the imaged subjects ranged from 6 to 33 years; the mean age was 14.6 years, and the median age was 14.0 years.
|Incidental finding category||Frequency (n)|
|Cervical vertebrae||3 (3.4%)|
|Cervical vertebrae fusion||3|
|Hypodontia (excluding third molars)||1|
|Enlarged follicular space or possible odontogenic cyst||2|
|Nasal-oral-pharyngeal airway||37 (42.1%)|
|Enlarged inferior nasal turbinate||5|
|Irregular mucosal thickening of nasal cavity; possible polyps||1|
|Lingual tonsil hypertrophy||1|
|Palatine tonsil hypertrophy||3|
|Nasal mucosal thickening; rhinitis||2|
|Dystrophic calcification in tonsils||3|
|Nasal septal deviation||6|
|Polypoidal soft-tissue mass on soft palate||1|
|Soft tissue mass on left side of pharynx and larynx||1|
|Complete opacification of the middle and superior meatuses||1|
|Paranasal sinuses||17 (19.3%)|
|Localized inflammatory conditions (mucositis, sinusitis)||9|
|Paranasal sinus hypoplasia||1|
|Obliteration of maxillary, sphenoid, frontal, ethmoid sinuses with soft tissue infiltration||1|
|Surrounding soft and hard tissues||10 (11.4%)|
|Calcification of stylohyoid ligament||1|
|Idiopathic osteosclerosis (dense bone island)||4|
|Enlarged sella turcica||1|
|Clivus notch suggesting ectopic blood vessel||1|
|Temporomandibular joint||9 (10.2%)|
|Physiologic remodeling (flat margins, subchondral sclerosis)||5|
|Degenerative changes (osteophytes, erosions)||2|
All values of kappa are reported using the minimum and maximum obtained values of kappa ( κ min,max ). Using the interpretation of kappa values by Fleiss et al, interexaminer agreement in the assessment of the need for further follow-up of the reviewed findings was “fair to good” ( κ min,max , 0.606, 0.710), and the proportions of overall agreement ranged from 0.807 to 0.855 ( Fig 1 ). Interexaminer agreement in the assessment of a potential impact on future orthodontic treatment was also “fair to good” ( κ min,max , 0.633, 0.705), with overall agreement ranging from 0.875 to 0.943 ( Fig 2 ).
Again, using the interpretation of Fleiss et al of kappa values, intraexaminer agreement regarding the need for further follow-up of the reviewed findings was “excellent” ( κ min,max , 0.846, 0.909) ( Fig 3 ). Intraexaminer agreement regarding the impact of the reviewed findings on future orthodontic treatment was also “excellent” ( κ min,max , 0.860, 0.910) ( Fig 4 ).
The proportions of overall interobserver agreement in the assessment of need for further follow-up were assessed separately for the findings in each anatomic category ( Table II ). Only the overall proportions were assessed for descriptive purposes because the subsample size for a few anatomic categories was small. The proportions of overall agreement at time 1 were highest (1.000) for the dentoalveolar findings for all subjects and lowest (0.666) for the cervical vertebrae findings for all subjects. The results were similar at time 2 ( Table II ).
|Anatomic category||n||P Overall at T1||P Overall at T2|
|Ortho A vs ortho B|
|Surrounding hard and soft tissues||10||0.800||0.600|
|Ortho A vs ortho C|
|Surrounding hard and soft tissues||10||0.800||0.900|
|Ortho B vs ortho C|
|Surrounding hard and soft tissues||10||0.700||0.700|