The objective of this study was to assess the reproducibility of cervical vertebral maturation (CVM) method based on the type of radiographic image and the level of experience and level of training of the evaluator.
Ten evaluators (5 orthodontic residents and 5 faculty members) were randomly divided into 2 groups: trained and untrained. All participants evaluated 80 radiographic images previously acquired in 4 different formats: (1) 2-dimensional (2D) digital (2D-digital), (2) 2D digitized hard copy from the Iowa Facial Growth Study (American Association of Orthodontists Foundation Craniofacial Growth Legacy Collection), (3) 2D digital reconstructed from a 3-dimensional (3D) radiograph (2D-from 3D), and (4) 3D cone-beam computerized tomographic (3D-CBCT) images. Agreement among evaluators on the morphology of the cervical vertebrae (CV) and the CVM stage of each radiographic image was assessed using Randolph’s kappa statistic and Kendall’s W coefficient of concordance.
Interobserver agreement on the determination of a curvature on the inferior border of the CV was substantial to perfect, whereas agreement on shape was fair to moderate. Overall, the level training in all image types, except 3D-CBCTs, but not the level of experience affected the agreement for shape and curvature of the CVs. Interobserver agreement on CVM staging for all combined images was substantial at 0.72. Faculty had a higher level of agreement than residents except for 2D-digital and 3D-CBCT images, whereas trained evaluators had an overall higher level of agreement than untrained evaluators except for 3D-CBCT images.
Interobserver agreement in determining CVM stage was substantial for all images evaluated; experience and training resulted in higher level of agreement for some image types. The 3D-CBCT images did not provide increased interobserver agreement over current 2D-digital lateral cephalograms in determining CVM staging or shape of the CV. The highest agreement in CVM staging was obtained on 2D-digital lateral cephalograms with training.
Curvature of the cervical vertebrae had better interobserver agreement than shape.
Experience may influence agreement depending on image type.
Training had an effect on the level of agreement in determining shape and cervical vertebral maturation stage.
Interobserver agreement for determining cervical vertebral maturation stage was substantial for all image types.
The 3-dimensional cone-beam computerized tomographic images did not increase agreement over 2-dimensional-digital lateral cephalograms.
Growth prediction of the human face has always been an area of interest in orthodontic research. It is thought that the ability to predict the amount, timing, and direction of facial growth will enable more efficient treatment planning, selection of more appropriate treatment regimens, and optimization of treatment timing. A number of different methods to evaluate the maturation stage of a subject are available. These methods include chronological age, height, weight, sexual maturation characteristics, dental development, and skeletal development.
Skeletal age assessment from radiographic analysis is a widely used approach to assess growth in orthodontics. Frontal sinus morphology, hand-wrist radiograph analysis, and the morphology of the cervical vertebrae (CV) , are 3 major areas that have been studied to assess skeletal age and skeletal maturation from radiographic analysis.
Perhaps the cervical vertebral maturation (CVM) index is the most appealing radiographic analysis to orthodontists because skeletal maturation can be assessed from a lateral cephalometric radiograph. This image is often taken as part of the diagnostic records, eliminating the need for radiographs in other skeletal regions and reducing radiation exposure to patients.
Changes in the morphology of the CV were initially studied by Lamparskii who was able to correlate these morphologic changes to the timing of peak height velocity and stages of hand-wrist ossification. Baccetti et al introduced a new clinically improved CVM method that consisted of 6 maturational stages. The features of each cervical stage were summarized and correlated with the peak in mandibular growth. In addition, CVM index has been found to have good correlation to the hand-wrist method, which has been considered the gold standard to evaluate skeletal maturation.
In the literature, there is controversy on the reproducibility of the CVM method assessment. Some studies have reported good reproducibility of the CVM method, exceeding 85%, , whereas other studies have reported the agreement for CVM staging to be fair or poor among evaluators.
Reasons for the poor reliability have been attributed to the level of training, clinician experience, and methods of assessment, such as simple visual assessments to tracings and digitizing methods of the CV, which provide a wide variety of qualitative and quantitative analyses. The effect of image type on the reliability of the CVM method has not been studied. Image type, which could be directly related to image quality, may be an important element to properly determine the appropriate CVM stage. Newer systems of radiological image acquisition, including digital radiography and 3-dimensional (3D) imaging, may have an influence in the reliability of the CVM method.
Cone-beam computed tomography (CBCT) is one of the radiological methods that has evoked recent interest regarding its validity and reliability in determining the different CVM stages. Conflicting results have been published. One study found a good relationship in determining skeletal maturation assessment between CBCTs, lateral cephalograms, and hand-wrist radiographs. Similarly, a recent study found almost perfect agreement between calibrated observers using CBCT 3D reconstructions and abstracted sagittal sections. By contrast, another study found that the agreement between CBCTs and lateral cephalograms was inconsistent.
Therefore, the aim of this study was to assess the influence of the type of radiograph, training, and level of clinical experience on the reproducibility of the CVM method.
Material and methods
Institutional review board approval was obtained by the review board at the University of Connecticut (institutional review board # 16-008-1) before the start of the study. Four different types of radiographic images previously acquired in 4 different formats—(1) digitized hard copy in 2 dimensions provided by the American Association of Orthodontists Foundation Craniofacial Growth Legacy Collection (AAOF; Fig 1 , A ), (2) digital image in 2 dimensions (2D-digital; Fig 1 , B ), (3) 2D digital image acquired from a 3D radiograph (2D-from 3D; Fig 1 , C ), and (4) a 3D radiograph (3D-CBCT; see Fig 2 , and Video , available at www.ajodo.org )—were used to evaluate the skeletal maturation based on the CV. In addition, the influence of orthodontic clinical experience and level of training in determining the CVM status was assessed. All radiographs (total of 80) were evaluated by 10 subjects (5 orthodontic residents and 5 faculty members) who were recruited from the orthodontic residency program at the University of Connecticut. All subjects completed a questionnaire, which contained questions about the shape of the vertebrae in each radiograph. All images were deidentified (identifiers of health information were removed) and cropped to include only the cervical vertebra C2-C4 to eliminate any additional information such as stage of the development of the dentition that might generate bias during the evaluation. The radiographs were obtained from 3 different sources. Forty lateral cephalometric radiographs were selected from the longitudinal growth records from the Iowa Facial Growth Study through the AAOF Craniofacial Growth Legacy Collection Web site. The AAOF radiographs were downloaded from the AAOF Web site ( www.aaoflegacycollection.org/aaof_collection.html?id=UIOWAGrowth ) and saved in a TIFF format. The evaluators looked at these radiographs on a 14-inch laptop screen using Windows Photo viewer (Microsoft, Redmond, Wash). Another 20 digital lateral cephalometric radiographs were selected from the electronic health record system Axium (Exan Group, Coquitlam, British Columbia) of patients treated at the Division of Orthodontics at the University of Connecticut. All cephalometric images were acquired using a Planmeca Promax unit (Planmeca USA, Inc, Roselle, Ill). Exposure parameters were 60 kVp and 7 mA with an acquisition time of 7 seconds. The images were stored in MiPACS (Medicor Imaging, Charlotte, NC), which is a picture archival system tied to Axium electronic health record system. All images were viewed in the standard MiPACS cephalometric image display mode inherent to the software program. Finally, 20 CBCT scans that were taken as part of the initial record were obtained from a private orthodontic office. All CBCT scans were acquired using a next-generation iCAT CBCT scanner (Imaging Sciences International, Hatfield, Pa) with a 360-degree rotational protocol. A large field of view (13 × 16 cm) with 0.4-mm voxel was used for all scans. The scans were acquired at 80 kVp and 5 mA with a focal spot size of 0.5 mm. The image acquisition times were 20 seconds per scan. The CBCT scans were viewed as a 3D and 2D image (1 CBCT record provided 2 different image modalities, 1 in 2 dimensions and 1 in 3 dimensions). CBCT reconstruction software (Invivo 5; Anatomage, San Jose, Calif) was used to acquire a 2D radiograph from the 3D scan. Twenty 2D-form 3D images of the CV were generated from the 20 CBCT scans. The subjects evaluated both the CBCT scans and the 2D images that were obtained from the CBCT scans. While viewing the 3D scans, subjects had full rotational control of the scan volume and were able to scroll through all 3 orthogonal planes (axial, sagittal, and coronal). The subjects also had the ability to control the histogram and to make any changes to the contrast and density to help them better evaluate the image. The following inclusion criteria were applied to all radiographs selected: (1) subjects age range between 10 and 16 years, (2) clear and visible C2-C4 vertebrae, and (3) absence of anomalies of the vertebrae. Low-quality radiographs and those without a clear and visible C2-C4 vertebrae were excluded from the study.
The influence of training on the reproducibility of the assessment of the curvature, shape, and stage of the CV was assessed by randomly dividing the 10 subjects into a training and a nontraining group. This was a convenient sample drawn from the orthodontic residency program at the University of Connecticut. The subjects did not participate in the design or construction of the study. The subjects were recruited by sending an e-mail to all orthodontic residents and faculty members in the residency program. The first 5 residents and the first 5 faculty members who responded to the e-mail were included in the study. All subjects included were provided with a cover letter and information sheet that briefly described the study. The evaluators (5 faculty and 5 residents) were randomly divided into 2 groups. The faculty had a range of 3-20 years of experience with the majority having 5 years of experience (n = 3). The faculty group had 2 evaluators who received training, and the other 3 did not receive any training. The resident group had 3 evaluators who received training, and the other 2 did not receive training ( Fig 3 ). The evaluators were randomly assigned to either group. Five dark covered envelopes per group contained training (2 for the faculty group and 3 for the resident group) and no training (3 for the faculty group and 2 for the resident group) slips for evaluator allocation. Each of the evaluators was asked to pick 1 of the envelopes and was assigned to a group based on the slip inside the envelope.
The training session was conducted 1-2 weeks before the main evaluation session. The training session included a detailed explanation of the rules to be followed for using CVM method and assigning CVM stages. Twenty lateral cephalograms already selected from the longitudinal growth records from the Iowa Facial Growth Study through the AAOF Web site were used for this training session. These images were evaluated ahead of time by the study coordinator who completed the questionnaires and assigned a CVM stage to each image. Those 20 lateral cephalograms were not used for the main evaluation session. The same questionnaire (see following questions) was given to the evaluators who answered the same 6 questions for each radiograph. After completing the questionnaires, a discussion was carried out with the study coordinator, and any conflict about the result was discussed immediately and clarified by referencing the published index ( Fig 4 ) in the specific cephalogram, highlighting where the evaluator had failed to identify the correct answer. The session was considered successful only if at least 80% of the subjects were correctly identified. Evaluators who were unable to reach this result underwent a second session of retraining 1 week later. Those evaluators were ready for the main evaluation session 1 week after the second training session regardless of the result achieved. The same radiographs that were used for the first training session were used for the second training session. All untrained and trained evaluators had a baseline knowledge of the CVM method. Four of the 5 subjects who underwent training needed a retraining session after not achieving an 80% passing score. All the trained evaluators successfully passed the 80% milestone set for the second training session. The CVM method is part of what the faculty teach and residents learn within the curriculum of the training program. However, none of the evaluators were experts in this subject area.
All evaluators whether in the training or nontraining group were provided a hard copy handout of figures and definitions of the CVM morphology, according to Baccetti et al, to be used at any time during the study ( Fig 4 ). Then, a high-resolution image presentation containing all the selected radiographs was shown to the subjects, and they were asked to complete a questionnaire, which contained questions regarding each radiograph being evaluated (as described by Nestman et al ). For each radiograph, the following 6 questions regarding CV morphology were asked:
Is the lower border of C2 best described as flat or curved?
Is the lower border of C3 best described as flat or curved?
Is the lower border of C4 best described as flat or curved?
Is the vertebral body of C3 best described as trapezoidal, rectangular horizontal, square, or rectangular vertical?
Is the vertebral body of C4 best described as trapezoidal, rectangular horizontal, square, or rectangular vertical?
What is the CVM stage of the radiographic image?
The subjects had unlimited time to make their evaluations and complete the questionnaires. There was no detailed explanation of the CVM method or any discussion after the questionnaire completion at the end of the evaluation session.
To assess the influence of training, experience, and type of radiograph on the reproducibility of the CVM method, the results obtained were compared between the trained and untrained subjects, faculty and residents, and among the 4 different radiographic groups.
The interrater agreement was summarized by the percentage of agreement among raters, Randolph’s kappa , for nominal outcomes, and Kendall’s coefficient of concordance (Kendall’s W) for ordinal outcomes. F test was performed based on Kendall’s W statistic to test against the null hypothesis that the ratings of different raters are not concordant with one another. A P value of <0.05 was deemed to be statistically significant, and the null hypothesis was rejected. The agreement was compared between image types, faculty and residents, and trained and untrained subjects by 95% confidence interval (CI) for the difference in Randolph’s kappa or Kendall’s W statistic. Randolph’s kappa was used to assess agreement among raters for categorical ratings (questions evaluating CV shape and lower border curvature), and Kendall’s W statistic was used instead for ordinal ratings (question on the CVM stage) where the agreement between rated stages is inversely proportional to their distance. The 95% CI was derived based on 1000 resampling samples following the bootstrap method of McKenzie et al. No statistically significant difference between groups was recorded if the CI contained 0. All statistical analyses were performed in R 3.3.1.
Interpretation of kappa coefficient and Kendall W values has been proposed as poor (≤0), slight (0.01-0.2), fair (0.21-0.4), moderate (0.41-0.6), substantial (0.61-0.8), and almost perfect (0.81-1). ,
A sample size calculation before the study could not be performed because of a lack of prior knowledge about the parameters including agreement between raters and distributions of ratings from individual raters. We considered 20 images per image type and 80 images in total for between-group comparisons. To arrive at the decision for the final number of images, we considered the amount of time each evaluator would take to analyze them, being careful not to create survey fatigue.
The time spent by each evaluator to complete the questionnaire for all images ranged from 1.5 to 3 hours. Interobserver agreement in assessing the presence of a curvature of the inferior border of the vertebrae and determining the shape of the CV (questions 1-5) was assessed using the Randolph’s kappa statistic. Overall, the kappa values for assessing the curvature of the vertebrae ranged from substantial (0.64) to perfect agreement ( Table I ). The interobserver agreement for determining the shape of the body of the vertebrae ranged from fair (0.34) to moderate agreement (0.47).
|Question (image type)||2D-digital||AAOF||2D-from 3D||3D-CBCT||All images|
|Q1 (curvature C2)||0.88||0.80||1.00||0.98||0.92|
|Q2 (curvature C3)||0.86||0.64||0.94||0.96||0.85|
|Q3 (curvature C4)||0.89||0.66||0.81||0.76||0.78|
|Q4 (shape C3)||0.47||0.34||0.35||0.41||0.39|
|Q5 (shape C4)||0.44||0.40||0.37||0.43||0.41|
Specific paired comparisons between all image types did show significant differences, although these were not consistent for all vertebrae evaluated. Significant differences were observed for curvature assessment between all groups except for 2D-from 3D compared with 3D-CBCT images ( Table II ). These 2 methods had overall higher interobserver agreement than the AAOF and 2D-digital images. Significant differences for shape were only observed between 2D-digital compared with AAOF images, whereas 2D-digital images had more interobserver agreement than AAOF images ( Table II ).
|Image type 1||Image type 2||Question||Kappa difference||95% CI of difference|
|2D-digital||AAOF||Q1 (curvature C2)||0.08||0.000, 0.138|
|Q2 (curvature C3)||0.21||0.067, 0.311 †|
|Q3 (curvature C4)||0.23||0.120, 0.289 †|
|Q4 (shape C3)||0.18||0.031, 0.271 ∗|
|Q5 (shape C4)||0.07||−0.113, 0.229|
|2D-digital||2D-from 3D||Q1 (curvature C2)||−0.12||−0.160, −0.040 †|
|Q2 (curvature C3)||−0.09||−0.160, 0.004|
|Q3 (curvature C4)||0.08||−0.047, 0.187|
|Q4 (shape C3)||0.17||−0.002, 0.320|
|Q5 (shape C4)||0.11||−0.122, 0.307|
|2D-digital||3D-CBCT||Q1 (curvature C2)||−0.10||−0.158, −0.004 ∗|
|Q2 (curvature C3)||−0.11||−0.173, −0.020 ∗|
|Q3 (curvature C4)||0.13||0.009, 0.220 ∗|
|Q4 (shape C3)||0.08||−0.176, 0.289|
|Q5 (shape C4)||0.02||−0.227, 0.238|
|AAOF||2D-from 3D||Q1 (curvature C2)||−0.20||−0.265, −0.091 †|
|Q2 (curvature C3)||−0.30||−0.378, −0.153 †|
|Q3 (curvature C4)||−0.15||−0.273, 0.011|
|Q4 (shape C3)||−0.01||−0.107, 0.098|
|Q5 (shape C4)||0.04||−0.071, 0.142|
|Q1 (curvature C2)||−0.18||−0.260, −0.062 †|
|Q2 (curvature C3)||−0.32||−0.407, −0.169 †|
|Q3 (curvature C4)||−0.10||−0.218, 0.022|
|Q4 (shape C3)||−0.11||−0.345, 0.156|
|Q5 (shape C4)||−0.05||−0.311, 0.178|
|2D-from 3D||3D-CBCT||Q1 (curvature C2)||0.02||0.000, 0.047|
|Q2 (curvature C3)||−0.02||−0.053, 0.036|
|Q3 (curvature C4)||0.05||−0.089, 0.138|
|Q4 (shape C3)||−0.09||−0.376, 0.167|
|Q5 (shape C4)||−0.10||−0.324, 0.136|