The cervical vertebrae maturation (CVM) method has been advocated as a predictor of peak mandibular growth. A careful review of the literature showed potential methodologic errors that might influence the high reported reproducibility of the CVM method, and we recently established that the reproducibility of the CVM method was poor when these potential errors were eliminated. The purpose of this study was to further investigate the reproducibility of the individual vertebral patterns. In other words, the purpose was to determine which of the individual CVM vertebral patterns could be classified reliably and which could not.
Ten practicing orthodontists, trained in the CVM method, evaluated the morphology of cervical vertebrae C2 through C4 from 30 cephalometric radiographs using questions based on the CVM method. The Fleiss kappa statistic was used to assess interobserver agreement when evaluating each cervical vertebrae morphology question for each subject. The Kendall coefficient of concordance was used to assess the level of interobserver agreement when determining a “derived CVM stage” for each subject.
Interobserver agreement was high for assessment of the lower borders of C2, C3, and C4 that were either flat or curved in the CVM method, but interobserver agreement was low for assessment of the vertebral bodies of C3 and C4 when they were either trapezoidal, rectangular horizontal, square, or rectangular vertical; this led to the overall poor reproducibility of the CVM method. These findings were reflected in the Fleiss kappa statistic. Furthermore, nearly 30% of the time, individual morphologic criteria could not be combined to generate a final CVM stage because of incompatible responses to the 5 questions. Intraobserver agreement in this study was only 62%, on average, when the inconclusive stagings were excluded as disagreements. Intraobserver agreement was worse (44%) when the inconclusive stagings were included as disagreements. For the group of subjects that could be assigned a CVM stage, the level of interobserver agreement as measured by the Kendall coefficient of concordance was only 0.45, indicating moderate agreement.
The weakness of the CVM method results, in part, from difficulty in classifying the vertebral bodies of C3 and C4 as trapezoidal, rectangular horizontal, square, or rectangular vertical. This led to the overall poor reproducibility of the CVM method and our inability to support its use as a strict clinical guideline for the timing of orthodontic treatment.
Accurate prediction of craniofacial growth has proven to be deceptively problematic. Whereas the study of craniofacial growth has shown growth patterns that might apply to the general population, these patterns are not as reliable when predicting growth at the individual level. Still, the pursuit of growth prediction methods continues, because of the potentially significant value of a simple and accurate prediction scheme during the diagnosis and treatment of patients with skeletal jaw discrepancies.
One important aspect of craniofacial growth prediction involves the assessment of a patient’s skeletal age, which could aid in timing orthodontic treatment with the facial growth spurt, particularly in Angle Class II patients. Recently, a method to assess skeletal maturation as it relates to facial growth was devised by Lamparski, who created a set of standards for cervical vertebrae maturation (CVM) using lateral cephalograms and correlated this to hand-wrist radiographs. He reported that his set of standards was as accurate as the hand-wrist method, without additional radiation exposure to the patient. Subsequently, numerous authors have studied the relationship between CVM and skeletal maturity based on hand-wrist radiographs, and others have examined the correlation between CVM and mandibular growth.
Careful review of the literature on the CVM method showed potential methodologic errors that might influence the reported reproducibility of this method. Most of these studies reported interobserver and intraobserver reproducibility levels of greater than 90%, with the exception of Kucukkeles et al, who found that 2 of the 3 intraobserver tests of reproducibility were 45% and 65%. Most studies reporting high levels of reproducibility used tracings of lateral cephalograms, rather than actual radiographs, to determine CVM stages. Some latitude must be granted to a person in tracing cephalometric radiographs, since this is not an exact science, and cephalometric radiographs traced by another person might introduce a level of uniformity into the staging process that is not found if each observer performs an independent analysis. Therefore, the use of standardized tracings might inflate the levels of reproducibility reported in the literature. Furthermore, the observers performing the tests of reproducibility are often the authors themselves. It is possible that they have a “research-level” understanding of the CVM method that could overstate the reported levels of reproducibility.
Many studies reporting CVM reproducibility levels had small sample sizes, some of which appeared to be significantly reduced from larger samples, so that the overall randomness of the sample was in question. In many instances, the same sample was used repeatedly in subsequent studies, and the authors failed to test their results on separate, larger, and more random samples. In addition, some studies reported values of reproducibility with the Pearson correlation coefficient, which is a measure of association between 2 normally distributed interval types of variables. However, a more stringent measure of association for use with categorical data is recommended for measuring agreement between judges.
The reproducibility of the CVM method was recently assessed by Gabriel et al, who attempted to use methods to “eliminate the methodologic shortcomings of previous studies.” In that study, 30 subjects (15 boys, 15 girls) were randomly selected from the University of Iowa Facial Growth Study. Their age range was 10 to 16 years. Thirty lateral cephalograms (1 for each subject) were placed in random order and presented to 10 private-practice orthodontists who served as judges and were asked to assign a CVM stage (1-6) to each subject. The judges were then asked to stage the same subjects 3 weeks later. From these results, intraobserver and interobserver levels of agreement were evaluated. The authors found that interobserver agreement for CVM staging among practicing orthodontists was below 50%; on average, the clinicians agreed with their own staging only 62% of the time (ranging from a high of 80% for 1 clinician to a low of 43% for 2 clinicians); and the reproducibility of trained clinicians was significantly below the level purported in the literature. Based on these results, the authors could not support the use of the CVM method as a strict clinical guideline for timing orthodontic treatments.
For any measurement to be of value, it must be reproducible. But why does the CVM method demonstrate poor reproducibility? The purpose of this study was to expand on the study by Gabriel et al to determine why the reproducibility of CVM staging was poor. Specifically, our objective was to answer the following questions. What was the level of agreement between the observers regarding specific cervical vertebral morphology? Were certain morphologic features of the cervical vertebrae more consistently classified the same by different observers? Were certain morphologic features of the cervical vertebrae less consistently classified the same by different observers? Was it possible to assign a CVM stage to each subject based on the observers’ responses, or were there combinations of morphologic features that would make it impossible to stage the subject? In all cases when a CVM stage could be assigned, what was the level of agreement between the observers regarding the derived CVM stage for each subject (interobserver agreement)? Finally, how well did the orthodontists agree with their own previous stagings of the same subjects?
Material and methods
The sample used in this study was the same sample used by Gabriel et al, who randomly selected subjects with untreated longitudinal growth records from the Iowa Facial Growth Study. Thirty lateral cephalograms of good quality with complete visualization of cervical vertebrae C1 through C4 were selected for 15 white boys and 15 white girls. The lateral cephalograms were scanned at 600 dpi for placement into a presentation as high-resolution images in TIF format to maintain the original radiographic quality ( Fig 1 ). The lateral cephalograms were cropped to include cervical vertebrae C1 to C4 and to eliminate any additional information such as stage of dentition that might bias the observer.
The observers in this study included the same 10 private-practice orthodontists who participated in the study by Gabriel et al. The orthodontists had between 7 and 40 years of clinical orthodontic experience at the time of our study (mean, 21.2 years of clinical experience). The observers did not participate in the design or construction of the research project. Each observer was trained in the CVM method and given definitions of CVM morphology to be used at any time during the study ( Tables I and II , Fig 2 ). The observers were then shown a PowerPoint (Microsoft, Redmond, Wash) presentation containing lateral cephalograms of the 30 subjects previously described.
|CS 1||The lower borders of all 3 vertebrae (C2-C4) are flat. The bodies of C3 and C4 are trapezoid in shape.|
|CS 2||A concavity is present at the lower border of C2 in 80% of cases. The bodies of both C3 and C4 are trapezoid in shape.|
|CS 3||Concavities at the lower borders of both C2 and C3 are present. The bodies of C3 and C4 can be either trapezoid or rectangular horizontal in shape.|
|CS 4||Concavities at the lower borders of C2, C3, and C4 now are present. The bodies of C3 and C4 are rectangular horizontal in shape.|
|CS 5||The concavities at the lower borders of C2, C3, and C4 still are present. At least 1 body of C3 or C4 is square in shape. If not square, the body of the other cervical vertebrae still is rectangular horizontal.|
|CS 6||The concavities at the lower borders of C2, C3, and C4 still are present. At least 1 body of C3 or C4 is rectangular vertical in shape. If not rectangular vertical, the body of the other cervical vertebrae is square.|
|Trapezoid||The superior border is tapered from posterior to anterior.|
|Rectangular horizontal||The heights of the posterior and anterior borders are equal; the superior and inferior borders are longer than the anterior and posterior borders.|
|Square||The posterior, superior, anterior, and inferior borders are equal.|
|Rectangular vertical||The posterior and anterior borders are longer than the superior and inferior borders.|
For each image, the following 5 questions regarding cervical vertebrae morphology (used to stage the cephalograms according to the CVM method) were asked:
Is the lower border of C2 best described as flat or curved?
Is the lower border of C3 best described as flat or curved?
Is the lower border of C4 best described as flat or curved?
Is the vertebral body of C3 best described as trapezoidal, rectangular horizontal, square, or rectangular vertical?
Is the vertebral body of C4 best described as trapezoidal, rectangular horizontal, square, or rectangular vertical?
Each observer was given a recording sheet and asked to answer the same 5 questions for each of the 30 subjects (a total of 150 responses for each of the 10 observers). The observers had unlimited time to make their evaluations.
The Fleiss kappa statistic was used to assess interobserver agreement when evaluating each of the 5 vertebral morphology questions for each subject. This measure calculates the degree of agreement over that which would be expected by chance. The Fleiss kappa varies between 0 (no agreement) and 1 (perfect agreement). Coefficients of 0.4 to 0.6 are generally considered to indicate moderate agreement. The Kendall coefficient of concordance (Kendall’s W) was used to assess interobserver agreement when determining a “derived CVM stage” for each subject. Kendall’s W varies between 0.0 (no agreement) and 1.0 (maximum agreement). Kendall’s W values of 0.4 to 0.6 are generally considered to indicate moderate agreement. The alpha level of each test was set at 0.05. Intraobserver agreement was evaluated by comparing the staging results made by Gabriel et al with those of the same authors on the same subjects in this study.