We read with obvious interest an article published online in October 2009 about the reproducibility of the cervical vertebral maturation (CVM) method (Gabriel DB, Southard KA, Qian F, Marshall SD, Franciscus RG, Southard TE. Cervical vertebrae maturation method: poor reproducibility. Am J Orthod Dentofacial Orthop 2009;136:478.e1-7). We would like to express a few concerns with regard to the methodology and the interpretation of the results of this study on the reproducibility of the CVM method.
First, in “Material and methods,” the authors reported that the training of the orthodontists judging the CVM method consisted of their receiving a hard-copy handout of a schematic representation of the 6 stages of the CVM method and a legend, with no further explanation or training. Therefore, the exposure of the judging orthodontists to the method consisted of an extremely limited self-learning experience.
Also, the schematic representation of the CVM method that was given to the orthodontists (Fig 1 in Baccetti et al ) never was proposed by the original authors as a guideline for the implementation of the CVM method in a clinical setting. That article described at least 2 examples of the shape of the third and fourth cervical vertebrae for the same CVM stage (more specifically, for stages CS 3, CS 5, and CS 6).
We actually are thankful to the authors for offering us an indirect suggestion to give clinicians and readers more detailed practical tips to perform the CVM method routinely on lateral cephalograms. Any descriptive categorization or staging of a biologic system requires an understanding of the nuances and subtleties of the method, since there is a gradual transition from 1 stage to the next.
Even considering the limited training opportunity of the judging orthodontists in the study, another concern relates to the interpretation of the results. As reported in the title (for us, it is highly unusual to include the study’s conclusions in the title), the reproducibility of the CVM method was defined as “poor.” In the introduction, the authors recommended the use of a “stringent measure of association … for measuring agreement between judges.” However, no reference scale for the interpretation of the weighted kappa values for agreement between observers was reported in “Material and methods.”
If we look at the results in Table IV, the weighted kappa coefficient for intraobserver agreement for individual cephalograms was between 0.36 and 0.79, with 9 of 10 observers scoring more than 0.41. According to the most widely used scale for the interpretation of weighted kappa in studies on intraobserver agreement (Landis and Koch ), a kappa value greater than 0.41 indicates either moderate (0.41-0.60) or substantial (0.61-0.80) agreement. It is noteworthy that 50% of the observers showed substantial agreement, 40% had moderate agreement, and only 1 observer showed fair agreement. The longitudinal portion of the study reported even better scores. We wonder how the authors were induced to define as “poor” an agreement that typically is considered to be moderate to substantial.
Interestingly, Ballrick et al from Ohio State University performed a similar study on both the accuracy and the reproducibility of the CVM method in orthodontic graduate students. Their results showed very good reproducibility (kappa value, 0.82), which would be interpreted as “almost perfect agreement” according to the scale by Landis and Koch. Thus, the findings and conclusions of the Iowa and Ohio State studies are in sharp contrast.