Introduction
Growth and its prediction are important for the success of many orthodontic treatments. The aim of this study was to determine the reliability of the cervical vertebral maturation (CVM) method for the assessment of mandibular growth.
Methods
A group of 20 orthodontic clinicians, inexperienced in CVM staging, was trained to use the improved version of the CVM method for the assessment of mandibular growth with a teaching program. They independently assessed 72 consecutive lateral cephalograms, taken at Liverpool University Dental Hospital, on 2 occasions. The cephalograms were presented in 2 different random orders and interspersed with 11 additional images for standardization. The intraobserver and interobserver agreement values were evaluated using the weighted kappa statistic.
Results
The intraobserver and interobserver agreement values were substantial (weighted kappa, 0.6-0.8). The overall intraobserver agreement was 0.70 (SE, 0.01), with average agreement of 89%. The interobserver agreement values were 0.68 (SE, 0.03) for phase 1 and 0.66 (SE, 0.03) for phase 2, with average interobserver agreement of 88%.
Conclusions
The intraobserver and interobserver agreement values of classifying the vertebral stages with the CVM method were substantial. These findings demonstrate that this method of CVM classification is reproducible and reliable.
Highlights
- •
Reliability of the cervical vertebral maturation (CVM) method was determined.
- •
Twenty inexperienced observers staged 72 full lateral cephalograms.
- •
Overall intraobserver agreement was substantial (weighted kappa, 0.70; SE, 0.01).
- •
CVM staging classification is reproducible and reliable.
Graphical abstract
Knowledge of craniofacial growth and development is a prerequisite for the comprehensive and successful management of orthodontic patients. Such knowledge plays a crucial role in the diagnosis, treatment planning, outcome, and overall stability of a patient’s orthodontic treatment. Numerous methods to identify the stage of growth and development, as well as the prediction of both the timing of onset and the potential of this growth, have been investigated. These investigations have included assessments by chronologic age, skeletal age, and skeletal maturation, as well as mandibular growth, standing height, menarche and voice changes, and cervical vertebral maturation (CVM).
Of these, the use of hand-wrist radiographs to assess skeletal maturity and growth has been investigated by many authors. Initially advocated by Bergersen, Fishman introduced the skeletal maturity index using hand-wrist films in 1982 in response to conflicting evidence from Houston and Hägg and Taranger. The skeletal maturity index has varied in popularity, mostly because it requires additional radiation exposure and a specific skill set to interpret.
As a result, alternatives to hand-wrist radiographs were developed. Lateral cephalograms are commonplace in orthodontics and familiar to the orthodontist; therefore, investigators have looked at the relationship between CVM, hand-wrist radiographs, and mandibular growth, largely concluding that the CVM method is a valid indicator for the assessment of skeletal maturity and is comparable with hand-wrist radiographs. However, more recently, Beit et al concluded that CVM assessment offers no advantage over chronologic age in either assessing skeletal age or predicting the pubertal growth spurt.
The CVM method, first described by Lamparski, is based on assessing the shape of the cervical bodies, as seen in routine lateral cephalograms. Lamparski explored the relationship between the anatomy of the cervical vertebrae and the hand-wrist radiographs, and concluded that his method was as accurate as the hand-wrist method, with the added advantage of avoiding additional radiation exposure. Subsequently, Hassel and Farman used a sample of headfilms from the Bolton-Brush Growth Study to identify maturational markers in the cervical vertebrae that correlated with Fishman’s skeletal maturity index using hand-wrist radiographs.
More recently, Franchi et al and Baccetti et al confirmed the validity of Lamparski’s original method as a biologic indicator for both mandibular and somatic skeletal maturation. They continued to modify Lamparski’s method using longitudinal growth records from the University of Michigan Growth Study, making it applicable to both sexes, easier to use, and suitable for most patients. Baccetti et al reported good reproducibility with this method of assessing CVM.
A successful diagnostic tool must be valid and reliable. Ideally, it must do what it is purported to do in a quick, easy, and reproducible way. The CVM staging method, according to Baccetti et al, also must detect the peak in mandibular growth in a consistent manner, with interexaminer error as low as possible. The available literature assessing the reliability of the CVM staging method, however, is conflicting, with intraobserver and interobserver correlations ranging from perfect to poor agreement.
A recent systematic review of the CVM method by Santiago et al highlighted the methodologic flaws of previous research that assessed the reliability of the index and encouraged more robust testing of the index to establish whether it is a clinically applicable tool. This systematic review suggested that authors should not be used as observers because they have research-level experience, and that the image sample should be random, not preselected on the basis of ease of stage determination. Other problems identified from reviewing the literature included small sample sizes and too few observers, both of which reduced the generalizability of the results.
Gabriel et al had attempted to address these methodological concerns and concluded that the CVM method has poor reliability. However, these authors had the cephalograms in a cropped format, showing only cervical vertebrae C2, C3, and C4. Perinetti et al also used cropped images that then were hand traced to evaluate the diagnostic accuracy and repeatability of the visual assessment of the CVM stages. They found that visual assessment of the CVM stages was accurate and repeatable to a satisfactory level. Cropping may reduce the resemblance of the test environment to the normal clinical situation and is therefore thought to be an unnecessary step that could influence the reliability and reproducibility of the method. Hand tracing adds an additional stage to the assessment process and takes it further from the clinical environment in which an orthodontist will view a full cephalogram.
The aim of this study was to determine the reliability of the improved version of the CVM method for the assessment of mandibular growth.
Material and methods
Ethical approval was obtained from the East Midlands Research Ethics Committee (reference 12/EM/0126).
The primary outcome was to determine the intraobserver and interobserver reliability of CVM stage determinations by a group of orthodontic clinicians.
The secondary outcome was to assess whether image quality influences reliability.
This was a 2-phase reliability study. A group of 20 orthodontic clinicians (9 orthodontists, 11 orthodontic residents), who were members of the Mersey and North Wales Audit Group, and none of whom had used the CVM staging method previously, was trained to use the improved CVM method using the teaching material from Baccetti et al. The training was carried out at the beginning of each phase of the reliability study. The training presentation included (1) a detailed explanation of the morphologic features of each cervical stage (CS) in diagrammatic format initially, (2) a written description of the radiographic features of each CS, (3) a PowerPoint (Microsoft, Redmond, Wash) presentation concerning Professor James McNamara’s novel way of remembering the characteristics of each CS (personal communication, 2010, 2012), and (4) a calibration exercise to ensure that all observers understood the method.
The sample of lateral cephalograms was selected from consecutive headfilms, satisfying the inclusion and exclusion criteria, taken in the radiology department at Liverpool University Dental Hospital (LUDH) during a 4-month interval. All cephalograms were taken of patients who had undergone radiographic exposure in line with normal clinical practice.
The full lateral cephalograms were presented in a random order in the PowerPoint presentation and interspersed at regular intervals with 11 “standardized” images provided for standardization by McNamara, the codeveloper of the modified CVM index. The supplemental sample of 11 standardized radiographs was presented in a cropped format, including the cervical vertebrae only, since this was how its authors originally described the method. McNamara described this sample as portraying clearly the various stages of the CVM.
The purpose of this supplemental prestaged sample was to validate the training provided to the observers. This combination of randomly gathered headfilms and the supplemental prestaged cropped films gave a total sample of 83 images. The observers were given hard copies of reference material related to the staging of the cervical vertebrae as a memory aid, for use throughout the reliability study.
Immediately after the training session, the 20 observers were shown the 83 images for 30 seconds each and asked to score each cervical image. At the midway point, the observers were given a 5-minute break. Three months after the first phase, all observers were retrained in the same technique and asked to stage the same sample of lateral cephalograms, presented this time in a different random order.
Lateral cephalograms were included, irrespective of sex, if the patient was below the age of 18 years, had no previous orthodontic treatment, was beginning treatment with first-year orthodontic residents, and had a cephalogram with complete visualization of cervical vertebrae C2, C3, and C4.
Lateral cephalograms were excluded if the patient was over age 18 years at the initial records appointment, had previous orthodontic treatment, had been diagnosed with any congenital clefts of the lip or palate or any known or suspected craniofacial syndromes or growth-related conditions, required orthognathic surgery, or had a cephalogram that did not show C2, C3, and C4 adequately.
The number of lateral cephalograms in the image sample was determined by the minimum sample size required for the valid use of the weighted kappa statistic and was approximated by the sample size equation 2k 2 , where k was the number of categories in the rating scale. The rating scale used here has 6 categories, giving a minimum sample size of 72 radiographs.
Statistical analysis
Intraobserver agreement was determined using percentages of agreement and the weighted kappa coefficient to calculate the chance corrected agreement.
The CVM index is an ordinal categorical scale; consequently, the weighted kappa statistic was used, allowing credit for complete and partial agreement. The unweighted kappa is unsuitable for ordinal data. The linear weightings for a 6-category scale are shown in Table I .
Cervical stage at Phase 2 | Cervical stage at Phase 1 | |||||
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | |
1 | 1 | 0.8 | 0.6 | 0.4 | 0.2 | 0 |
2 | 0.8 | 1 | 0.8 | 0.6 | 0.4 | 0.2 |
3 | 0.6 | 0.8 | 1 | 0.8 | 0.6 | 0.4 |
4 | 0.4 | 0.6 | 0.8 | 1 | 0.8 | 0.6 |
5 | 0.2 | 0.4 | 0.6 | 0.8 | 1 | 0.8 |
6 | 0 | 0.2 | 0.4 | 0.6 | 0.8 | 1 |
In this study, linear weighted kappa coefficients were determined for intraobserver reliability and were calculated by manual construction of a 6 × 6 comparison table ( Table II ) that comprised the intraobserver agreements for each of the 2 samples of radiographs and all clinicians. These data were then entered into the VassarStat statistical program ( http://www.vassarstats.net ) to calculate the linear weighted kappa statistic. Interobserver agreement also was calculated using percentages of agreement and weighted kappa statistics with the Excel program AgreeStat (Microsoft).
CVM stage at Phase 2 | CVM stage at Phase 1 | Total | |||||
---|---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | ||
1 | 95 | 17 | 22 | 5 | 3 | 0 | 142 |
2 | 21 | 100 | 36 | 13 | 5 | 0 | 175 |
3 | 13 | 29 | 110 | 45 | 14 | 1 | 212 |
4 | 2 | 6 | 39 | 206 | 60 | 7 | 320 |
5 | 0 | 1 | 13 | 88 | 217 | 43 | 362 |
6 | 0 | 0 | 2 | 8 | 62 | 157 | 229 |
Total | 131 | 153 | 222 | 365 | 363 | 208 | 1440 |
Landis and Koch have proposed the following as standards for the strength of agreement for the kappa coefficient: ≤0, poor; 0.01-0.20, slight; 0.21-0.40, fair; 0.41-0.60, moderate; 0.61-0.80, substantial, and 0.81-1, almost perfect. Similar formulations exist, but with slightly different descriptors. The choice of such benchmarks, however, inevitably is arbitrary.
Results
The overall intraobserver and interobserver reliability for our sample was characterized as substantial (weighted kappa, 0.61-0.8).
The intraobserver reliability of the 20 observers, when staging the LUDH image sample, gave a weighted kappa statistic of 0.70 (SE, 0.01), with an average agreement of 89% ( Table III ). When they staged the “standardized” image sample, this increased to a weighted kappa statistic of 0.82 (SE, 0.02), which was statistically significantly better than the value for the LUDH sample.