The aim of this research was to investigate the perceived facial changes in Class II Division 1 patients with convex profiles after functional orthopedic treatment followed by fixed orthodontic appliances.
Pretreatment and posttreatment profile photographs of 12 Class II Division 1 patients treated with activators, 12 Class II Division 1 patients treated with Twin-block appliances, and 12 controls with normal profiles treated without functional appliances were presented in pairs to 10 orthodontists, 10 patients, 10 parents, and 10 laypersons. The raters assessed changes in facial appearance on a visual analog scale. Two-way multivariate analysis of variance was used to evaluate differences among group ratings.
Intrarater reliability was strong in most cases (intraclass correlation coefficients, >0.7). The internal consistency of the assessments was high (alpha, >0.87), both within and between groups. The raters consistently perceived more positive changes in the Class II Division 1 groups compared with the control group. However, this difference hardly exceeded 1/10th of the total visual analog scale length in its highest value and was mostly evident in the lower face and chin. No significant differences were found between the activator and the Twin-block groups.
Although the raters perceived improvements of the facial profiles after functional orthopedic treatment followed by fixed orthodontic appliances, these were quite limited. Thus, orthodontists should be tentative when predicting significant improvement of a patient’s profile with this treatment option.
There were perceived facial Class II profile changes after functional treatment with fixed appliances.
There were strong intrarater reliability and high internal consistency within and between groups.
The raters perceived more positive changes in the Class II groups compared with the controls.
This difference hardly exceeded 1/10th of the total VAS length in its highest value.
No significant differences were found between the activator and Twin-block groups.
Class II malocclusions have high prevalences in the population and are evident in a significant percentage of patients seeking orthodontic treatment. A common treatment option, especially for growing skeletal Class II patients with a convex profile due to a retrognathic mandible, involves functional orthopedic treatment aiming to enhance mandibular growth. Activator and Twin-block are 2 popular functional appliances of this type.
Previous studies that evaluated the soft tissue response to activator and Twin-block treatment using cephalometric measurements reported improvement of facial profiles after functional treatment. A recent systematic review on this topic concluded that skeletal effects are minimal when natural growth is taken into account, but there are significant dentoalveolar and soft tissue changes. However, the clinical impact of these changes is still questionable, even regarding soft tissues.
Improving facial appearance is an important goal of contemporary orthodontic treatment and a main reason for seeking treatment. Thus, patient satisfaction is closely related to improvement of the facial esthetic parameters. Patients with a Class II skeletal pattern usually have increased facial convexity and retruded positions of the mandibular hard and soft tissues. These patients seek orthodontic treatment mainly to improve their facial appearance and consequently their self-esteem and quality of life.
The definitions of beauty and attractiveness are complex and highly subjective. Probably, what laypersons find attractive might not necessarily agree with patients’ or experts’ opinions, influenced by their personal experiences and educational backgrounds, respectively. However, the orthodontic treatment outcome should meet patients’ and parents’ expectations, and it must also be perceivable during social interactions.
To our knowledge, no authors have attempted to investigate the esthetic outcome of functional orthopedic treatment combined with fixed orthodontic appliances in patients with convex facial profiles as perceived by different groups of raters who assessed actual facial images. Thus, this was the main purpose of our study. Secondary objectives were to assess potential differences between groups of raters, functional appliances (activator and Twin-block), and regions of the face.
Material and methods
The study protocol was approved by the ethical committee of the dental school of Aristotle University of Thessaloniki in Greece (protocol 07/05-11-2015). All parents or guardians provided signed informed consent that allowed for the use of the patients’ data for research purposes.
The sample was retrieved from the postgraduate clinic of the Department of Orthodontics of Aristotle University. Through retrospective searches of patient files, the most recent patients who fulfilled the inclusion criteria were assigned to groups. Selection was designed to create 3 groups of 12 persons, each consisting of 6 males and 6 females. Two would be the test groups and one the control group.
For the test groups, the dates of treatment completion ranged from January 2000 to October 2015. The initial diagnostic records were considered in the sample selection process. The final diagnostic records were examined at this stage only to confirm availability.
The inclusion criteria for the test groups were (1) full initial and final diagnostic records (medical, dental, and orthodontic histories, panoramic and lateral cephalometric radiographs, dental casts, intraoral and extraoral photographs of good quality and without obvious positional or other errors), (2) Class II Division 1 malocclusion at the beginning of orthodontic treatment (Class II, more than half cusp in molars bilaterally), (3) convex profile defined by facial contour angles greater than 15° for males and greater than 17° for females at the initial lateral cephalometric radiograph, (4) mixed dentition at the beginning of orthodontic treatment, (5) complete treatment with functional (activator or Twin-block) and fixed orthodontic appliances, (6) nonextraction treatment (excluding third molars), (7) white origin, and (8) no craniofacial anomalies, syndromes, clefts, congenitally missing teeth (excluding third molars), severe facial asymmetries, or functional mandibular shift greater than 1 mm.
In the test groups, Class II or other types of interarch elastic forces were used during the fixed appliance stage when deemed necessary by the treating doctor.
The control group comprised 12 patients with normal facial contours before and after orthodontic treatment. Nine of them had Class I and 3 had Class II malocclusions with less than a half-cusp distal molar relationship bilaterally. In this group, the facial contour angle was between 7° and 15° for males and between 9° and 17° for females, both at the initial and final lateral cephalometric radiographs. The other inclusion criteria were identical to those of the test groups. Orthodontic treatment in these patients was completed between October 2005 and October 2015 and included fixed and sometimes, additionally, removable orthodontic appliances, but no functional appliances. Interarch elastic forces were used in the control sample at the later stages of treatment when deemed necessary to achieve proper interdigitation. After sample selection, the occlusal treatment outcomes of all groups were investigated using final casts and intraoral photographs.
The initial and final profile photographs of the 36 patients ( Table I ) were assessed. All photographs were taken with the Frankfort horizontal plane parallel to the ground, the teeth in maximum intercuspation, and the lips at rest. Photographs that were not in digital form (35-mm slides) were converted into digital files of 300 dpi resolution using an appropriate scanner (J232D Perfection V330; Epson, Jakarta, Indonesia). Then all the photographs were edited in Adobe Photoshop CS5 (Adobe Systems, San Jose, Calif) to have a consistent white background and similar brightness and contrast.
|Treatment type||n (sex)||Malocclusion type||Age (y)
|Treatment duration (y)
|Facial contour angle (°)
|T0||T1||T0 ∗||T1 ∗||T0 ∗||T1|
|Activator||12 (6 M, 6 F)||Class II Division 1||9.8 (9.2-11.2)||13.9 (12.8-15.2)||4.1 (2.8-5.0)||20.5 (18.0-28.0) a||17.0 (13.0-22.5) a||8.0 (5.0-12.0) a||2.0 (1.5-3.0)|
|Twin-block||12 (6 M, 6 F)||Class II Division 1||10.6 (9.0-11.9)||13.5 (11.7-16.7)||3.6 (1.6-5.7)||20.0 (17.0-25.5) a||17.0 (10.5-21.0) a||7.5 (5.0-16.0) a||2.25 (1.5-4.0)|
|Control||12 (6 M, 6 F)||Class I||10.7 (8.9-12.9)||14.5 (13.1-17.0)||3.7 (2.5-5.9)||12.3 (9.5-14.5) b||12.0 (9.50-15.0) b||3.5 (2.0-5.0) b||2.0 (1.5-2.5)|
The adjusted photographs were evaluated by 4 groups of raters (orthodontists, patients, parents, and laypersons). The patients’ group of raters was randomly selected from the Class II Division 1 patients who were treated in our clinic during the study and were between 10 and 15 years of age. The parents’ group comprised parents of equivalent patients. Apart from patients, the other groups were composed of adults of various ages (20-66 years old). Each rater group consisted of the first 30 white persons who agreed to participate; they formed groups of equal numbers of males and females. Care was taken to ensure that all groups of raters, except orthodontists, had wide ranges of educational levels and socioeconomic statuses. No rater was related to those in the study sample, and the orthodontists were not involved at any stage of treatment.
The initial and final facial profile photographs of each patient were presented in pairs, in a printed A4-size page, in landscape orientation. After random selection, half of the patients of each treatment group (3 male, 3 female) were presented with the initial photograph to the left and the final photograph to the right, whereas the other half were presented in reverse order. Each patient’s face, as well as the before and after facial sizes, were adjusted to a certain size so as not to affect assessments ( Fig 1 ).
For each pair of photos, the raters were asked to fill out a questionnaire consisting of 5 questions, each accompanied by an illustration for easier understanding and a 100-mm visual analog scale (VAS) for marking the answers ( Fig 2 ).
A pilot study with 2 potential raters showed that the completion time for assessment of the 36 patients was particularly high (over 30 minutes), making the process uncomfortable for the raters. A second pilot study with 4 raters, each representing 1 of the 4 groups of raters, showed that completion time was reduced to 11.5 minutes (range, 8.2-14.5 minutes), when 12 patients were assessed by each rater. At the end of the trial, the raters were asked through open-ended questions to assess the whole process. None of them suggested any changes, reported any difficulties, or considered the process time-consuming and tiring. Thus, the patients were randomly divided into groups of 12 (4 subjects; 2 male, 2 female, from each of the 3 study groups) to be evaluated by the raters. Thirty persons were required in each rater group to obtain 10 assessments of each patient by each rater group.
The printed questionnaires were administered by a researcher (K.T.) who approached all raters in a similar manner. Before the questionnaire session, specific instructions concerning the evaluation of the photographs and the use of the VAS were given. A pilot assessment of a subject who was not included in the sample was also conducted. The raters were not told the purpose of the study and did not know that the pairs of photographs showed patients before and after orthodontic treatment. They all evaluated the questionnaires in a quiet place with adequate lighting and under slight supervision from the investigator.
The distances between the start of the scale and the markings of each rater were measured (in millimeters) by a researcher (K.T.), using an electronic digital caliper (Jainmed, Seoul, Korea), to transform ratings to continuous metric variables.
When the final photograph was presented to the left and the raters evaluated—without having knowledge—the change from the final to the initial status, the VAS measurements were converted by subtracting each value from 100 to conform with the other half ratings, where the change was assessed from the initial to the final status.
Two weeks after the initial measurement, 30 VAS scores were remeasured by the same researcher to evaluate method error. For the assessment of intrarater agreement, 12 raters (3 from each group) reevaluated the same questionnaire 1 month after the initial evaluation.
The statistical analysis was carried out by using SPSS software (version 20.0; IBM, Armonk, NY). The Levene test showed homogeneity of variances in all cases. Data were tested for normality with the Shapiro-Wilk test and were not normally distributed in a few cases. Thus, parametric and nonparametric statistics were applied depending on normality.
Treatment group similarity in certain characteristics was assessed using the Kruskal-Wallis test followed by the post-hoc Mann-Whitney U test for pairwise comparisons.
Intraexaminer agreement on the repeated VAS measurements was tested with the Wilcoxon signed rank test. Random error was assessed with Dahlberg’s formula.
Intrarater agreement (test-retest reliability) of repeated VAS ratings was tested with the Wilcoxon signed rank test and the intraclass correlation coefficient (1-way random model, absolute agreement, single measures). A 1-sample t test was used for testing if the mean differences between the 2 measurements are statistically different from 0, and Bland-Altman plots were an alternative way to assess the intrarater level of agreement.
Internal consistency for professionals, patients, parents, and laypeople was assessed by the calculation of the Cronbach alpha for each test group separately. The Cronbach alpha was based on individual scores of the assessors of each group. The effect of deleting each item at once from a subscale in the obtained alpha values was also examined. A level above 0.8 was considered high consistency and above 0.7 was considered acceptable.
The interrater agreement among groups was determined by means of intraclass correlation coefficients (2-way mixed model, absolute agreement, average measures). Since each patient was rated by 10 members of each group of raters for each item, the median VAS score was used to obtain a more representative approximation of each group’s assessments for each patient. Interrater agreement was calculated for each evaluation separately. A level above 0.7 was considered strong agreement; moderate agreement was defined between 0.5 and 0.6. This, along with comparative statistical tests, was used to test agreement between group pairs and also as a test of reliability and an example of concurrent and statistical conclusion validity of the questionnaire.
Two-way multivariate analysis of variance was used to evaluate differences among group ratings. The assessment score of each patient was calculated as described above for interrater agreement. Responses to the 5 items of the questionnaire were the 5 dependent variables and treatment groups (activator, Twin-block, control group), and the rater groups (orthodontists, patients, parents, laypeople) were the independent variables. Equality of covariances of the dependent variables was tested with the Box M test. In case of significant differences detected by the multivariate tests, post hoc pairwise comparisons were performed with the Fisher least significant difference test.
In all cases, a 2-sided significance test was carried out at an alpha level of 0.05. The level of significance used for the study was set at 0.05. The Bonferroni correction was applied for pairwise a posteriori multiple comparison tests.
The treatment groups were similar in sex distributions, pretreatment and posttreatment ages, and treatment durations. The activator and Twin-block groups were also similar regarding the pretreatment and posttreatment facial contour angles and the pretreatment overjets, but they differed significantly with the control group in these parameters. Posttreatment overjet was similar and within normal values in all groups, suggesting successfully treated patients in this aspect ( Table I ).
There was no statistically significant difference between the first and second VAS measurements (intraexaminer error; Wilcoxon signed-rank test, P >0.05); random error was minimal (0.07 mm).
The Wilcoxon signed rank test showed no statistically significant difference between repeated VAS ratings (intrarater agreement; P >0.01) of all 12 raters apart from 3 Items assessed by 1 orthodontist. The intraclass correlation coefficient showed strong to almost perfect intrarater agreement for orthodontists, laypeople, and parents. Moderate agreement was evident only for certain items evaluated by patients ( Table II ). Mean differences between the 2 repeated ratings performed by 12 raters were minimal. The 1-sample t test for the mean differences between the 2 measurements from 0 showed that only the orthodontists’ group provided slightly more positive ratings in the first rating compared with the second rating ( Table III ). Bland-Altman plots also showed homogenous patterns of error in most cases (similar variances and magnitudes of error) without signs of systematic error in measurements. Furthermore, the magnitude of error was not increased with the increase in VAS ratings ( Fig 3 ).
|Profile||0.86 (CI, 0.74, 0.93)||0.72 (CI, 0.51, 0.84)||0.86 (CI, 0.74, 0.92)||0.86 (CI, 0.74, 0.92)|
|Lower face||0.91 (CI, 0.83, 0.95)||0.75 (CI, 0.56, 0.86)||0.79 (CI, 0.62, 0.89)||0.86 (CI, 0.75, 0.93)|
|Upper lip||0.76 (CI, 0.58, 0.87)||0.60 (CI, 0.34, 0.77)||0.78 (CI, 0.61, 0.88)||0.87 (CI, 0.76, 0.93)|
|Lower lip||0.91 (CI, 0.84, 0.95)||0.57 (CI, 0.31, 0.76)||0.79 (CI, 0.62, 0.89)||0.88 (CI, 0.78, 0.94)|
|Chin||0.91 (CI, 0.84, 0.96)||0.50 (CI, 0.21, 0.71)||0.76 (CI, 0.58, 0.87)||0.93 (CI, 0.87, 0.96)|
|t||df||Significance (2-tailed)||Mean difference||SD||95% CI of difference|
|Lower face||3.15||35||0.003 ∗||5.06||9.63||1.80||8.32|
|Upper lip||4.72||35||0.000 ∗||6.35||8.08||3.61||9.08|
The internal consistency of the items of the questionnaire was generally acceptable both within and between groups, with a Cronbach alpha value higher than 0.87 in all cases. Examination of the importance of each item to the alpha values showed that eliminating any item would not increase alpha values significantly and consistently in all cases, and so it was reasonable to keep all items ( Table IV ).
|Group||Items||Cronbach alpha||All (if item deleted)|
|Activator (if item deleted)||Twin-block (if item deleted)||Control (if item deleted)|
|Upper lip||0.900||0.930 ∗||0.919||0.919|
|Chin||0.905||0.912||0.962 ∗||0.932 ∗|
|Chin||0.863||0.929 ∗||0.929 ∗||0.913 ∗|
|Chin||0.933 ∗||0.950||0.960||0.950 ∗|
|Chin||0.909||0.938||0.954 ∗||0.937 ∗|