I read with interest the article of Kandasamy et al (Kandasamy S, Boeddinghaus R, Kruger E. Condylar position assessed by magnetic resonance imaging after various bite position registrations. Am J Orthod Dentofacial Orthop 2013;144:512-7). I believe some points need clarification or correction.
It was not mentioned on which basis the authors estimated the power. Was it based on a pilot study or a previous study? Since there was no cited previous study, it seems a pilot study. So, if the authors had seen in their pilot study that the differences were as small as 0.1 mm, why did they set their power to detect a 1.0-mm difference instead of a 0.1-mm difference?
The mean difference inputted into the power formula (ie, the 1.0-mm difference) was too high. I think so because, according to the table, the average of mean (absolute) differences was 0.141 mm (ranging from 0.01 to 0.36 mm). Therefore, instead of 1 mm, the authors probably needed to calculate the sample size for detection of differences as small as about 0.14 mm.
It is surprising that none of the comparisons became statistically significant when the calculated test power was 80%. I believe the reason is that the power calculation was incorrect, and the actual power was far smaller than reported.
Calculating the power for detection of a much greater difference might have considerably inflated the calculated power and falsely reduced the required sample size.
To assess the variability and reproducibility of the techniques, the authors tested each technique thrice on 2 patients. Then they used repeated-measures ANOVA and interpreted its nonsignificant result as an indicator of the reliability of these methods (p. 514). It is incorrect. In such a small sample (2 participants × 3 repeated experiments for each method), the lack of significance is very likely only a sign of the low test power (type II error). Besides, using repeated-measures ANOVA in such a small sample is likely incorrect (since normality and sphericity are least likely to hold). The authors should probably have run a Friedman test instead.
Two different ANOVA tests and a Tukey test were used. However, in the results, there is absolutely no mention of any statistical tests and their P values. It is only stated that there were no significant differences! So what were the P values? Both P = 0.056 and P = 0.950 are nonsignificant. So which one applied to this study? And if the ANOVA tests were nonsignificant, why was the Tukey test used?
The authors stated that the results of all techniques were “highly variable” (p. 515). This is inconsistent with normality (which is an assumption of ANOVA tests). Thus, ANOVA usage should be justified, especially since there was no mention of normality assessments.
Condylar positions were summarized only for CO. Why did the authors not calculate (and compare) the same for CR and Roth-CR?
Lines 3 to 8 of the first paragraph of the “Discussion” are quite unclear. How could the authors “safely infer” something about differences between CR and Roth-CR from information pertaining to CO?
In numerous parts of the “Discussion,” “Conclusions,” and “Abstract,” there are assertions that are not backed up by any statistical substantiations. The authors strongly discuss, interpret, and conclude (eg, that the Roth method is unjustified and so on) based on unsubstantiated or unreliable findings. The authors needed to state their limitations and warn the reader that their results were inconclusive.