I read with interest the article by Bavbek and Dincer (Dimensions and morphologic variations of sella turcica in type 1 diabetic patients. Am J Orthod Dentofacial Orthop 2014;145:179-87). Although it is a very good research, I believe some major points need correction or clarification.
It was stated that the control subjects were matched with the diabetic patients (p. 180). However, it was not clarified according to which factors they had been matched (eg, age, sex, height). Besides, which statistical tests did they use to verify that their groups are matched (none were defined)?
Data normality was rejected by using 2 normality tests (Kolmogorov-Smirnov and Shapiro-Wilks). This is not the best approach, because when the sample becomes sufficiently large (such as this sample), normality tests become overpowered and tend to give significant results with the slightest deviations from normal distribution. Therefore, the lack of normality might be a false-positive error, which should be assessed by better approaches (eg, Q-Q plots).
Furthermore, when the sample grows larger and larger, the sample distribution becomes less and less important because the “central limit theorem” (CLT) kicks in. Thus, even if the significant result of normality tests was not due to the false-positive error, the CLT might have already dealt with the normality issue. The lack of any P values for normality tests or any histograms, etc, disallows further evaluations. It is also not clear why 2 different normality tests were used.
Even if we disregard the CLT, another issue exists in the normality assessment. The article implies (although never clarifies) that the “whole sample” or at least “each group (n = 76)” was assessed in terms of normality. It was implied by the “n >50” report for normality tests (p. 181). However, the data normality should not be assessed for the whole sample at once. It is actually the normality of the residuals that matters, not even the normality of the subgroups, let alone the whole sample. Therefore, assessing the sample normality might be incorrect.
The study design (2 groups divided into 4 subgroups each according to the variables of bone age and sex) clearly indicates the need for an ANOVA design (or its nonparametric alternative or another multivariate design). However, instead, the authors used only pair-wise comparisons, without any multivariate frameworks.
The performed pair-wise comparisons were the Mann-Whitney U and Fisher exact tests. However, the authors stated that the control and diabetic groups were matched. Therefore, they needed to use paired tests instead (ie, a paired t test or a Wilcoxon signed rank test instead of the Mann-Whitney, and a McNemar test instead of the Fisher). The tests they used assume independence of the groups, which does not hold in this matched set of groups and might render the results incorrect.
The authors stated that “the Bonferroni correction was made.” However, throughout the text, there was no indication of using the Bonferroni correction method (despite the need for it). On the other hand, every part of the report clearly indicates the lack of using this method (ie, the alpha was set only at 0.05 and not adjusted to any corrected value).
In the last paragraph of page 184, it was stated that “the effect of the interaction… was evaluated.” It is simply impossible to assess an interaction only by pair-wise comparisons. This task needs ANOVA-like tests, which were missing.