The purpose of this study was to assess differences between articles published in the American Journal of Orthodontics and Dentofacial Orthopedics (AJO-DO), the Angle Orthodontist (AO), the European Journal of Orthodontics (EJO), and the Journal of Orthodontics (JO) from 1999 to 2008.
All journals were hand-searched and 4301 eligible articles were identified. A random sample of 425 articles was obtained to provide 80% power to detect a 100% increase in the number of randomized controlled trials (RCTs) at the 5% level of significance. Each article was classified according to predetermined criteria. Variations between journals were assessed using the χ 2 test or odds ratio (OR) and 95% confidence intervals (95% CI).
The AJO-DO published 45.6% of the articles in the final sample, 27.7% were from the AO, 17.4% were from the EJO, and 9.2% were from the JO. Statistically significant differences were found between the type ( P <0.001), subject ( P <0.001), setting ( P <0.03), and method ( P <0.001) of articles published in the 4 journals. The increase in the proportion of RCTs published between 1999 and 2003 and 2004 to 2008 was not statistically significant (OR 0.64; 95% CI, 0.29, 1.43).
Statistically significant differences were found in the publication profiles of the 4 orthodontic journals examined, but the increase in RCTs was lower than anticipated.
In the current climate of evidence-based care, it is hoped that clinical decisions about health care are becoming supported by scientific evidence more frequently, and this includes the practice of orthodontics. Evidence-based clinical practice is an approach to decision making in which the clinician uses the best available current evidence, in consultation with the patient, to decide on which treatment suits the patient best. Evidence-based practice, as described by Rosenberg and Donald, involves systematically finding, appraising, and using this contemporaneous research as a basis for clinical decision making. Nevertheless, Sackett et al highlight the fact that evidence-based practice should incorporate both individual clinical expertise and the best available external evidence and that either alone is not enough.
Unfortunately, a large proportion of published medical research lacks either relevance or sufficient methodological rigor to be reliable enough for answering clinical questions. Systematic reviews were added to Green and Byar’s hierarchy for the strength of evidence, which is based on the validity or degree to which they are not susceptible to bias ( Fig 1 ). However, with the development of systematic reviews it must be remembered that well coordinated and prospective randomized controlled trials (RCTs), with rigorous methodology and sufficient power, are required to yield strong inferences.
Deeks et al identified the problems associated with relying on nonrandomized studies to provide evidence of the effectiveness of health care interventions. They conclude that our inability to compensate for selection bias or identify nonrandomized studies that are free from selection bias indicates that nonrandomized studies should be undertaken only when RCTs are not feasible or are unethical. Nonrandomized studies may give seriously misleading results when treated and control groups appear similar in key prognostic factors, and residual confounding may be high even when good prognostic data are available.
In an editorial, Burden advocated the process of selectivity in reading so that there was sufficient time to read the truly important articles. Other authors have addressed the issue of being able to identify high-quality articles and publications. Lee et al in 2002 investigated the association of journal quality indicators with methodological quality of clinical research articles. They concluded that high citation rates and impact factors (IFs), and low manuscript acceptance rates appear to be predictive of higher methodological quality scores of journal articles. In a previous study, Harrison et al revealed that there were significant differences in the content of the British Journal of Orthodontics (now called the Journal of Orthodontics [JO]) and the European Journal of Orthodontics (EJO) when viewed with regard to type, setting, and subject of articles published. This suggests that different journals may be better reference points for different aspects of the speciality. An understanding about the publication trends in different journals may provide the daunted clinician with a means of identifying journals that are more appropriate to meet their individual requirements.
An assessment of the research design of papers published in a variety of general and specialist journals illustrates the relatively low levels of RCTs published. Relatively little information relating to an assessment of the methodological quality exists within the orthodontic literature; however, previously there has been an impression that evidence about the effectiveness of orthodontic treatment has been derived from retrospective evaluations of success. Work by Tulloch et al and Harrison et al into the publication profile of the orthodontic literature revealed a small proportion of published clinical evidence derived from RCTs.
This study is a retrospective, observational study that aims to classify objectively a representative sample of articles published in the 4 main orthodontic journals, over a 10-year period, to assess differences in article type, setting, subject, direction, research method, and control used. It will also assess whether there has been a significant increase in the number of RCTs published.
Material and methods
Identification of papers
A hand search of all the articles published in the American Journal of Orthodontics and Dentofacial Orthopedics (AJO-DO), Angle Orthodontist (AO), EJO, and JO between 1999 and 2008 was performed by one of the authors (R.G.). Articles were included if they were full articles or case reports, including updates, and excluded if they described reviews, for example, of books or abstracts, commentaries, litigation, legislation and ethics, or editorials.
The classification system used was adapted from Harrison et al, which was itself a modified compilation of Fletcher and Fletcher, Bailar et al, and Tulloch et al. The classification system was designed to assess the kind of articles published and made no attempt to assess the quality of the studies.
The robustness of the classification system was assessed in a pilot study involving a 10% random sample from the articles published in the 4 journals in 2003. The articles were classified by 2 examiners (R.G. and J.E.H.) and intra- and interexaminer reliability was assessed using percentage agreement and kappa statistic. When the initial data were analyzed, it became apparent that there were some weaknesses in the classification system that were resulting in ambiguity in assignment and an associated drop in reliability. Therefore, the definitions within the classification system were revised to clarify certain aspects. Specific attention was paid to the definitions within the research method and control sections, as these 2 categories were the least reliable. Also, an additional group was developed for laboratory-based RCTs within the research methods section so that they could be differentiated from clinically conducted trials. The same papers were then reassessed 1 month later by both examiners until good levels of agreement and reliability were obtained.
For the main study, a sample size calculation, using data from Harrison in Pocock’s formula, revealed that 400 articles were required to give 80% power, at the 5% level, to detect a 100% rise in the proportion of RCTs published during the study period. Four thousand three hundred one articles were eligible and therefore a 10% random sample ( Fig 2 ) was produced by one of us (J.E.H.) using GraphicPad Software (GraphPad, LaJolla, Calif).
The type, setting, subject, direction, research method, control, and country of origin of each paper included were classified by 1 examiner (R.G.) ( Appendix 1 ) and recorded in coded form. A 10% random sample of the papers was reanalysed 1 month into the main investigation to reassess the intraexaminer reliability. To prevent errors due to examiner fatigue, no more than 10 articles were assessed at any 1 time. Variations between the journals, in terms of articles published and changes over time, were assessed using the χ 2 test or odds ratio (OR) and 95% confidence intervals (95% CI) as appropriate.
The initial mean percentage agreement for interexaminer reliability for all the groups was 74.4%, with mean κ statistic of 0.64, which suggested overall substantial agreement and reliability. After modifications to the classification system, the mean percentage agreement for interexaminer reliability was 93.7% and the κ statistic was 0.87, suggesting almost perfect reliability and agreement.
For the main study, the overall percentage agreement of the classification system was 95.3%. The actual percentage agreement between subgroups varied between 88% and 100%. The intraexaminer agreement was formally tested using the κ statistic, revealing a mean value of 0.93, which suggested that the intraexaminer agreement was almost perfect ( Table I ).
|Category||Percentage agreement||Kappa||Strength of agreement|
|Research method||88||0.86||Almost perfect|
A random sample of 425 papers was selected from the 4301 eligible papers, of which 194 (45.6%) were from the AJO-DO, 118 (27.8%) were from the AO, 74 (17.4%) were from the EJO, and 39 were from (9.2%) the JO.
Comparison between journals
Type of paper
Three quarters of the papers published reported the results of studies (76.9%), but significant differences existed between the types of papers published between journals (χ 2 = 64.2, df = 12, P <0.001). Case reports and literature reviews made up a fifth each of the JO’s content; the EJO did not publish any case reports and the AO no opinion based material ( Table II ).
|Journal||Type of paper||Number||%||Number||%||Number||%|
Papers reporting on the development, diagnosis and treatment of human subjects comprised almost three quarters (72.2%) of all the published papers. However, there were significant differences in the subject of the papers in the different journals (χ 2 = 60.6, df = 21, P <0.001) with the JO publishing a third of the number of development papers compared with the other journals ( Table III ). The AO published twice as many materials based papers and the EJO had twice as many publications on animal experiments as the other journals. The proportion of papers reporting treatment of human subjects was approximately half for the AJODO and the JO but only a third for the AO and EJO.
Over two thirds of the papers were classified as being clinical (70.2%) and reporting on the development, diagnosis and treatment of human subjects. Significant differences were seen between journals (χ 2 = 18, df = 9, P <0.035) at the 5% level, with the AO and EJO publishing twice as many laboratory-based studies and the JO 3 times as many measurement-based papers ( Table IV ).
More than a third of the papers were carried out retrospectively, and 40% were conducted prospectively. There were significant differences between journals (χ 2 = 39.5, df = 9, P <0.001), with the EJO publishing twice as many prospective papers compared with the JO and AJO-DO ( Table V ).
Comparison of clinical studies
Of the 425 articles, 277 were in a clinical setting. Case report series made up one third of the total of clinical-based research. One tenth of the total sample was composed of RCTs. Significant differences were noticed in the research methods used for clinical studies (χ 2 = 34.2, df = 15, P <0.003) between journals, with more than two thirds of the papers in the JO, AJO-DO, and AO being made up of case reports, case series, or surveys ( Table VI ).
|Journal||Clinical research method||Number||%||Number||%||Number||%|
|Controlled clinical trial||2.0||3.4||8.0||10.8||10.0||7.5|
|Randomized controlled trial||5.0||8.5||7.0||9.5||12.0||9.0|
|Controlled clinical trial||3.0||11.1||0.0||0.0||3.0||4.2|
|Randomized controlled trial||2.0||7.4||6.0||13.6||8.0||11.3|
|Controlled clinical trial||3.0||15.0||1.0||3.3||4.0||8.0|
|Randomized controlled trial||2.0||10.0||5.0||16.7||7.0||14.0|
|Controlled clinical trial||0.0||0.0||0.0||0.0||0.0||0.0|
|Randomized controlled trial||1.0||8.3||2.0||18.2||3.0||13.0|
|Controlled clinical trial||8.0||6.8||9.0||5.7||17.0||6.1|
|Randomized controlled trial||10.0||8.5||20.0||12.6||30.0||10.8|