Concerns regarding “The role of artificial intelligence in providing accurate and reliable information on surgically assisted rapid palatal expansion: A cross sectional study” Subscribe to RSS feedSubscribe to RSS feed

We read with great interest the article by Hatipoğlu et al titled “The role of artificial intelligence in providing accurate and reliable information on surgically-assisted rapid palatal expansion: A cross-sectional study” ( Am J Orthod Dentofacial Orthop 2026;169:31–41.e3). Although the evaluation of large language models (LLMs) in orthodontics is timely, we identified several methodological and conceptual issues that compromise the validity and reproducibility of the study’s conclusions.

First, the study design reveals a fundamental misunderstanding of LLM architecture. The authors state that prompts were deleted after each query “to avoid the learning process of the AI chatbots.” Pretrained LLMs used via standard commercial interfaces are static during inference; they do not update their parameters or learn from user interactions in real-time. , Deleting a prompt merely clears the context window, preventing in-context reference, but has no bearing on the model’s training state. This distinction is crucial for accurate experimental design, as misconceptions about learning can lead to flawed study protocols. ,

Second, the study lacks necessary version control. The authors evaluated Gemini and Copilot in January 2025 but failed to specify the exact model versions. Previous research has demonstrated significant performance disparities between model versions (eg, GPT-3.5 vs GPT-4), even within the same ecosystem. Given the rapid update cycles of these tools, omitting version numbers renders the study nonreproducible and the results transient.

Third, the instrument used to assess accuracy, a 5-point scale originally developed for television drug advertising, lacks validation for this context. The use of unvalidated metrics for artificial intelligence (AI) evaluation is a known limitation in the field, often leading to subjective interpretations that fail to capture the nuance of AI hallucinations or plausibility. Without reporting inter-rater reliability metrics (eg, the Cohen kappa) or validating the scale for medical AI advice, the objectivity of the findings is questionable.

Fourth, there is a discrepancy between the statistical results and the authors’ conclusions. The results section explicitly states there was no statistically significant difference between the types of AI and the types of responses ( P >0.05). Yet, the discussion claims that Copilot produced more false information, whereas ChatGPT was more accurate. Drawing qualitative distinctions from statistically nonsignificant data introduces bias and misleads the reader regarding the comparative performance of these models.

Finally, the study mischaracterizes the variables tested. Copilot is an interface powered by OpenAI’s models, often the same underlying model as ChatGPT (GPT-4), but with Web-browsing capabilities. Treating Copilot and ChatGPT-4 as entirely distinct AI types without controlling for Web-access capabilities conflates the interface with the model architecture. As noted in similar comparative studies, the specific configuration of the model (eg, Web-access enabled vs disabled) is a critical variable that must be isolated to draw valid conclusions.

As AI research in our field accelerates, we must adhere to rigorous standards of validation, reproducibility, and statistical reporting. We hope these points clarify the limitations of the current study for the readership.

During the preparation of this work, the authors used ChatGPT-5.2 from the OpenAI company in order to enhance the fluency and grammar of the manuscript. After using this tool/service, the authors reviewed and edited the content as needed and take full responsibility for the content of the published article.

References

Only gold members can continue reading. Log In or Register to continue

Stay updated, free dental videos. Join our Telegram channel

May 23, 2026 | Posted by in Orthodontics | 0 comments

Leave a Reply

VIDEdental - Online dental courses

Get VIDEdental app for watching clinical videos