# Why using a paired ttest to assess agreement is problematic?

In a study including cephalometric or dental model measurements, investigators often wish to examine the reliability of those measurements. Therefore, a percentage of the original measurements are repeated a few weeks later, and the 2 sets of measurements (original and repeated) are assessed for agreement. An often encountered but incorrect approach to assess agreement between continuous measurements is using a paired t test. To better explain why this method is incorrect, I will provide a simplified example. The Table shows the results of 2 scenarios (A and B) of 20 pairs of measurements conducted at 2 different time points and their differences. The data has been manipulated so that there is no difference between the means in either scenario A or B between the time 1 and 2 measurements (see mean values [= 10.45] at the end of the Table ). For reasons of simplification, variances of the means at the 2 time points are ignored.

Table I
Repeated measurements at 2 different time points for scenarios A and B
ID Scenario A Scenario B
Time 1 Time 2 Difference Time 1 Time 2 Difference
1 12 8 4 12 11 1
2 5 12 −7 5 6 −1
3 8 12 −4 8 9 −1
4 5 14 −9 5 4 1
5 9 15 −6 9 11 −2
6 12 16 −4 12 12 0
7 15 13 2 15 14 1
8 27 15 12 27 26 1
9 22 14 8 22 21 1
10 19 15 4 19 18 1
11 10 8 2 10 9 1
12 12 5 7 12 11 1
13 6 7 −1 6 7 −1
14 5 12 −7 5 6 −1
15 7 13 −6 7 8 −1
16 3 5 −2 3 4 −1
17 8 6 2 8 7 1
18 9 5 4 9 8 1
19 12 9 3 12 13 −1
20 3 5 −2 3 4 −1
Mean 10.45 10.45 0 10.45 10.45 0