In a study including cephalometric or dental model measurements, investigators often wish to examine the reliability of those measurements. Therefore, a percentage of the original measurements are repeated a few weeks later, and the 2 sets of measurements (original and repeated) are assessed for agreement. An often encountered but incorrect approach to assess agreement between continuous measurements is using a paired t test. To better explain why this method is incorrect, I will provide a simplified example. The Table shows the results of 2 scenarios (A and B) of 20 pairs of measurements conducted at 2 different time points and their differences. The data has been manipulated so that there is no difference between the means in either scenario A or B between the time 1 and 2 measurements (see mean values [= 10.45] at the end of the Table ). For reasons of simplification, variances of the means at the 2 time points are ignored.
ID | Scenario A | Scenario B | ||||
---|---|---|---|---|---|---|
Time 1 | Time 2 | Difference | Time 1 | Time 2 | Difference | |
1 | 12 | 8 | 4 | 12 | 11 | 1 |
2 | 5 | 12 | −7 | 5 | 6 | −1 |
3 | 8 | 12 | −4 | 8 | 9 | −1 |
4 | 5 | 14 | −9 | 5 | 4 | 1 |
5 | 9 | 15 | −6 | 9 | 11 | −2 |
6 | 12 | 16 | −4 | 12 | 12 | 0 |
7 | 15 | 13 | 2 | 15 | 14 | 1 |
8 | 27 | 15 | 12 | 27 | 26 | 1 |
9 | 22 | 14 | 8 | 22 | 21 | 1 |
10 | 19 | 15 | 4 | 19 | 18 | 1 |
11 | 10 | 8 | 2 | 10 | 9 | 1 |
12 | 12 | 5 | 7 | 12 | 11 | 1 |
13 | 6 | 7 | −1 | 6 | 7 | −1 |
14 | 5 | 12 | −7 | 5 | 6 | −1 |
15 | 7 | 13 | −6 | 7 | 8 | −1 |
16 | 3 | 5 | −2 | 3 | 4 | −1 |
17 | 8 | 6 | 2 | 8 | 7 | 1 |
18 | 9 | 5 | 4 | 9 | 8 | 1 |
19 | 12 | 9 | 3 | 12 | 13 | −1 |
20 | 3 | 5 | −2 | 3 | 4 | −1 |
Mean | 10.45 | 10.45 | 0 | 10.45 | 10.45 | 0 |