In this brief extension worksheet, we look at why kappa is sometimes much lower than percentage agreement, and also why the kappa2
command sometimes prints NaN
for Z and p.
To illustrate these things, here are some example ratings, and the output they produce:
# A tibble: 5 x 3
subject rater1 rater2
<int> <int> <int>
1 1 3 3
2 2 3 4
3 3 3 3
4 4 3 3
5 5 3 3
Percentage agreement (Tolerance=0)
Subjects = 5
Raters = 2
%-agree = 80
Cohen's Kappa for 2 Raters (Weights: unweighted)
Subjects = 5
Raters = 2
Kappa = 0
z = NaN
p-value = NaN
It might seem odd to have a kappa of zero here, because the percentage agreement is quite high (80%). Recall that Cohen’s kappa is calculated as:
(P - C) / (100 - C)
where P is the percentage agreement between the two raters, and C is the percentage agreement we’d expect by chance. So, for kappa to be zero, the percentage agreement by chance must also be 80%.
Agreement by chance is so high here because Rater 1 is using the same response all the time, and Rater 2 is using that same response 80% of the time. If one person always makes the same rating, and the other makes that rating on a random 80% of occasions, they’ll agree 80% of the time. For example, if I call everything I see a cat, and you call everything you see a cat unless you roll a five on your five-sided dice, we’ll agree 80% of the time. This does not mean either of us knows what a cat is.
NaN
In this case, NaN
doesn’t mean grandmother, it means ‘not a number’. What’s happened here is that there is so little variation in the ratings used (they are nearly all ‘3’) that R cannot calculate the Z score or the p value — the calculations that are used to do this break down in these extreme cases.
This material is distributed under a Creative Commons licence. CC-BY-SA 4.0.