Interobserver Agreement For Continuous Variables

It is important to note that in each of the three situations in Table 1, the passport percentages are the same for both examiners, and if the two examiners are compared to a typical 2-×-2 test for mated data (McNemar test), there would be no difference between their performance; On the other hand, the agreement between the observers is very different in these three situations. The basic idea that must be understood here is that « agreement » quantifies the agreement between the two examiners for each of the « couples » of the scores, not the similarity of the total pass percentage between the examiners. Cohens – can also be used if the same counsellor evaluates the same patients at two times (for example. B to 2 weeks apart) or, in the example above, re-evaluated the same response sheets after 2 weeks. Its limitations are: (i) it does not take into account the magnitude of the differences, so it is unsuitable for ordinal data, (ii) it cannot be used if there are more than two advisors, and (iii) it does not distinguish between agreement for positive and negative results – which can be important in clinical situations (for example. B misdiagnosing a disease or falsely excluding them can have different consequences). Statistical methods for evaluating the agreement vary depending on the nature of the variables examined and the number of observers between whom an agreement is sought. These are summarized in Table 2 and explained below. ( observed agreement [Po] – expected agreement [Pe]) / (agreement 1 expected [Pe]). Two methods are available to assess the consistency between continuously measuring a variable on observers, instruments, dates, etc.

One of them, the intraclass coefficient correlation coefficient (CCI), provides a single measure of the magnitude of the match and the other, the Bland-Altman diagram, also provides a quantitative estimate of the narrowness of the values of two measures. Let us now consider a hypothetical situation in which examiners do exactly that, i.e. assign notes by throwing a coin toss; Heads – pass, tails – Table 1, situation 2. In this case, one would expect that 25% (-0.50 × 0.50) of the students would receive the results of both and that 25% of the two would receive the « fail » grade – a total approval rate « expected » for « not » or « fail » of 50% (-0.25 – 0.25 – 0.50). Therefore, the observed approval rate (80% in situation 1) must be interpreted to mean that a 50% agreement was foreseen by chance. These auditors could have improved it by 50% (at best an agreement minus the randomly expected agreement – 100% 50% – 50%), but only reached 30% (observed agreement minus the randomly expected agreement – 80% 50% – 30%). Thus, their real return in agreement is 30%/50% – 60%. Krippendorffs Alpha[16][17] is a versatile statistic that evaluates the agreement between observers who categorize, evaluate or measure a certain number of objects against the values of a variable.

It generalizes several specialized agreement coefficients by accepting any number of observers applicable to nominal, ordinal, interval and proportional levels of measurement, capable of processing missing and corrected data for small sample sizes. Kappa is similar to a correlation coefficient, as it can`t exceed 1.0 or -1.0. Because it is used as a measure of compliance, only positive values are expected in most situations; Negative values would indicate a systematic disagreement. Kappa can only reach very high values if the two matches are good and the target condition rate is close to 50% (because it incorporates the base rate in the calculation of joint probabilities). Several authorities have proposed « thumb rules » to interpret the degree of the agreement, many of which coincide at the center, although the words are not identical. [8] [9] [10] [11] On the surface, these data may appear to be available for analysis using methods for 2 × 2 tables (if the variable is classified) or the correlation (if it is numerical) that we have previously discussed in this series. [1.2] However, further examination would show that this is not true.