Wednesday, July 1, 2015

Reliability of Scoring Checklists for a Rheumatology Objective Structured Clinical Examination


Lisa Criscione-Schreiber, MD, MEd
Duke University Medical Center

Background: This project was designed to assess the inter-rater reliability of scoring checklists for a rheumatology objective structured clinical examination (ROSCE). The ROSCE was conducted at Duke University to assess rheumatology fellows from North Carolina, South Carolina, and the Massachusetts General Hospital. 13 fellows gave consent to have their audio and video recorded stations scored by trained faculty raters after the ROSCE. .

Aims: The purpose of this work was to assess whether our ROSCE scoring checklists and current level of rater training was sufficient to allow this ROSCE to be used for summative assessment purposes. A second aim was to learn how the ROSCE was perceived as an educational activity by the rheumatology fellows who participated.
Methods: We obtained written informed consent from rheumatology fellows to participate in this research project. Three patient counseling stations were audio (1) or video (2) recorded for later scoring, one letter-writing station was all scored after the ROSCE. The stations were also scored live by faculty proctors, who gave participants immediate feedback on the counseling interactions. Two trained faculty raters then scored the video or audio-recorded sessions using standardized scoring checklists. We calculated a weighted kappa for every checklist item. For the patient counseling stations, we calculated an intraclass correlation (ICC) between rater pairs for patient counseling and medical knowledge composite scores. We also surveyed all fellows about the educational value of the ROSCE the day following the ROSCE and about 4 months after completion. Survey results were analyzed using descriptive statistics and qualitative analysis.
Results: Due to the low number of participants as well as clustering of scores, for several checklist items weighted kappa values were either in-calculable or negative. Overall, the weighted kappa scores were quite low. Regarding number of items with kappa > 0.3 for all 3 rater pairs, the lupus station had 0/26; the osteoporosis station had 6/25, and the letter writing station 5/13. Because kappa values were low, we also calculated the crude agreement, a descriptive statistic, for each item. Several items for both patient counseling stations had reasonable crude agreement between raters. The intraclass correlations for sub-scores were also generally low. Even for the letter writing station, with a detailed scoring rubric, kappa values were unacceptably low.
At both time points, fellows generally agreed that the ROSCE helped them identify areas of strength and weakness in medical knowledge, patient care, and interpersonal communication skills. Over 80% of fellows at both time points agreed that the ROSCE was educationally effective and about 60% reported enjoying it at least somewhat at both time points. Nearly all fellows, however, reported that their performance on the ROSCE was not representative of their usual performance when counseling patients. Reasons for this discrepancy were generally non-modifiable elements of OSCEs.
Conclusions: The scoring checklists and current rater training for this rheumatology OSCE do not have adequate reliability to allow this ROSCE to be used as a summative assessment. Making this OSCE into a summative assessment will require significantly more financial and time resources. However, as it currently stands, fellows found this ROSCE to be an effective educational activity. Information learned from the fellow survey will lead to changes to improve the educational value of the ROSCE in future administrations.