The Principles and Practice of Generalizability Theory for Medical Educators

Medical educators use rater-mediated assessments to measure learner performance daily (e.g., end-of-rotation evaluations, simulation, mini-CEX). While Cronbach's alpha and inter-rater reliability are often used to ascertain reliability of assessment instruments, these reliability indices fail to capture the influence of multiple factors on observed scores. Generalizability theory (G-Theory) explains such multiple sources of measurement error and quantifies the variations in scores attributable to each source of error (so-called "facet" such as items and raters). G-Theory is a two-step analysis followed by Decision studies (D-Study). D-Study allows us to manipulate the number of items, raters, and other facets and provides the estimated reliability indices in desired assessment conditions. G-Theory is an underutilized but powerful tool for medical educators to improve their assessment systems.

This workshop will make G-Theory more accessible to medical educators without any statistical background. We begin with a gentle introduction to G-Theory and D-Study using basic crossed and nested designs. We then demonstrate G-Theory and D-Study analyses using simulated data in R (or other free software if needed). Participants will have opportunities to run G-Theory and D-Study analyses (codes will be provided) and see the impact of different conditions (e.g., numbers of items, raters) on reliability indices while performing D-Study (a laptop with a Wi-Fi connection is required). We will also explore more advanced G-Theory topics for more complex situations such as mixed formats (e.g., a combination of multiple-choice items and rater-mediated items), subscores, and imbalanced missing data.