SCALE provides training for the scoring of teacher candidates’ Teaching Events and for the moderation of scores. These processes ensure that scores are valid and reliable, and that scores are comparable and credible across institutions.
Benchmark work samples
Candidate work samples representing different levels of performance on the rubric, used for training purposes to illustrate and clarify the difference between score levels. Programs identify potential benchmarks during the scoring process and solicit appropriate consent for assessments to be available as benchmarks in the following year.
Training of trainers
Experienced scorers at every institution will be prepared to train new scorers through a rigorous Training of Trainers program that is repeated annually for new trainers, with updates every three years for experienced trainers. Trainers need to reach a calibration standard in order to be eligible to work as a trainer. The trainers will then assume a set of responsibilities that include training, calibrating, and supervising scorers.
Scorer training and calibration
The training design for which we have demonstrated reliability is a two-day, subject-specific training. Participants learn to gather evidence from the work sample to provide reasoning for the assigned score on each of twelve subject-specific analytic rubrics, grouped by scoring dimensions. Benchmark performances are reviewed to clarify distinctions between scoring levels and anchor raters’ understandings of the score levels. Scorers should be knowledgeable about the content, content pedagogy, and grade level of the assessments that they score. Scorers are generally education professionals, such as university faculty, K-12 teachers, administrators, supervisors, mentors and support providers, as well as retired faculty, teachers, and National Board Certified Teachers. Scorers must go through the training and calibrate to score assessments in the same credential area. If they have not scored for a year, they must go through training and calibrate again. To score in additional credential areas, qualified scorers need not go through more training, but they must calibrate in the new area. Materials have been designed to increase transparency in applying the common scoring rubrics, such as “Thinking Behind the Rubrics.”
In order to ensure consistency and accuracy in scoring, raters must score a calibration Teaching Event prior to scoring assessments. Calibration Teaching Events have been previously scored multiple times by experienced raters who come to a consensus on the appropriate scores, and the evidence supporting the scores is fully documented and explained. In order to calibrate, one’s set of scores must result in the same pass/fail decision, and be within sufficient proximity to the pre-determined scores, as set out in a Calibration Standard. Scorers who fail to calibrate must be retrained until they calibrate or else they may not score.
Moderation processes in assessment systems are methods of checking the accuracy of scores. The SCALE assessment system includes a moderation process comprised of double scoring a sample of Teaching Events, mandatory double scoring of all Teaching Events for failing candidates and candidates just above the passing standard (“borderline” scores), random read-behinds of scoring evidence by the trainers, and the audit of local scores. When there is a conflict between the scores of two raters (particularly for those Teaching Events initially scored as failing), a third reader (usually a lead trainer) scores the Teaching event. Every year, an external audit of a small percentage of each institution’s scores is conducted to assess the reliability of local scores. This is achieved by recruiting scorers from each institution to score a small number of Teaching Events from another institution.