This study examines one aspect of the validity evidence for Connecticut State Department of Education's (CSDE) performance-based teacher assessment system, the Beginning Educator Support and Training (BEST) program. Specifically, we investigate whether external validity evidence in the form of teachers' mean effects on their students' achievement support the use of BEST portfolio scores as a measure of teacher quality. Using a complicated administrative data set, the Degrees of Reading Power (DRP) test was used to provide evidence of student reading achievement for elementary school students in two urban school districts in Connecticut. We applied a traditional validity study analysis (correlation with an external variable), and a more nuanced approach (hierarchical linear modeling--HLM) to examine the evidence. In line with previous findings, the more correlational approach showed no significant relationship between the BEST portfolio scores and changes in student DRP scores. However, the HLM findings, which take the school context into account, indicate that BEST portfolio scores do indeed distinguish among teachers who were more and less successful in enhancing their students' achievement. Specifically, a one unit change in the portfolio score corresponded to a 2.20 change in fall- to-spring DRP units, or about 40% of a year's average change for the students in this study (i.e., about 4 months of teaching time). In an additional analysis, the relationship between the portfolio scores and alternate measures of teacher quality, ETS' Praxis series of tests, were also studied: no relationship was found between BEST portfolio scores and Praxis scores, or between Praxis scores and mean student DRP scores. These results indicate that the portfolio and Praxis assessments are measuring different constructs for these teachers. That is, BEST portfolios add information that is not contained in the Praxis tests, and are more powerful predictors of teachers' contributions to student achievement gains.