reseņas educativas (Spanish)    
resenhas educativas (Portuguese)    

Dwyer, Carol Anne. (Ed.) (2005). Measurement and Research in the Accountability Era. Mahwah, NJ: Lawrence Erlbaum Assoc.

351 pp.
$99.95   isbn: 0-8058-5330-8

Reviewed by Gerald Blankson
Arizona State University

November 27, 2006

Measurement and Research in the Accountability Era is a collection of chapters from eighteen education scholars addressing aspects of research in education in the United States. The authors in order by chapter are Ellen C. Lagemann, Robert L. Linn, Sharon Lewis, Lisa Towne, Andrew C. Porter, Caroline V. Gipps, Richard M. Ingersoll, Gloria Ladson-Billings, Michael J. Feuer, Paul W. Holland, Lily W. Fillmore, Julia Lara, Susanna Loeb and Felicia Estrada, Steven G. Rivkin, Cecilia E. Rouse, Michael W. Krist, and Eva L. Baker. After speaking at an Educational Testing Service Invitational Conference in New York 2003, these scholars were invited to contribute papers. Instructed to use non-technical language, the authors for the most part write in an accessible fashion. This is essential in the opinion of the editor, as she points to the many lay people and policymakers ill informed about the breadth of education issues and work that has been done on them. In her introduction, Dwyer views the present accountability era as an opportunity for clear dissemination of research to address misconceptions and blind spots in the public eye.

We live in the Accountability Era as state and national policymakers are aiming to control and monitor more closely education functions on several fronts. Among them are: funding of educational science research, setting instructional and curricular foci, testing for school and student achievement with disaggregating for subgroups, and heightened requirements for teacher certification and teacher quality. Several chapters focus on the impact of this movement on academia and educational research. Others deal with teacher demographics and choices. Conspicuously under represented are the experiences of students themselves in the accountability era. What choices are students being presented with that impact their actions in schools? Two chapters address issues of equity for poor, minority and English Language Learners. This does not replace the experiences of students and the idea that they have a range of choices and are not merely data points or another set of “manipulable variables.”

Many might expect a book about research and measurement from ETS to be filled with steady applause for the current rise in prominence of measurement and testing in education, eventually leading to a grand future. This is not the case. The volume tackles broadly the promise and the challenges in using data not only to improve instruction but to judge programs, states, districts, schools, teachers, and students. Several chapters go beyond the use of achievement data, which non-educators commonly associate with schooling. These chapters use sources to reveal issues of equity and trends in the teaching profession. This is not to say that socio-political critical views are represented thoroughly. The volume does not aim to examine the political or economic roots of the accountability push or its potential consequences. Taking accountability measures as a generally benign inevitability, the authors describe the ethical and statistical limitations for various sources. The book is divided into seven sections with scientific evidence and the science of education coming first. This review does not go through the chapters individually or sequentially. The chapters from the first two sections are used as a structure within which the views and issues of the later chapters are placed.

Lisa Towne’s fourth chapter delineates the roots of the policymakers’ and the academicians’ varying definitions of scientifically based research (SBR). No Child Left Behind (NCLB, 2002), the Office of Educational Research’s (OERI) Educational Sciences reform Act of 2002 (ESRA), and its creation the Institute of Education Sciences are all involved in a shift in educational research. The call for this change is not new, it is akin to efforts in medicine, and has been predicted for almost a century in education. Towne identifies a dichotomy. Research, acceptable for funding, is to be rigorous, systematic, objective, and empirical with random sampling or random assignment. Yet, schools raise ethical concerns about clinical trials and professional judgment is discounted. This is not to mention who is responsible for translation into practical applications. Towne suggests a way that this dichotomy might be bridged dovetailing SBR with standards based school reforms. Eva L. Baker addresses the issues involved in curriculum alignment in the last chapter. Ellen Condliffe Lagemann offers her vision in the first chapter on the patterns of educational research.

Lagemann may have the one chapter with any promises about being more science based and even this is tempered. She notes the predictions of a new dawn through educational science in the last century. She does not begin with a desire to make schools more accountable though. “I believe we need new standards of accountability for education research community, new infrastructure for research, and new programs of research training” (p 8). Lagemann believes education research should move beyond basic research, which is often solo practitioner work. Communities of practice running medical style trials based in schools and in coordination with university centers should lead directly to usable classroom applications whether through texts, curricula, toys, or the like.

Lagemann focuses on two critical shifts while noting that medical style trials in educational research may not always be feasible. First, she sees more interdisciplinary and inter-specialty work done through university centers functioning like teaching hospitals. The Institute of Educational Sciences, the What Works Clearinghouse, the Campbell Collaboration, and the National Research Council’s Committee on a Strategic Education Research Partnership are organizations working along these lines. Secondly, she looks toward “design experiments” in real educational settings. Lagemann highlights several examples of longitudinal multidisciplinary work in schools. One study was Raudenbush, Hong, and Rowan (2002) at the University of Michigan that looked at resources and student outcomes research. They argued that teacher knowledge, approaches, and strategies must be disaggregated as well as how this knowledge is used. This essentially calls for the use of “instructional regimes” and full trial runs of differing methods and differing resources in the vein of medical research. Another design example from Collins, Joseph, and Bielaczyc (2004) is noted. In it, case experiments are conducted in a school, changes are made, and the results are compared to another group. Over time the intervention is modified and improved. The researcher in Lagemann’s vision is co-participant in an interdisciplinary team.

In Chapter Six, Caroline V. Gipps comes to a different conclusion in her treatment of accountability in England. England is further along on the national front because the process is older beginning in earnest in the 1980’s, and there is less pressure to maintain local control. While Lagemann looks at trends in educational research, Gipps charts parallel shifts for teachers and schools following increased testing and standardization of curriculum. Gipps has found teachers who are less able to be flexible, creative, or use their own judgment. These teachers feel less professional and less valued. For perspective, Gipps draws on a similar period in English history. In the late 19th century, England went through a phase of increased assessment accompanying a standard curriculum. For some time it went even as far as to allocate payment rewards based on performance or Payment by Results (Rapple 1994). One cannot ignore the similar market drive being crafted in the United States through various privatization or “competition breeds excellence” schemes. Gipps quotes Edmond, Holmes, Chief Inspector for Elementary Schools from his reflections on the period: “What the Department did to the teacher, it compelled him to do to the child. The teacher who is the slave of another’s will, cannot carry out his instructions except by making his pupils slaves of his own will. The teacher who has been deprived by his superiors of freedom, initiative, and responsibility, cannot carry out his instructions except by depriving his pupils of the same vital qualities…”(p. 106). Gipps does not ignore U.S. history, namely school performance reporting in Massachusetts in the mid 19th century. This movement also failed with the common feature being teachers’ main responsibility shifting from their students to accountability measures. While Curriculum 2000 and its predecessors have increased cross-national performance for England, Gipps makes a final note that Finland, where teacher autonomy and prestige is higher, regularly outperforms most other nations.

SBR and shifts in educational research affect schools indirectly, and administrators are staring down growing demands and expectations under Adequate Yearly Progress (AYP) monitoring. Robert Linn briefly points out how NCLB is unprecedented in its call for accountability (Linn, 2005). Linn then goes on to describe how misunderstanding reliability and validity of an AYP measure can lead to mistaken inferences. Only a few differences are needed to show how using AYP may not tell as much about a school as many believe. Linn argues that, while laudable, NCLB legislation may lead to confusion in inferences without consideration of reliability and validity. Schools with fewer students or heterogeneous schools where students’ scores must be disaggregated have less reliability in any measure. Shifts year-to-year are not indicative of any real change in instruction or learning. Necessary gain, whether incremental or explosive, and proficiency and its measurement are defined differently by the states. There is a wide range in just how stringently proficiency is defined. Within-state comparison of schools is complicated by the fact that one may only know how school stands against AYP in a given year. This is not as helpful as seeing where a school started and whether it makes progress over several years.

Cecilia Elena Rouse puts AYP into a larger framework in Chapter 15. AYP, according to Rouse is the metric piece in an econometric measurement effort. Her use of the idea of “value added” and the ideal school evaluation were particularly enlightening. Her argument further complicates the belief that one can make inferences about what schools are doing. Rouse also describes the flaws she sees in voucher or competition plans.

Chapter 12 and 13 address how teachers are being affected by AYP and accountability regimes. In her chapter on the impact for ELL, Julia Lara uses data from a survey of state program directors. Susanna Loeb and Felicia Astrada attempt to assess the impact on teacher career choices. The results are a first step in response to the warnings raised by Gipps, and whether or not NCLB and accountability proponents are leading the schools toward a repeat of history.

Lily Wong Fillmore’s Chapter on ELL students caught in the crossfire is very different from the rest of the book. In it, Fillmore tackles what students are being asked to know. Fillmore details how efforts to hold schools accountable for ELL’s profieciency may actually take away from their broader education. She describes several, according to her, short-sighted efforts. The work of Project Challenge and the Boston University Chelsea Partnership is applauded. This collaboration appears to typify the kind of interdisciplinary “design experiment” that Lagemann described. The Chelsea Partnership aims to build academic English through focusing on language development in mathematics with several cohorts of students beginning in 4th grade. Fillmore’s treatment of the partnership is inspiring and perplexing. She asks why with successful SBR aren’t people breaking down the door to see the program and replicate it. This confuses me somewhat as well, but I do wonder about transferability. When you try to carry something fantastic across the country, or even a county, I fear there are good odds something or many things will fall out from its operational core.

In the third chapter, Sharon Lewis first stresses the grand investment made in testing with 143 to 400 million tests being administered yearly at an approximate cost of 20 billion dollars. She believes that for urban schools especially the investment and results make the disaggregating of data critical. Lewis details ways that data disaggregating, unpacking, and usage should be done to target kids and programs in need of help. Lewis uses short case examples to illustrate her suggestions.

After spending these billions, policymakers want more than scores for parents to know about their kids’ school and states to make decisions on specific schools. At the national level there are two conflicts working within the United States. In chapters 9 and 10, Michael J. Feuer and Paul W. Holland address the desire to link various state assessment scales. Feuer identifies the first conflict rising from a push for increased access and higher standards. The other conflict is between e pluribus, the multiplicity from freedom for variation represented by the states, and unum, the federal desire for a single vision and measurement. Feuer tracks the process through which Congress asked the National Research Council (NRC) to investigate if it would be possible to link the various achievement assessments and make cross state inferences with a single scale. The response from the committee was “No,” and Feuer includes his own warnings for policymakers and academics. Holland does not stop with that singular negative in terms of test linking. Holland describes his work with his colleagues in pursuit a measure of linkability. He address equitability of tests in particular and not prediction from one to another. The researchers used a hypothetical model with six different designs of tests for the same subject. Holland was not satisfied and believed that the policymakers request could advance the science of assessment.

Andrew Porter does not have to worry about equatability because he makes inferences based on the National Assessment of Education Progress (NAEP) and the Longitudinal Youth Study. In the lone chapter in Section II, Porter sets out to describe the achievement gap, its size, duration, and shifts. Difficulties in understanding and assessing the gap are addressed first. Although most often defined as the gap between white and black achievement, it is at other times defined as between Hispanic and white or students of higher and lower SES. The gap is different at different ages in different skill areas. Even the boundaries of race will shift over time especially when using self-identification. Charts illustrate the change in scores for 9, 13, and 17 year olds over a 30-year period. The gaps in reading reportedly remain stable from 1st to 12th grade. Porter uses Longitudinal Youth Study data to show the gap existing for 3-4 years old. Initiatives set to reduce the gap may raise achievement but frequently widened the gap. Peabody Preschool is an initiative working to close the gap with maintenance. Peabody researchers have demonstrated significant increases in IQ and the Peabody picture test for low income and minority students. Nevertheless apparent losses of these gains have been found by 3rd grade in other reports. Porter also looks at teacher effects. Teachers with higher test scores seem to have students with higher performance. Efforts in Alabama to raise teacher quality through certification testing in the 80s quickly reduced the gap.

There is no debate about the importance of teacher quality. The argument has been made that pervasive gaps, such as those describe by Andrew Porter, are in fact a result of a disparity in educational opportunity over time. Richard M. Ingersoll and Gloria Ladson-Billings in Chapters 7 and 8, respectively, tackle this critical issue. Ingersoll focuses on the prevalence of Out-of-Field Teaching (OFT), where teachers are teaching classes outside their college major or focus. Ladson-Billings goes further to detail how teacher quality differences inordinately impact minority students. To assess OFT, Ingersoll uses data from the Schools and Staffing Survey (SASS) of the National Center for Education Statistics. Tables show clearly the areas of high incidence of OFT especially in middle schools, small schools, and high poverty schools. This flows directly into Ladson-Billings’s argument, who also uses the SASS data. It is often schools with constrained resources that are forced to make the decision to hire a teacher not trained in the area they teach, and quite frequently it is inexperienced teachers given the assignments. Ladson-Billings addresses teacher expectations, preparation, and recruitment, novice training, professional development, and school structure. Each of these issues is tied to how education is funded. The other authors tend to stay away from writing plainly about social issues. Ladson-Billings does not shy away from the legacy of under funding, racism, and the erosion of the public sphere in American life.

There may not be debate on the importance of teacher quality, but actually assessing it is more difficult than one might assume. Steven G. Rivkin outlines this and other dilemmas as states work to improve teacher quality. Rivkin concludes that the weight of responsibility lies with local administrators in judging potential and actual effectiveness. This might be organized using a basic framework for listing the factors influencing teacher quality. Approaching the issues from an economic perspective, Rivkin concludes that is at the local level that the teacher pool must be increased and better selection made of potential teachers. I am not sure how this addresses the problems of the rural schools or those with high poverty.

References

Collins, A., Joseph, D., & Bielaczyc, K. (2004). Design Research: Theoretical and Methodological Issues. Journal of the Learning Sciences, 13(1), 15–42.

Linn, R. L. (2005). Conflicting demands of No Child Left Behind and state systems: Mixed messages about school performance. Education Policy Analysis Archives, 13(33).Retrieved November 21, 2006 from http://epaa.asu.edu/epaa/v13n33/.

Rapple, B. A. (1994). Payment by results: An example of assessment in elementary education from nineteenth century Britain. Educational Policy Analysis Archives, 2(1). Retrieved November 21, 2006 from http://epaa.asu.edu/epaa/v2n1.html.

Raudenbush, S. W., Hong, G., & Rowan, B. (2002). Studying the causal effects of instruction with application to primary-school mathematics. Longitudinal Evaluation of School Change and Performance (LESCP): A secondary analysis. Retrieved November 21, 2006 from www-personal.umich.edu.

Gerald Blankson

About the Reviewer

Gerald Blankson (BA, Oberlin; MC, ASU) is a second-year PhD student in the Educational Leadership and Policy Studies doctoral program at Arizona State University. He has varied experience in education working for Success for All, the Baltimore City Health Department, Mesa Arts Academy, the Arizona Learning and Literacy Center, and Arizona State University. His research interests are in comparative education, specifically program evaluation in international settings.

Copyright is retained by the first or sole author, who grants right of first publication to the Education Review.

Editors: Gene V Glass, Kate Corby, Gustavo Fischman

~ ER home | Reseņas Educativas | Resenhas Educativas ~
~ overview | reviews | editors | submit | guidelines | announcements | search
~