What you should know about
North Carolina's ABC Tests⁽¹⁾

George H. Olson, Ph. D.
Leadership and Educational Studies

Appalachian State University
March 2002

Statewide testing is not new to North Carolina. In fact, North Carolina has tested on a regular statewide basis since 1978. The California Achievement Test was administered regularly in grades 1, 2, 3, 6 and 9 through 1994-95. A state-developed testing program, using end-of-course (EOC) tests, has been in existence since 1986 for selected high school courses and since 1992-93 for grades 3-8, where end-of-grade (EOG) tests have been regularly administered at the end of the school year. These tests, developed by North Carolina teachers with assistance from the North Carolina Department of Public Instruction (NCDPI) and others, were aimed at measuring the curriculum, as defined by the North Carolina Standard Course of Studies (SCS) that is supposed to be taught in every classroom in the state (NCDPI, 2001b).

Brief history of the ABC Accountability Program. In 1995, the General Assembly of North Carolina passed a law (Senate Bill 16) that directed the State Board of Education to examine the administrative organization of the Department of Public instruction and propose a plan for reducing and/or reorganizing the department. One of the key recommendations of the plan was to develop a new accountability plan focusing on the achievement performance of public schools with a system of clear rewards and consequences (NCDPI, 2001c). This recommendation quickly materialized as the North Carolina ABCs Accountability Program, implemented initially at grades K-8 in 1996-97 and at the high school level in 1997-98 (NCDPI, 2001b).

In the ABC acronym, "A" stands for Accountability, "B" stands for an emphasis on the Basics, and "C" stands for increased local Control. The program reportedly offers a comprehensive accountability system that has been recognized as one of two complete accountability models in the nation (Education Week on the Web, 1999). It emphasizes higher standards and achievement in the basic subjects of reading, writing, and math; expects that each child should show one year's academic growth per year of schooling; and requires a written school improvement plan for each school and a school improvement team that should include leadership, parents and teachers. It measures achievement at the school (rather than district) level and requires the results to be released to the public. It recognizes schools that achieve or exceed their "expected" growth with substantial bonuses for school personnel and outlines consequences for schools that exhibit low performance or fail to meet expectations.

Still referred to as EOGs the tests administered K-8 are reported to be curriculum-based, multiple-choice standardized achievement tests that measure the achievement of curricular competencies described in the SCS (NCDPI, 20001b). There are two such tests: a Reading Comprehension test and a Mathematics test. Additionally, students at Grades 4 and 7 are administered the North Carolina Writing Assessment which reportedly measures written expression (composing) skills. The tests administered at the high school level, also multiple-choice standardized achievement tests, and stilled called EOCs, reportedly are aimed at assessing higher-level thinking and problem-solving skills in Algebra I; Biology; Chemistry; Economic, Legal, and Political Systems (ELPS); English I; English II; Physics; Physical Science; and United States History (NCDPI, 2001c; Sanford, 1996).

A key component of the ABCs accountability program is that rather than comparing different students from one year to the next, the program claims to hold schools accountable for the educational growth of the same groups of students (cohorts) over time. At least a year's worth of growth for a year's worth of school is expected. For Elementary schools the growth of students is determined by scores on the EOG tests, Reading Comprehension and Mathematics. The scores from these tests are reported on developmental scales, which yield measures of achievement growth in these subject areas across time and grade levels.

Actually the NC system does not measure achievement growth at the individual level; nor does it measure growth from one year to the next. The best it can do is measure the growth of groups of students, like all 4^th graders in a given school. The average yearly statewide growth of students from grades 3 through 8, from one grade to the next was determined, initially, by subtracting the 1992-93 scores at one grade level from the 1993-94 scores of the same students at the next higher grade level. In order to determine growth for grade 3, a pretest was administered during the fall of the 1996-97 school year. For grades 8 to 10, the average growth was determined during the 1997-98 school year. These average rate-of-growth values are constants in the growth formula of the ABCs Accountability Model and will remain so until new values based on different school years are approved by the State Board of Education (NCDPI, 1996c, 2000b).

In developing this model the state recognized that not all students exhibit the same rate of achievement growth from one year to the next. In their own words, "different rates of growth are expected for two different reasons:

(1) Students who are more proficient might grow faster. That is how they got to be more proficient in the first place--they already grew faster.

(2) Students who score high on a particular test one year may not score as high the next year, and students who score low one year may score higher the next year, partly due to "regression to the mean."

Estimates of these two factors along with the North Carolina average rate of growth are then used in a rather unnecessarily complicated regression model to determine schools' expected yearly rates of growth. These expected yearly rates of growth are in turn used for classifying schools into various awards and recognition categories (NCDPI, 2001e) that can take the form of monetary rewards for personnel in high-performing schools that meet or exceed expected growth or sanctions for schools that fail repeatedly to meet expected growth.

Originally, the ABC Testing Program was developed to serve two purposes (NCDPI, 1996a):

(1) to provide accurate measurement of individual student skills and knowledge specified in the SCS, and

(2) to provide accurate measurement of the knowledge and skill attained by groups of students for school, school system, and state accountability.

Except for a condition that EOC scores count as some un-specified part of a students's overall grade in the course, these tests were not intended to be individually high stakes. This changed in 1999 when the NC State Board of Education introduced new Gateway Standards which required that students in grades 3, 5, and 8 attain passing scores on the ABC tests in order to be promoted to the next grade level and that high school students pass an exit exam and a computer skills test in order to graduate (NCDPI, 2002). At this point, the ABC testing program became a truly high-stakes testing program.

The ABCs as a high-stakes test. According to the Standards for educational and psychological testing (AERA/APA/NCME, 1999) high-stakes testing occurs whenever "significant educational paths or choices of an individual are directly affected by test performance, such as whether a student is promoted or retained at grade level, graduated, or admitted or placed into a desired program" (p. 139). These standards, along with the earlier Code of fair testing practices (NCME, 1988), have been cited by a large group of professional associations (AERA, 2000; AFT, 2001; APA, 2001; IRA, 1999; NCTE, 1999; NCTM, 2000) in resolutions against the use of test scores as a sole criterion in making important decisions concerning individual students. The following are but just a few of the bold statements from these organizations.

High stakes decisions should mot be made on the basis of a single test score, because a single test can only provide a "snapshot" of student achievement and may not accurately reflect an entire year's worth of student progress and achievement (APA, 2001).

To use a single objective test in the determination of such things as graduation, credit, grade placement, promotion...or placement in special groups is a serious misuse of such tests. This misuse of tests is unacceptable.... When test use is inappropriate, especially in making high-stakes decisions about a child's future, it undermines the quality of education and equality of opportunity (NCTE, 2001).

Decisions that affect individual students' life chances or educational opportunities should not be made on the basis of test scores alone. Other relevant information should be taken into account to enhance the overall validity of such decisions (AERA, 2000).

These resolutions have not gone unheeded in North Carolina. In March 2001 the Chairman of the NC State Board of Education issued a memorandum to superintendents that, in part, stated:

However, the failure to score Level III or above...does not preclude the LEA from promoting a student. After the second or third administration of a standard EOG test, a teacher or parent may request that the student be promoted even if the student has not scored Level III or above. Documentation submitted to [a] review committee may include student work samples, other test data, information supplied by parents, or any other information that verifies that a student is at grade level (Ward, 2001).

So, in North Carolina, as in other states, promotion and graduation decisions are no longer to be made solely on the basis of test scores. But does this mean that the NC testing program should not be considered a high-stakes testing program? To answer this question it is important consider various consequences, for teachers, schools, and school districts-in addition to students-that derive from testing and accountability programs like the ABCs. Testing programs like these can have serious indirect effects on schools, their teachers, and ultimately their students. As Brennan, et al., (2001) point out, "even in situations where there are no immediate high-stakes consequences attached to test performance (e.g., promotion is not conditional on successful performance), these tests can nevertheless affect students in various indirect ways that may still be considered 'high stakes.'" This might occur, for instance, when schools or districts, but not individual students, are sanctioned for poor performance on tests, when schools that fail to meet minimal criteria on the tests are denied attractive funding opportunities, or when the quality of the educational program is otherwise affected by test results.

National criticism of high-stakes tests. High-stakes testing programs have been receiving increasing attention over the last several years, both by the professional educational community and by the press. Most of this attention has been negative. In the popular press, education writers like Carolyn Kliner (2000), Mary Lord (2000) and Peter Schrag (2000) have been particularly critical of the high-stakes testing programs now in place in over half the states. The criticism is broadly targeted: too much time spent preparing students for testing; less emphasis on subjects not being tested; too much pressure on children, resulting in health-threatening levels of anxiety; bias against poor and minority youth; and so on. The professional literature, while more moderate in its hyperbole has tended to be no less critical. For instance, Koretz, et al. (1991) found that teachers tended to narrow the curriculum to focus only on materials covered by the test. And a recent study by Klein et al. (2000) showed that large gains on the Texas Assessment of Academic Skills (TAAS) were not matched by comparable gains on the National Assessment of Educational Progress (NAEP), suggesting that "schools are devoting a great deal of class time to highly specific TAAS preparation."

The Texas program has come under intense scrutiny particularly because of recent large increases in the percentage of students who pass the TAAS. The gains have been so striking that, it has been referred to as "The Texas Miracle" (Schrag, 2000b, Haney, 2000). Among its harshest critics are Linda McNeil and Angela Valenzuela, who studied Texas classrooms throughout the '90s. Among their findings that led them to conclude that the TAAS was harmful to both educational quality and opportunities for economically disadvantaged and minority youth, McNeil and Valenzuela (1999), cited the loss of valuable instruction time to drill and practice on TAAS-prepared materials, narrow teaching to tested content, and neglect of subjects not tested. They found, for instance, that Texas' over-reliance on test scores caused a decline in educational quality for those having the greatest educational need-poor and minority youth. Pressure to raise TAAS scores has led teachers to spend less class time (often much less time) on teaching important topics and more time (often much more time) practicing test taking. Again, this was especially pronounced in poor and minority schools.

Local criticism of the ABCs. While the national attention given to the North Carolina's accountability system has not been as intense as that given to the Texas system (or to the controversial accountability systems in other states-Arizona, Maryland, Massachusetts, Minnesota, and Virginia, for instance) it has nevertheless come under the critical eye of both the media and professional constituents of North Carolina. In a report commissioned by the John Locke Foundation's Alliance for Smart Schools, Haynes (1999) noted that while NC's scores on its own tests have increased, often dramatically, similar gains have not been shown for independent national tests such as the National Assessment of Educational Progress (NAEP) and the Iowa Tests of Basic Skills (ITBS) (although, to be fair, it should be noted that moderate progress has been noted in the past couple of years). Haynes also pointed out that the North Carolina system is less precise and less sophisticated than the Tennessee Value Added System (Sanders & Horn, 1998) which actually does track individual students over time.

Another report, this one published by the Common Sense Foundation, examined the "troubling" consequences of the ABC testing program (McMillan, 1999). Both intended and unintended consequences were examined, although most of the attention was given to unintended consequences. On the positive side, McMillan found that the program appeared easy to understand, that both administrators and teachers felt that the program has led to an increased focus on the SCS, and that the program appears to have restored both the public's and the business community's faith in North Carolina's schools. On the negative side, however, the tests were criticized for contributing to more stress and lower morale among teachers; for pushing (or leading to) teachers leaving low-performing schools (working in such schools is seen as an embarrassment); and for equating a teacher's worth to his or her students' scores on tests.

McMillan reported evidence to support the conclusions that non-tested subjects are being neglected; that teachers teach to the test through excessive test prep and concentration on content typically included in the tests; that critical thinking is less emphasized, especially due to the fact that the tests are multiple-choice tests; that the tests lack diagnostic information for students, due primarily to the fact that test results are returned to schools too late to be useful (in one survey of teachers, 57% did not find the tests useful for diagnosing student progress and problems).

Among negative, unintended consequences for students, McMillan (1999) noted too much pressure to perform well (leading to too much anxiety and stress); reduced challenge for high performing students due to the heavy emphasis on items that assess memorization; the tendency, in many schools, to remove low-performing students from tested courses and placing them in remedial courses so that they are excluded from testing; and the fact that many low-performing (hence, excluded) students are minority students or disadvantaged students. Similar findings have been reported by a team of researchers from the University of North Carolina headed by Gail Jones (Jones, et al.,1999) and have been suggested by McColskey and McMunn (2000), another team of North Carolina researchers. Additionally, Lyle Jones, a UNC professor noted that:

Unanticipated side effects of high-stakes testing programs are gradually being recognized. In North Carolina, teacher bonuses are awarded by the state to all teachers in schools for which test scores rise substantially above "an expected amount." That now is influencing both the retention and recruitment of teachers. Teachers are leaving schools that fail to meet test-score standards, and promising new teachers ask about the bonus and are attracted to schools that have earned the bonus. These prospective teachers then are less likely to accept positions at schools in which they may be even more needed (pp. 25-26).

Jones went on to cite as typical a case of one middle school in North Carolina where "students scoring below grade level are not permitted to enroll in elective courses but are assigned to remedial math and reading classes instead" (p. 26).

Other professionals, albeit few in number, have not been so harshly critical. Yet another researcher from UNC, Greg Cizek, has raised several points in defense of high-stakes testing. Among the beneficial consequences he lists improvements in professional development (greater focus on curriculum-relevant topics), increased knowledge about testing (because of high-stakes tests, educators have learned the importance of informing themselves about their content, construction, and consequences), and greater, and more sophisticated, use of information about students. After examining many of the published critiques, Meherens (1998) concluded that the evidence to support many of the claims of deleterious effects of high-stakes testing on curriculum and instruction is currently insufficient; although he did admit that while empirical evidence is sketchy it does seem reasonable to suggest that increasing the stakes for teachers will lead to more burnout, lowered morale, and an increase in the probability of unethical behavior. He argued that teaching too closely to the test results in the inferences that are derived from test scores being corrupted, in which case one can no longer make valid inferences to the tested domain. The result is the "Lake Wobegon" effect-all scores are above average. Typically, however, legislators and the public infer improvements in educational quality from improved test scores-this inference may be incorrect.

The question of validity. The problem of test validity is at the heart of concerns about unintended consequences of high-stakes tests. Ever since Messick's (1989) important chapter in the third edition of Educational Measurement, measurement experts have realized that test validity, rather than being considered an attribute of a test itself, is actually concerned with the inferences one draws from examinees' performances on tests. Inferences, valid when drawn from performances on tests used for their intended purpose, may not be valid when drawn from the performances on the same tests when used for other purposes. For instance, the ABC tests may yield valid inferences when the tests are used for their originally-intended purpose-monitoring school-level achievement-but may yield less valid inferences when used to make individual decisions regarding student promotion, retention, and graduation. This was stated clearly in the introduction of the National Research Council's report on high-stakes testing:

The important thing about a test is not its validity in general, but its validity when used for a specific purpose. Thus, tests that are valid for influencing classroom practice, "leading" the curriculum, or holding schools accountable are not appropriate for making high-stakes decisions about individual student mastery (Heubert & Hauser, 2000a, p. 3; 2000b).

Meherens and Kaminski (1989) considered several test preparation practices that they considered inappropriate when applied to most high-stakes tests. They stated flatly that practice or instruction on the same test is clearly unethical. Most would agree. But they also labeled as unethical practice (instruction) on published parallel forms of the same test, a practice similar to using recently released tests for test preparation or when any of a number of commercially-available coaching tests (e.g., Educaitonal Design, Inc., 2000) are employed for the same purpose. Of questionable validity, they cited instruction on objectives generated by looking at the objectives targeted by the tests (narrowing of the curriculum), and instruction that specifically matches the tested skills where the instruction follows the same item format as the test questions. To put it more plainly, if test scores reflect efforts other than those genuinely aimed at improvement in attaining the objectives throughout the instructional domain specified in the SCS then inferences drawn from those test scores about attainment of the objectives have questionable validity. When legislators point to the improvements that North Carolina students have made in terms of their performance on the EOCs and EOGs and use these improvements, in turn, to support claims of improvement in North Carolina's educational system then it is important to establish that the test scores reflect genuine improvement in educational achievement and that they are not artifacts of educationally-indefensible practices that lead simply to higher test scores. As Meherens and Kaminski (1989) pointed out "causal inferences about student achievement cannot possibly be accurate unless the direct inference about student achievement is made to the correct domain" (p. 15). In this case the correct domain for North Carolina students is the NC SCS.

Purpose of the study. The purpose of the study reported here was to determine whether any of the practices that can adversely affect the inferences often drawn from high-stakes testing are being employed in North Carolina schools. Previous studies (e.g., Jones, et al., 1999; McColskey & McMunn, 2000) have certainly suggested the existence of these practices, but those studies involved a much more limited sample than that used here. Of course, no survey of teachers' perceptions and opinions is ever conclusive. About the best that can be accomplished in a study like this one is to gain a perspective. If teachers believe certain practices are occurring then we have to assume that the likelihood that they are actually occurring is higher than if the teachers did not hold the belief.

Method

Survey. The survey was composed of 36 items asking for teachers' perceptions regarding several aspects of the ABCs, including consequences, test preparation, and the impact of the ABCs on instruction. Motivation for the items included in the survey came from several recently published sources, both directly related to the North Carolina testing program (Haynes, 1999; Jones, et al., 1999; McColskey & McMunn, 2000; MacMillan, 1999) and to statewide, high-stakes testing in general (Barksdale-Ladd & Thomas, 2000; Brennan, et al., 2001; Heubert & Hauser, 1999a, 1990b; Jones, 2001; Natriello & Pallas, 1999; Smith, 2000). The survey was constructed as a three page (front and back) foldout that could easily be mailed back. First-class postage was used for both out-going and return mail.

A special feature of the survey was that respondents, if they chose to do so, could respond via the Internet. Each survey included a unique password that a respondent could use one time. Respondents wishing to respond via the Internet were directed to a particular url where they were asked to enter their unique password and complete the survey. It was possible for respondents to return a completed hard copy of the survey and also respond via the Internet. Whenever this occurred, which was seldom, the hard-copy survey response was used in lieu of the Internet response. In the end, the Internet application proved unsuccessful. Only 78 teachers responded via this medium. For the results reported below, Internet responses are combined with hard copy responses and no distinctions are made between the two sets of data.

Sample. The initial sample consisted of 2,500 teachers randomly selected from a list of active members of the North Carolina Association of Educators (NCAE). Arrangements were made with the NCAE research office to furnish an Excel^® file containing the names and addresses of the randomly selected teachers. From this list, surveys were printed and addressed automatically using an independent contractor. The surveys were mailed out the first week of March 2000. Follow-up postcards, reminding teachers to complete (and return) the survey, if they had not already done so, were sent out the first week in April 2000. The survey phase of the study was terminated the first week in June 2000, by which time the only surveys being returned were those that were undeliverable. In accordance with my agreement with NCAE, all possible links to teachers' names after the closing date of the survey were destroyed.

RESULTS

Respondents. Of the 2,500 teachers in the original sample, a little less than half, 1066 (42.64%), responded to the survey, either by returning a hard copy (899 or 84.3% of the respondents) or by responding via the Internet (118, or 15.7% of the respondents). Of these, the surveys of 47 respondents had to be discarded because necessary classification data (e.g., grade-level assignment, subject-area assignment) were incomplete. The results presented below are based on the 1019 completed surveys from which teachers could be classified as elementary school teachers (Grades K - 8, a typical grade-level configuration in North Carolina) or high-school teachers (Grades 9-12). The two groups of teachers, elementary school teachers and high school teachers were determined by their reported grade-level assignment as shown in Table 1.

Overall, three times as many female teachers responded than did male teachers. Among elementary school teachers, 88.5 percent were female; among high school teachers, 72.7 percent were female.

The age distributions were fairly similar in the two groups. Among elementary school teachers the modal age category was 46-50 with 25 percent placing themselves in older categories. Among high school teachers the modal age category was 41-50, with 48.2 percent classifying themselves into higher age categories.

The two groups of respondents also showed similar years of experience. In both groups of teachers the median years of experience reported was 11-15 years. Both distributions were negatively skewed- elementary teachers more so than high school teachers (the modal experience for elementary school teachers was 21-25 years; for high school teachers, 11-15 years).

Minority-group preparation. Teachers were asked to indicate what percent of their school's students were non-white. The modal category reported by elementary school teachers was 25-50 percent, although the other categories (Table 2) were fairly evenly represented. The modal category reported by high school teachers was 10-25 percent, with most teachers indicating their school's non-white population to be between 10-75 percent (Table 2). With respect to their school's minority students, teachers were asked whether, in their opinion, racial and minority students received equal preparation to perform well on the ABCs. About two thirds (67.7%, elementary; 62.2%, high school) responded affirmatively. Among just those elementary schools students who teach tested subjects at grade levels where testing occurs the percent responding affirmatively was slightly higher (71.7%).

Purpose of the ABCs. The psychometric literature draws a distinction between achievement tests on the one hand and aptitude tests on the other. Generally, achievement tests are designed to assess the acquisition of relatively recently acquired knowledge-typically knowledge acquired within the past year. Aptitude tests, on the other hand, which are designed to predict future performance, typically assess knowledge acquired over a much wider span of time. The two types of test require somewhat different kinds of evidence to establish their validity. For achievement tests, evidence for validity usually rests on showing that the items in the test are clearly linked to the targeted learning objectives-the so-called learning domain, which includes not only the content covered by the learning targets but also the different levels of cognitive outcomes inherent in the description of the learning targets. For aptitude tests, where the emphasis is on predicting future performance, arguments for validity usually entail presenting evidence that the tests do, in fact, reliably predict.

The evidence for the validity of the ABC tests as achievement tests has been fairly solidly established NCDPI (1996a, 1996b). The tests' validity when used as an aptitude test has not been established. Yet is precisely this use, as an aptitude test, that underlies the use of the tests in making promotion/retention decisions.

Teachers were asked which of the following two statements, in their opinion, best illustrates the inference that can be drawn from a passing score on an ABC test:

Statement A: The student has satisfactorily mastered the corresponding instructional objectives in the Standard course of Studies.

Statement B: The student is adequately prepared for the next level of instruction (next year or next course in sequence).

Few of the teachers returning surveys agree solely with Statement B. Instead, 28.6 percent of the elementary school teachers and 27.4 percent of the high school teachers said Statement A best illustrated the inference that could be drawn from a passing score. Among elementary school teachers, 30.1 percent thought both statements illustrated an inference that could be made from a passing score. Among high school teachers the percent was 22.2. Interestingly, 26.2 percent of the elementary school teachers and whopping 40.9 percent of the high school teachers thought neither statement represented an accurate inference. The distribution of responses to this item among elementary school teachers who teach tested subjects in Grades 3 - 8 was virtually identical to that of elementary school teachers in general.

Consequences of ABC testing. Teachers were asked whether or not they thought each of several different consequences of ABC testing had occurred, or may be occurring, in North Carolina schools. Their results are summarized in Table 3. On the positive side, about two-thirds of the teachers reported that more attention has been given to the state curriculum. Furthermore 56 percent of the elementary school teachers and 60.9 percent of the high school teachers indicated that there were higher expectations for student performance as the result of ABC testing. Also, low percentages of teachers stipulated that students were being discouraged from taking subjects in which ABC testing occurs. On the other hand, many negative consequences were reported by large percentages of teachers. Nearly 90 percent of the elementary school teachers and two-thirds of the high school teachers indicated that subjects not tested were being neglected. Furthermore, large percentages of teachers felt that instructional time was being overused for practicing testing and nearly all of them felt teachers experienced more pressure for their students to do well in areas tested by the ABCs. Only about half the teachers thought there was greater standardization of instruction across schools, and few felt that low-performing students were receiving greater support.

The responses on this portion of the survey, for that subset of elementary teachers who indicated that they taught tested subjects at grade levels where testing occurred (Grades 3 - 8), were very similar to those of all elementary school teachers. In general, the percent responding affirmatively to what could be positive consequences tended to be a point or two higher (e.g., 57.2%, as opposed to 56.0%, responding that higher expectations for student performance had occurred; 63.1%, as opposed to 62.0%, responding that more attention had been given to the state curriculum). Furthermore, the percent responding affirmatively to what could be called negative consequences, also tended to be one to three points higher (e.g., 97.8%, as opposed to 95.3%, responded that there was more pressure on teachers for their students to do well in areas tested by the ABCs; 83.6%, as opposed to 80.2%, indicated instructional time was being overused for practicing testing).

Test preparation time. Teachers were asked to indicate about how much time they, personally, and the other teachers in their school, generally, spend preparing students for the ABC tests after the beginning of second semester (when ABC testing occurs). The teachers' response is summarized in Tables 4 and 5. Elementary school teachers reported spending more time preparing their students for testing (40.5% spent an hour/day or more) than did high school teachers (28.4%, an hour/day or more). At both levels, teachers generally thought the other teachers in their schools spent at least as much time as they did preparing students for testing.

By their owncount, over half the teachers in elementary schools spent an hour/day or more, and 35.1 percent of the high schools teachers spent an hour/day or more preparing students for testing during the second semester (presumably up until the time of testing).

Allocations of instructional time. When asked what determined the amount of time teachers allocated to various subject areas the teachers responded as shown in Table 6. Most teachers, at both levels, indicated that their allocation of instructional time was ruled mostly by the Standard Course of Studies or by district administration. However, an appreciable percentage of teachers (13.3%, elementary; 12.2%, high school) reported that their time was mostly governed by the ABC tests. Small percentages of teachers indicated that their use of instructional time was determined by their school's administration or by their own discretion.

With respect to their instruction in areas tested by the ABCs, a majority of the teachers (79%, elementary; 60.4%, high school) reported that the amount of instructional time allocated to these areas has increased over the last few years, with large percentages (45.8%, elementary; 60.4%, high school) indicating that the amount of time has increased appreciably. Furthermore, as ABC testing draws near, over three-quarters of the elementary school teachers, and over half the high school teachers indicated that the amount of time they spend on ABC-tested subjects increases. Moreover, most teachers (72.9% ,elementary; 65.7%, high school) reported that the time they have their students spend practicing tests like the ABCs increases as testing draws near.

Teachers also were asked whether their instructional practices (the way the teach) had changed over the past few years and, if they had, whether the change could be attributed to the ABC testing program. Most teachers (84.4%, elementary; 72.6%, high school) reported that their instructional practices had changed over the past few years and, of these, nearly all (91.8%, elementary; 86%, high school) said that the change could be attributed to the influence of the ABCs.

Teachers opinions about the effects of the ABCs. The survey included a series of questions asking for teachers' opinions about the effects of ABC testing and their (the teachers') attributions for those effects. The results of these series of questions are given in Tables 7 through 10.

In the first pair of questions teachers were asked whether they thought their students would attain higher levels of achievement than in previous years and then whether they thought any change in the levels of achievement their students attained could be attributed to the ABC program. As can be seen in Table 7, about half the (54.1%, elementary; 46.7%, high school) expected their students to attain higher levels of achievement. Among elementary school teachers of tested subjects, the percent agreeing that their students would reach higher levels of achievement was comparable (49%) to that for elementary school teachers in general. With respect to attribution for change, if any, the teachers agreeing that their students would reach higher levels of achievement than in previous years were split with about half (54.2%) agreeing that the change could be attributed to the ABCs. On the other hand, over all the teachers, not just those believing their students would attain higher levels of achievement than in previous years, only about a third of the teachers (36.9%, elementary; 31%, high school) thought that any change could be attributed to the ABCs program. With respect to only those elementary teachers who taught tested subjects at tested grade levels only 35.7 percent felt that any change in levels of achievement could be attributed to the ABCs.

But what about their school's students in general? The second pair of questions in this series asked teachers whether they thought the students in their school would attain higher levels of achievement by the end of the current year than in previous years and then whether they thought any change in the levels of achievement their school's students attained could be attributed to the ABC program. The results for this pair of questions is given in Table 8. There it can be seen that 49.7 percent of the elementary teachers and 45.7% of the high school teachers believed that the students in their schools would reach higher levels of achievement than in previous years. Furthermore, about equal percentages of elementary and high school teachers (42.9% and 42.6%, respectively) attributed any change in levels of achievement of their school's students to the influence of the ABCs. In contrast to their response to the questions concerning their own students, most teachers who agreed that their school's students would make gains in achievement tended to attribute the gains to the influence of the ABCs; those who did not agree that the students in their schools would attain higher levels of achievement did not believe the ABCs exerted much influence over achievement.

The next pair of items asked teachers whether they thought students in North Carolina would attain higher levels of achievement this year than in previous years and then whether they thought any change in the levels of achievement North Carolina's students attained could be attributed to the ABC program. The teachers' responses are summarized in Table 9. About half the teachers (52.3%, elementary; 50.6%, high school) agreed that students in North Carolina would attain higher levels of achievement than in previous years. Furthermore, most of these teachers tended to attribute the improvements to the influence of the ABCs. Over all, however, more teachers than not tended to discount the influence of the ABCs on achievement.

The final pair of questions in this series asked teachers whether they agreed that student attitudes toward school had improved this year over previous years and whether any change in attitude toward school could be attributed to the ABC program. The teachers' responses to these items are summarized in Table 10. Teachers at both levels of schooling overwhelmingly disagreed. In their opinion students' attitudes toward school have not improved. Among elementary teachers, it appeared that the lack of change toward more positive attitudes toward school tended more often than not to be attributed to the ABC testing program. This association was not as

strong among high school students.

A final question in the survey asked teachers whether they agreed that ABC tests have caused teachers to focus their instruction more directly on the objectives of the SCS. A large majority of teachers at both levels of schooling (81.4%, elementary; 86.2%, high school) agreed with this statement.

SUMMARY

To summarize, a majority of the teachers, believe that minority students are afforded an equal opportunity to perform well on the ABCs.

Teachers are unclear as to the purpose of the tests. Very few teachers thought that performance on the ABCs could be used to infer students' success at the next level of achievement, and only about a fourth of them thought the tests provided valid measures of student mastery of SCS objectives. In fact, among high school teachers, where many of those surveyed did not teach tested subjects, nearly half did not agree either that the tests measured performance on the SCS or that the tests could be used as predictors of future performance.

As for what teachers perceived as consequences of ABC testing, most teachers appeared to feel that, because of the ABC program, more attention was being given to the state curriculum and that there were higher expectations for students achievement. On the other hand the majority of teachers felt that subjects not tested by the ABCs were being neglected, that instructional time was being over used for test preparation activities, and that they felt more pressure for their students to perform well on the tests. With respect time spent preparing students for testing, higher percentages of elementary school teachers reported spending more time on test preparation than did high school teachers, probably because more elementary school teachers are directly involved in teaching areas that are tested.

Teachers at both levels, for the most part, indicated that their allocations of instructional time was ruled manily by the SCS or by district administration. Smaller numbers, though still appreciable, stated that they allocated their instructional time in response to the ABCs. Large percentages of teachers at both levels, however, did indicate that the amount of instructional time they devote to tested areas has increased (sometimes substantially) over the last few years. Furthermore, astesting time draws near the percent of time they spend practicing tests like the ABCs increases.

About half the teachers expected their own students to reach higher levels of achievement than in previous years, but only about half these teachers attributed the expected improvement to the influence of the ABCs. Overall, most teachers did not attribute any changes in the levels of achievement of their own students to the ABCs influence. On the other hand, when asked about the other students in their schools, or elsewhere in North Carolina, about half agreed that students would attain higher levels of achievement than in past years. Again, however, most teachers tended to discount the influence of the ABCs. Nearly all teachers responding to the survey disagreed that students' attitudes had improved over the years. The teachers attributed this lack of improvement in attitude directly to the ABCs.

References

APA (American Psychological Association, 2001): Appropriate use of high-stakes testing in our nation's schools. Retrieved November 18, 2001 from http://www.apa.org/pubinfo/ testing.html (APA OnLine: http://www.apa.org/)

AERA/APA/NCME (1999). American Educational Research Association, American Psychological Association, and National Council on Measurement in Education (1999). Standards for Educational and Psychological Testing. Washington, DC: American Educational Research Association.

AERA (American Educational Research Association, 2000). AERA Position Statement Concerning High Stakes Testing in PreK-12 education. American Educational Research Association. Retrieved November 20, 2001 from http://www.aera.net/ about/policy/stakes.htm.

AFT (American Federation of Teachers, 2001a-Created November 2). Making Standards Matter 2001. American Federation of Teachers. Retrieved November 20, 2001 from http://www. aft.org/edissues/standards/msm2001/.

Barksdale-Ladd, M. A.., & Thomas, K. F. (2000). What's at stake in high-stakes testing: Teachers and parents speak out. Journal of Teacher Education, 51(5), 384-397.

Brennan, R. T., Kim, J., Wenz-Gross, M., & Siperstein, G. N. (2001). The relative equitability of high-stakes testing versus teacher assigned grades: An analysis of the Massachusetts Comprehensive Assessment System (MCAS). Harvard Educational Review, 71(2), 173-216.

Cizek (2001, November 14-created). Unintended consequences of high-stakes testing. EducationNews.org: In defence of testing series. Retrieved November 20 from http://www. educationnews.org/ in_defense_of_testing_ series_uni.htm.

Education Week on the Web (1999). Quality counts. Retrieved on September 23, 2000 from http://www.edweek.org/sreports/qc99/ac/mc/mc-intro.htm.

Education World (2000, June 29-Creation date). School Issues: Are high-stakes tests punishing some students? Retrieved November 17, 2000 from http://www.education-world.com/ a_issues?issues093.shtml.

Haney, W. (2000). The myth of the Texas miracle in education. Education Policy Analysis Archives, 8(41). Available on line from http://eepa.asu.edu/epaa/v8n41.htm.

Haynes, D. (1999). Grading our schools '99: Second annual report to North Carolina parents & taxpayers. Raleigh, NC: NC Alliance for Smart Schools, John Locke Foundation.

Heubert, J. P., & Hauser, R. M. (1999a). High-Stakes Testing for Tracking, Promotion and Graduation. (Report of the Committee on Appropriate Test Use, National Research Council.) Washington DC: National Academy Press

Heubert, J. P., & Hauser, R. M. (1999b). High-Stakes Testing for Tracking, Promotion and Graduation. National Research Council (Executive Summary). Retrieved September 28, 2001 from http://www.nap.edu/html/ highstakes/highstakes.pdf.

IRA (International Reading Association, 1999). High-stakes assessments in reading: A position statement of the International Reading Association. The Reading Teacher, 53(3), 257-264.

Jones, M. G., Brett D. Jones, B. D., Hardin, B., Chapman, L.,Yarborough, T., & Davis, M. (1999, November). The Impact of High-Stakes Testing on Teachers and Students in North Carolina, Phi Delta Kappan, 81(3) 199-203.

Jones, L. V. (2001). Assessing achievement versus high-stakes testing: A crucial contrast. Educational Assessment, 7(1), 21-28.

Klein, S. P., Hamilton, L. S., McCaffrey, D. F., & Stecher, B. M. (2000). What do test scores in Texas tell us? Educational Policy Analysis Archives, 8(49) Retrieved on September 28, 2001 from http://epaa.asu.edu/ epaa/v8n49/.

Kleiner, Carolyn (2000, June 12). Test case: Now the principal's cheating . U.S. News and World Report.

Koretz, D. M., Linn, r. L., Dunbar, s. B., & Shepard, L. A. (1991, April 5). The effects of high-stakes testing on achievement: Preliminary findings about generalizations across tests. Paper presented at the annual meeting of the American Educational Research Association and the National Council on Measurement in Education. Chicago.

Lord, Mary. (2000, April 3-Created). High-stakes testing: It's backlash time. U.S. News and World Report. Retrieved November 17, 2001 from http://www.usnews.com/usnews/issue/000403/education.htm.

Lynd, C. (2000): CER Action Paper: The new generation of standardized testing. Center for Education Reform. Retrieved October 23, 2001 from http://edreform.com/pubs/testing.htm.

Linn, R. L. (2001). A century of standardized testing: controversies and pendulum swings. Educational Assessment, 7(1), 29-38.

Madaus, G. F. (1988). The distortion of teaching and testing: High-stakes testing and instruction. Peabody Journal of Education v. 65(3) 29-46.

McColskey, W., and McMunn, N. (2000, October). Strategies for Dealing with High-Stakes State Tests. Phi Delta Kappan 82(2),115-120.

McMillan, M. (1999, June). The troubling consequences of the ABCs: Who's acountable? Raleigh, NC: Common Sense Foundation.

McNeil, L. & Valenzuela, A. (2000, May 1-Updated). The harmfull impact of the TAAS system of testing in Texas: Beneath the accountability rhetoric. Boston MA: The Civil Rights Project, Harvard University. Retrieved September 23, 2000 from http://www.law.harvard .edu/civilrights/conferences/tesing98/drafts/ mcneil_valenzuela.html.

Meherens, W. A. (1998, July 14-Creation date). Consequences of assessment: What is the evidence? Educational Policy Analysis Archives, 6(13). Retrieved September 28, 2001 from http://epaa.asu.edu/epaa/v6n13.html.

Meherens, W. A., & Kaminski, J. (1989). Methods for improving standardized test scores: Fruitful, fruitless, or fraudulent? Educational Measurement: Issues and Practice, 8(1), 14-22.ion follows the same item format as the test questions.

NCDPI (1996a). North Carolina End-of-Grade Tests. Technical Report #1. North Carolina Department of Public Instruction.

NCDPI (1996b). North Carolina End-of-Course Tests. Technical Report #1. North Carolina Department of Public Instruction.

NCDPI (1996c, September). Setting annual growth standards: "The formula." Accountability Briefs. 1(1). Retrieved November, 1998 from http://www.ncpublicschools.org/ Accountability/reporting/asb_formula.pdf.

NCDPI (2000a). North Carolina Open-Ended Assessments: Grades 4 and 8. Assessment Brief, 6(11). North Carolina Department of Public Instruction.

NCDPI (2000b, June). Setting annual growth standards: "The formula." Accountability Briefs. 1(1). Retrieved December 21, 2001 from http://www.ncpublicschools.org/Accountability/ reporting/asb_formula.pdf.

NCDPI (2001a). North Carolina Testing Program General Information, Policies, and Procedures. North Carolina Department of Public Instruction. Retrieved October 23, 2001 from http://www.ncpublicschools.org/ accountability/testing/policies/TestProg0001.html.

NCDPI (2001b). Testing started with the ABC's and other myths about testing and accountability in North Carolina. North Carolina Department of Public Instruction. Retrieved October 23, 2001 from http://www.ncpublicschools.org/parents/myths.html.

NCDPI (2001c). The 2000-2001 North Carolina Testing Program Overview. North Carolina Department of Public Instruction. Retrieved October 23, 2001 from http://www. ncpublicschools.org/accountability/testing/policies/TestProg0001.htm/.

NCDPI (2001e). Refinement of the ABCs Awards and Recognition Categories. Retrieved on January 22, 2002 from http://www.ncpublicschools.org/Accountability/reporting /abcmain. htm.

NCME (National Council on Measurement in Education, 1994). Code of fair testing practices. Washington, DC: Joint Committee of Testing Practices.

NCTE (National Council of Teachers of English, 1999-Created). Resolution: On high-stakes Testing. National Council of Teachers of English. Retrieved on September 15, 2000 from http://www.ncte.org/resolutions/ highstakes1999.html.

NCTM (National Council of Teachers of Mathematics, 2001 [20 Nov]). http://www.nctm. org/about/position_statements/highstakes.htm.

Sanders, W. L., & Horn, S. (1998). Research findings from the Tennessee Value-Added Assessment System. Journal of Personnel Evaluation in Education.

Sanford, Eleanor. E. (1996). North Carolina End-of-Course Tests. Raleigh, NC:North Carolina Department of Public Instruction: Office of Instructional and Accountability Services.

Scrag, P. (2000a, January). Too good to be true. American Prospect 11(4). Available on line from http://www.prospect.org/print/V11/4/schrag-p.html.

Scrag (2000b, August): High stakes are for tomatoes. The Atlantic Monthly, 286(2), 19-21. Available on line from http://www.theatlantic.com/issues/2000/08/schrag.htm.

Smith, M. L. (1991a). Put to the test: The effects of external testing on teachers. Educational Researcher, 20(), 8-11.

Smith, M. L. (1991b). Meanings of test preparation. American Educational Research Journal, 28(3), 521-542.

Smith, M. L. & Fey, P. (2000). Validity and accountability in high-stakes testing. Journal of Teacher Education, 51(50, 334-344.

Ward, M. (2001, March 29). Memorandum to LEA Superintendents, Directors of Charter Schools, LEA Test Coordinators, LEA Directors, and Exceptional Children Programs. Retrieved on September 13, 2001 from From http://www.ncpublicschools.org/news/00-01/040201.html.

1. Paper presented to the Graduate School on March 27, 2002. This paper is based on research supported by a grant provided by the Appalachian State University Graduate Research Council.