# Which Curriculum Is Best?

#### Presented November 14-15, 1998

*A Talk Presented by UCSMP Director Zalman Usiskin at the Fourteenth Annual UCSMP Secondary Conference, November 14-15, 1998. This article was published in UCSMP Newsletter No. 24: Winter 1998-99.*

More precisely, "Which Identifiable Available Mathematics Curriculum, When Implemented, Is Best?" All schools need to answer this question every time new materials are picked. But, as it turns out, almost no one wants to do the research needed to find the answer.

Decisions regarding which curriculum is best for your students are always difficult to make. Quite a wide variety of curricula are available, and proponents of each of the available curricula are around to make a strong case for their favorite. But rarely have any of the curricula been tested. In fact, because of state adoption requirements, it is impossible for publishers to test their materials. The length of time between the announcement of the state guidelines and the deadline for submitting materials is rarely even close to the three-year minimum that it takes to write materials, give them a year of testing, and revise based on that testing. Consequently, the evidence that exists to make a decision is most often just belief supported by anecdote.

*Point 1: Few decisions regarding choice of curriculum are based on evidence that the curriculum works.*

Decisions regarding which mathematics curriculum is best are often made or greatly influenced by individuals who have very little knowledge of the school mathematics classroom or of your school's classrooms, or must be made hastily by those who do have knowledge. The result is that you are often asked to teach using materials that no one has tested, following guidelines that reflect dreams more than reality. The materials have passed that casual review known as the "flip test." Such a test, however, cannot determine whether the explanations are clear, the material is organized in a cohesive manner, the questions are appropriate, the teachers' editions give enough suggestions for the diverse sets of students likely to use the materials, and the additional materials available provide the support for which they are intended. Only teaching and testing the materials can give us this type of information.

## What Constitutes a Good Comparison Study?

Let's suppose we want to conduct a study comparing different curricula. The same general principles work whether we are thinking about testing a single-year course or the entire mathematics curriculum in a school. Statisticians are as rigorous about the design of a study as mathematicians are about the design of a proof. The results from a study with a lousy design are no more valid than the results from a proof with logical gaps.

A fundamental principal is that a good comparison study is not one that is designed to show that a curriculum is best, but *whether* a curriculum is best or *which* curriculum is best. In a good study we do not know in advance what the outcome will be.

The study must be one that is not biased for or against any particular treatment. This is typically done by ensuring the following criteria:

- The treatments to be compared are clearly identified.
- The samples being treated are equated.
- The treatments are under the same conditions, or as similar as can be expected.
- The instruments (tests, questionnaires, interviews) are constructed or chosen to test clear criteria on which the treatments are to be judged.
- The analysis is fair and complete.

Ideally, a sixth criterion is needed: - The subjects of the study do not know which treatment they are receiving.

This last criterion, an important characteristic in medical studies, is essential to take into account placebo or Hawthorne effects. But in education, it is difficult to keep the identity of a treatment from its users. Besides, following the famous aphorism "if it works, don't fix it," many people do not care whether a positive effect is placebo, Hawthorne, or real, as long as there is a positive effect.

If *any one* of the first five characteristics is not present, the study has enough of a fundamental flaw to make its results unreliable. Doctoral dissertations, even though carried out by research neophytes, are often the most reliable studies because their design and the final report must be approved by a team of professors. But doctoral students usually do not have the resources to mount large studies. Very few comparison studies of curricula as doctoral dissertations involve more than a few classes. The problem with using only a few classes or a few schools is that individual school and teacher effects on curricula are very large. The quality of your teaching greatly determines what your students learn. Studies with just a few classes or in just a few schools can and do suggest what might happen in a larger study but, by themselves, are too small to generalize.

## Examples of Flawed Comparison Studies

Here are some examples of studies in which the criteria necessary to make the study valid are not present. My first examples are of attempts to compare full curricula. The most publicized have been the attempts to compare the U.S. curricula with those of other countries. The TIMSS 12th-grade results for our advanced mathematics students have been used as evidence that even the best U.S. students – those who are taking advanced mathematics – perform poorly. For the U.S., the group of advanced mathematics students included all students taking precalculus mathematics or calculus, what was calculated to be 14% of U.S. students. In terms of UCSMP courses, I think that both FST and PDM students would be considered as part of this sample. A minority of this 14% were taking calculus.

The distribution of items on the TIMSS test is most interesting. Examine this table from the published report.

### Table 1: Distribution of Advanced Mathematics Items by Content Category, from the Third International Mathematics and Science Study (taken from Mullis et al., 1997, p. B-9, Table B-2)

Category | % of items* | No. of points |
---|---|---|

Numbers & Equations | 26 | 22 |

Calculus | 23 | 19 |

Geometry | 35 | 29 |

Probability & Statistics | 11 | 8 |

Validation & Structure | 5 | 4 |

Totals | 100 | 82 |

*There were a total of 65 items.

Would anyone view this as a reasonable distribution of the mathematics U.S. students should know when they are seniors? For U.S. students, there is too much emphasis placed on geometry and too little on algebra.

This isn't the only problem. All of the conclusions of this population falsely assume that there exists a standard curriculum taken by all U.S. students. The researchers concluded that the U.S. curriculum is not as effective as other countries even though there is not a uniform curriculum in the U.S.

Also, it happens that the U.S. did not meet the international sample requirements. Only six countries did. So we have a study in which we don't have well-identified treatments, we do not have equated samples, and the instruments are faulty. That is, at least three of the characteristics of a good comparison study were missing.

*Point 2: The TIMSS 12th grade study lacks several essential characteristics of a good study; furthermore, the interpretations of it in the U.S. have ignored the lack of uniformity of our curricula.*

These faults do not mean that the conclusions based on the TIMSS results are necessarily wrong. It may be true that U.S. students as a whole perform poorly compared to the rest of the world on things that we deem important. But we cannot use the TIMSS results as evidence for this.

There are plans to replicate TIMSS again within the next couple of years. The weaknesses that I have mentioned can be removed, and I hope that will happen. International studies can provide important information to us that internal studies cannot.

Two other sources of evidence are often used to judge how well we are doing in school mathematics.: the SATs and the National Assessment of Educational Progress (NAEP). On both, scores have been increasing. The mean SAT score of seniors in the nation has steadily increased from 501 in 1992 to 512 in 1998. Is this increase due to curricula based on the NCTM Standards? I'd like to think so, since UCSMP was the only such curriculum used during much of this time. But even if as many as 10% of SAT-takers used our materials, using UCSMP texts would have had to have over a 100-point effect on those students to get a 11-point effect on the mean.

These faults do not mean that the conclusions based on the TIMSS results are necessarily wrong. It may be true that U.S. students as a whole perform poorly compared to the rest of the world on things that we deem important. But we cannot use the TIMSS results as evidence for this.

There are plans to replicate TIMSS again within the next couple of years. The weaknesses that I have mentioned can be removed, and I hope that will happen. International studies can provide important information to us that internal studies cannot.

Two other sources of evidence are often used to judge how well we are doing in school mathematics.: the SATs and the National Assessment of Educational Progress (NAEP). On both, scores have been increasing. The mean SAT score of seniors in the nation has steadily increased from 501 in 1992 to 512 in 1998. Is this increase due to curricula based on the NCTM Standards? I'd like to think so, since UCSMP was the only such curriculum used during much of this time. But even if as many as 10% of SAT-takers used our materials, using UCSMP texts would have had to have over a 100-point effect on those students to get a 11-point effect on the mean.

National Assessment scores show that it is unlikely that the increases are due to the NCTM Standards. An NAEP long-term study published in 1996 [NAEP 1996 Trends in Academic Progress] shows that between 1990 and 1996 the increase in National Assessment scores for 17 year-olds was only two points. Between 1982 and 1990 a larger increase of 6 points occurred for the same age group. We can assume that National Assessment samples are equated, so here we are comparing the full curriculum to that level in one year to the full curriculum in another year. An increase of 10 points is roughly considered to be an increase of one grade level. Why, if students are taking a year more mathematics, is there such a small increase in these scores? The answer likely comes from a different direction – the NAEP test does not include advanced mathematics, and so it does not reflect the additional mathematics known by many of the students.

For 13 year-olds, there has been a 10-point increase since 1978, but most of the increase occurred before 1990. And for 9 year-olds, the major increase occurred between 1986 and 1990. All of this suggests that the movement in the 1980s to stress problem-solving has so far had more impact than the NCTM Standards. But again this analysis does not prove the connection, because in no case – for either SATs or National Assessment – do we know whether students who were using texts that might be identified as stressing problem-solving performed better than other students.

*Point 3: Because the SATs and NAEP tests do not chart textbook use, we cannot use these national tests to evaluate mathematics curricula. *

## A State Test

Let me move now to state tests. In many states in recent years, performance on state tests has been used as an indicator of how well schools are doing in teaching mathematics. Interesting data was sent to us a few years ago by Bruce Budzynski of Ludington, Michigan. He copied an article from the Detroit Free Press indicating the performance of the 25 best and 25 worst school districts in the state on the Michigan Educational Assessment Program (MEAP) test at grade 10. I tried to determine which of the highest-performing districts were using UCSMP texts, and looked at Scott Foresman sales summaries sent to us. These summaries show which districts had purchased more than 10 of the various UCSMP books. UCSMP texts were being used in 17 of these 25 districts, with an average of 3.6 books being used in them.

But would these schools have scored high anyway? The Detroit Free Press included the percent of students in each district eligible for free lunch. There are three school districts on the list that have over a quarter of their students classified as in poverty. In all three districts, four UCSMP texts are used. This puts UCSMP in an even more optimistic light. We would love to have published these data as a great study of UCSMP. But in these schools we do not know how many students who took the MEAP had studied from UCSMP texts. No one ever collected that data, and without it we cannot tell much. So we do not have a well-identified sample, and without such identification, even the percent of poverty students is not sufficient to make a definite conclusion possible.

Remember the 5th criterion for a good study? The analysis should be complete. The data on the 25 lowest-performing districts show a number of things that indicate why we need more information to be reasonably certain that using UCSMP texts raises scores on the MEAP. Ten of the 25 lowest-performing districts had bought UCSMP texts. So it could be that 17 of the 25 schools at the top in Michigan use UCSMP texts only because so many districts in the state use our materials. But an analysis by district has problems, for in some large districts the usage is misleading; for example, we think that only one public high school in Detroit was using our materials. The conclusion we would like to make is that the use of UCSMP materials helps schools score better on the MAEP. It seems that this is the case, but we do not have enough data to be certain.

A few years ago, Joyce Camara sent me the data from the Rhode Island state test. Her school district, which had never scored at the top of the state, went to the top after using UCSMP texts. Top-performing high schools in California and Illinois use UCSMP texts, though not with all their students. And just two days ago, the Illinois state test scores for last year - the 1998 IGAPS - were published, and five of the eight top-scoring elementary schools in the Chicago suburbs use UCSMP Everyday Mathematics in all their grades K-6.

What can we conclude? UCSMP materials are used in a great number of places, so even by chance there should be some states in which UCSMP schools are the top-performing schools. We cannot conclude from this anecdotal evidence that UCSMP materials will make your school into a high-scoring school. But we can conclude that UCSMP materials are used in some of the highest-performing schools in our country. Do they use UCSMP materials because they are high-performing? Or are these schools high-performing because they use UCSMP materials? It's probably both.

*Point 4. UCSMP materials are used in some of the highest-performing schools in the U.S.*

In both Bruce Budzynski's and Joyce Camara's schools, the MEAP scores increased significantly after UCSMP texts were adopted. We would like to think that these increases are due to our books, and that is likely to be the case, but we cannot claim that, because we know that other factors are possible. Everyone knows that in different years students can behave quite differently. This year's 9th grade class may have had a reputation for many years as a good class, or as a poor one. We have no way of equating classes unless we go four or more years back into school records to find out how the students scored before they began using UCSMP materials.

Other schools have reported that their SAT scores have increased since the school began using UCSMP texts. The SAT is an optional test – the students taking the test this year could easily have different characteristics than students taking the test a few years ago. For instance, maybe they are getting more help from tutors. We would like to think that use of UCSMP texts raises scores on the SAT and ACT exams, if for no other reason than so many UCSMP students are a year ahead of where they would have been in some other curriculum. Dan Hirschhorn's doctoral dissertation studied PSAT scores on matched pairs of students, showing far better results for UCSMP students in two of three sites. So we think UCSMP students do better on the SATs, but we have no published study of this.

*Point 5. There is anecdotal evidence, but no definitive study, that UCSMP students perform better on the SATs than students in other curricula. *

## A Brief Summary of UCSMP Comparison Studies

One of the nicest studies we've heard of with UCSMP texts was in the Lexington County School District One in Lexington, South Carolina. This study, done by John Swann, won the distinguished paper award from the South Carolina Educators for the Practical Use of Research. It involved the use of Transition Mathematics with 6th grade students. It has all the characteristics of a good study. The 260 students in the study were matched with 260 students from the previous year in both reading and mathematics based on the South Carolina Test of Basic Skills . Of these, 70 pairs took the Preliminary Scholarship Aptitude Test. TM students scored statistically significantly higher on the mathematics portion, about 2.5 points higher. The two groups scored the same on the verbal portion. What is particularly significant about this is that these were all top students in the district, so the improvement occurred among the hardest students to improve, namely the best.

This would seem to be something for which we at UCSMP should call a press conference. A school sends us data that shows that students in UCSMP classes outperform students in other classes. They have equated samples, and the treatments are in the same school under as close to the same conditions as possible. The tests seem to be fair tests. It is evidence, but not enough evidence. To give a mathematical analogy, it is like having verified Goldbach's conjecture – that every even number > 2 is the sum of two prime numbers – for all even numbers up to 1 million or so. It gives evidence, but not enough evidence. We have not tested all even numbers – we have not even tested 1% of the infinitude of even numbers – and so we have no proof. The logical problem with the statistics is similar – we have not taken into account all the places that our materials are used. There may be something else going on in this school district; for example, special inservices that have juiced up the teachers and in turn caused them to raise expectations for their students, or expectations of the state that have increased. Another school district might not have the same results.

Single-site evidence like the South Carolina study is used all the time in commercials and infomercials. It is evidence, but it is insufficient evidence to make a conclusion that one curriculum performs better than another, except in that location that reported the evidence. Of course we want to use those places in which UCSMP has been successful as evidence that UCSMP is a better curriculum than others out there. If that were our only evidence, however, we would be being dishonest. Just as in the example of the Detroit News top 25 schools, the analysis is not complete. But we believe we have done rather careful studies, and our studies give us consistent results.

## The UCSMP Studies

With a grant of $500,000 from the Carnegie Corporation of New York when we developed the first edition of UCSMP, we conducted large comparative studies with Transition Mathematics, Algebra, and Geometry. We also conducted a small study with Advanced Algebra whose results were even more positive than those from the first three books. Later, while developing the second editions, we conducted reasonably large studies comparing the first four of our books with other books and with our first editions. We could not conduct comparison studies with either edition of FST or PDM because there do not exist textbooks similar enough for comparison.

Usually we would have two pairs of classes in a school, each pair with one class using UCSMP materials, the other a class felt by the school to be matched and using the school's usual materials. Typically, only one of these pairs of classes in a school would match. Perhaps we should not have been surprised. Think about the times that you have taught the same course twice or more in a day – do the two classes seem matched in prior mathematical knowledge? For a study, it is important that the classes actually match, not statistically match, because when classes are different, there can be effects on the teaching even when the courses are the same course.

It is also important that the matched classes be in the same schools, because schools differ greatly in the number of days available for class and in the amount of work expected. You might think it would be best to have the same teacher teach both treatments, so there would be no teacher difference, but then the two treatments get contaminated. We did not offer any inservice to the UCSMP teachers so that the treatments would be, as much as possible, under the same conditions. Actually, this gives some favoritism to non-UCSMP treatments, because those teachers are likely to have had experience with their texts.

Let me summarize what we found. The top part of the table gives results from the first edition studies. You can see that TM, Algebra, and Geometry students held their own against comparison students on traditional standardized tests. The means are not always as high, but there never is a statistically significant difference. In the second edition studies, shown in the lower half of the table, the UCSMP means are always higher, but they are never statistically significantly higher.

### Means on traditional standardized tests: UCSMP vs. comparison

Course | n | 1st edition comparison | n | |
---|---|---|---|---|

TM | 280 | 23.8* | 24.2* | 274 |

Alg | 226 | 20.6* | 20.1* | 190 |

Geom | 349 | 14.3* | 14.7* | 360 |

n | 2nd edition comparison | n | ||

TM | 56 | 22.9 | 22.8 | 53 |

Alg | 75 | 18.0 | 16.9 | 62 |

Geom | 147 | 17.0 | 16.8 | 122 |

Adv Alg | 177 | 20.3 | 15.5 | 180 |

The second type of test was over content not covered on the first test but believed by us to be important. Some of this content was traditional. For instance, in algebra we tested finding the slope of a line. Some of this was not so traditional. For instance, in all courses we included questions on applications. Whereas traditional standardized tests are biased against UCSMP texts because they do not cover the breadth of material we cover, the second test was clearly biased in favor of UCSMP students. So we are not surprised that UCSMP students outscore their comparison counterparts on every test. Often the differences are significant both statistically and educationally. One type of test was not biased in favor of UCSMP students. The proof test covered only traditional proofs.

## Means on tests of other content: UCSMP vs. Comparison

Algebra Content appropriate to the level of the course | ||||

Course | n | UCSMP | comparison | n |

TM-1st* | 414 | 38.4 | 36.3 | 396M |

TM-2nd | 56 | 9.9 | 9.6 | 53 |

Alg-1st | 226 | 38.0 | 28.0 | 190 |

Alg-2nd | 75 | 18.7 | 14.1 | 62 |

Geom-1st | 184 | 2.1 | 1.9 | 197 |

AdvAlg-2nd | 177 | 20.3 | 15.5 | 180 |

Geometry Content appropriate to the level of the course | ||||

n | UCSMP | comparison | n | |
---|---|---|---|---|

TM-1st* | 293 | 9.7 | 8.5 | 294 |

TM-2nd | 56 | 8.0 | 7.4 | 53 |

Geom-1st | 349 | 14.2 | 11.0 | 360 |

Geom-2nd | 147 | 17.9 | 12.7 | 122 |

Problem Solving appropriate to the level of the course | ||||

Course | n | UCSMP | comparison | n |

Alg-2nd# | 75 | 5.5 | 3.1 | 62 |

Alg-2nd# | 75 | 6.6 | 3.2 | 62 |

AdvAlg-2nd | 177 | 10.4 | 6.4 | 180 |

Proof | ||||

Course | n | UCSMP | comparison | n |

Geom-1st# | 184 | 6.5 | 5.0 | 197 |

Geom-1st# | 165 | 7.6 | 7.1 | 163 |

*means of class means; all others are means of student scores

# two different forms of the test were analyzed separately

*Point 6: UCSMP students (1) consistently score as well as comparison students on traditional standardized tests and (2) consistently significantly outscore comparison students on applications and other content stressed in UCSMP courses. *

The two types of tests taken together show that UCSMP students do learn things that their counterparts in comparison courses do not learn and suffer no loss in what is tested on traditional standardized tests. This does not mean that they are better on all content. For instance, UCSMP *Algebra* students are worse at dividing a trinomial by a binomial than their comparison counterparts – that material is not in our book. But UCSMP students are better on the algebra common to old and new curricula, so that the total scores on a traditional test are not different.

We also tested UCSMP 2nd edition students against their 1st edition counterparts. In three of the four courses the 2nd edition students tend to score higher, while in Advanced Algebra they scored lower, but none of these differences, higher or lower, is statistically significant.

*Point 7: UCSMP second-edition and first-edition students score similarly on virtually all tests.*

When we came to test FST, we were faced with a problem of not being able to have equatable samples. FST tends not to replace an existing course, but to create a new one. While we could find students who had the same characteristics as FST students, no other available course taught both functions and statistics. This brings us directly back to the original question: Which curriculum is best? If a course or curriculum does something that no other curriculum attempts to do, and it succeeds at what it attempts to do, then no comparison study is needed. And many people have used FST for exactly that reason: it is a unique course, and it works.

In the six years since the first edition of FST appeared, several of the new NSF projects have statistics as an important ingredient of their curriculum. I think FST integrates the statistics in with standard content more than other available materials, but you might like the approach taken by other projects. Still, there is no one-year course comparable to FST.

Consequently, when we came to test FST, we used items from the Second International Mathematics Study (SIMS). (This was the precursor to TIMSS.) FST students performed at the average of U.S. pre-calculus students on the functions and trigonometry items, despite the fact that FST students also spent much time on statistics. We do not know if our sample consisted of worse or better students to start with than the SIMS students. This is a fundamental problem when there is no matched group. But we do know that at the beginning of the year the FST students knew about as much as Advanced Algebra students knew at the end of their year.

The evaluation of the first edition of PDM was done by Denisse Thompson as her doctoral dissertation. The dissertation consists of two large volumes and I am not going to attempt to summarize all she found. But I will point out two comparisons that were made, comparing PDM students with SIMS students. Remember that FST students scored about the same as U.S. Precalculus students on the SIMS precalculus items. PDM students, as you might expect, scored quite a bit higher. In fact, PDM students scored very much like the U.S. Calculus students on SIMS. So we would expect that PDM students going into a calculus course would be quite a bit better than their counterparts on the kinds of questions asked in these international studies. [Denisse Thompson, An Evaluation of a New Course in Precalculus and Discrete Mathematics, Ph.D. dissertation, University of Chicago, 1992.]

*Point 8: Preliminary evidence indicates that FST students score similarly as 12th grade precalculus students; PDM students score similarly as calculus students.*

No reform curricula were in our studies. But that does not mean that all of the comparison courses were alike. Our studies were not large enough to allow us to speak about specific comparison curricula. We should not assume that the comparison curricula are alike any more than we can assume that all the reform curricula are alike. It could be that there is some curriculum out there that would consistently outscore UCSMP, but no such curriculum has surfaced in our studies.

## The Danger of Ignorance

We have been taught to recognize bad science. A person touts some herb as a cure for some common condition. In the infomercial or literature, a number of people offer testimonials. We are not told how many used the herb without being cured. We are not told how many would have been cured without the herb. For some, a placebo might have cured the condition. We are not told of potential harmful side effects.

I purposely use the example of an herb because in the United States herbs do not have to undergo the kinds of testing that prescription medicines must have. A prescription drug must undergo a careful study showing that it generally has the good effects it is supposed to have. And when the study is done, one finds out about potential harmful side effects. That is why, when you buy prescription drugs these days, you receive literature detailing what the drug can and cannot do, and what side effects it may have.

We care enough about our physical health to have put these kinds of monitors in place. We care so little about the health of our educational system that we have little idea what is going on, let alone what things work better than others. This is why we have situations like that in California, where a state framework that is irresponsible because few of its ideas have ever been tested, is replaced by another framework that is irresponsible because it ignores all the changes and advances in the field in the past generations.

History tells us that there is danger in our ignorance. Scores on the SATs rose in the 1950s and peaked in 1963. Then a decline began. By 1970, the decline was interpreted by many as an indication that new math had failed. It was used by those who did not agree with the new math to argue for a skill-driven curriculum, what was called back-to-basics. In 1976, too late to be of use at the time, an SAT commission concluded that 7-year decline from 1963 to 1970 was due to an increasing population of students taking the SAT, and not due to any curriculum.

The decline continued through the 1970s, reaching bottom around 1981. The same SAT commission said that the decline from 1970 to 1975 was a real decline, not due to a change in the population of test-takers. Was *this* part of the decline due to new math? Or was it due to back-to-basics? Or was it due to a more general malaise of students, represented in the even greater decline on the verbal test?

*Point 9: We have no consistent data on the materials used in U.S. classrooms, and so we cannot make any judgments regarding the effects of those materials on the quality of performance of U.S. students.*

One thing seems clear to me. The lowest scores of U.S. students occurred during the era of back-to-basics and just afterwards. From the 1920s on, studies generally have shown that teaching skills without paying attention to the underlying mathematical and applied concepts results in poorer performance than teaching skills in isolation. This knowledge is the reason that the leadership of NCTM and most mathematics educators of my age so strongly oppose those who believe in curricula driven by skills without understanding. But our lack of connections of these studies to particular curricular materials has meant that virtually anyone can claim virtually any result for their materials.

## The State of Research

Can researchers help us? Each year a compilation of all published research in mathematics education – articles, dissertations, and books – is made available. It takes time to put this together, so the two most recent compilations are for 1995 and 1996. The number of studies that compare identifiable curricula are astonishingly few. Only 5 of 627 studies in 1995 and only 3 of 529 published studies in 1996 are textbook comparisons at any level – elementary through college. And this lack of studies existed even though in recent years there has been more textbook development activity than perhaps any decade ever.

There are so few studies that I can identify them for you. The two elementary studies are published papers by Bill Carroll on UCSMP Everyday Mathematics. One middle school program is on Saxon's 6th grade, the other compares commercial textbooks with performance on the 7th grade MEAP. One high school program tested Saxon's Algebra, the other the CORD applied mathematics. Both college programs tested the Harvard Project Calculus. I may have missed some studies because a program was not identified in the summary or in the title. But even if I missed twice as many as I found, less than 2% of the studies in mathematics education were textbook comparisons.

In contrast, there were lots of studies of graphing calculators, a number of studies of geometry drawing programs, some of symbol manipulators. But these are tools, not curricula. They have no definite sequence and an unclear scope and there is no reason to expect that different teachers would use these tools in anything resembling a consistent manner, nowhere near the consistency we get from textbook use. It is ironic that there were a number of studies on "cooperative learning", a "technology-rich" curriculum, or a "constructivist-based" curriculum, but not on specific textbooks. This ignores the fact that there are many ways to implement any of these ideas. A particular method of implementing an idea may be so poor that it results in poor performance, but another implementation of the same idea may be quite effective. It is likely that the results from studies of textbook use are more consistent than the results of studies of these instructional practices.

*Point 10: Researchers avoid studies of textbooks, where the treatments are roughly replicable, and instead study instructional techniques, where the treatments are seldom replicable. As a result, little is learned that can be passed on to others.*

Because we have so few textbook studies, schools tend to pick books based on the surface features of the book and not on whether the textbook is effective or not at reaching those goals. As a result, schools sometimes pick books that do a lousy job of reaching their goals, yet conclude not that the book was lousy but that student failure is to be expected in mathematics. And as a result, schools sometimes decide that an idea behind a book is a poor one because the book has done such a poor job at the idea. They don't know that there might be another book available that does a good job.

## We Do Not Know What Materials Are Being Used

What can we do? Perhaps we in the business of educating can learn from other sectors of business. Each week U.S. car manufacturers announce how many cars of each make they produced in the previous 7 days. Car production is a barometer of our economy. It is not the only barometer, but we can tell whether people are preferring large cars to small ones, minivans to station wagons, gas guzzlers to gas economizers, and so on. The information is quite public and readily available in such publications as the World Almanac and Book of Facts.

Do we have comparable figures for textbooks in use? Not even close. Each year - not each month, not each week - publishers who are members of the Association of American Publishers report their total sales for *all* the grades K-12 in *all* subjects. No detail is given, not by book or even by subject. Why do they hide the data? Because they feel that their competition might use the information. Bunk. Automobile manufacturers are no less competitive than textbook publishers.

As a result of this ignorance, both in the time of new math and in the past few years of Standards reform, there are those who reported that the curriculum was not really changing in schools, and those who reported that there were major changes. What actually happened?

There may be another reason why analyses are not made of textbooks. School textbooks are the *persona non grata* of the publishing industry. The authoritative source for hardcover books is Books in Print. Children's books, reference books, trash novels, sex guides - all of these are in Books in Print. But not books published for grades K-12. College textbooks are in Books in Print, but not schoolbooks. I wrote a precalculus book in the early 1970s. It was published by the college division of McGraw-Hill so it was listed in Books in Print. Had the school division of McGraw Hill-published the exact same book it would be listed nowhere except in the school division's catalog. Go into your favorite bookstore and attempt to buy a schoolbook. You can't because they will not be able to find it. Try amazon.com or barnesandnoble.com. You will not find the book.

Research indicates that most mathematics classes have a single textbook that they follow rather closely. That we have very little idea what these textbooks are is a disgrace. We know from our studies that students using UCSMP textbooks learn different things than students using other texts. It is fair to assume that other textbooks also have this property. Certainly in this era in which there are quite different mathematics curricula out there, someone should be looking at the relationship between the textbooks in use and performance.

*Point 11: We need an ongoing system that tells us which textbooks and other materials are being purchased and which are in current use.*

In some state adoption states, information already exists about textbooks that are purchased, and - since public funds are used for textbooks in public schools - one could obtain this information from all public schools. But it seems easier to go to the source and obtain agreement from all publishers to give annual reports on all textbooks sold. What is in it for the publishers? They give up their knowledge of their own sales, it is true, but they obtain the knowledge of everyone else's. The trade would seem to be worth it. And the value for educational policy would be immense: we would have at least one measure of what is going in mathematics classrooms.

We also need a way to tie textbook and other material use to performance on generally recognized tests. I know that TIMSS collected data on the textbooks used by the students who took their tests, but they have not published these data. It would be interesting to know what textbooks were in use in 4th, 8th, and 12th grade, and even more interesting to know how students who used these textbooks fared on the TIMSS tests. This knowledge would not necessarily tell us which curriculum is best, because the schools that have better students to begin with might be using different textbooks than schools with poorer-performing students, but it would at least tell us more than we know now. And, if we know something about the socio-economics of the various communities, we might be able to do some statistical work relating student performance with the materials they have used. But, in order to do so, we have to make textbooks an object of study, just as we do over-the-counter drugs.

*Point 12: We need a greater number of studies that tie textbook use to performance.*

## It Won't Be Easy

Some will say that the connection between materials and performance cannot be done. Teachers use textbooks differently. Sometimes they use more than one textbook. Sometimes they do not use any textbook at all. Each year most students get a new textbook, so that the performance of a student is affected by different books written by different authors with varying amounts of attention to different skills and properties and uses and representations. In fact, one book may do a poor job at something and a school purposely follow it with another that makes up for its weaknesses. And students move from class to class, or from school to school, sometimes more than once in the same year. Under such varying conditions, how can cause and effect possibly be established?

Every one of these difficulties exists in long-term medical studies. People take different doses of medicine than they were supposed to, or forget to take a dose, or become sick and take something else. They eat all sorts of food with all kinds of vitamin supplements. They exercise in varying amounts at different places. Some smoke. None of these difficulties has kept us from designing studies and determining, from those studies, that certain practices are healthier than others.

The materials that are currently published as the second edition of UCSMP are, at the very least, the 7th iteration of an attempt to provide the best mathematics curriculum for our secondary schools. From the evidence we have, we think they are the best materials available for the vast majority of schools and students, but we know that, for optimal use, they must be adapted by every school and every teacher for the students you teach. We welcome the continual evaluation of our materials in our effort to develop students who are mathematically literate, capable, and comfortable in the broadest sense of these terms.

### Contact

UCSMP

1427 East 60th Street

Chicago, IL 60637

T: 773-702-1130

F: 773-834-4665

ucsmp@uchicago.edu