This completes the fraud series that began with Keeping Savages in a Cage, and continued with Fraudo the Frog?.
BackgroundIn 1981, I was handed a proposal from the Control Data Corporation, and because of my known funny head for making numbers sing, I was asked to give it a once-over. The whole operation proved to be a fraud which I foiled by writing what was referred to as "the acid drops" in 1981. Basically, CDC tried to con four government departments into buying an outdated computer-based education system called PLATO. They delivered a thick wad of "evidence" which was a complete load of garbage, as I showed, and as they say in assassins' circles, I did so with extreme prejudice (hence "acid drops").
What I found is known in general terms in fraud circles, but it has never been fully documented: that is the reason why this is now posted.
My "acid drops" paper was marked for no further distribution, but a slime-bag of my acquaintance loudly praised PLATO in 1985 at the Australian Association for Research in Education. I knew this bloke for a complete shonk who had harmed a friend of mine by stealing her credit (and to a lesser extent, my credit).
He needed to learn that this was not the best way to build your career. So the following year I delivered my clinical demolition of PLATO to AARE, hoping he would be there, but he had skulked off back to England. My paper was delivered to a small circle of cognoscenti who basically nodded, and said "we thought as much". The offer had been spurned, and nobody cared much any more,
I recently found the printed paper. The events happened more than 30 years ago, and I am applying the 30-year rule. I ran the paper through OCR the other night, and recovered it. Delightfully, in the same folder, I found the original "acid drops" paper, which I am sitting on, along with a large volume of evidence. If anybody is silly enough to even hint at legal action, they need to be aware that I can prove that some CDC people knew the claims were fraudulent but still made them. If you try to be a nuisance, I will escalate. I can do that, you can't, and if you try, expect to pay for it.
In fairness, the known crooks were in America. I am quite certain the Australian CDC people were blissfully unaware that the evaluation studies they handed us were bogus. Even an idiot would not have handed over the data they gave me, because the proof that I predicted would be there was easy to find.
I have appended one (and one only) of the smoking guns at the end.
* * * * *
The paper begins:This is the story of a meta-evaluation that was completed in 1981. The object of the evaluation was PLATO, a computer-based education system, as it was used to teach basic skills to adults, the object of this report is to show where the simplest enquiries can sometimes lead. The evaluation relied heavily on gain scores derived from standardised tests: we ought really to start with these, so that we are all talking the same language.
In the first place, we need to be sure that the standardised test used is appropriate. Preparing a standardised test is both difficult, expensive, and time-consuming. It also takes a considerable amount of time to do. Researchers commonly look around for a test which appears to ask the right sorts of questions, and which has been tried out on the same sorts of people as those under study in the research. If such a test can be found, then the researcher may report results such as "9% of the students performed at or above the 9th grade level" (most standardised test are developed in the USA.
But is a standardised test the best way of assessing the performance of an adult basic education student? Galen (1980) quotes Otto Ford (Teaching Adults to Read, 1967) as saying "Everyone agrees that an adult can be frightened away from a basic education program by testing. The informal inventory in the hands of a sensitive teacher has none of the formidability of standardised tests." The main thing to be said for a standardised test is that it is a convenient and quick method of gathering the data required for a study or evaluation.
A proper measure of reading ability would involve sitting down one subject with an experienced teacher who would watch the subject read, listen to the subject reading, and ask questions of the subject, all before making an informed decision. A standardised test simply poses a set of (usually) multiple choice questions aimed at objectives which reflect reading ability. These indicators of reading ability are then used to allow us to make the (usually safe) jump to the (usually correct) conclusion that we have direct information on the reading ability of the subject.
But this can come unstuck if the subject has been coached in those skills and those skills only which are tested in the test. In this case, the assessed reading ability (i.e., the score on the test) would be too high. Again, if we are studying "slow learners", the subject's testable skills may be exactly the ones in which he or she is having difficulty. If the subject has learned to compensate in some way for these difficulties in some roundabout way, then his or her scores will be too low: the deficiencies are still there, but they do not affect actual reading performance any more.
A gain score is calculated when a standardised test has been given twice, once before, and once after some form of instruction. To assist this, most standardised tests are available in two or more equivalent forms. The difference in grade equivalent score (or raw score, for that matter) is then calculated, and attributed to the intervention of some forms of instruction. In logicians' circles, this is known as post hoc ergo propter hoc, and held in low esteem as a form of proof. Stake goes further in his criticism:
"The testing specialist sees not one but at least four hazards attendant to the analysis and interpretation of learning scores: grade-equivalent scores, the “learning calendar", the unreliability of gain scores, and regression effects. All show how measures of achievement gain may be spurious. Ignoring any one of them is an invitation to gross " misjudgement of the worth of the instruction." (p 210)
Much of the evidence which we will consider later depends on the interpretation of gain scores. It will thus be instructive for us to consider each of Stake's objections individually.
The grade-equivalent scores objection
It often happens that a difference of one, two or three marks in raw score is equivalent to the gain typically found between one grade and the next. If this is the case, then we must be wary of gains which are due merely to chance effects, or to the acquisition of one minor skill.
The standardised test most frequently used in PLATO studies is ABLE: the Adult Basic Learning Examination. While I have not been able to obtain copies of the test itself, I have located reviews, and those are most revealing.
Hieronymus (1972?) comments in general terms about the shortcomings of Levels 1 and 2 of ABLE, detailing problem areas, and concluding "This general criticism applies to a somewhat lesser degree to all of the tests in the battery with the possible exception of reading.“
He then goes on to criticise the reading test:
“The reading tests consist of short passages, in which the last word in most sentences is missing and must be selected from three alternatives. For this reviewer, this type of reading test has some serious shortcomings. Most of the passages consist of two or three sentences interrupted by missing words. The examinee must use the context of the remainder of each sentence to select the word which best fits the context. This type of item does not recognize the multi-faceted nature of reading comprehension. No emphasis is given to such skills as generalization, discerning the main idea, evaluating the purposes, attitudes, or intentions of the writer, etc "
Fry (1969) also finds fault with the test on these grounds:
"…there are only four items which cover the grade range 5.0-5.8, while three items cover the eighth grade. Hence, I believe that the man who wrote the front page and probably the advertising copy for this test should state that Level I is most suitable for testing groups with first- and second-grade ability and Level II is suitable for students with third- eighth grade ability with much greater discrimination at the lower end.
But Fry is also critical of the Level II arithmetic test:
"…Test 4, Arithmetic Problem Solving for Level II of the ABLE has a total of twelve items which give a grade level range of 3-9. This means that the student can gain or lose 1/2 year by simply getting one more item right or wrong."
Hall (1968) has a further criticism:
"Although the examiner is told that guessing is to be encouraged and no "correction formula" is to be used, the instructions to examinees are not sufficiently explicit on this point.“ (p271)
An even more serious problem is implied by the data disclosed by Nafziger et al: (1975):
"Reliability: split-half (odd-even) reliability coefficients adjusted by the Spearman-Brown formula are reported for grade 3 of the school group (.87 for vocabulary, .93 for reading, .95 for spelling), grade 4 of the school group (.89 for vocabulary, .93 for reading, .95 for spelling), the Job Corps group (.85 for vocabulary, .96 for reading, .96 for spelling) and a group of adult basic education students (.91 for vocabulary, .98 for reading, .94 for spelling)."*
[Footnote interpolated here, it having appeared at the foot of the page: * The context of Nafziger et al. is ambiguous: coefficients quoted are probably only those for Level 1. If this is so, the problem mentioned is a problem no more. Only a study of the test can tell for sure.]
These results are outstandingly high, and may well have been obtained by having tests made up of paired items: in the absence of a copy of the test, this must remain as the most probable explanation. If this proves to be a correct surmise, then students would tend to advance by two-mark steps, giving even more rapid gains on the vocabulary, reading and spelling test than for Arithmetic Problem Solving.
A specific objection to the ABLE grade—equivalents must also be raised: they are second-hand. The grade—equivalents on the Stanford Achievement Test of an elementary school sample have been used as the basis for the ABLE grade—equivalents. Hall (1968) reports that the correlation between the Stanford Paragraph Meaning subtest and the ABLE editing Level 2 subject is .58. In the following paragraph, he comments that “the authors wisely urge that local norms be developed by ABLE users.“ (p.273)
The “learning calendar" objections
When standardised tests are administered to a norming population, this is done at one time of the year. It is then taken for granted that 0.1 grades are gained in each of the nine USA school months, with a further 0.1 grade gain over the three month summer vacation.
Unfortunately for this assumption, as far back as 1968, Beggs and Hieronymus showed that there is a distinct loss of performance on many tests of skills over the summer vacation. Losses of two grades were quite common, and the trend was rather more marked in students of lower ability. ‘This loss is obviously retrieved in the early part the new school year, and augmented by the year's growth. Any teacher who tests students at the start and end of the year should be able to show a gain of about three grades during the year.
It is instructive to ponder the possible results of this effect operating on adults who have been absent from school for some years.
The unreliability of gain scores
Stake demonstrates that when two tests have reliabilities of 0.84, and correlation of 0.81, these being typical good values (but compare them with the figures for ABLE on page 61), the reliability of the gain scores will be 0.16.
The regression objections
The phenomenon of regression to the mean has been known for a century or so, but never sufficiently widely. when things vary, there are usually two main sources of variation. There are systematic causes, such as heredity, treatment, intelligence and so on, and there are chance factors such as assignment of teachers, diet, "luck of the draw“ and so on.
Now in any test, some individuals will be at the high end of the distribution: this is because both the chance and the systematic factors have favoured them. Similarly, those at the "bottom of the pile" are there because both chance and systematic factors operated against them. If we take the "top" group and test them again, the chance factors (which are completely independent of the systematic factors) will, on average, neither advantage nor disadvantage the group.
Some will be favoured, some will suffer. But on the first test, most of them were favoured: that is how they ended in the top group. So the end result is that the "star" performers have given clear evidence of falling standards of exactly the sort that demagogues love to write. Or have they? Down at the bottom, the low group have shown an equivalent improvement. This is just the sort of growth that educational do-gooders love to clasp to their bosom and claim for their own.
We have now reached (I hope) that happy point where we may consider the claims and offers about PLATO that were laid before the educational community, confident that we have some the necessary gains of salt ready at our sides.
* * * * *
The marketing of PLATO passed in the late 70s to Control Data Corporation, and rather than just marketing the idea, CDC wished to sell programs as well as terminals and processing. The content area chosen matched a perceived need: basic skills, mostly for disadvantaged students of one sort or another.
Most of the studies seem to have involved one or more Control Data personnel, as do most of the available public documents. Through the good offices of Ms Lane Blume, Control Data Australia, I have been able to obtain a bound set of photocopies of what appear to be the papers collected by Dr Peter J Rizza, educational consultant to the Control Data Education Company at Minneapolis.
Some of these papers have authors, some do not. One is even labelled ‘NOT FOR PUBLICATION OR ATTRIBUTION" (Study 3).
The papers total more than 300 pages, are incomplete, and quite possibly out of chronological order. I can only attempt to draw selections from these and the matching public documents, in the hope that a pattern will emerge. These papers deal with the PLATO system, as it was used to present the Basic Skills Learning System, or BSLS. These are supposed to be adult materials, but are they?
In January 1979, David F. Fry, Supervisor of Instructional Systems, wrote to Rizza and commented:
“Looking at the total BSLS package from the viewpoint of an instructional developer who has been shown the advertising claims and statements, I was a little disappointed. You should tell the brochure writers not to claim "multi-media package“ when the only other media provided is [sic] at best secondary and motivational. The texts were never "prescribed" nor were the video tapes, except for the first one. In my opinion the video tapes should not be used for adults. My students were embarrassed and uneasy when viewing the tapes. I had to use them in groups because the program never referred to them. The workbooks provided practice in working the problems, but were not adequate as alternate methods of instruction. They should be rewritten.“ (p.201)
This appears to imply that the materials were originally written for children: could it be that Control Data learned to hanker after a more lucrative market? Rizza and Caldwell are quite specific about the target, but while their paper is undated (other than a non-committal “1979"), the evidence of the ERIC Clearinghouse number implies a date late in 1979 (a point which will be reintroduced later in this paper).
Rizza and Walker-Hunter, dated January 1979, and so writing before Fry's letter, say the target population may be found “in a variety of settings: adult basic education centres, correctional institutes, and unemployment lines". Here we see less emphasis on an adult-centred system. Two of the major evaluation projects which were carried out in 1978 were centred on the use of BSLS in schools in Baltimore City and Florida. The claim that BSLS was written for adults does not appear to be wholly proven.
Study 1 in the CDC papers is actually a report on two studies carried out with adult learners in Baltimore City. Most, but not all, of the students showed gains in both reading and maths. 0f the 11 students who had completed the PLATO reading course, all had gained, with a mean gain of 0.8 grades. This is a depressed estimate, as two students in the post-test had reached the Grade 9 ceiling of the Level 2 ABLE test. The 13 students who had not completed also had a gain score of 0.8 grades, also a depressed estimate, for the same reason. On average, the non-completers had completed less than two of the five units on reading. If linear growth were predicated, this would imply an overall growth of 2.2 grades. The alternative possibility is that a Beggs and Hieronymus effect is working.
In mathematics, the completers performed better than the non-completers, and the relationship was roughly linear. The completers had gained 1.8 grades, the non-completers had gained 1.2 grades with two-thirds of the work completed. As the mean entry score of the completers was 6.3, and the mean entry score of the non-completers was 4.8 (grades), this result is surprising. Caldwell and Rizza (1979) state that the approach adopted by BSLS is a mastery one. If this is so, then all performers should come out at the same level, and so the lower group should show a greater gain. Of the 27 non-completers, 5* showed losses, one showed no gain, and four showed gains of less than 0.2.
[* The numeral 5 was missing in the presented paper in the previous line, but was found in the "acid drops".]
A summary of attrition levels is fairly impressive: of 135 enrollees, 8 are described as "dropped", while another 23 left under "extenuating circumstances“ (which are not defined). This is good, although possibly attributable in part to the novelty value of computer learning. Rizza and Walker-Hunter (1979) clearly see this as a strong point: "Attendance was good; the drop-out rate was only 6 percent…".
Study 3 (there is no study 2) also looks at the Baltimore Adult Learning Centre, and was received by Peter J Rizza (according to a stamp on the title page) on ' March 23, 1979. This is after the publication of Rizza and walker—Hunter (1979), and so it is not quoted there. The main interesting feature of this study is that some of the post-test scores exceed 9.0. A footnote on each page of the results tells us that
“Post-test scores of 9.0+ were estimated at the rate of 0.15 grade-level increase for each raw point above 53“. This did not need to be done with pre-test scores, since all students over 8.5 grades have been deleted from the study, and thus probably boosting the regression effect. The gain-scores are swelled by about 10% by this approach.
Study 4 is also on the use of PLATO in Baltimore, but this time, the users were school pupils in 7th grade. On page 90—a, we read "For 107 seventh graders, who averaged only thirteen hours each on PLATO, a mean gain score of 5.7 was found. (This was a raw score gain in terms of number of correct problems out of forty.)"
The test used was the Baltimore City Proficiency Test, and it was administered to all of the city's 6th and 8th graders, who showed gains of 4.5 and 3 respectively. (“The seventh grade test was not given system wide, so seventh grade comparison figures were not available ") If this means what it says, a separate test was used on each grade, so that gain scores cannot in any way be compared. And even if the same test is used, we do not have the norms to tell us what to expect of 7th grade. The author(s) use a t-test to show that the 7th grade result is significantly different from the 6th and 8th grade.
Page 95 shows us that students who had completed more of the PLATO course had higher gain scores. The possibility that both are influenced by some other factor (mathematical ability?) is not discussed.
One thing that can be said for this study is that there is probably not a Beggs and Hieronymus effect operating: the pre-test was in November, two months into the school year. In this context, it is curious to note that
“…almost all math students spent the majority of their time working to improve whole number skills. The forty-problem proficiency test used as the measure of achievement contained only four problems that required straight-forward whole number computational skills.“
A second paper appears to refer to the same study, but there are minor differences in the numbers. There are now 96 7th graders using PLATO, and their gain score is 6.13. There is also a control group of 47 with a gain score of 7.73 (in statistical terms, this is not significant: p = .14).
Results are also available for a senior high school group. The gain score for the control group was 3.11, while for the PLATO group it was 1.67. The PLATO gain appears to have come from the improvements for a few poor individuals:
Study 5, on the other hand, is quoted (in part at least) by both Rizza and walker-Hunter and Caldwell and Rizza:
"Students at Stillwater gained an average of 1.6 grade levels in reading achievement and 2.16 grade levels in mathematics as measured by ABLE. Statistical analysis showed that gains in reading were significant even with small number of cases. (p .06)." (Rizza and Walker-Hunter, p.23).
Caldwell and Rizza supply this table:
The mathematics gain of 2.16 must also be reduced, since post-test grades of 12.4 and 10.1 appear. The best estimate becomes 1.26+.
Before leaving Caldwell and Rizza's table, it is worth quoting Park. Perhaps this explains the zero gain score for the "Fair Break" group:
“The Fair Break group all had access to terminals and the teachers were unable to provide facilities for a control group." (p.147) and “There were no controls in the Fair Break Learning Center..." (p.148).
[Interpolated comment: at this point in my presentation, I raised my eyebrows and said, very slowly, "There was no control group." My audience got it, and I guess if you have read this far, you will have got it as well.]
* * * * *
In Rizza and Walker—Hunter (dated January 1979, hence written in late 1978) we read: "At the Adult Learning Centre and the Fair Break Learning Centre, adults referred by city training programs were able to achieve measurable progress in both reading and math. Due to the lack of a control group, it was difficult to show the gains to be statistically significant." (emphasis added)
Park also undertook a similar small study at willow River, with 7 in the PLATO group and 3 controls. This produced anomalous results, and this may be why neither Rizza and Walker-Hunter or Caldwell and Rizza reported it. The PLATO students lost 0.3 grades in reading while the controls gained 0.2. The PLATO students gained 0.5 grades in mathematics while the control group gained 0.36. The total time given over to study for all students appears to have been only about six hours. One PLATO group pre-test score is stated as 9.2, but this not explained.
The Fair Break study has already been mentioned in the context of the control group that never was. The raw data make interesting reading, especially in the context of an internal Control Data memo from Peggy Walker-Hunter to Peter Rizza which is attached. A copy of Park's Table 4 is also attached.
The second paragraph tells us that Level 3 of ABLE was in fact used in the Fair Break project as a post—test, but not as a pre-test: "...it is still impossible to establish a grade level gain when the pre test is inaccurate." Again, in paragraph 3 we find that times were not recorded for the St Paul students: "...staff had to look at group records and guess at the amount of time spent in each curriculum. In some cases, it was just too difficult to determine,". No blanks appear in Park's Table 4, and the same figure (11 hours) is quoted by both Rizza and Walker-Hunter (published January 1979) and Caldwell and Rizza (1979, no month, but submitted to ERIC in late 1979, on the basis of clearinghouse accession numbers). Walker—Hunter's memo is dated 7th March, 1979.
[Interpolated comment: I was also submitting material to ERIC in 1979, and I kept meticulous records of my submissions, and as I had my (and Caldwell and Rizza's) accession numbers, I had a very good idea of submission dates. One of my submissions went off by air mail from Australia in late 1979, and their paper had a higher accession number, so it arrived later. These are the trivia that catch shonky operators out.]
In paragraph 4, Walker-Hunter writes "...there emerge only eight students with accurate pre- and post-test scores and time data in reading." (This was from a starting total of 38.) Then in paragraph 5, we read
"In view of this situation, I simply determined the average entry and exit level of the students (eliminating those with “9+" scores either pre or post) and computed the average grade level gain, ignoring time on task altogether.
|The table, taken from the AARE paper.|
There are several notable things in this quotation. The eight become fourteen, probably because time data have been ignored. And the post-test reading mean is 9.15, when all individuals over 9.0 have been deleted. "
It appears impossible to reconcile Walker-Hunter's quoted calculations with the results which are appended to her memo, or to Park's Table Four. Park's eight students would not, one would expect, have time data which are valid. There are seven students so noted in Walker-Hunter's data. Park's participant 3 appears to be Walker-Hunter's 022, and Park's 6 appears to be Walker-Hunter's 001. If this is so, then why are Park's pre-test scores given as 9.0 instead of 9.0+? Park's 8 looks a bit like 005, Park's 1 is like 009, her 5 could be 013, possibly her 7 is 002. But there are discrepancies, and the match gets worse as we proceed.
Rizza and Walker-Hunter had claimed
"Students gained an average of 1.8 grade levels in reading and 2.6 grade levels in mathematics. Both gains were statistically significant." (p.23)
This is much better than Walker-Hunter's 0.62 and 1.9, figures which do not seem to have been made available in any scholarly or promotional publication.
Interestingly, Caldwell and Rizza comment on the Stillwater and Fairbreak projects: "Each site utilized approximately twenty (20) students. .". Park's Table 2 (p.157) and Table 4 (p.159) shows that there are results for only eight (8) students in each case.
The most important point, though, is the discrepancy between the "experimental" and "control" groups in the Stillwater project. The "control" group had a significantly poorer performance on the mathematics pre-test (p = .008) than did the "experimental" group, even though the numbers were so small.
Study 7 was the work of Fairweather, which we encountered briefly. His most biting criticism was over hardware issues, but he also had doubts about the suitability of the material:
"Repeatedly, certain inmates needed convincing that the Basic Skills materials were designed for adults and that they were part of a continuum that led to the high school equivalency certificate. Although the inmates responded well to the animations the benefits of the graphics were offset by the perception that the materials were inappropriate for study by adults." (p.179)
This is one study which recognises the "...problems involved in using gain scores to evaluate a project of this sort..." (p181) but pleads that "...he did not have time to design a mancova program." (p.181).
One of the most unusual aspects of this study is that the researcher calculated a series of regression equations to fit PLATO study time to learning gains. The clear implication of these equations on page 183 (copy attached) is that one gains about one and one half grades on each of reading vocabulary and spelling before even touching the keyboard! In the case of spelling, a loss is incurred which increases with exposure to PLATO. If we ignore bizarre temporal theories, we are left with two possibilities. Geof Hawke (pers. comm.) argues that the linear regression model is probably wrong, and if it were correct, it ought to be forced through the origin. My own view is that the y-intercept indicates the operation of the Beggs and Hieronymus effect in adult learners. In view of the short term involved, natural maturation may be rejected.
Study 9 relates to remedial mathematics for college students unable to meet college requirements. An existing program, using hand-held calculators. The challenge to PLATO here was to match the results of a well-thought—out program designed to meet certain objectives which might not be found in the BSLS system. The result was that PLATO came off second-best, except in the area of ABLE word problems in arithmetic. (It has not previously been mentioned that there are two separate ABLE arithmetic scales.)
This was curious, in that the "Calculator Basic" students were drilled in word problems, while the PLATO students were drilled in computation.
This report does not offer sufficient data for any real analysis, but it appears that when pre-determined objectives are to be taught, PLATO may prove relatively inefficient.
Studies 10 and 11 relate to schools use, and have no data of note.
In conclusion, at the time of my study, there was not one study which compared PLATO with an equally expensive traditional system. There was not one study in which a properly controlled comparison took place. There was not one study which was written up, complete with data, in the professional literature. And there was not one valid study showing PLATO to be better than traditional approaches. The potential is there, but I do not believe that it has yet been realised.
Beggs, Donald L., and Hieronymus, Albert N., Uniformity of growth in the basic skills throughout the school year and during the summer. Journal of Educational Measurement 5(2), 1968, 91-97.
Caldwell, Robert M and Rizza, Peter J. A Computer Based System of Reading Instruction for Adult Non-readers. ED 184 554, 1979.
Control Data Corporation: Basic Skills Learning System: Evaluation Report: May 1979. (No other details supplied.)
Fairweather, Peter (1978): See Control Data Corporation.
Fry, Edward 8., untitled review, excerpted in Buros, O.K., The Seventh Mental Measurements Year book. New Jersey: The Gryphon Press, 1972.
Fry, David F. (1979); See Control Data Corporation.
Galen, Nancy: Informal Reading Inventories for Adults: An Analysis, Lifelong Learning: the adult years, 3(7), 1980, 10-14.
Hall, James N., The Adult Basic Learning Examination. Journal of Educational Measurement, 5(3), 1968, 271-274
Hieronymus, A. N. Review of Levels 1 and 2, Adult Basic Learning Examination in Buros, O.K., The Seventh Mental Measurement Yearbook. New Jersey: The Gryphon Press, 1972.
Nafziger et al. Tests of Functional Adult Literacy: an Evaluation of Currently Available Instruments. Portland, Oregon: Northwest Regional Education Laboratory, 1975.
Park, Rosemarie ( ): See Control Data Corporation.
Rizza Peter J. and walker-Hunter, Peggy, New Technology Solves an Old Problem: Functional Illiteracy. Audiovisual Instruction 24(1), 1979, 22-23, 63.
Stake, Robert E. Measuring What Students Learn, in House, Ernest R. (ed.) School Evaluation: The Politics and Process, Berkeley: McCutchen Publishing Corporation, 1973, pp. 193-223.
Walker-Hunter, Peggy (1979): See Control Data Corporation.
The above material is a lightly-edited and annotated version of a paper delivered to AARE in 1986 in Melbourne. Tables are taken from original material in my possession.
Disclaimer:This text was converted from the paper read to AARE using OCR, and in a late stage of checking, the phrase "post hog ergo propter hog" was detected. After a struggle with my conscience (I decline to say who won), I amended this. I remain uncertain that the initial version was not more apposite, and I suspect this may well be the view of the majority of those who have read my account of such an inept and fraudulent evaluation.
By the way, if you were involved and you are thinking of taking legal action, this is just a small sample of what you will have to justify. I have left your name out, for now, but that doesn't mean I don't know it. If you take action in any way whatsoever, to annoy me, I will mount a truth and public benefit defence and name you.
The choice is yours, the pleasure will be mine.
Here is a table that will show you what went down, and as a sample of what I hold. If you know your numbers, this shrieks. If you don't, look at the average of the reading pre-test scores, look at the alleged gain scores of participants 1, 2 and 4. This was either incompetent or fraudulent, and letting the raw data out shows gross stupidity.