HOW TO MEASURE THE MACMILLAN COMPANY NEW YORK • BOSTON • CHICAGO • DALLAS ATLANTA - SAN FRANCISCO MACMILLAN & CO., Limited LONDON • BOMBAY • CALCUTTA MELBOURNE THE MACMILLAN CO. OF CANADA, Ltd. TORONTO HOW TO MEASURE BY G. M. WILSON, Ph.D. m PROFESSOR OF VOCATIONAL EDUCATION AND DIRECTOR OF THE SUMMER SESSION IOWA STATE COLLEGE OF AGRICULTURE AND MECHANIC ARTS AMES, IOWA AND KREMER J. HOKE, Ph.D. DEAN AND PROFESSOR OF EDUCATION COLLEGE OF WILLIAM AND MARY WILLIAMSBURG, VIRGINIA FORMERLY SUPERINTENDENT OF PUBLIC SCHOOLS DULUTH, MINNESOTA Nrfu fforfc THE MACMILLAN COMPANY 1920 All rights reserved Copyright, 1920, By THE MACMILLAN COMPANY. Set up and electrotyped. Published November, 1920. Nortnooti p«sa J. S. Cushing Co. — Berwick & Smith Co. Norwood, Mass., U.S.A. PREFACE The present volume on educational measurement is dom- inated by two main ideas, first, that the work in measurement should be handled more and more by the individual class- room teacher; and second, that the chief purpose to be served by standard tests is the diagnosis of pupil ability and pupil difficulties. When standard tests were first devised and dur- ing the experimental stage, it was well to leave their adminis- tration largely in the hands of experts. But now that the value of such tests has been fully demonstrated, it is important that all teachers master the technique of scientific testing, and that courses in educational measurement be included as a necessary training for all teachers. Standard tests are a new and valuable educational tool or set of tools. No teacher is fully equipped who has not mastered their use. The value of standard tests for diagnostic purposes is now generally recognized. Diagnostic tests are replacing others not well adapted for such purposes. This means that the tests are used to locate pupil weaknesses in order that such weaknesses may be corrected. The individual child thus becomes the center and object of the work. It is not school systems as such but children that are important. That we have so quickly, in the use of standard tests, come to recognize that the child is the true center and the true object of consid- eration, is an indication that to-day as never before the spirit of progress and service is dominating and determining all educational effort. The purpose of this volume is not a critical evaluation of all the available tests on different subjects, but a treatment of those tests which on account of their use, purpose and adapt- V VI Preface ability have been found to be most serviceable to the class- room teacher. To this end classroom results from the use of certain tests and the teachers’ reactions to them expressed in their own words have been freely used. The work will serve as a handbook for the classroom teacher and also as a text for use in teacher training classes in high schools, normal schools, and colleges, or as a basis for reading circle work. The authors have drawn freely from the many available sources and they acknowledge the many courtesies, little and great, extended by authors of standard tests and by cooperat- ing teachers and superintendents. While the present work is issued as a completed volume, it is expected that revisions will be necessary, since standard tests have undoubtedly not reached their final development. It will be the purpose of the authors to keep the work up to date by as frequent revisions as may be necessary in order to provide teachers with the latest and best essential data on educational measurements. G. M. Wilson Kremer J. Hoke Ames, Iowa, January i, 1920. CONTENTS CHAPTER PAGE I. The New Attitude toward Measurement i II. The Measurement of Spelling 5 III. The Measurement of Handwriting .... 23 IV. The Measurement of Arithmetic .... 58 V. The Measurement of Reading no VI. The Measurement of English Composition . .156 VII. The Measurement of Drawing 181 VIII. The Measurement of Other Grade Subjects . . 192 IX. The Measurement of High School Subjects . . 212 X. The Measurement of General Intelligence . .229 XI. Statistical Terms and Methods 254 XII. The Teachers’ Use of Scales and Standardized Tests 263 Appendix 279 Index 283 VII HOW TO MEASURE CHAPTER I THE NEV ATTITUDE TOWARD MEASUREMENT When Dr. J. M. Rice, a little more than twenty years ago, published his studies applying scientific measurement to the results of teaching, there was a storm of protest by educators from one end of the country to the other. It was apparent that the educational leaders of the country were not ready to follow Dr. Rice’s lead. The present movement started with studies of a somewhat different nature, such as Thorndike’s notable study on “The Elimination of Children from the School” and studies by Strayer and Elliott upon school costs. The application of scientific methods to these phases of education was received with more favor by educators, and the emphasis was gradually shifted to the measure- ment of subject matter through the use of scales and standardized tests. Thus, after two decades, Dr. Rice’s viewpoint was accepted and his methods improved upon. It was fortunate that the first scale for the measurement of subject matter was the one in handwriting. The value of this scale became apparent immediately, and by degrees standards for grade attainment in speed and quality were set up. These standards were first developed by practical school men through the actual use of the scale.1 1 Wilson, G. M., “TheHandwriting of School Children,” Elementary School Teacher, n : June, 1911, pp. 540-543, and Freeman, Frank W., “Some Practical Studies of Handwriting,'’Elementary School Journal, 14: Dec. 1913, pp. 167-179. Partly reported by J. F. Bobbitt in the Twelfth Yearbook of the National Society for the Study of Education, Part I, 1913, pp. 7-96. 1 2 How to Measure Even so, when it was proposed at the Philadelphia meeting of the department of superintendence in 1913 that a com- mittee on school efficiency be appointed, there was vigorous opposition. The proposal was merely for the appointment of a committee, yet a decision required a standing vote and carried by a majority of only one. The next year, at the Rich- mond meeting of the department of superintendence, it was surprising to note the change in sentiment. The growth that may take place with an individual in a single year is well illustrated by the remarks of Super- intendent Ben Bluett, of the St. Louis public schools. At the Philadelphia meeting, in his usual sincere and thorough way of thinking, he was very much disturbed that a group of young men should propose the measure- ment of “childhood,” “mother love,” and other intangible elements of the educative process. There was, in fact, never any intention of trying to measure these elements, but such terms were used by the opposition, and it was Ben Bluett’s impassioned appeal against such procedure which had much to do with the large vote against the proposal for the appointment of a committee on measurement and school efficiency. A year later it was generally agreed that the feature of the Richmond meeting was Ben Bluett’s confession. He had been made a member of the committee appointed at Philadelphia. He had met with this committee, fifteen in number, several times during the year, and had studied the question earnestly with the other members of the committee. He had begun to realize the significance of the movement and had secured the cooperation of Dr. Withers of the St. Louis College for Teachers in applying some of the tests in the St. Louis schools. The loyal, sincere, whole-hearted manner in which Ben Bluett acknowledged his lack of under- standing of the movement a year before, and his thorough conversion to the advantages of the movement, swept away whatever opposition there may have been in the Richmond The New Attitude toward Measurement 3 meeting. From that time forward the progress of the move- ment has been only a question of ways and means, and better adaptation to secure the desired results. Even the school survey movement, that phase of the school efficiency movement which has been most feared by superintendents because of its frequent use by an opposition to discredit the work of the schools, has entered upon new life and has become an integral part of the American public school system. It must not be assumed, however, that the work in measure- ment in the public schools has been perfected. It has passed the first stages. Leaders are convinced. Useful scales and tests have been developed. The technique of formulating a test has been further perfected and the value of a scientific test is better understood. In some respects we have entered the second stage of measurement. We have come to the point of discriminating between good and bad tests. Already a few standardized tests have been discarded. We are now quite surely approaching a third stage of development, and that is the stage in which the tests shall be thoroughly weighed and judged as to the fundamental considerations of curricula making involved, whether they are or are not testing desirable school products, and whether their use will or will not lead to better methods of teaching and better selection of subject matter. In this stage the standard tests will be used more and more for the diagnosis of the weaknesses of individual pupils, more and more in testing the efficiency of methods of teaching. It is in this third stage that the rank and file of the teaching profession are necessarily involved. If the tests are to be of service, not merely as a general measure of the efficiency of a school system, but also of service to the teacher and for the pupils in the schoolroom, then it becomes necessary that the individual teacher shall master the details for actually using the tests in her own schoolroom. This is not too much to expect if a 4 How to Measure man well beyond sixty, as was Superintendent Bluett, could approach this movement with an open mind and accept its benefits after a year of conscientious study. That teachers are interested and keen to master the accumulated knowledge with regard to measurement is more and more apparent. Hence this effort is made to bring to- gether the various contributions on the subject in form for use by the teacher. It is true, of course, that we shall make slow progress in educating the entire teaching profession until teachers become a trained body of professional educa- tors with permanent tenure. But for this it were unwise to wait. In the meantime, may we not expect that any one who has accepted the responsibilities of the teaching profession will consider that she owes it to herself and to her pupils to master the details of using scales and standard- ized tests for the measurement of subject matter? CHAPTER II THE MEASUREMENT OF SPELLING There are at present several spelling tests available. Before deciding on which one to select for use, it will be well to consider what should be tested in spelling.1 It appears that a person needs to spell only when he writes. People are therefore good spellers, for all social purposes, when they spell correctly the words which they use in their written work, such as writing letters, articles, club papers, compositions, school exercises, business notes, and the like. Manifestly the words used under such circumstances are the foundation words of the English language. The first require- ment of a test in spelling, therefore, is that it be based upon the common fundamental words of the English language. What to Test. — Much progress has been made in deter- mining the fundamental words in the English language. Dr. W. Franklin Jones, at the University of South Dakota, studied the writing vocabulary of grade pupils by analyzing the words in the composition work of 1050 pupils residing in four different states. The work was so managed as to lead pupils to cover all the various fields of experience, and so exhaust the words in their several vocabularies. The pupils continued to write until new words ceased to appear in their compositions. In all, 75,000 themes were secured, consisting 1 The teacher who wants further help on the value of measurement in edu- cation should take time to read Chap. XII before proceeding with the present chapter. The teacher unfamiliar with statistical terms will need to consult Chap. XI as terms occur in this and succeeding chapters. For the practical uses of the spelling tests, see the last section of this chapter, beginning on page 19. 5 6 How to Measure of a total of 15,000,000 words. Dr. Jones spent eight years collecting and scoring these data. When completed, it was found that a total of only 4532 different words had been used by all these pupils. The largest single vocabulary consisted of 2812 words, the vocabulary of an eighth grade girl. The result of this study was to give a list of words which accurately represents the fundamental words used by school children. Apparently, it contains also the funda- mental words of the English language. Other studies have been made. One of similar character, which has led to the formation of a spelling scale, was conducted by Dr. Leonard P. Ayres. Dr. Ayres examined a total of 368,000 words written by 2500 different persons. This was a summary of previous studies. The first of these studies included in all about 100,000 words taken from standard literary selections. The second was an analysis of 250 different articles which appeared in four Sunday news- papers published in Buffalo. The third consisted of the tabulations of 23,629 words from 2000 short business letters. The fourth consisted of some 200,000 words taken from the family correspondence of 13 adults. The Ayres study has the advantage of being based upon the words used by adults, and if we assume that the schools must prepare for active social participation on the adult level, then certainly Dr. Ayres’ study would be above criticism from the standpoint of determining the fundamental words of the English language in common use for writing purposes. The Jones study and the Ayres study are in complete agree- ment as to the simplicity and small compass of the writing vocabulary. Any adequate test must be based upon the words of the language that are in common use and fundamental in written work. The Ayres Scale. — In undertaking to form a scale for testing the spelling of school pupils, the first thing which MEASURING SCALE FOR ABILITY IN SPELLING SECONR. 6RA r .SECOND THIRD . GRADE‘S .THIRD GRADE FOURTH . GRADE^ .FOURTH GRADE FIFTH . GRADE^ GRADE SIXTH. GRADE^ .SIXTH GRADE SEVENTH. GRADE^ .SEVENTH EIGHTH GRADE - EIGHTH r6RADE me do and go at on a it is she can see run the in so no now man ten bed top he you will we an my up last not us am good little ago old bad red of be but this all your out time may into him today look did like six boy book by have are had over mu6t make school street say come hand ring live kill late let big mother three land cold hot hat child ice play sea day eat sit lot box belong door yes low soft stand yard bring tell five ball law ask just way get home much call long love then house year to I as send one has some if how her them other baby well about men for ran was that his led lay nine face miss ride tree sick got north white 6pent foot blow block spring river plant cut song winter stone free lake page nice end fall feet went back away paper put each soon came Sunday show Monday yet find give new letter take Mr. after thing what than its very or thank dear west sold told best form far gave alike add seven forget happy noon think sister cast card south deep inside blue post town stay grand outside dark band game boat rest east son help hard race cover fire age gold read fine cannot May line left ship train saw pay large near down why bill want girl part still place report never found side kind life here car word every under most made said work our more when from wind print air fill along lost name room hope same glad with mine became brother rain keep start mail eve glass party upon two they would any could should city only where week first sent mile seem even without afternoon Friday hour wife state July head story open short lady reach better water round cost price become class horse care try move delay pound behind around burn camp bear clear clean spell poor finish hurt maybe across tonight tenth sir these club seen felt full fail set stamp light coming cent night pass Bhut easy catch black warm unless clothing began able gone suit track watch dash fell fight buy stop walk grant soap news small war summer above express turn lessoa half father anything table high talk June right date road March next indeed four herself power wish because world country meet another trip list people ever held church once own before know were dead leave early close flower nothing ground lead such many morning however mind shall alone order third push point within done body trust extra dress beside teach happen begun collect file provide sight stood fix bom goes hold drill army pretty stole income bought paid enter railroad unable ticket account driven real recover mountain steamer speak past might begin contract deal almost brought less event off true took again inform both heart month children build understand follow charge says member case while also return those office great Miss who died change wire few please picture money ready omit anyway except aunt capture wrote else bridge offer suffer built center front rule carry rhiiin death learn wonder tire pair check prove heard inspect itself always something write expect need thus woman young fair dollar evening plan broke feel sure least sorry press God teacher November subject April history cause study himself matter use thought person nor January mean vote court copy act been yesterday among question doctor hear size December doxen there tax number October reason fifth eight afraid uncle rather comfort elect aboard jail shed retire refuse district restrain royal objection pleasure navy fourth population proper judge weather worth contain figure sudden forty instead throw personal everything rate chief perfect second slide farther duty iDtend company quite none knew remain direct appear liberty enough fact board September station attend between public friend during through police an til madam truly whole address request raise August Tuesday struck getting don’t Thursday spend enjoy awful usual complaint auto vacation beautiful flight travel rapid repair trouble entrance Importance carried I0S3 fortune empire mayor wait beg degree prison engine visit guest department obtain family fnvor Mrs. husband amount human view election clerk though o'clock support does regard escape since which length destroy newspaper daughter answer reply oblige sail cities known several desire nearly sometimes declare engage final temble surprise period addition employ property seleet connection firm region convict private command debate crowd factory publish represent term section relative progress entire president measure famous serve estate remember either effort important due include running allow position field ledge claim primary result Saturday appoint information whom arrest themselves special women present action justice gentleman enclose await suppose wonderful direction forward although prompt attempt whose statement perhaps their imprison written arrange forenoon lose combination avenue neighbor weigh wear entertain salary visitor publication machine toward success drown adopt secure honor promise wreck prepare vessel busy prefer illustrate different object provision according already attention education director purpose common diamond together convention increase manner feature article service Injure effect distribute general tomorrow consider against complete search treasure popular Christmas interest often stopped motion theater improvement century total menfien arrive supply assist difference examination particular affair course neither local marriage further serious doubt condition government opinion believe system possible piece certain witness investigate therefore too pleasant guess circular argument volume organize summon official victim estimate accident invitation accept impossible concern associate automobile various decide entitle political national recent business refer minute ought absence conference Wednesday really celebration folks meant earliest whether distinguish consideration colonies assure relief occupy probably foreign expense responsible beginning application difficulty scene finally develop circumstance issue material suggest mere senate receive respectfully agreement unfortunate majority elaborate citizen necessary divide principal testimony discussion arrangement reference evidence experience session secretary association career height organization emergency appreciate sincerely athletic extreme practical proceed cordially character separate February immediate convenient receipt preliminary disappoint especially annual committee decision principle judgment recommend allege AH the words in each column are of approximately equal spelling difficulty. The steps in spelling difficulty from each column to the next are approximately equal steps. The numbers at the top indicate about what per cent of correct spellings may be expected among the children of the different grades. For example, if 20 words from column H are given as a spelling test it may be expected that the average score for an entire second grade spelling them will be about 79 per cent. For a third grade it should be about 92 per cent, for a fourth grade about 98 per cent, and for a fifth grade about 100 per cent. The limits of the groups are as follows: 50 means from 46 through 54 per cent; 58 means from 55 through 62 per cent; 66 means from 63 through 69 per cent; 73 means from 70 through 76 per cent; 79 means from 77 through 81 per cent; 84 means from 82 through 86 per cent; 88 means from 87 through 90 per cent; 92 means from 91 through 93 per cent; 94 means 94 and 95 per cent; 96 means 96 and 97 per cent; while 98,99 and 100 per cent are sepa- rate groups. By means of these groupings a child’s spelling ability may be located in terms of grades. Thus if a child were given a 20 word spelling test from the words of column 0 and spelled 15 words, or 75 per cent of them, correctly it would be proper to say that he showed fourth grade spelling ability. If he spelled correctly 17 words, or 85 per cent, he would show fifth grade ability, and so on. Russell Sage Foundation, New York City Division of Education Leonard P. Ayres, Director The data of this scale are computed from an aggregate of 1,400,- OOO spellings by 70,000 children in 84 cities throughout the country. The words are 1,000 in number and the list is the product of combin- ing different studies with the object of identifying the 1,000 common- est words in English writing. Copies of this scale may be obtained for five cents apiece. Copies of the monograph describing the inves- tigations which produced it may be obtained for 30 cents each. Address the Russell Sage Foundation. Division of Education, 130 East 22d Street, New York City. The Measurement of Spelling 7 Dr. Ayres did was to determine the words which were most fundamental. The 368,000 words of his study were made up largely of repetitions. Fifty different words were repeated so frequently that they made up approximately half of the entire list. Dr. Ayres had fixed upon 1000 words as the number which he should select. In order to get the 1000 words, he finally took all words which had been repeated as many as 44 times in the entire study. The next step was to arrange the different words according to difficulty, in order to secure a graded test, or, in other words, a spelling scale. To determine the relative difficulty of the words in the 1000 list, Dr. Ayres arranged to have the words spelled by school pupils. Fifty lists of 20 words each were constructed, and the words included in these lists were pronounced to the pupils of the various grades in the middle of the school year in the schools of 84 cities scattered through- out the United States. The data secured from these tests gave a total of 1,400,000 spellings by 70,000 school children. On the basis of these data, the 1000 words were divided into 26 groups according to difficulty. This will be understood by reference to the scale. (See scale inserted herewith.) Group “ A ” consists of “ me ” and “do,” and these words were spelled by 99% of the second grade pupils. At the other extreme, Group “ Z,” consisting of “ judgment,” “ recommend,” and “ allege ” were spelled by only 50% of the eighth grade pupils. The scale is simple, and easily understood. At the top of each column is shown the average per cent of the words spelled by each grade, except that report is not made upon any grade for per cents below 50. The blank spaces to the left, however, if filled in, would indicate in each case 100%, — that is to say, the eighth grade pupils spelled all of the words correctly from columns “ A ” to “ N ” inclusive. Giving a Test. — A good test should be so difficult that no pupil in the grade will make a perfect score, and sufficiently 8 How to Measure easy that most pupils in the grade will secure a fairly satis- factory score. In selecting words, therefore, to test the spelling ability of a particular grade, it would be well to choose the words spelled correctly by about 70% of the children of that grade. If pupils in the third grade were being tested, the best test would result from the use of words selected from column “ L.” A test, in order to be valid for individual pupils as well as for the group, should consist of at least 20 words. A smaller number of words would be equally valid for an entire school system, but the teacher will desire to know the standing of individual pupils, and so will need to use 20 words for the test. If 40 words were used, the results would be more reliable for individuals. The tabulations of the scale are based upon tests given by the column method. This is the usual method of dictating words for pupils to spell by writing in columns. The Cleveland Survey shows that the returns from testing by this method differ very little from returns secured when the words are used in context. Other studies show that the contextual method (including words in complete sentences, the entire sentence being written) gives a slightly lower score. It is recommended, therefore, that teachers test by the column method. All that is necessary is that the pupils be given sufficient time to write a word before proceeding to the next word. The teacher should also be accommodating in re- pronouncing a word when necessary, in order to have it understood. Pronounce the words clearly, but do not sound them phonetically, or inflect them so as to aid the pupils in spelling. Give the meaning of words that sound like words with a different meaning and spelling. In case of difficulty in understanding a word, the best way to explain it is to use it in a simple sentence. Scoring the Papers. — If there were 30 pupils in the third grade class above referred to, that would give a total of 600 spellings. Suppose that of these 600 spellings, 480 were The Measurement of Spelling 9 correct. Then 80% of the words were correctly spelled. Referring now to column “ L ” of the scale, it will be observed that the class, as a whole, is 7% above the standard of third grade pupils in the 84 cities which formed the basis for the scale. They are at the same time 8% below the standard for fourth grade pupils. Suppose that a particular child in the grade has spelled 17 words out of the 20, — that would mean a grade of 85%. This is better than the class average and only a little below the standard for the fourth grade. In the same way, the standing of each pupil in the grade may be determined. In order to see at a glance the condition of her class, the teacher will find it worth while to arrange the scores for her grades in a distribution somewhat as follows: Table i . — Distributed Spelling Scores for 30 Third Grade Pupils. . Standard 73 Grade: Date: Score . . . 40 45 50 55 60 65 70 75 80 85 90 95 100 No. of pupils 1 1 2 4 5 7 6 3 1 This table means that one pupil made a score of 55, one a score of 60, two a score of 65, four a score of 70, etc. This distribution emphasizes the needs of particular pupils. If the teacher of this particular third grade class can, by special work with the one pupil at 55, the one at 60, the two at 65, and the four at 70, bring these pupils up to the grade’s stand- ard, she will have a very satisfactory situation. One of the advantages of the Ayres spelling scale is its simplicity and the ease with which it can be used. Because it contains the fundamental words of the language and the words on which the pupil should place his attention, the changes which it effects in the character of the spelling work How to Measure 10 will be entirely in the right direction. To the extent that it does thus direct the attention to the proper kinds of words, we may expect that scores in particular cities will rapidly become higher than those indicated on the Ayres scale. This fact is indicated by the returns from the use of the Ayres scale in Boston, after considerable attention had been given by the teachers to the proper selection of word lists. Dr. Ayres himself has recognized this possible limitation, closing the discussion of his spelling scale with the following words: “ In all such testing, it must be remembered that the present scale or any scale for measuring spelling attainment will become increasingly and rapidly less reliable for measuring purposes as the children become more accustomed to spelling these particular words. In proportion as these lists are used for the purposes of classroom drill, the scale will become untrustworthy as a measur- ing instrument. Probably the scale will have served its greatest usefulness in any locality when the school children have mastered these 1000 words so thoroughly that the scale has become quite useless as a measuring instrument.” Other Tests. — While it is recommended that the grade teacher use only the Ayres scale in testing her pupils as in- dividuals and her room for comparison with other rooms within the city or elsewhere, many teachers, and especially superintendents, will desire at least some information con- cerning other lists which have been used as spelling tests. The most notable of these are the Buckingham extension of the Ayres scale, the Iowa Spelling scale, the Buckingham scale, the Rice test, the Starch test, the Courtis Spelling test, the Boston Minimum list, and Jones’ One Hundred Demons. Buckingham's Extension of the Ayres Scale. — Dr. Bucking- ham’s extension of the Ayres scale (first available in 1919) consists of the addition of 505 words chosen on the basis of agreements among spelling books. The words are added, for The Measurement of Spelling 11 the most part, to the upper end of the Ayres scale. This increases the number of words in the columns at the upper end of the scale and also extends the scale six steps to the right. The added words are not offered as constituting a fundamental vocabulary in the same sense as were the original 1000 words selected by Ayres. In using this ex- tension, therefore, teachers should keep in mind that the added words have less value from the standpoint of social utility than the 1000 original words of the scale. The addition of these words, however, makes it possible to use the scale more extensively in upper grades and high school. It should be of particular value in testing the spelling efficiency of the pupils in the high school who are specializing in commercial studies. The Iowa Spelling Scale. — This scale includes 2977 words from the written correspondence of Iowa people. Accuracy of each word was determined on the basis of 200 or more spellings by children in each grade. Thus, more than 650,000 spellings were used in each grade, or a total of nearly 4,750,000 in the seven grades. In all essential features the scale is an imitation of the Ayres scale. The placing of the words is determined in practically the same manner and the form of the scale is similar. It has decided value, however, as showing the possibility of basing the spelling work directly upon the words of a particular section of the country. The scale is published in three parts in order to reduce the error in the placement of words. Part 1 is a scale for grades 2, 3, and 4; part 2, a scale for grades 4, 5, and 6; and part 3, a scale for grades 6, 7, and 8. The large increase in the number of words makes the scale particularly valuable for individual testing. The Buckingham Scale. — The work of Dr. Buckingham in evaluating a list of 50 words, has to date proved of value chiefly in calling attention to the importance of the proper selection of word lists, the difference in the difficulty of words, 12 How to Measure and the methods to be used in the further study of words for spelling lists. The scale first appeared in 1913, and apparently has not come into general use in school testing and school survey work. The Ayres scale, which made its appearance a little later, is so convenient and so satisfactory that it has been extensively used by superintendents, bureaus of efficiency, and survey committees. The fifty words resulting from the Buckingham study are given herewith, in the order of their difficulty. These words vary in difficulty by even distances, so that the scale, as it appears, is a step scale. Theoretically it should be used in such a way as to determine how far up the scale a pupil can spell successfully. It can be used in grades three to eight. Dr. Buckingham, in deriving the scale, pronounced all of the words to the children in contextual form. In view of other studies which have been made, it appears that they could be used in column form with results slightly varying and equally satisfactory for comparative purposes. Although not in general use, the scale is mentioned because of the high quality of the scientific work involved in its formation. It has not been evaluated in terms of grade achievement. However, Dr. Buckingham is working on an extension of his scale. In time, he expects to extend it to include 1000 words and evaluate it in terms of grade achievement. Buckingham’s Fifty Words Arranged in Order of Difficulty 1. only 2. even 3. smoke 4. chicken 5. front 6. another 7. lesson 8. bought 9. pretty 10. nails 11. butcher 12. Tuesday 13. sure 14. answer 15. nor 16. raise 17. cousin 18. beautiful 19. touch 20. freeze 21. forty- 22. instead 23. wear 24. tailor The Measurement of Spelling 13 25. trying 26. minute 27. pear 28. towel 29. tobacco 30. whole 31. button 32. janitor 33. quarrel 34. against 35. circus 36. sword 37. whistle 38. stopping 39. carriage 40. guess 41. telephone 42. choose 43. telegram 44. saucer 45. saucy 46. already 47. pigeons 48. beginning 49. grease 50. too The Rice Test. — It was Dr. J. M. Rice, in his Forum articles of 1897, who first began the work of attempting a definite measurement of spelling. He gave three different tests, the number of children examined reaching nearly 33,000. The first test consisted of 50 words pronounced by the teachers for written spelling in the usual manner. The words used in this test were the following: furniture chandelier curtain bureau bedstead ceiling cellar entrance building tailor doctor physician musician beggar plumber superintendent engine conductor brakeman baggage machinery Tuesday Wednesday Saturday February autumn breakfast chocolate cabbage dough biscuit celery vegetable scholar geography strait Chicago Mississippi Missouri Alleghenies independent confectionery different addition division arithmetic decimal lead steel pigeon Dr. Rice had some question as to the value of word lists for spelling work, recognizing that spelling was useful only as a means for recording or communicating thoughts. This is the same point which we now recognize in different form; viz. that only the written vocabulary needs to be mastered for spelling purposes. 14 How to Measure In line with this thought, Dr. Rice gave a second test to more than 13,000 children. This test contained 50 words placed in composition form. The following sentences were used, the underscored words forming the basis of the test: “While running he slipped. I listened to his queer speech, but I did not believe any of it. The weather is changeable. His loud whistling frightened me. He is always changing his mind. His chain was loose. She was baking cake. I have a piece of it. Did you receive my letter? I heard the laughter in the distance. Why did you choose that strange picture? I thought I liked it. It is my purpose to learn. Did you lose your almanac? I gave it to my neighbor. *1 was writing in my language book. Some children are not careful enough. Was it necessary to keep me waiting so long? Do not disappoint me so often. I have covered the mixture. He is getting better. *A feather is light. Do not deceive me. I am driving a new horse. *Is the surface of your desk rough or smooth? The children were hopping. This is certainly true. I was very grateful for my elegant present. If we have patience we shall succeed. He met with a severe accident. Sometimes children are not sensible. You had no business to answer him. You are not sweeping properly. Your reading shows improvement. The ride was very fatiguing. I am very anxious to hear the news. I appreciate your kindness, I assure you. I cannot imagine a more peculiar character. I guarantee the book will meet with your approval. Intelligent persons learn by experience. The peach is delicious. I realize the importance of the occasion. Every rule has exceptions. He is thoroughly conscientious; therefore I do trust him. The elevator is ascending. Too much praise is not wholesome.” (The fourth and fifth year test ends with: “This is certainly true.” The higher test includes all the sentences except the four marked with an asterisk.) The third test given by Dr. Rice, and the one which he considered really more valid than any of the others, was a The Measurement of Spelling 15 composition test based upon a picture and a story told by the teacher. This test was valuable particularly in that it required pupils to choose their own words and to spell them. The results on the third test are not tabulated by Dr. Rice, but we do have some tabulations on the first and second tests and the averages are given herewith. Table 2. — Rice Tests — 1895 Grade Average First Test (Column List) Average Second Test (Context) 4 53-5 64.2 5 64-3 75-1 6 75-6 70.4 7 8i. 78.8 8 84.2 84.4 Since Dr. Rice gave these tests to a sufficiently large number of pupils, the teacher may accept the averages given above as norms or standards of performance, and by comparison with them may determine the spelling ability of her own pupils. The chief objection to the Rice list is that the words are not evaluated, and do not form a scientifically constructed scale. The words are given uniform values, but are far from being uniform in difficulty. In the Ayres and Buckingham scales, the words are assigned values according to difficulty. In the Rice test, a pupil gets as much credit for spelling an easy word as he does for spelling a difficult word. The Starch Test. —Any one making use of the Starch test in spelling will do it with quite different purposes in mind than those for which he uses the Ayres scale. The words were secured by taking the first defined word on the even-numbered pages of the 1910 edition of the New International Dictionary. Proper names, technical words, and obsolete words were dis- carded from the list. The list, thus reduced to 600 words, 16 How to Measure was arranged alphabetically according to the size of the words. These were then divided into six lists of 100 words each by assigning words in turn to the six lists. A test is made by using one of these six lists, which are assumed to be of equal difficulty as lists. By using words selected at random from the entire English language, Starch proposes to test general spelling ability, and his tests will be found to be of service in the grammar and high school grades, provided the test is not permitted in turn to exercise an influence upon the teacher in determining the materials of the spelling lessons. The influence of the Starch test is surely in the direction of the old “ spelling grind ” described by Rice. The Starch lists contain such words as the following: nunciature quarantinable conterminous photosphere anthropometric imperturbation Such words are manifestly not suitable for use with grade pupils. The Boston Minimum List. — The Boston School Document No. 8, 1914, contains a minimum spelling list of 840 words. They are well selected, and similar in many respects to the Ayres list. However, they have not been evaluated for use as a standard test. The document containing this list, and a supplementary list of 2525 words, is no longer available except in libraries of departments of education. It is of interest chiefly in showing the tendency to get away from the old type of speller which contained 10,000 to 15,000 words, selected with little regard for use. The California list1 is similar to the Boston list and is constructed along similar lines. It is of value for curriculum making in spelling, but not for testing. Jones' One Hundred Demons. — Dr. Jones has given a list of the 100 words most often misspelled by pupils in written 1 Bulletin No. 7, Chico State Normal, Chico, California. The Measurement of Spelling 17 work, as shown by his study involving the tabulation of 15,000,000 words. This list he has designated as the “ spelling demons.” The list has been widely used for testing, but to date it has not been sufficiently evaluated in terms of grade standards, although Dr. Jones promises such evaluation in the near future. The list appeals to children because of its simplicity, and its known difficulty. If a pupil thoroughly masters this list of “ demons ” he will very probably correct the spelling of most of the words which he has been mis- spelling. Dr. Jones did not find any pupil among the 1050 who missed as many as 100 words, 87 being the largest list for any one pupil. The list of “ spelling demons,” together with their relative difficulty as shown by preliminary tests which Dr. Jones has summarized, follows herewith: Frequency of Misspelling of the Jones’ ioo Demons which 321 their 316 there 296 separate 283 hear 280 here 278 said 275 been 273 says 273 they 271 some 270 any 268 Wednesday 266 done 263 know 263 read (“red”) 261 piece 260 don’t 258 break 257 tear 255 meant 247 just 245 many 245 too 243 Tuesday 242 knew 237 lose 236 week 235 can’t 234 grammar 234 whole 231 wear 230 every 228 instead 228 built 225 blue 224 shoes 224 won’t 221 wrote 220 cough 217 minute 210 busy 209 two 208 much 206 enough 206 seems 205 none 203 does 203 easy 202 would 200 whether 200 loose 198 could 196 ready 196 beginning 195 heard 195 country 194 business 194 ache 192 answer 191 often 185 writing 184 doctor 182 very 182 though 181 among 179 sure 179 tonight 174 forty 172 since 172 once 170 raise 169 trouble 168 choose 168 color 167 dear 166 truly 166 early 166 used 165 friend 164 18 How to Measure February 255 laid 252 straight 251 through 250 half 250 where 2x6 write 216 buy 212 believe 212 coming 212 making 190 always 188 hour 187 tired 187 sugar 185 again 164 hoarse 162 guess 162 women 161 having 158 The Pupil’s Own List of Misspelled Words.—The final test of spelling is a gradual decrease in the pupil’s own list of misspelled words. A necessary precaution in this connection is that pupils should not consciously avoid good words because they do not know how to spell them. They should be taught to use the dictionary instead of replacing good words by simpler words which they are able to spell. If every child is told to keep a list of his own misspelled words and to build up a spelling consciousness with the aid of the dic- tionary, and if he is urged constantly to extend his vocabulary and to study the choice of words in order to get appropriate and accurate expression, a pupil’s spelling in regular written work may be considered as the best and the final test of spelling. At stated intervals, a pupil should be encouraged to go over 8 or 10 pages of his written material and determine carefully the number of misspelled words. The teacher can help the child in doing this. But for the teacher to do it without the child’s help has been in general the mistake of the past. In proportion as the number of misspelled words decreases, the,child is improving in spelling. While this test is not scientific, we can conceive of teachers making it even more valuable than scientific tests as they are frequently used. We do know that the time which a pupil spends upon his own list of misspelled words involves no lost effort; and that his spelling improves in the same pro- portion that this list is reduced. Indiscriminate drill in spelling, as indicated in the Butte, Montana, survey, must be replaced by attention to the needs of individual pupils. There were 278 of the Butte children, or over 18% of the total, who made scores of less than 60%, although the total The Measurement of Spelling 19 score for the city was 10.3% above the Ayres standard. Much time had been spent upon indiscriminate drill. The Practical Uses of a Spelling Scale. — Teachers will find a spelling scale of very great use in their regular school work, aside from any supervisory use which the superintendent may make of the tests given. Tests administered under uni- form conditions and with a scientifically constructed scale per- mit the teacher to compare one class with another very accu- rately. If the fourth grade teachers in a city system would agree among themselves to give a test on a certain day, they could then come together after the papers had been scored and find out, first of all, which room was doing the best work. This would be shown not only by the median score, but also by the total distribution which shows the number of pupils at lower as well as at higher levels. After the teachers have agreed that a certain one of the fourth grade rooms has made, all told, the best score in the test, a second question naturally arises; namely, what method was used in securing these results with your children ? This question suggests the second use which the teacher may make of the scale. She can test out different methods in her own room, or the particular group of fourth grade teachers to which we have referred may separate their rooms into groups of approximately equal ability and assign different methods for different groups. Then, at the close of a given period, — one, two, three, or six months, — they may again give a test and so determine which methods are most effective. If the teachers have been wise they have determined in great detail how the methods were to be applied and the amount of time to be devoted to the spelling work, so that the one thing which is upon trial is the method of presenting the work; such, for instance, as the column method, the contextual method, the method of studying at home or in the seat and then testing in class, the method of teaching in class with very little testing, and various other methods. 20 How to Measure The above paragraph suggests a third point which teachers may try out by the use of a scientific scale; namely, the amount of time which can profitably be devoted to spelling. Dr. Rice, in his discussion of the spelling grind in 1897, showed that the time element had very little to do with results. We now know that this was because of the character of the spelling lists. When the words used in the spelling work with children are unintelligible to them, the results will be poor, regardless of the methods and the time devoted to the work. But if we assume words with correct social values, then the Ayres scale may properly be used for determining the amount of time which can be spent upon the spelling work with greatest profit. A fourth use of the spelling scale has been suggested in asking the teacher to make a distribution of the grades. This use is to locate the spelling ability of individual children. By doing this, the teacher will probably find in her classes a small number of pupils who spell so well that it is unnecessary to require them to submit to any regular spelling drill. If such pupils are excused from spelling drill, being told merely to attend to their own misspelled words and to use the dic- tionary when in doubt, and if the teacher finds in future testing that these pupils do not lower their scores, then she may feel that she has saved their time for other more valuable work without detriment to them, so far as spelling is con- cerned. At the other end of the scale, however, will be pupils who spell very poorly, and it is only by use of the scale that these pupils can be located with any degree of accuracy. Taking these pupils as individuals, or as groups according to their several needs, the teacher can work in a definite manner, giving additional time to some pupils without boring others, and really follow out the injunction of William Hawley Smith to “ put the oil where the squeak is.” It is quite probable that this result of. the use of the scale in spelling, as in writing, will in time become one of its most valuable contributions. The Measurement of Spelling 21 Some pupils will make low scores in their spelling work because of the lack of general intelligence; others, because of the lack of an adequate vocabulary, which can come only from reading; others because their attention has never been directed to the difficulties of words, etc., etc. The teacher will know that she is working at the problem in a definite manner, and that she is working only with the pupils who need attention. This she has known more or less before in a general way, but the use of a scientific scale permits her to know it beyond peradventure of a doubt. It is not the purpose of the present work to discuss methods of spelling. The teacher is directed to other works dealing specifically with this problem.1 The teacher will do well, however, to make her spelling work as specific as possible, both as to words and pupils. Many words spell themselves and require no attention, others are very difficult for large numbers of pupils. It is not only necessary to locate the words, but to analyze each word to see in what the difficulty consists. In short, drill which is general and blind must become specific and intelligent. The discussion throughout has directed the attention of the teacher to the Ayres scale. Some teachers may properly ask if other scales may not at times be used to advantage. There will be no harm done in using other scales and the teacher may learn to use the Buckingham scale very effec- tively. The Jones’ “Demons” have the advantage of being the words most frequently missed by school pupils. They are common words, and it is safe to assume that every pupil in the upper grades should study these words until he can spell the entire list without a mistake. In general, however, the Ayres scale is the one to use, for reasons which have been previously stated. There is this caution only, and that has 1 Freeman, Frank N., “The Psychology of the Common Branches,” Houghton Mifflin Company; Suzzallo, Henry, “The Teaching of Spelling,” Houghton Mifflin Company; Cook and O’Shea, “The Child and His Spelling,” Bobbs- Merrill Company, Indianapolis, 22 How to Measure been anticipated by Ayres himself; namely, that as the scale is used more and more with the same pupils, a teacher should expect that gradually the scores will become higher. This, however, is quite satisfactory, since the words are of the right kind and since, by using the scale, the pupil’s attention has been turned from unfamiliar, useless dictionary words to the words which he will use in his own work. BIBLIOGRAPHY 1. Ayres, Leonard P., “The Spelling Vocabularies of Personal and Business Letters,” Division of Education, Russell Sage Founda- tion, New York City. 2. Ayres, Leonard P., “A Measuring Scale for Ability in Spelling,” Division of Education, Russell Sage Foundation, New York City. 3. Buckingham, B. R., “Spelling Ability. Its Measurement and Distribution,” Teachers College, Columbia University, New York City. 4. Des Moines Annual School Report, 1915. Section on “Spelling.” 5. Jones, W. Franklin, “Concrete Examination of the Material of English Spelling,” University of South Dakota, Vermilion, South Dakota. 6. Pryor, Hugh Clark, “A Suggested Minimal Spelling List,” Chap. V, Part I, Sixteenth Yearbook of the National Society for the Study of Education. 7. Rice, J. M., “The Futility of the Spelling Grind,” Forum, Vol. 23, pp. 163, 409. 8. Studley, C. K., and Ware, Allison, “Common Essentials in Spell- ing,” Bulletin No. 7, State Normal School, Chico, California. 9. Wallin, J. E. W., “Spelling Efficiency in Relation to Age, Grade, and Sex, and the Question of Transfer,” Warwick and York, Baltimore, Maryland. 10. The teacher or supervisor who is interested in the more intricate problems of establishing a spelling standard is referred to the following recent articles: Ballou, School and Society, 1917, Vol. S, pp. 267-270: Ballou, Educational Administration and Super- vision, 1915, Vol. 1, pp. 469-472 ; Kallom, Educational Adminis- tration and Supervision, 1917, Vol. 3, pp. 539-542. 11. Ashbaugh, Ernest J., “Iowa Spelling Scale,” Extension Bulletin, Nos. 43, 54, and 55, University of Iowa. CHAPTER III THE MEASUREMENT OF HANDWRITING The writing supervisor had given Wilbur a grade of 95. Wilbur was dissatisfied. When the supervisor next came to the building, Wilbur made known his dissatisfaction, and asked why his grade was not higher. The supervisor answered that 95% was a good grade, that she never gave 100%, and that there was opportunity for him to further improve his work. Wilbur answered that he had received 95% from the fourth grade up, and he knew that he was writing much better than in any previous grade. The supervisor had no conclusive or satisfactory argument. She resorted to her authority as teacher, and left Wilbur still dissatisfied. What teacher has not had a similar experience with reference to the grade in writing? This situation is rapidly changing in the public schools. Writing can be definitely measured, and the ratings can be made so accurately that the pupils themselves fully under- stand and appreciate that exact justice has been done. This has been brought about by'the development of scales for the measurement of handwriting. If a teacher has not been accustomed to make use of scales and standardized tests in her work of grading, she would do well to begin with the subject of writing. Writing is one of the mechanical subjects and one of the most easily and quickly measured. In order to avoid confusion on her part, she should study and practice scientific measurement in this subject alone until she has become reasonably proficient. It will be well for the teacher to read through a large num- ber of the works mentioned in the bibliography at the close 23 24 How to Measure of this chapter, and as a beginning in this work, particular attention is called to numbers 1 and 2. The first scale in handwriting was developed by Dr. E. L. Thorndike, of Teachers College. It is based upon general merit in handwriting as determined by the judgment of a large number of competent graders. Thorndike’s scale is widely used at the present time, and many think that it gives more satisfactory results than any other. It had, originally, the disadvantage1 of being mechanically in- convenient, and for that reason the Ayres scale has become much more widely used. The Ayres scale consists of twenty-four samples of writing, eight each of vertical, semi-slant, and full slant style. The scale is arranged on a heavy sheet of paper 9" high and 36" wide, in the form of the following diagram: 20 3° 40 S° 60 70 80 90 A . B . C . It is so convenient in form that it may be placed in the schoolroom, where pupils may compare their handwriting with it at any time. This is desirable, and it is recommended that every schoolroom in which there are intermediate and upper grade pupils should have a copy of the Ayres scale available for pupils as well as for teachers. (See pp. 28-35.) What to Measure. — Ordinarily the teacher will measure only two elements in handwriting; namely, speed and quality. By speed is meant the number of letters written per minute. By quality is meant general merit, or what the teacher indi- cates when she gives a grade in writing. Speed is determined 1 A defect since remedied in large measure. The Measurement of Handwriting 25 by simply counting the number of letters written during a given time and reducing to the one-minute basis. It is quality or general merit which is measured by the use of the writing scale. These terms are relatively simple, and their significance will appear during the further discussion. It is just as well for the teacher to begin by giving a regular test, and in this manner to apply herself to the work of master- ing the details of grading and evaluating papers in handwriting. Giving the Test. — In order to make the test valid for comparative purposes, uniform conditions must prevail. The rules of the game are simple, and the teacher should follow them carefully, since it is only in this way that valuable comparison will be made possible. The directions for tests in handwriting are so generally standardized at the present time that comparison is possible, not only within the class, but one room with another and even one school system with another. The invariable aim is to secure results in such form as to make them easily comparable with like results obtained elsewhere. The rules are as follows: 1. The copy must be simple enough for second grade pupils. While it is not necessary to use the same copy each time, it should be similar in difficulty. A copy which has been much used is the line: “ Mary had a little lamb.” Others have used the entire first stanza of this selection. Another copy which has been used is “ Sing a song of sixpence, a pocket full of rye.” The idea is to have a simple, easily understood copy, which will not deter the pupil in his speed test. Some tests have been given with copy which was too difficult, making the results in speed unsatisfactory for com- parative purposes. 2. Before the test is given, the copy should be memorized by all of the pupils. The purpose of the test is to determine speed and quality of handwriting. If the pupil must stop and think, he falls behind in speed. In one survey a rather difficult copy was placed in the hands of the pupils. They 26 How to Measure were instructed to write the copy, repeating the same during the period of the test. The results were so unsatisfactory that speed was not reported upon by the survey committee. In addition to having the copy committed, it is a good plan to place the same upon the blackboard at several different places, so that any pupil who does happen to forget for a moment may reassure himself by a glance at the copy. 3. The time for the test should be exactly two minutes. In order to make sure that all pupils start together, it is well to rehearse the details before actually starting the test. This makes sure that all pupils understand, clears away any con- fusion, and so secures the test papers in reliable form. 4. Everything should be in readiness for the test before the pupils begin. This means that every pupil must have paper, a good pen, ink, and the copy committed. In order to make sure that all have pens, it is well to ask every pupil in the room to hold up the pen (or pencil, if used in second or third grade). Since the teacher will want to use the results of the test for the benefit of individual pupils, it is well at this point to place certain items at the head of the paper. The usual items are — name, grade, building, city, and date. If for any reason it is desired to make the test impersonal, these items may be omitted, or placed on a separate card with a number scheme as a key. 5. When all is ready, the teacher gives some simple direc- tions. “ Write as well as you can at your usual speed, using the following copy: ‘ Mary had a little lamb.’ Write the copy again and again until I say ‘ stop.’ At the command, stop at once, even if in the middle of a letter.” After this explanation has been given the teacher says, “ All in position. Dip the pens. Pens up. Begin.” 6. In exactly two minutes, pupils should be given the order to stop, and required to place their pens on the desk. 7. At this point the teacher may save herself considerable work by having the pupils count the number of letters in the The Measurement of Handwriting 27 copy. It is suggested that pupils place this number below the copy to the right, using pencil for the same, and then divide the number by two, thus reducing the score to a one- minute basis, as 2)146. The papers may then be collected 73 in the usual manner. Scoring for Speed. — The speed is calculated in terms of the number of letters written per minute. The test is given over a two-minute period in order to reduce the error. Some examiners have used other units, as three or four minutes, but evidence is not at hand that the results have been improved. In the first report upon speed in handwriting,1 two minutes was made the basis of the test, and this unit has quite generally been used in later tests. The practice is common, also, of reducing to the one-minute basis, thus making comparison easy. The speed measurement is secured by counting the letters in the pupil’s copy and dividing by two. Although the pupils have been asked to count the number of letters, the teacher should carefully check the results. The teacher may reduce her work by knowing the total number of letters in the copy used, multiplying by the number of repetitions of the full copy, then adding the extra letters. Suppose a particular pupil has written the copy, “ Mary had a little lamb,” eight times, and has written the first three words the ninth time. The teacher in figuring the number of letters will multiply 18 by 8 which gives her 144 and then add the number of letters in the three words, — “ Mary had a,” namely, eight. This gives a total of 152 letters. Dividing by 2 she gets the pupil’s score, 76 letters per minute. In case the teacher gets a result different from the pupil’s result, the same should be placed in the lower right-hand corner, the pupil’s figure being crossed out. This completes the scoring of the papers for speed. 1 Wilson, G. M., “The Handwriting of School Children,” Elementary School Teacher, n : pp. 540-543. This is the first known attempt to fix a standard for speed in handwriting. 28 How to Measure Fig. i. — Ayres handwriting scale_(pp. 28-35). The copy shown herewith is the so-called Gettysburg edition. The Measurement of Handivriting 29 How to Measure 30 The Measurement of Handwriting 31 How to Measure 32 The Measurement of Handwriting 33 34 How to Measure The Measurement of Handwriting 35 36 How to Measure Scoring for Quality. — The teacher will be surprised how quickly she can learn to grade papers by using the Ayres scale. While it is helpful to have a demonstration and some practice in a teachers’ meeting, this is not at all necessary, and the teacher who is patient and willing can train herself very quickly to use this scale and to secure satisfactory results. The teacher should give herself preliminary drill of at least an hour or two. If this drill is divided into half hour periods, and continued during a considerable part of a week, the teacher will become reasonably uniform in grading papers, and will feel competent to score the papers from the test in her room. At this point it would be well for her to consult an expert, in case one is available. This expert by a little observ- ing and advising will correct any marked defect, — such as a uniform tendency to grade too low or too high. In the absence, however, of a teacher, a supervisor, or a superintendent, in the system, who can give this expert help, a teacher need not be deterred. She can master the details, working entirely alone. Directions for grading a sample, while not uniform, have in mind the common object of helping the teacher to locate the specimen on the scale which most nearly corresponds in merit with the pupil’s copy. Apparently the best way to do this is to glide the pupil’s copy back and forth underneath the scale, comparing it with one sample after another in the scale until a decision is reached as to which sample most nearly corresponds with the pupil’s copy. The teacher will fre- quently have difficulty, and especially where the pupil’s copy is better, for example, than 50 on the Ayres scale, but not as good as 60. Some scorers recommend the use of intermediate units in such cases, permitting the teacher thus to indicate 54, 56, or whatever the proper value may appear to be. Practice on this point varies. If the number of papers to be scored is not too large, intermediate values may be used. The score for quality when determined upon should be placed in the upper right-hand corner of the paper. The Measurement of Handwriting 37 Recording the Scores. — From the beginning the teacher should acquire the habit of distributing her scores, showing both speed and quality on a single sheet. This will be found exceedingly helpful. Table 3, which follows herewith, shows such a distribution for a sixth grade. By reference to this, it will be seen that of the 33 pupils in the grade, 2 are writing at quality 20 (see totals at the bottom of the sheet), 4 at quality 30, 5 at quality 40, 8 at quality 50, 8 at quality 60, 5 at quality 70, and 1 at quality 80. The middle1 score on the basis of quality will fall therefore in the group of 8 at 50 and this is noted below as the median quality. Table 3. — Distribution of Scores for a Sixth Grade 20 30 40 So 60 70 80 90 Totals for Speed I- 20 . . . . 21- 30 . . . . 1 1 2 31- 40 . . . . 1 I I 1 4 41- So • • • • 1 1 2 2 X 7 51- 60 . . . . 1 1 2 3 1 8 6l- 70 ... . 1 2 2 2 7 71- 80 ... . I 1 1 3 8l- 90 ... . 1 1 91-100 .... 1 1 101-120 .... I21-140 .... I4I-160 .... l6l-l8o .... l8l-200 .... Totals for Quality 2 4 5 8 8 5 1 33 Median Quality — 5° Median Speed — 56 The totals for speed are indicated in the right-hand column. It is observed that the median speed falls between 51 and 60. In this particular case, however, the teacher has determined 1 See explanation of middle score, or median, p. 261. Since there are 33 papers, the middle score in this case will be that of the 17th paper from either end. 38 How to Measure the exact median for speed, and it is recorded below as 56. To determine the exact median for speed all that is necessary is to arrange the papers in order, from lowest to highest on the basis of speed, then count in to the middle paper. In this particular case the middle paper would be the 17 th one from either end, and it appears that the 17th one had a speed of 56 letters per minute. Standard Scores. — With the scores fully tabled the teacher’s next question naturally is, “ How does the writing of my pupils compare with others, and what are the standards? ” She wonders if sixth grade pupils should show a range in quality from 20 to 80, and if a median quality of 50 is too low. In speed she notes that they are distributed from less than 30 to nearly 100. This means that some of the pupils are writing three times as rapidly as others. How rapidly should they write? So far as known this question was first raised only six years ago, and at that time a tentative standard for speed was indicated on the basis of results from a single city system. Table 4. — Standards of Speed 1 Grades 1 2 3 4 5 6 7 8 I. Cleveland 60 70 76 80 2. Kansas City (May, 1915) . S3 64 69 76 76.5 3- Denver Survey .... 36 So 54 63 66 69 4- South Bend (May) . . . 33 48 63 77 82 93 105 5- Freeman’s 56 cities . . . 3i 44 5i 59 63 68 73 6. Brookline 76 87 90 98 7- Newton 73 85 94 102 8. Missouri Training Schools . 80 92 92 102 9- Iowa, 33,569 children . . 29 39 So 62. 6S 73 75 76 Now, however, it is possible to indicate a standard based upon results obtained from all parts of the country, and to 1 Decimals largely omitted. The Measurement of Handwriting 39 indicate rather definitely how well pupils in any particular grade should write. Table 4, given herewith, shows the median attainment in speed for Cleveland, Kansas City, Denver, South Bend, fifty-six cities combined, Brookline, Newton, the Missouri Training Schools, and over 33,000 Iowa children. From this table it will be seen that sixth grade children from different parts of the country are averaging from 63 up to 92 letters per minute. It should be noted, however, that the 82 for South Bend is a May average and was secured by special attention after a test given earlier in the year had shown the need for improvement. It is apparent, then, that the particular sixth grade shown in Table 3 is quite definitely below standard, if we take as a standard the per- formance of other sixth grade children throughout the country. In this connection, it may be well to note two proposed standards made by men who have given considerable thought and attention to the subject. Table 5. — Standards eor Speed in Handwriting Grades 2 3 4 5 6 7 8 Freeman 36 48 56 65 72 80 QO Starch 31 38 47 57 65 75 83 Tables 4 and 5 will give plenty of opportunity for com- parison with actual performance and with proposed standards, to enable the teacher to judge of the writing in her own room. It appears that the median speed of 56 for her sixth grade is lower than the sixth grade median of any system appearing in Table 4, and indicates that the teacher should increase the speed of writing in this particular grade. She should at least aim to reach 63, the average of Freeman’s 56 cities, the average also for Denver and the lowest sixth grade median appearing in Tables 4 or 5. How to Measure 40 Standards for Quality. — In measuring quality for com- parative purposes it is necessary to use one of the standard scales of handwriting. Not all studies in the measurement of handwriting have made use of the Ayres scale, but Table 6, given herewith, shows several returns in the Ayres scale and will permit comparison. Table 6. — Quality in Handwriting (Ayres) Grades 1 2 3 4 5 6 7 8 Brookline . 44 46 47 49 Cleveland 45 48 50 55 Denver 26 31 38 43 5i 57 Newton 48 5i 50 53 South Bend (May) .... 45 49 49 49 53 56 54 Missouri Training Schools 4i 42 45 47 Iowa median 28 36 40 44 49 52 57 61 Freeman, 56 cities .... 44 47 50 55 59 64 70 It will be observed from this table that quality in hand- writing for the sixth grade has ranged from 42 in the Missouri Training Schools to 59 in the 56 cities reported by Freeman. It appears therefore that the particular sixth grade reported in Table 3, is writing better than the sixth grade pupils in the Missouri Training Schools, Brookline, Cleveland, and Denver, but not so well as those in South Bend, Newton, Iowa, or Freeman’s 56 cities. The standards proposed by Freeman and Starch for quality are likewise given herewith: Table 7. — Standards of Quality in Handwriting Grades 2 3 4 5 6 7 8 Freeman 44 47 5° 55 59 64 70 Starch 27 33 37 43 47 53 57 The Measurement of Handwriting 41 It will be observed that the particular sixth grade writes better than the standard indicated by Starch, but not so well as the standard indicated by Freeman. Social Standard of Writing. — In attempting to set up standards, there is one danger which school people are likely to encounter, and that is the danger of considering writing as a school exercise, wholly apart from the social and business demands of life outside the school. In the last analysis it is this latter which should determine the proper standards. While it is difficult to get at the standards required by society, there are at least some evidences of social standards of hand- writing. Dr. Ayres has constructed a special handwriting scale for the Municipal Civil Service Commission of New York City. On the basis of this scale, the Commission con- siders that applicants pass in handwriting if they make a grade corresponding to quality 40 of the Ayres public school scale. Where handwriting is a special requirement a grade equal to quality 50 is required. These standards are lower than the Freeman standard for the sixth grade, and correspond fairly well with the Starch standard. However, sixth grade pupils will be in school two years longer, and under the present regime will write and continue to improve their writing for two years. This naturally raises the question as to whether the school standard for handwriting is not an artificial one, whereas it should be based directly upon the demands of society. There is additional evidence on this matter, as reported on page 24 of the First Iowa Elimination Report, as follows: “ One hundred graduate students of Teachers College wrote at a median quality less than 50. Three hundred Indiana teachers in Perry, Green, and Ripley Counties wrote at median quali- ties less than 50. One hundred inquiries for help received by the Social Service Bureau of New York City showed a median quality less than 50. One hundred applications for positions ranging from $10 a week to $5000 a year, received by the 42 How to Measure Social Service Bureau of New York City, showed a median quality of 60. Signatures on ioo bank checks showed a median quality of 41. 256 signatures on a hotel register showed a median quality of 41.1.” It appears from the above that the adult social standard is fully satisfied by a quality of 50 for practically all purposes. Even in the case of appli- cants for positions, where there is a special incentive for good writing, the median rises only to 60. On the basis of social usage, therefore, it appears that a quality of 60 on the Ayres scale should be accepted as satisfactory for any grade of school work, and that when pupils have attained a quality of 60, with reasonable speed, they should be excused from further writing drill unless a pupil voluntarily chooses to continue. It will be observed from Table 6 that most 7 th and 8th grade medians fall between 50 and 60. A quality of 60 therefore appears reasonable and attainable for upper grades. A higher standard except for special commercial positions would be artificial and unreasonable. What should be accepted as a reasonable speed from the standpoint of society has not been determined in any authori- tative manner. It is quite probable that a speed of 60 or 70 letters per minute is sufficient to meet almost any situation. It would seem, therefore, that a teacher who brings her pupils to a quality of 60 and a speed of 60 has prepared them to meet the handwriting demands of society. Many pupils, because of special interests or superior abilities, will prefer to go above this, easily meeting the extreme social demands where handwriting of superior quality is required. Remedial Instruction. — When the sixth grade teacher has distributed her scores as shown in Table 3, and has decided what should be considered a reasonable standard in speed and quality for sixth grade pupils, her next question is how to remedy the situation for the pupils who are below standard in speed and quality. Studies have indicated that merely extending the time for the writing work will not solve the The Measurement of Handwriting 43 problem. In fact, there is much evidence that children write too much and fall into careless habits for that reason. The story of how to remedy the defects is a long one, and will not be taken up fully in this discussion. The teacher is referred to other sources, particularly to the “ Teaching of Hand- writing,” by Frank N. Freeman. There are certain phases of the work of remedying defects, however, which have been subjected to definite measurement. Freeman has constructed a series of writing scales or charts, based upon the most common defects of the pupils’ writing. These scales or charts deal respectively with — i, Uniformity of slant; 2, Uniformity of alignment; 3, Quality of line; 4, Letter formation; 5, Spacing. Each chart contains three qualities of excellence, illustrating good, average, and poor qualities of handwriting from the standpoint of the characteristic dealt with in the particular chart. The teacher who is especially interested in writing, and especially the writing supervisor, will find it worth while to make use of Freeman’s analytical charts. By carefully selecting samples of the pupils’ writing she can for her own use make up charts similar to the Freeman charts, thus having available for showing to the pupils samples that illustrate desirable and undesirable features under uniformity of slant, uniformity of alignment, etc. Table 8, given herewith, should prove especially helpful, as it indicates the causes for the various defects. The teacher and pupil should work together in applying this table to the pupil’s writing. If a pupil is writing with too much slant, the teacher will do well to study the pupil in the light of the five suggested causes. It may be a matter so simple as having the paper in the wrong position — and so with other defects. It is a matter of studying the situation with the particular pupil, analyzing the defect, finding the cause, and helping the pupil to apply the remedy. 44 How to Measure Table 8. — Analysis of Defects in Writing and Their Causes1 Defect Causes i. Too much slant. . . . (1) Writing arm too near body. (2) Thumb too stiff. (3) Point of nib too far from fingers. (4) Paper in wrong position. (5) Stroke in wrong direction. 2. Writing too straight . . (1) Arm too far from body. (2) Fingers too near nib. (3) Index finger alone guiding pen. (4) Incorrect position of paper. 3. Writing too heavy . . . (1) Index finger pressing too heavily. (2) Using wrong pen. (3) Penholder too small diameter. 4. Writing too light . . . (1) Pen held too obliquely or too straight. (2) Eyelet of pen turned side. 5. Writing too angular . (3) Penholder too large diameter. . (1) Thumb too stiff. (2) Penholder too lightly held. (3) Movement too slow. 6. Writing too irregular . . (1) Lack of freedom of movement. (2) Movement of hand too slow. (3) Pen gripping. (4) Incorrect or uncomfortable position. 7. Spacing too wide . . . (1) Pen progresses too fast to right. (2) Too much lateral movement. The teacher may find it advisable to extend the list of defects, and this can doubtless best be done by making use of the analytical score card for handwriting, developed by Dr. C. Truman Gray of the University of Texas. It is indicated herewith, Figure 2. Dr. Gray’s score card is in many respects more complete than the detail of defects listed by Dr. Free- man. The teacher will do well to enlist the pupil fully in the attempt to improve his writing. For the most part the pupil 1 F. N. Freeman’s “ The Teaching of Handwriting ” in the Riverside Educational Monographs, page 72, published by Houghton, Mifflin Company. By special permission of the publishers. The Measurement of Handwriting 45 Figure 2. — Standard Score Card for Judging Handwriting (Devised by C. Truman Gray) Pupil Age Date Grade School Teacher Perfect: Score Score for Each Sample I 2 3 4 s 6 7 8 9 IO II 12 13 14 is etc. 1. Heaviness . . . 2. Slant Uniformity . . . Mixed 3. Size Uniformity . . . Too large .... Too small .... 4. Alignment . . . . 5. Spacing of lines . . Uniformity . . . Too close .... Too far apart . . 6. Spacing of words . . Uniformity . . . Too close . . /-. . Too far apart . . 7. Spacing of letters . . Uniformity . . . Too close .... Too far apart . . 8. Neatness Blotches . . . . Carelessness . . . 9. Formation of letters . General form . . Smoothness . . . Letters not closed . Parts omitted . . Parts added . . . Total score . . 3 5 7 8 9 11 18 13 26 8 6 5 5 2 100 46 How to Measure simply knows that his writing is poor. He doesn’t know why it is poor, and he is given no help in applying proper remedies. If he realizes, for instance, that it is a question of slant, or of uniformity in spacing, or uniformity in height, or neatness, — that is, if he can be made to place his attention upon some particular defect and work toward the correction of that defect, he can feel that he is working toward some definite end and not merely drilling aimlessly upon writing. The teacher’s business here is to teach, not to scold, not to find fault. The teacher may not find it advisable to use the Gray score card, so far as actually scoring the pupils’ work is con- cerned, but she can use it along with Freeman’s suggestions in discovering with the pupil the defects which need remedy- ing. In time the teacher may be able to construct a chart showing letter defects similar to Freeman’s, but made up en- tirely from work of her own pupils. Freeman’s chart1 shows the correct form of a letter, together with the usual defects. It will help to furnish an answer to the pupil’s “ Why,” when he asks why he was marked down in writing. All pupils appreciate being treated with consideration and given an opportunity of doing a reasonable amount of thinking in connection with their work. Locating the Individual. — The discussion under remedial instruction shows the necessity of locating the individual. It is suggested that the teacher be not satisfied with the distribution as indicated in Table 3, but go a step farther, placing in the names of the particular pupils, as in Table 9. This will individualize the work, and will also make it more intelligible to the children. Raising the score in quality for her room then becomes a question not of blind unintelligent drill, but a question of improving the work of John, Mary, Jane, William, etc. In fact, taking the particular sixth grade as an example, and accepting quality 60 as the standard, it is observed that 14 of the pupils are already writing satisfactorily. 1 The Teaching of Handwriting, page 135. The Measurement of Handwriting 47 From the standpoint of speed, 12 are writing above 60 and it is possible that some of the 8 writing between 51 and 60 are on a satisfactory basis. This analysis of the situation limits the teacher’s efforts to particular pupils, and enables her to apply her instruction where it is most needed. It also eliminates useless drill. At least two of the pupils writing at quality 60 or above are below in speed. These are Jeanette and Mark. Four others, Grace, Lily, Henry, and David, are also below in speed or just on the line. Four who are satis- factory in speed are below in quality. These are Bruce, Ruth, Bert, and Thomas. The eight to the right and below the heavy lines are satisfactory in speed and quality, and further drill by them may be left to choice. If this plan were generally followed in school systems, a large amount of effort would be released in handwriting alone, for applica- tion along other needed lines. Table 9. — Distribution of Scores for a Sixth Grade Quality 20 30 40 50 60 70 80 90 Total for Speed (Speed) 1—20 . . 21- 30 . . John Mary 2 31- 4° • • Jane Orie Kate Mark 4 41- 50 . . William Luther Sarah Carrie Jeanette 7 Epsie Hazel SI- 60 . . Wilber Bertha Joe Grace David 8 Paul Lily Henry 6l- 70 . . Bruce Ruth Eldon Bess 7 Bert Ina Frank O 00 1 M Thomas Mildred Doris 3 81- 90 . . Helen I 91-100 . . Jacob 1 101-120 . . Totals for Quality . . 2 4 S 8 8 5 1 33 48 How to Measure Proportion of Children at Standard Quality.—Figure 3, given herewith, shows a distribution of upper grade pupils in Cleveland, Ohio. Computation shows that 3303 of the Fig. 3.— Number of pupils writing at each quality from 20 to 90. Data from 10,528 pupils in four upper grades (Cleveland Survey, p. 70, “ Measuring the Work of the Public Schools ”). 31.3 % at 60 or above. children, or a total of 31.3%, were writing at quality 60 or above. The Springfield, Illinois, survey showed that 33.3% of the upper elementary grade pupils were writing at 60 or above. In the Butte, Montana, survey 23.8% of the pupils in grades 2 to 8 were writing at quality 60 or above. In Kansas City, in 1915, 16.4% of all pupils were writing at quality 60 or above. In the three upper grades in Kansas City these percentages were as follows: Fifth grade — 25.1% at quality 60 or above. Sixth grade — 39-7% at quality 60 or above. Seventh grade — 48.4% at quality 60 or above. The Measurement of Handwriting 49 This means that in the seventh grade in the Kansas City schools, practically half of the children were writing at a satisfactory standard of quality, and should have been excused from further drill. These figures taken from city reports and surveys make it evident that many upper grade pupils should properly be excused from further writing drill, and that our illustrative sixth grade throughout this chapter is quite representative in its distribution of writing ability in an intermediate or upper Fig. 4. — Speed records of 36 sixth grades, Cleveland. grade. The procedure recommended for this grade should be applied quite generally. Some teachers, however, may want to require pupils to reach and maintain in all written work a quality somewhere above 60, even as high as 70, before excusing them from further drill in writing. Some How to Measure 50 pupils who are excused from drill may prefer to continue until a higher standard is reached. The Writing of an Entire School System.—Above in Figure 3, the writing scores for an entire school system above the third grade are thrown together into a single distribution. There are various ways in which these data for a school sys- tem may be used to advantage, the following being particu- larly useful: i. A grade in one building or part of the city may be compared with the same grade in other buildings or parts of the city. Figures 4 and 5 show this detail in median Fig. 5. — Quality records of 36 sixth grades, Cleveland. speed and quality for the sixth grade of the Cleveland schools. Any teacher may locate her particular grade in these distributions, and so see its rank in terms of the median scores. 2. A city may be compared with another city or with an established norm or standard. Table io will aid in this work. The Measurement of Handwriting 51 It is valuable as a means of showing quickly and forcefully the relative standing of the city in question. Here Cleve- land is compared with 12 other cities in speed and quality, as follows: Table 10. — Speed and Quality in Handwriting, Cleveland, Ohio 12 other cities . . Cleveland . . . Average Speed 5 th grade 57 62 6th grade 65 69 7th grade 75 73 8th grade 83 78 12 other cities . . Cleveland . . . Average Quality 5 th grade 43 45 6th grade 47 48 7th grade 53 50 8th grade 57 55 The Freeman standards have been much used for such comparisons. 3. After building scores and medians have been ascertained, a superintendent of a school system may desire a total city summary. Table n, following herewith, permits comparison of the writing in any particular building with writing in the other buildings and the city average. This will be particularly interesting and stimulating to teachers, principals, and super- visors. In speed, Cleveland excels the other cities in grades five and six, but is below in grades seven and eight. However, there is some evidence that Cleveland is more nearly right in the matter of speed in writing than is the average of the twelve other cities. Likewise in quality, grades five and six of the Cleveland schools do better than the average of the twelve other cities, but grades seven and eight are below the average of the twelve cities. 52 How to Measure Table ii.1—Distribution of Median Scores in Quality of Penman- ship by Schools and Grades. (Salt Lake City)2 Grade in IV V VI VII VIII Emerson School 9.6 9-5 12.5 10.9 12.4 ii-3 Forest School 9-3 10.4 10.2 9.9 11.9 13.2 Grant School 8.2 10.1 10.9 10.9 10.4 Hamilton School 11.9 10.1 n-5 12.9 12.5 Jackson School 10.7 10.7 9.9 10.5 11.4 13- Jefferson School 9-5 11-3 11.5 11-3 11.6 Lafayette School 10. s 11-3 10.6 10.3 12.2 14.7 Lincoln School 9.0 9.2 9.0 11. 11.2 Lowell School 8.6 10.6 11.7 11.8 14- 14.6 Onequa School 10. s 11.6 10.9 9.9 12.2 13-5 Oquirrh School 8.7 10.7 12.2 13-3 12.1 Poplar Grove School . . . 9-5 9.8 n-3 11.6 12.4 Riverside School . . . . . 9.4 12.7 9.8 11. 12. 12.2 Summer School 10.2 13-8 12.4 12.2 12.7 13-9 Training School 7-i 9.0 9.8 9.6 11.6 12.5 Wasatch School 12.7 134 11-3 12.4 12.3 Washington School .... 8.9 9-7 9-5 10.7 11.2 Webster School 7.6 11.1 10.7 12.1 12.8 11.6 Whittier School 9.1 n.7 11 4 12.0 12.8 14.7 For the City 9.2 10.7 11.0 n-3 12.2 12.8 Table 12 shows a total distribution of quality in writing for Salt Lake City. It will be observed that this table gives a different view from the distribution of medians shown for Cleveland in Table 6. It gives a worth while bird’s-eye view of the writing for the entire city. This distribution is particularly valuable to the superintendent and supervisors in showing the work yet needed on handwriting. Quality 12 of the Thorndike scale corresponds to 60 of the Ayres scale. 1 The scores in Tables ii and 12 are in terms of the Thorndike scale to be explained further on in the chapter. 2 Salt Lake City Survey, page 148. The Measurement of Handwriting 53 Table 12.— The Distribution of Scores in Quality on 3685 Samples of Penmanship by Grades. (Salt Lake City)1 Score ill IV V VI VII VIII 4 3 5 4 6 21 5 3 7 55 3° 3 3 2 8 85 63 59 26 8 9 196 i75 i47 117 70 28 IO 46 37 23 38 12 ' 4 ii 102 152 190 53 163 97 12 44 60 65 92 9i 81 13 39 IOI 98 87 189 84 14 11 38 41 52 68 50 IS 4 12 i5 20 3i 35 16 4 9 4 10 24 61 17 4 1 2 10 18 1 1 2 22 Number of samples .... 616 687 646 602 662 472 Median score for grade . . 9.2 10.7 11.0 n-3 12.2 12.8 Pupils writing above quality 12 should be excused from further drill, except voluntary drill. The Thorndike Scale. — While it is assumed that the teacher will doubtless use the Ayres scale, because of its convenience and availability, yet teachers should know of the Thorndike Scale, and should appreciate the fact that it was Dr. E. L. Thorndike who first gave us a usable scale for hand- writing. The Thorndike scale is based upon general merit, as determined by the judgment of a large number of competent judges. In this respect it differs from the Ayres scale, which is based entirely upon legibility. It is unnecessary at this point to go into the discussion of the merits of the two scales. 1 Salt Lake City Survey, page 149. How to Measure 54 It is agreed that either scale can be understood, and will give much better results than the old method of grading. Because the Thorndike scale was first developed, and its value was immediately appreciated by school men, it was introduced into a large number of school systems, and is still retained in many of them. For this reason it will be well to indicate standards of quality according to the Thorndike scale. The numbers are quite definite since the samples on the Thorndike scale range from 4 to 18. The Thorndike scale was used in the Butte survey, and Table 13 shows the complete distribution of scores in quality. Table 13. — The Distribution oe Scores in Penmanship (Butte Survey, p. 165) Score (Quality) Grade 2 3 4 5 6 7 8 o I 2 3 4 5 2 5 22 2 3 3 1 6 21 21 16 3 2 I 7 29 44 24 12 I 3 3 8 28 86 42 56 20 15 7 9 42 4i 55 61 25 29 15 IO 7 8 20 16 9 11 1 ii 29 13 21 17 32 25 23 12 - . 5 2 i5 i5 44 12 21 13 7. 2 2 6 17 19 9 i4 3 4 10 16 9 i5 1 9 6 15 16 1 1 10 12 17 17 6 2 3 18 3 1 Total papers . . . 196 221 202 194 188 152 124 Median scores . . 8.2 8.0 8.8 8.9 11.6 11.2 12.1 The Measurement of Handwriting 55 Table 14 shows the median performance in certain cities and indicates also the Freeman standard, expressed in units of the Thorndike scale. Table 14. — Quality oe Handwriting (Thorndike) 1 11 III IV V VI VII VIII Connersville, Indiana 10.3 10.0 10.3 n.7 n.7 11.0 Butte, Montana . . 8.2 8.0 8.8 8.9 11.6 11.2 12.1 Salt Lake City . . 9.2 10.7 11.1 11-3 12.2 12.8 Kansas City . . . 7.2 74 8.4 9-3 10.4 11.0 11.4 Freeman’s 56 cities 1 . 7.8 9.4 10.2 11.2 11.6 12.s 13-4 Freeman’s standard 1 8.4 9.8 10.9 12.0 12.8 13-9 15.2 Table 15, which follows, contains a complete table of trans- formation, by which the qualities in the Ayres scale may be transformed into the Thorndike scale and vice versa. Table 15. — Comparative Values.2 Ayers Thorndike Thorndike Ayres 20 6-33 5 9-5 30 7.60 6 17.4 40 8.86 7 25-3 5° 10.13 8 32.2 60 n-39 9 41.1 70 12.66 10 49- 80 13-93 11 56.9 90 15-19 12 64.8 13 72.7 14 80.6 IS 88.5 16 96.4 1 Transformed scores, approximate only. 2 Dr. T. L. Kelley, Journal of Educational Psychology, December, 1914. 56 How to Measure Lister-Meyers Handwriting Scales. — These scales are in use in the schools of Greater New York. They were pre- pared by Professors Lister and Meyers of the Brooklyn Training School for Teachers. They are printed on a sheet 24//X26// and show rankings from 90 to 20 on the three items: form, movement, and spacing. This scale is a good illustration of a special adaptation based upon the type of writing which the supervisors are endeavoring to secure in the particular city. The teacher who is merely interested in putting her grading system on a scientific basis may neglect some of the present discussion and may secure good results by simply following the rules laid down for giving the tests, scoring the results, distributing the scores, and applying remedial instruction. What now seems theoretical and abstract in the measure- ment of handwriting will take on new significance as the teacher gradually masters the details of applying the work to her own schoolrooms. The practice will illuminate the theory; that which is theoretical will become practical. The work is of value as it modifies and improves school practice. Many teachers, however, will desire to know the history and development of the work, and in addition to a thorough study of the present chapter, will use the following bibli- ography to further study the subject. - BIBLIOGRAPHY 1. Ayres, L. P., “A Scale for Measuring the Quality of Handwriting of School Children,” Russell Sage Foundation, Bulletin No. 113. 2. Freeman, F. N., “The Teaching of Handwriting,” Riverside educa- tional monographs, Houghton Mifflin Company. “Handwriting,” the Fourteenth Yearbook of the National Society for the Study of Education, Chap. V, pp. 61-77. The Sixteenth Yearbook of the National Society for the Study of Education, Chap. IV, pp. 60-72. “An Analytical Scale for Judging Handwriting,” The Elemen- tary School Journal, April, 1915. Order copies from Houghton Mifflin Company, each. The Measurement of Handwriting 57 3. Thorndike, E. L., “Handwriting,” Teachers College Record, March, 1910. “Teachers’ Estimates of the Quality of Specimens of Hand- writing,” Teachers College Record, November, 1914. 4. Bobbitt, Franklin, Twelfth Yearbook of the National Society for the Study of Education, Part I, pp. 40-42. 5. Wilson, G. M., “The Handwriting by School Children,” Elementary School Teacher, 1911, Vol. u, pp. 540-543. 6. Kelley, T. L., Journal of Educational Psychology, December, 1914. 7. Gray, C. Truman, “Standard Score Card for Judging Handwriting,” University of Texas, Austin, Texas. 8. Starch, Daniel, and Wise, Carl T., “A Measuring Scale for Hand- writing.” For copies, address The University Cooperative Co., Madison, Wisconsin. For discussion of the experimental and statistical work involved, see Starch, Daniel, “A Scale for Measur- ing Handwriting,” School and Society, January, 1919. CHAPTER IV THE MEASUREMENT OE ARITHMETIC Measurement in arithmetic is not so simple as in spelling or in handwriting. Arithmetic taken as a whole involves many processes, and each process in turn involves particular difficulties. No one has attempted to measure general mathematical ability,1 and no one has attempted a test which covers the entire range of arithmetical processes. However, in its more essential and mechanical phases, arithmetic is susceptible of quite definite measurement. Since our purpose is to help the teacher in measuring her work to the extent that scales and standardized tests are available, it will be necessary to consider what can be measured at the present time, and what are the means available for doing it. There are at the present time five series of tests reasonably well standardized. The one first developed and most exten- sively in use is the Courtis Standard Research Tests. At the present time Courtis is confining his work to measurement of the four fundamental processes by his tests known as “ Series B.” The Courtis tests were first made available during the school year of 1909-10. The first tests, known as “ Series A,” were tentative in nature, and have since been discontinued. The use of Series B has extended very rapidly. During the 1 The discovery of mathematical ability among secondary pupils has re- cently been attempted and a prognostic test devised. See Rogers, Agnes Low: “Experimental Tests of Mathematical Ability and Their Prognostic Value.” 58 The Measurement of Arithmetic 59 year 1915-16 they were used in forty-two states of the Union, Hawaii, and two foreign countries, a total of one half million copies being sold. The teacher will see, therefore, that in making use of the Courtis tests she is becoming acquainted with a method of measurement which is used widely and is likely to be used even more extensively in the future, unless they are replaced by something better. The chief attention of the teacher will be directed to these tests. Courtis Arithmetic Tests, Series B. — Series B consists of tests in addition, subtraction, multiplication, and divi- sion, respectively. The test in addition consists of twenty- four examples, each made up of nine three-place numbers. They are constructed mechanically in such a way that each example is equal in difficulty to every other. The addition problems, therefore, consist of 24 different units of measurement, more or less in the nature of 24 foot rules, although not as accurately equal one to the other. The point of the test is to see how many of these examples can be solved by a pupil in a given time, — the time in this case being eight minutes. The examples in subtraction, multi- plication, and division are likewise made up on the basis of uniform difficulty and a definite time limit is set. The details, which follow herewith, give a few samples from each of the tests, together with the time limits. Arithmetic. Test No. i. Addition score . No. Attempted Senes B Form 2 _ T , No. Right You will be given eight minutes to find the answers to as many of these addition examples as possible. Write the answers on this paper directly underneath the examples. You are not expected to be able to do them all. You will be marked for both speed and accuracy, but it is more important to have your answers right than to try a great many examples. 60 How to Measure 127 375 953 333 325 911 554 167 554 996 320 778 886 913 164 897 972 119 237 949 486 987 354 600 744 195 234 386 463 827 240 616 261 755 833 959 186 775 684 260 372 846 595 254 137 474 787 591 106 869 451 336 820 533 877 845 981 693 184 772 749 256 258 537 685 452 904 611 988 559 127 323 Test No. 2. Subtraction The test consists of 24 examples like the following, time 4 minutes. 97089301 20203267 93994413 54783938 108051861 73463849 163130569 91061255 168354186 70537861 188545364 92471259 120981427 64188045 105755782 90863147 Test No. 3. Multiplication The test consists of 25 examples like the following, time 6 minutes. 6283 47 9624 503 7853 35 4926 620 5873 49 2964 94 8357 87 6249 78 3785 35 4965 19 Test No. 4. Division The test consists of 24 examples like the following, time 8 minutes. 29)24679 57)51642 38)32300 64)61504 46)34086 75)55500 92)27784 83)26643 The Measurement of Arithmetic 61 Nature of the Examples. — The teacher may properly question whether the examples appearing in these tests are such as commonly appear in ordinary business transactions. That the examples of the Courtis test are more or less artificial from a social standpoint, and considerably more difficult than the transactions actually occurring under business conditions, is borne out by a study of the social and business use of arith- metic reported in Chapter 8 of the Sixteenth Yearbook of the National Society for the Study of Education.1 The teacher should remember, however, that the purpose of the Courtis examples is merely to test the ability of the pupils in the fundamentals. Examples as difficult as those appearing in Test i, for instance, will involve all of the difficulties of simpler examples, and so in a measure justify themselves in that they test the extreme ability likely to be required not only of pupils in the public schools under any system, but even by the exigencies of business and social situations of adult life. Mr. Courtis himself has recognized the fact that the multiplication and division problems are entirely too difficult for third grade pupils, and as a result his 1916 standards indicate a zero score for third grade pupils in multiplication and division. It is possible that in time further adjustment will be made in this same direction. Directions for Giving the Tests. — It is assumed that the teacher is not interested in a merely theoretical discussion of arithmetic tests, but in using them for the measurement of the work in her own schoolroom. The next step, therefore, in connection with the Courtis tests is to write2 for a sufficient quantity of the research tests in arithmetic, series B, to enable her to give the test to the number of pupils which she has in her room. 1 This study has since been confirmed by a larger study. Wilson, G. M.: “ A Survey of the Social and Business Usage of Arithmetic,” Teachers College Bureau of Publications. 2 See bibliography for directions. 62 How to Measure The teacher will have no difficulty in administering the tests. One or two of the tests can be given during a single recitation period. The time for addition is 8 minutes, for subtraction 4 minutes. It will doubtless be better to give these two tests on one day, deferring the tests in multiplica- tion and division until the next day. The instructions follow herewith. Instructions to Examiners 1. For each room, prepare as many bundles of papers as there are rows of seats, putting into each bundle as many papers as there are seats in each row. 2. Begin by saying, “My purpose this morning is to measure how well this school teaches its children how to add, subtract, multiply, and divide. I have here some printed tests. They are not examinations, because exactly these same tests are given to all the grades from the third through high school. They are also being given in other schools in this city, and in other cities all over the country. It is the school that is being examined to-day. If you treat the tests as though they were a game, you will enjoy them and do your best for the honor of your school. I am going to give each of you a set of these papers, but do not look at them until I tell you to do so. Will the boys and girls in the front seats please distribute them for me?” 3. Distribute the papers by putting a bundle on the first desk in each row and letting the children do the rest. 4. Have the children fill out the blanks at the top of the first page. Write the date in figures, and the time to the nearest half hour; thus: 9-25-1913-10:30. 5. Have the children read instructions for Test 1 aloud in concert. 6. “Now please listen closely. In these tests it is important that we all start at the same time and stop at the same time. We can do this easily, if you follow my instructions exactly. Lay your papers on your desks in position to work the examples, but close the cover with your left hand, keeping it between your thumb and finger, like this (illustrate), so that you can open it quickly The Measurement of Arithmetic 63 when I tell you to start. Take your pencil in your right hand, and when I say ‘ Get ready/ raise your pencil hand in the air as if you were going to ask a question. (Illustrate, by suiting the action to the words.) Then when I say ‘Start/ you can bring your pencil down as you turn the cover back, and every one will start at the same time. When I say ‘ Stop/ I want you all to stop at once, and to raise your hands again so that I can see that you have stopped. Now I think we are ready to try the test.” When the second hand of the watch reaches the 55-second mark say “ Get ready for the addition test. Hands up.” Exactly at the 60 mark say “Start.” Allow Exactly Eight Minutes “Stop. Hands up.” Make sure all have stopped. “Count how many examples you have finished, and write the number in the score card in the corner under the number attempted. Do not count examples you have begun but have not finished. Your score is the number of the examples you have finished. I am coming to your desk to see that you have written it in the right place.” 7. Read the answers from an Answer Card (be sure the form number corresponds with that of the tests), and have the children check answers right or wrong, counting the number right, and writing it in their score cards. 8. In similar fashion give and score the other tests. For Test 2, Subtraction, allow exactly FOUR minutes. For Test 3, Multiplication, allow exactly SIX minutes. For Test 4, Division, allow exactly EIGHT minutes. 9. Give Tests 1 and 2 the first day, and Tests 3 and 4 the next. All may be given at one time if desired. Scoring the Results. — The teacher may save herself much work, and on the whole secure equally satisfactory returns if she has each child score his own paper at the close of the tests (or papers may be exchanged for scoring). This method has the added advantage of enlisting the interest of the children. 64 How to Measure Ask the child to count the number of examples for which he has a complete answer. This number is to be placed in the little square at the upper right-hand corner of the test paper, in the blank following “ Number Attempted.” The next thing to do is to read from the key, furnished with the tests, the correct answer to the various examples. Each pupil should follow, crossing out on his sheet any answer which is not correct. Have pupils then count the number of correct answers, and place the result in the upper right-hand corner, in the blank following “ Number Right.” If the pupil is made to understand the purpose of the work and the necessity of knowing exactly the condition of the class, and for that matter his own condition, there will be no difficulty in getting full cooperation on the part of the pupils and honesty from every member of the class. At the conclusion of the test the pupil should make his own graph, showing attempts and rights in the four funda- mental processes, as per the example which follows. Research Tests in Arithmetic INDIVIDUAL SCORE SHEET ARITHMETIC Series B Name. boy or girl Age last birthday- School_ ..Grade. . .Room City. ..State_ Date. INDIVIDUAL SCORES CLASS SCORES Attempts Rights Attempts Rights Test Subject ist 2nd Change ist 2nd Change ist 2nd ist 2nd Trial Trial Trial Trial Trial Trial Trial Trial No. i Addition No. 2 Subtraction No. 3 Multiplication No. 4 Division The Measurement of Arithmetic 65 GRAPH Attempts Addition Rights Attempts Subtraction Rights M ULTIPLI CATION Attempts Rights Attempts Division Rights Instructions. In each column mark the number that corre- sponds to your score for that column. Then with a ruler draw a line from each number so marked to the next. Draw a curve for the class scores in the same way, using a dotted line. By comparing the two curves you can tell how much your scores are above or below the class results. This individual score sheet will appeal to children, and will be exceedingly serviceable in securing the necessary further 66 How to Measure progress of the children. The teacher may find it worth while to arrange all of the score sheets in order of excellence by pinning them on a piece of burlap on the side of the room. The pupil’s score (dotted line) appearing on the individual score sheet above is the record of an eighth grade pupil who is about average in ability. In order that the teacher may get an intelligent view of the performance of her class, it will be necessary to make a distribution of the class scores, and it is suggested that this be made in such a way as to show both speed and accuracy. Table 16, which follows herewith, shows such a distribution for 35 eighth grade pupils. Table 16. — Showing Distribution or 35 Eighth Grade Pupils in Speed and Accuracy in September. Addition, Series B Score in Examples Attempted. (Speed) s 6 7 8 9 IO II 12 13 14 IS 16 I7i 18 Totals § CD CJ CD cd p 2 . . p CJ CJ < 3 • • jg cfq* 4 • • I I 2 P* 5 • • I I 2 C/3 rP M) 6 . . I i 2 4 M 7 • • 2 i 4 2 I IO !> C/) JD 8 . . I 2 2 I 6 o £ a E 9 • • i I I 2 5 •-« pa o cd X IO . I I 2 w ii . I I 2 .9 12 . . I i 2 o •-1 7 • • I i 2 o o 8 . . i i i 3 £ 9 • • i 2 i I 5 P o IO . . i 2 2 2 I 8 ii . I 3 2 6 M o 12 . . i 2 2 i 6 o »-< 13 • • I I i i 4 00 14 . . i i Co IS • • Totals I 2 3 6 7 8 4 2 2 Median Attempts (Speed) — 12 content until pupils in an upper grade are letter perfect in solving simple examples in the fundamental processes, i.e. until 100% accuracy is reached. Since the results in addition as shown in Table 16 were secured in September, and the results in Table 19 in January, 74 How to Measure and since the standards as brought out in Tables 17 and 18 are based upon May tests, the teacher of the grade whose results are shown in Tables 16 and 19 may reasonably expect that her pupils will be up to standard when she gives the tests in May. In Figure 6 the class improvement in rights from Septem- ber to January is shown graphically. This graph comes directly from the rights in Tables 16 and 19. Number at each score Fig. 6.—Showing attainment of the class in September (single line graph) and in January (double line graph). The entire class has moved steadily to the right, which means an improvement in score. Other Arithmetic Tests. — Other available tests at the present time are the Stone Reasoning Tests, the Woody Arithmetic Scales, the Boston Tests in Addition of Fractions, the Cleveland Survey Arithmetic Tests, and the Monroe Diagnostic Tests in Arithmetic. These tests will in turn be The Measurement of Arithmetic 75 described briefly, more particular attention being given to the Stone Reasoning Tests and the Woody Arithmetic Scales, as these are needed to supplement the Courtis tests, and can be used by teachers without particular difficulty. It is evident that what can be measured in arithmetic depends somewhat on the test being used. In general it is the per- formance of the pupils which is tested, and this may include speed and accuracy, or accuracy only, according to how the tests are administered. The purpose in arithmetic, as in all testing, should be to find out the present condition of the child, in order to prescribe remedies in case he needs help, or in order to release him from further drill, in case he is fully up to reasonable standards. Reasoning Tests. — When arithmetic is put to practical business use, it is always connected with an actual situation, and the solution requires judgment or reasoning as to the processes involved. For upper grade work no test of arith- metic is complete which fails to test reasoning ability. The Stone Reasoning Test has been most used. It consists of twelve problems, ranging in value from i to 2, as follows: THE STONE REASONING TEST (Time Exactly 15 minutes) School Grade Name of pupil Problem Value Problems Solve as many of the following problems as you have time for; work them in order as numbered: x.o i. If you buy 2 tablets at 7 cents each and a book for 65 cents, how much change should you receive from a two- dollar-bill ? 1.0 2. John sold 4 Saturday Evening Posts at 5 cents each. He kept \ the money and with the other \ bought Sunday papers at 2 cents each. How many did he buy ? 76 How to Measure THE STONE REASONING TEST (Continued) Problem Value Problems 1.0 3- If James had 4 times as much money as George, he would have $16. How much money has George? 1.0 4- How many pencils can you buy for 50 cents at the rate of 2 for 5 cents? 1.0 5- The uniforms for a baseball nine cost $2.50 each. The shoes cost $2 per pair. What was the total cost of uni- forms and shoes for the nine? 1.4 6. In the schools of a certain city there are 2200 pupils; \ are in the primary grades, \ in the grammar grades, | in the high school and the rest in the night school. How many pupils are there in the night school ? 1.2 7- If 3! tons of coal cost $21, what will 5J tons cost? 1.6 8. A news dealer bought some magazines for $1. He sold them for $1.20, gaining 5 cents on each magazine. How many magazines were there? 2.0 9- A girl spent | of her money for car fare and three times as much for clothes. Half of what she had left was 80 cents. How much money did she have at first ? 2.0 10. Two girls receive $2.10 for making buttonholes. One makes 42, the other 28. How shall they divide the money ? 2.0 11. Mr. Brown paid one third of the cost of a building; Mr. Johnson received $500 more annual rent than Mr. Brown. How much did each receive? 2.0 12. A freight train left Albany for New York at 6 o’clock. An express train left on the same track at 8 o’clock. It went at the rate of 40 miles an hour. At what time of day will it overtake the freight train if the freight train stops after it has gone 56 miles ? The papers are scored by giving to each problem solved correctly the value as indicated at the left of each problem in the above. The test was first formulated for upper sixth grade pupils, but it is equally good for seventh or eighth The Measurement of Arithmetic 77 grade pupils. It is too difficult for good results in grades below the sixth. Dr. Stone has recently issued1 the following grade standards: Grade Standard 5 Score of 5.5, reached or exceeded by 80%, 75% accuracy. 6 Score of 6.5, reached or exceeded by 80%, 80% accuracy. 7 Score of 7.5, reached or exceeded by 80%, 85% accuracy. 8 Score of 8.75 reached or exceeded by 80%, 90% accuracy. It is quite probable that the median scores secured through the use of the Stone reasoning tests in various surveys form a more usable standard than the one suggested by Dr. Stone. These scores are shown in Table 20. Table 20. — Showing Median Scores Obtained in the Use of the Stone Reasoning Tests Grade Stone tqo8 26 Cities Butte, Mont. 1914 Salt Lake City 191s Boston 1916 Brookline Mass. Lead S. D. Nassau Co.N. Y. 19172 5 2.2 3-7 4.0 6 5-5 3-9 6.4 4.0 6.2 6.7 4-5 7 5-8 8.6 6.4 8 7-7 10.5 11.6 7.2 The teacher will find it worth while to use the Stone reason- ing tests, although the standards are not so definite as for the Courtis tests in the fundamentals. It will be simpler to take the returns from a single city, as for example, Salt 1 Stone, C. W., “Standardized Reasoning Tests in Arithmetic and How to Use Them.” (Teachers College Bureau of Publications.) 2 The scoring is such as to slightly raise the score. 78 How to Measure Lake City, as a standard. If pupils fail to reach the Salt Lake City standard, they are not doing as well as pupils have done in an average city system. Diagnostic Tests. — The teacher who has followed the discussion closely will appreciate the fact that the Courtis tests, while measuring ability, do not analyze the difficulties and do not permit the teacher to use them easily for analyzing a pupil’s shortcomings. This defect of the Courtis tests is being overcome gradually by the formation of other tests which have better diagnostic possibilities. Among these are the Woody Arithmetic Scales. Woody Scales. — The Woody scales were not originally designed for diagnostic purposes, but they are being made to serve that purpose, as well as their original purpose of measuring the ability of children. They are constituted quite differently from the Courtis tests. Each Courtis test consists of a series of problems of equal difficulty in one of the fundamental processes. The Woody scales consist of a series of problems of increasing difficulty. They are designed to measure work in the four fundamental operations: addition, subtraction, multiplication, and division. While constructed on a statistical basis rather than for the purpose of serving as the basis of an analysis of subject -matter needs, yet at the same time they do cover subject matter reasonably well. The addition scale, for instance, covers simple combinations in one, two, three, and four column addition; examples with addends from two to sixteen; addition of simple fractions; addition of decimals; addition of U. S. money; addition of denominate numbers; and addition of mixed numbers. The additions are ex- pressed in column form and by the plus sign. Thus the pupil is tested, more or less, over the entire range of addition possibilities, by a series of problems ranging in difficulty from those so simple that any third grade pupil may solve them, up to other problems so difficult that few eighth grade pupils succeed in solving them. This will appear by exami- The Measurement of Arithmetic 79 nation of the addition scale which follows herewith. The subtraction, multiplication, and division scales also follow. SERIES A1 Addition Scale (20 minutes) Name When is your next birthday ? How old will you be ?. .. . Are you a boy or girl ? In what grade are you ? (1) 2 3 (2) 2 4 3 (3) 17 2 (4) 53 45 (5) 72 26 (6) 60 37 (7) 3 + i = (8) 2+5+1= (9) 20 10 2 30 £5 (10) 21 33 35 (11) 32 59 £7 (12) 43 1 2 £3 (13) 23 25 16 (14) 24+42 = (i5) 100 33 45 201 46 (16) 9 24 12 i5 £9 (17) 199 194 295 156 (18) 2563 1387 4954 2065 (19) $ -75 1.25 •49 (20) $12.50 16.75 15-75 (21) $8.00 5- 2-33 4.16 •94 6- (22) 547 197 685 687 456 393 525 240 152 (23) ¥+1j = (24) 4.0125 I-59°7 4.10 8-673 (25) t+t+I'+i^1 1 The scales are printed in large type, on separate sheets, 8j"Xii", with ample space for the insertion of answers. 80 How to Measure (26) 12! 62^ 12^ 37i (27) (28) f+T = (29) 4f 2T 5j (30) 2i 6| 3t (31) 113.46 49.6097 19.9 9.87 .0086 18.253 6.04 (32) i+!+i= (33) .49 .28 •63 •95 1.69 .22 •33 •36 1.01 •56 .88 •75 •56 1.10 .18 •56 (34) ¥ + t = (35) 2 ft. 5 in. 3 ft- 5 in- 4 ft. 9 in. (36) 2 yr. 5 mo. 3 yr. 6 mo. 4 yr. 9 mo. 5 yr. 2 mo. 6 yr. 7 mo. (37) i6| 12I 2ii 32f (38) 25.091 + 100.4+25+98.28 + 19.3614 = SERIES A Subtraction Scale Name When is your next birthday ? How old will you be ?. .. . Are you a boy or girl ? In what grade are you ? (1) 8 _5_ (2) 6 o (3) 2 1 (4) 9 A (5) 4 j4 (6) 11 _7 (7) 13 _8 (8) 59 12 (9) 78 37 (10) 7-4 = (II) 76 60 The Measurement of Arithmetic 81 (13) 16 9 (14) 50 25 (15) 21 9 (16) 270 1 go (19) 567482 106493 (20) 2f-I = (12) 27 3 (17) 393 i78 (18) 1000 537 (21) 10.00 3-49 (22) 3i~i = (23) 80836465 49178036 (24) 8| if (25) 27 12I (26) 4 yd. 1 ft. 6 in. 2 yd. 2 ft. 3 in. (27) 5 yd. 1 ft. 4 in. 2 yd. 2 ft. 8 in. (28) 10—6.25 = (29) 75* 52! (30) 9.8063 —9.019 = (31) 7.3-3.00081 = (32) 1912 6 mo. 8 da. 1910 7 mo. 15 da. (33) 5 2 _ T5- — To-- (34) 6i _2j_ (35) 3¥-i| = SERIES A Multiplication Scale Name When is your next birthday ? How old will you be ?. .. . Are you a boy or girl ? In what grade are you ?. (1) 3X7 = (2) 5Xi = (3) 2X3 = (4) 4X8 = (5) 23 (6) 310 4 (7) 7X9 = (8) 5° (9) 254 6 (10) 623 7 (11) 1036 8 (12) 5096 6 (13) 8754 8 (14) 165 40 (i5) 235 23 (16) 7898 9 (17) 145 206 (18) 24 234 (19) 9.6 4 (20) 287 •05 (21) 24 2i (22) 8X5f = (23) iix8 = (24) 16 _£| (25) *X* = (26) 9742 59 (27) 6.25 3-2 (28) .0123 9-8 (29) iX2 = 82 How to Measure (3°) 2.49 36 (31) 12 w 15 _ •3-5X32 — (32) 6 dollars 49 cents 8 (33) = (34) ixi = (35) 9871 25 (36) 3 5 in- 5 (37) 2! X4¥ Xif = (38) •0963-s- .084 (39) 8 ft. 9! in. 9 SERIES A Division Scale Name When is your next birthday ? How old will you be ?. ... Are you a boy or girl ? In what grade are you ? (I> 3)6 9)27 (3)_ 4)28 to i)5 (5) 9)36 (6)_ 3)39 (7) 4-5-2 = (81 9)0 (9)_ 1)1 (10) 6 X ? = 30 (ii) 2)13 (12) 2 -5-2 = (14) 8)5856 (15) ! of 128 = (16) 68)2108 (13) 4)24 lb. 8 oz. (17) 50-7 = (18) 13)65065 (19) 248-5-7 = (20) 2.1)25.2 (21) 25)975° (22) 2)13-50 (23) 23)469 (24) 75)2250300 (25) 2400)504000 (26) 12)2.76 (27) ! of 624= (28) •003). 0936 (29) 3-1-9 = (30) 1-5 = (31) *+*- (32) 9f = (33) 52)3756 (34) 62.50-5-1! = (35) 53i)37722 (36) 9)69 lb. 9 oz. The Measurement of Arithmetic 83 Directions for Giving Woody Tests. — The directions for administering the Woody scales accompany the sheets, which may be secured from Teachers College Bureau of Publications. It is quite essential that uniform methods be followed in order to make results comparable. The papers are distributed with face down. When pupils are ready with pencils in hand, they are told to turn over the paper and answer the questions at the top of the page. The specific directions for the addition test as given by Dr. Woody are as follows: “ Every problem on the sheet which I have given you is an addition problem, an 1 and problem.’ Work as many of these problems as you can and be sure you get them right. Do all of your work on this piece of paper and don’t ask anybody any questions. Begin.” For the series A scales, twenty minutes are allowed for each test. There are shortened scales, series B, which are given in ten minutes each, but since the purpose in using the Woody scales will doubtless be to benefit more or less by their diagnostic values, it is assumed that teachers will prefer to use the longer scales of series A. It may be noted at this point that the time for giving the tests has been varied.1 While a shortened time gives slightly better distributions, particularly in the upper grades, yet the problems at the upper end of the scales are so difficult that few pupils will solve them even when given all of the time necessary. As a matter of fact, 20 minutes, the time allowed, is sufficient for most upper grade pupils to complete any one of the tests. The result is that in the upper grades, accuracy only is measured. But in using the Woody scales, it is likely that accuracy is the thing in which the teacher will be chiefly interested. In using the other Woody scales, the directions are the same as for addition except the substitution of such expressions as “ subtract or ‘ take away ’ problems ” ; “ multiplication or 1 The Nassau County Survey used 18 minutes instead of 20. 84 How to Measure Problem Addition Subtraction Mutltiplication Division I 5 3 21 2 2 9 6 5 3 3 19 I 6 7 4 98 6 32 5 S 98 O 69 4 6 97 4 1,240 13 7 4 5 63 2 8 8 47 150 O 9 87 41 1.524 I IO 89 3 4,36i 5 II 108 l6 8,288 65 not 6+1 12 59 24 30,576 I 13 64 7 70,032 6 lb. 2 oz. not 6+2 14 67 25 6,600 732 IS 425 12 5,405 32 16 79 80 71,082 31 i7 844 215 29,870 7? not 7+1 18 10,966 463 5,616 5,005 19 $2.49 460,989 38.4 357 not 35+3 20 $45.00 T3 14-35 12 21 $27.50 6.51 60 390 22 3.873 3 46 6.75 23 2 3 31,658,429 IO 2o293 ; 20.3, not 20+9 2.4 18 3762 3i 42 30,004 25 2, not nor f I4l tt 210 26 125, not = 2 i yd. 2 ft. 3 in. 574,778 •23 not 63 in. 27 i 2 yd. 1 ft. 8 in. 20,000 546 not 81 in. 28 1 not f nor f 3f or 3.75 .12,054 * 31.2 29 i2f not nf =if 235 not 23! = | i not f t7s 30 i2f not ii'j = if •7873 89.64 2? or .15 31 217-1413 4.29919 2t\ 32 not f nor if = i i yr. 10 mo. 23 $51.92 or 51 dol. da. 92 cts. 33 10-55 it 8f 72T? or 72.23 34 M 3i not 3f = i i 50 35 10 ft. 8 in. or iof 2\ not 2|=f 24,693? 7iT|7 or 71.04 ft. 36 22 yr. 5 mo. or 17 ft. x in. 7 lb. nf oz. 22T'V yr. 37 82H 1 Si? 38 268.1324 .00809025 or .00809025 39 79 ft. in. Table 21.—Answers to Problems in Woody Scales The Measurement of Arithmetic 85 Number of Problems Solved Score As Recorded Totals I 2 3 4 5 / I 6 / I 7 8 //// 4 9 // 2 IO / 1 ii III 3 12 mm 6 13 n 2 14 mum 8 T5 m 3 16 n 2 i7 / 1 18 / 1 19 20 21 22 23 24 25 26 27 28 29 3° 31 32 33 34 35 36 Table 22 Class Median 12 86 How to Measure ‘ times ’ problems ”; “ division or ‘ into ’ problems,” for “ addition or ‘ and ’ problems.” In case the teacher has used other expressions to indicate one of the processes, these may be substituted for the expressions “ and ” problem, etc. The purpose in using these extra expressions is to make clear to the child the process which is involved in the particular test. The standard for marking the examples in the Woody scales is absolute accuracy, and the final answer should be in its lowest terms. The table on the opposite page gives the cor- rect list of answers. The method of tabulating the results of the Woody tests is very simple. Assuming that a test has been given, indicate on the upper corner of each page the number of problems solved correctly. Then, for convenience, arrange the papers in order according to the number of problems solved. With the papers thus arranged, it will be possible to draw off di- rectly the results of the test as shown in Table 22. This table shows the distribution of a class of 35 with reference to the number of problems solved by the different members. Table 22 is taken directly from the results of a division test given in an intermediate grade,1 November 1, 1917. The distribution of pupils’ scores resulting from giving the Woody Test in Division, Series A, is shown for the entire school system in Table 23. This is the same form as shown in Table 22, except that it covers five grades, and the number of pupils in each grade is the complete number for the entire city. The superintendent of this school system has shown, exceptionally well, the diagnostic possibilities of the Woody scales. In the study referred to, he analyzes the division difficulties of pupils as shown by the errors they have made in attempting to solve the problems in the division scale. It will be well for the 1 Anderson, C. J., “Use of Woody Scales for Diagnostic Purposes,” Elementary School Journal, 16: June, 1918, pp. 770-781. The Measurement of Arithmetic 87 Table 23. — Distribution of Pupils’ Scores Number of Problem Grade IV V VI VII VIII I 1 2 I 3 4 5 I 6 3 1 7 1 1 8 11 2 9 7 IO 5 9 2 ii 12 7 3 12 11 6 13 8 9 1 I 14 13 3 2 1 is 5 2 2 1 16 3 7 4 1 17 1 10 4 18 2 2 4 19 12 2 2 20 6 8 2 2 21 9 6 I 22 8 9 6 2 3 2 12 1 24 1 11 3 1 25 1 9 2 1 26 1 5 8 1 27 3 3 2 28 3 9 3 29 1 5 3 3° 5 4 31 9 3 32 1 15 6 33 5 3 34 1 2 5 35 3 2 36 1 1 Total 84 100 93 83 40 Median 12 17 22 29 30.5 88 How to Measure teacher to summarize for her class the number of wrong solutions for each problem attempted. This can be shown for a single test by a table similar to Table 22, in which the problems are listed by number on the left, and the number of incorrect solutions shown on the right. In the final analysis, however, the teacher should study each paper to see what mistakes each particular pupil made. This should be done in each of the fundamental processes. If the Woody scales are used to supplement the Courtis tests for the purpose of finding out where the pupil made his mistakes, it will be found exceedingly valuable. The various types of errors made in division in the city referred to were summarized by the superintendent and his teachers as follows: 1. Ignorance of multiplication tables, 30 per cent. Illustra- tion: 8,107 8)5^6 2. Using dividend as a whole, 14 per cent. Illustration': 3)39 12-3 3. Confusion of multiplication and division, 14 per cent. Illustration: 3)39. 93 4. Remainder, 10 per cent. Illustration: 6f 2)13 5. Confusion of signs, 7 per cent. Illustration: 2^2=4. 6. Form of example strange, 5 per cent. Illustration: -I of 128. 7. Carrying (either forgetting to carry or ignorance of what should be carried), 5 per cent. Illustration: 2)1.350. 620 8. Value of o, 5 per cent. Illustration: 9)0 1)1 9 o 9. Confusion of addition and multiplication, 5 per cent. Illustration: 3)6. 3 The Measurement of Arithmetic 89 10. Confusion of dividend and divisor, 2 per cent. Illustra- tion : 8)498. 212 (This quotient is explained as follows: 4 into 8 = 2, 8 into 9 1 and 1 over, 8 into 18 = 2 — 2 over.) 11. Using some figure in dividend twice, 2 per cent. Illus- tration : 8)5,856. 7,io7 12. Transposing answer, 1 per cent. Illustration: of 128 = 23. , The teacher with her particular grade should proceed in a similar manner, taking up each fundamental process and dis- covering the types of errors made. It will be well to note, not only the errors made, but after each, the names of the pupils making that particular mistake in order that she may give special attention to all of the pupils making a particular mistake. Suppose, for example, that one of the teachers in the above city, on a certain day, desires to work upon the fourth one of the listed errors, namely inability to handle the remainder. In an intermediate class of 35 pupils, she may have four who need help on this point. By referring to her paper she will be able to call the names of the four who need special instruction. So with each particular mistake, she will be able to call for the pupils who need help, permitting others in the class to spend their time in some other way. The superintendent and his teachers in the city referred to, noted that long division was difficult for the pupils, and so made a special summary of the errors in long division, as follows : 1. The assumption that the first integer of the divisor may be used always as a trial divisor. 2. The trial-and-error method of finding quotient. 3. Ignorance of multiplication tables. 4. Carrying wrong number when multiplying. 5. Borrowing in subtraction. 6. Ignorance of value of cipher. 7. Forgetting to place integers in quotient. 90 How to Measure This is a good illustration of the diagnostic use of tests as a basis for remedial instruction. Such use of tests makes them of direct service in the work of helping pupils, and this is a use that must in future receive more and more attention. The Woody tests are being quite extensively used. During the year 1917-1918, 300,000 copies of the tests were used by school men in the United States. This extensive use is gradually developing standards of performance. Instead of giving standards alone, it will be more helpful to list the returns from a number of cities. Accordingly there are given in Table 24 the median scores secured in the use of the Woody scales in twenty Wisconsin cities, as well as the Woody stand- ard medians. Table 24.— Median Scores by Cities Wis- consin Cities Date Tested Section Tested Addition Subtraction III IV V VI VII VIII III IV V VI VII VIII I 10/3/16 B 12.4 19-3 21.2 27.4 30.3 31-9 10.7 16.3 20.3 244 27.7 30.8 2 10/10/16 • B 12.7 17.8 23.1 27. 30.3 32. 11.9 17-3 22.3 274 294 30.5 3 10/24/16 B 12.6 17.7 23-7 28.8 30.6 31.6 13-8 13-3 21.9 27.8 26.8 31-5 4 2/6/17 A&B 15-5 21.4 22.7 30.7 33-3 34-3 13.6 18.3 20.7 25-7 27.6 30.6 S 2/12/17 A&B 20.7 28.3 32.1 32.8 18.5 19.9 25- 27.6 6 11/27/16 B 15-3 20.7 22.1 28. 31-4 33-4 14.9 18.5 21.3 28.5 30.2 32.8 7 12/5/16 B 13- 20.1 20.3 24.2 27.4 26.8 II.I 18.1 19.9 20.7 274 26.7 8 1/9/17 B 19.2 20.7 26.9 30.9 31-9 32.9 15-7 20. 22.7 26.9 30.6 32.6 9 1/12/17 B 14.8 19.8 25-3 27. 28.2 32.7 IO 12/5/16 A&B is.s 20.9 27. 30. 11-3 17-3 2g. II 3/27/17 A&B 16.6 19-5 23-3 29-3 32.4 33-2 13-4 184 20.5 25.2 28.7 29.2 12 2/26/17 A 14.4 17-5 20.6 23.8 31.2 33-i 12.6 17.9 20.1 24.2 27-5 13 3/8/17 A 12.5 19. 20.5 26.8 30.8 32.5 14 15 5/8/i7 A&B 8-7 9.8 17.7 184 24.7 28.2 l6 4/10/17 A 17.8 22. 23-9 294 30-5 30.2 17 4/12/17 A 12. 19.2 18 6/4/17 A&B 12.2 18.2 21.2 27-3 19 6/6/17 A&B 21.8 26.4 28.6 32. 31.8 20 5/ /i7 A 19. 22.7 26.8 30.8 33-9 34-8 15- 20.3 24. 28.8 32.9 33- Median . . . . 15-5 20.2 22.7 28.4 31-9 33-i 13-3 18.1 20.8 25.6 284 30.3 Woody’s Standard Median 14-5 18.3 23.1 29.8 32.4 34- II.2 15-7 20.4 25- 28.5 31-7 The Measurement of Arithmetic 91 Table 24 (Continued) “I Multiplication 1 Division I 10/3/16 B 2.7 13-7 19.8 26.9 30.2 32.8 3-i 9.9 19.2 25-4 27-3 29.9 2 i'o/ 10/16 B 12.4 21.4 29-5 29.7 32.9 11.9 19.9 26. 29. 30. 3 10/24/16 B 4-3 11.7 18.8 27.2 29-3 32.3 5-6 5-5 17.8 25.2 28.2 30.2 4 2/ 5/17 A&B 6.2 14-3 18.8 28.5 30.5 33-6 8. i3-i 19.4 25-9 28.5 30.7 S 2/12/17 A&B 18.9 27.1 6 11/27/16 B 6.6 17.8 18.9 28.4 31- 34- 13-3 20.5 27.6 31- 29.9 7 12/5/16 B 8.0 12.8 l6.1 18.6 27. 29.6 10.8 10.8 17.6 26.5 25-5 8 1/9/17 B 8.5 15-8 18.9 28.3 32.1 34-1 12.8 18.1 24-3 28.8 3i-i 9 1/12/17 B 30.4 14-3 l6. 25-4 24.6 27.7 IO 12/5/16 A&B 5-i 18.5 26.3 10.4 18.5 22.1 22.5 II 3/27/17 A&B 7.8 16.4 19.2 27-3 32.3 32.4 11.6 16.5 23.2 27-5 28.2 12 2/26/17 A 15-5 18.4 24-5 30.4 3i-7 H.8 16.9 20.2 29. 28.7 13 3/8/17 A 11.6 15.2 21.5 23-5 25.2 14 5/4/17 A&B 20.5 23-5 30.5 32.4 35- IS 5/8/i7 A&B 8-7 9.8 17.7 18.4 24.7 28.2 16 4/12/17 A 22.5 24-5 23-5 23-3 17 6/4/17 A&B 14.8 25- 29-5 29.4 18 6/6/17 A&B II.2 14.2 26.5 29.1 33-9 19 5/ /i7 A 15-3 20.6 23.2 29.6 34-3 37-2 13-9 18.9 24.2 29-5 31.6 20 5/29/17 A 18.5 24.7 28.3 31.2 32.8 Median . . . 6.8 15-2 19.2 27-3 30.9 33-2 7.6 13-5 19.6 25-1 28.4 30. Woody’s Standard Median 4-7 II.I 18.3 26.1 30.6 32-9 5-8 9.9 16.5 23.8 27.4 30.1 Boston Research Tests in Fractions These tests are not generally available and are given in this connection to suggest to the teacher the possibility of becoming keen and active in the work of discovering pupils’ errors. The Boston test in addition of fractions consists of six simple tests of four problems each, each test having a two-minute time limit. They cover the various types of problems in the addition of fractions, and they increase in difficulty from the first example in which the denominators are the same, up to the last in which the common denominator can be deter- mined with difficulty by introspection. The test follows herewith: 92 How to Measure Addition of Fractions Showing Examples Used in Tests in Addition of Fractions, December, 1915 Test 1. — Time, 2 minutes. (1) i 1 (2) A A (3) A 7 T6 (4) To 7 1 0 Test 2. — Time, 2 minutes. C1) f 1 ¥ (2) f A (3) I 1 T2~ (4) i 7 T5~ Test 3. — Time, 2 minutes. (1) f ff (2) I 1 (3) 4 11 T4 (4) If 2. 3 Test 4. — Time, 2 minutes. (l) + A (2) | (3) f 3. J7_ (4) * 5 ¥ Test 5. — Time, 2 minutes. (1) A 1 (2) | 5 nr (3) i 3. _8_ (4) A To Test 6. — Time, 2 minutes. (1) i A (2) I 3 jr (3) i 9 1 0 (4) A 7 T7T The directions for scoring the test are not available, and without such directions definite comparison cannot be made. It seems worth while, however, to indicate the city medians for Boston, and these are summarized herewith in Table 25. The Measurement of Arithmetic 93 Table 25. — Summary Sheet — City Medians (Boston) Addition of Fractions, December, 1915 < < < H H H HH o 1 W M M M N> tO M On Co Ca Oo O Pupils Tested W W K) O ON O Speed Median M Cn vj (Number) H M GO 00 OO O Nl CO Accuracy Median H o o o (%) M M O H Speed Median H M C\ w On On Go 4^ Accuracy Median H boo Ca M OO Speed Median H Cn Co -t*. w 4* H to Os '-J boo Accuracy Median Oj in a Speed Median H o co o w On On O O 00 boo Accuracy Median -t* f a a Speed Median H On Co no w Ga Or Or H M Ui tO Accuracy Median Cn o o o Ca On Speed Median H i -o +■ w +* •£> +> H o oo '-j Accuracy Median CN boo These tests as given in the Boston schools proved especially helpful in the work of analyzing the difficulties of pupils and devising drills to raise the efficiency of the children. This was evidenced by the increase in both speed and accuracy in tests given during the following spring to selected sixth grades. The teacher should find these tests in addition of fractions very useful, and she can make comparisons on the basis of her own rules for scoring. The Boston tests in the subtraction of fractions will be equally suggestive and helpful to the teacher who is attempt- ing to analyze the types of problems and the difficulties en- countered by children in the solution of problems in sub- traction of fractions. The tests as given follow herewith. Subtraction of Fractions Showing Examples Used in Tests in Subtraction of Fractions, December, 1916 Test 1. —Time, 2 minutes. (1) i 1 (2) f x 4 (3) I 1 IT (4) n> 3 Te 94 //otu to Measure Test 2. —Time, 2 minutes. (l) i 1 (2) f 3 (3) t 3 TT (4) f 5 _9_ Test 3. — Time, 2 minutes. (l) t TO (2) f TTF (3) i T2 (4) TO 8 T75 Test 4. — Time, 2 minutes. (1) 4 27 (2) 6 S| (3) 6 21 (4) 6 3i Test — Time, 2 minutes. (2) 7t34 67 (3) 7iV 4? (4) 7t 2T5~ (1) 9* These tests increase in difficulty from the first to the fifth test and some of the examples in tests 4 and 5 are as difficult as any likely to appear in actual social and business practice. Instructions for the scoring of these tests are not at hand, but nevertheless the summary of the Boston medians is given herewith in Table 26. Table 26. — Summary Sheet — City Medians (Boston) Subtraction of Fractions, December, 1916 VIII . . . VII ... . VI ... . Grade W M W M K) vO 00 Go vO Go O Pupils M M to Cn nO to h oi Speed Median (Number) H M 00 o Oo M o o o Accuracy Median % H -P* On i O d) Speed Median E? M OO 00 On Or On boo Accuracy Median H -f» Ca On On On h Speed Median 0 Ca On On m w ca b o o Accuracy Median H Co 18.0 14.2 11.9 Speed Median H w CO vO NO Ca M nO b b b Accuracy Median H -f* Ca On On to Speed Median H M 81.0 66.0 64.0 Accuracy Median H The Measurement of Arithmetic 95 Tests were also devised in the multiplication and division of fractions. Some of these tests are quite difficult and yet for diagnostic purposes they will show the ability of children to multiply or divide fractions and mixed numbers. The tests are indicated herewith, and city medians are summarized in Table 27. Multiplication and Division of Fractions Showing Examples Used in Tests in Multiplication and Division of Fractions, December, 1917 Multiplication of Fractions. —Test 1. Time, 2 minutes. (1) |X6 (2) ix& (3) ? X12 (4) 12XA Multiplication of Fractions. — Test 2. Time, 4 minutes. (1) 246! 5 (2) 573? 5 (3) 275 8f (4) 456? 2 (5) i89 5? Multiplication of Fractions. — Test 3. Time, 2 minutes. (1) 4iXi (2) 7ixf (3) 5ixf (4) tX2f . Multiplication of Fractions. — Test 4. Time, 5 minutes. (1) 32? 69? (2) 84I 79? (3) 29! 28i (4) 25f ill (5) i9i _97l Division of Fractions. — Test z. Time, 2 minutes. (1) t-8 (2) 9 -=-•§■ (3) 6^| (4) 8-s-f Division of Fractions. — Test 6. Time, 4 minutes. (1) 5678-3-5 (2) 2789!-*-4 (3) 2467 -r8i (4) 6752 -5-12$ Division of Fractions. — Test 7. Time, 3 minutes. (1) *+* (2) 3*+* (3) 5*+l (4) It will be observed, from Table 27, that the scores in tests 2, 4, and 6 are very low, indicating that they were not well chosen. Referring to these particular tests, Dr. Ballou of the Boston bureau says: 96 How to Measure “It is probably true that there is no great use for the type of work shown in these three tests in practical life, but the business world does require it to some extent; business courses in our high schools require the processes, and the new course of study requires this work. In view of these three conditions, it was thought best to include these three tests in order that we might have some facts on which to base the development of our work in multiplication and division of fractions.” Table 27. — Summary Sheet — City Medians (Boston) Multiplication and Division of Fractions VIII VII VI . Grade H HH M (0 M O O O k) O Ok Pupils Tested On 00 h K) i H Speed Median (Number) Test i M ULTIPLICATION w 00 o Cm 00 Cm Accuracy Median % 00 00 K) CO Speed Median Test 2 Cm On O 00 Oo Accuracy Median ■f* Ok -7J m bk Speed Median Test 3 00 00 O h cn Accuracy Median Ca f* f* On to Speed Median Test 4 o o o Accuracy Median IO.I 8.2 54 Speed Median Test s Division Or * to 0 Go Speed Median Test 6 to O O vO Accuracy Median 0o O kO Cn Co Speed Median Test 7 on -2+4/>-45 . /)3~ 8 . 1 = p2-\-2 P+4 p2 — 81 $pr — i$r 2I x2+x —12 x+4 22. = a 23. — —_ -i2 a = 2 aV 18 There is no doubt that any of these tests in algebra will prove of value. They have been standardized, they permit comparison, they will be valuable for research purposes, and they have the advantage when used by teachers for promotion purposes of avoiding the unusually difficult prob- lems often used by teachers in final examinations. In other words they are more reasonable than tests usually given by teachers. Teachers frequently have erroneous ideas about the promotion of children; some even think it to their credit to fail a large number of pupils. If a pupil can pass simple tests such as the Monroe tests, he should be How to Measure 220 permitted to go forward with advanced work. While the Monroe tests cover only the simple fundamental processes of first year algebra, the Rugg and Clark tests cover the entire field of secondary algebra. The tests will distribute pupils so as to show a teacher that she is instructing a group of pupils who differ widely in ability and need help and drill on widely varying details. The wise teacher of algebra will keep for every pupil a card showing his mistakes or weak- nesses, such as mistakes in sign, errors in copying, errors in factoring, etc. The standard tests will further aid in locating pupils’ weaknesses. Every teacher of mathematics in a high school and every superintendent should become familiar with at least one of the available tests in algebra. Geometry. — The Stockard and Bell1 test in geometry consists of 70 questions arranged in 20 groups. In devising the test “ the attempt was made to call for information that is to be found in all standard textbooks; to test for important and fundamental principles of geometry; to provide such a range of questions as to be representative of the whole field of elementary geometry, and to include memory facts, knowl- edge of content, organization of subject matter, and partic- ularly ability to do originals; and to confine the list to such dimensions that every question could be tried by the average high school pupil in 40 minutes.” The 20. groups “involve drawing figures, naming figures, indicating order of development in demonstrations, complet- ing statements, stating the converse, definitions, regular polygons, parts of a demonstration, angular relations, area of a trapezoid, angles in polygons, angles in circles, con- gruency, similarity of triangles, loci, auxiliary lines, simple constructions, ratio and proportion, algebraic expression of geometric relations, and equivalent construction.” It is evident that the authors have attempted to measure quite fully the student’s mastery of the subject matter of 1 See Bibliography at close of chapter. The Measurement of High School Subjects 221 elementary geometry. The test was given to 372 school students who had completed a year’s work in geometry. About one third of the pupils tested were able to attempt all of the questions. On the basis of the tests given, the different questions are rated. The authors think the test not very practical for general high school use. It is too lengthy and, on the whole, a little too difficult. Teachers may use it, however, for diagnostic purposes or purposes of research. Diagnostic Tests in Mathematics. — Six tests have been selected by Anna L. Rogers.1 The tests, together with the time for explaining and giving the same, are as follows: (i) Algebraic Computation .... . . . . 12 minutes (2) Interpolation (3) Geometry . . . . 40 minutes (4) Superposition . . . . 7 minutes (5) Mixed Relations . . . . 8 minutes (6) Trabue Scales, L and J . . . . Total . . . . 97 minutes The total time required, 97 minutes, is just a little more than 2 regular high school periods. Yet in that brief time these tests enable a competent teacher to diagnose the mathe- matical ability of ninth grade pupils with a view to improve- ment in the classification of students in the high school by eliminating from the mathematics classes those unfit for further mathematical training, and selecting those capable of progressing at a more rapid rate than the majority. The tests also serve to discover particular lines of mathematical weakness. They may be given in the seventh and eighth grades, but, when so given,the time limits must be considerably extended. These tests are “ designed to measure the more important phases of mathematical capacity demanded by high school mathematics, and, in particular, the ability to manipulate numerical and algebraic symbols, the ability to 1 See Bibliography at close of chapter. How to Measure 222 grasp and handle spacial relations, and the ability to deal effectively with words. They are of such a nature as to enable an intelligent teacher to form an independent estimate of the pupil’s mathematical capacity and likelihood of success in further lines of mathematical work. They measure original ability rather than effect of training.” Miss Rogers has given directions for giving the tests and evaluating the scores, and has fixed tentative standards. She says, “ As tentative standards, we suggest: (i) Where a pupil’s score is greater than 150, he has capacity to progress at a more rapid rate than the ordinary high school student. (2) Where a pupil’s score is less than —150, he shows inca- pacity to progress in mathematics at the rate of the ordinary high school student and, other things being equal, should be released from further training in the subject.” The group coming between —150 and +150 is considered the normal group in high school mathematics. Henmon’s Latin Tests. — Prof. V. A. C. Henmon of the University of Wisconsin has developed a series of vocabulary tests, A, B, C, and D, which are of equal difficulty and in which the words are arranged in order of difficulty. All of the words in these tests have been carefully evaluated. A series of sentence tests is also available, consisting of tests 1 and 2, of equal difficulty, and test 3, in which the sentences are all of approximately the same difficulty. Standard scores are given for both the vocabulary and the sentence tests. One test is required for each pupil. The Henmon tests in Latin have several advantages over ordinary tests given by teachers. In the first place, they are scientifically constructed, and they are based upon vocabulary which is common to Caesar, Cicero, Virgil, and 13 of the most frequently used first year Latin texts. In the second place, the tests are thoroughly standardized, making possible accurate grading and comparison with other schools, with other classes, or among pupils of the same class. In the third The Measurement of High School Subjects 223 place, such tests are helpful in the study and analysis of class work. This is well illustrated in the overlapping of abilities in successive years as shown by these tests. In test A the medians for the first year range, for different classes, from i to ii, the median for all first year classes being 4. In the second year the class medians range from 4 to 19, the median for the year being 7. In the third year the class medians range from 3 to 25, the median for the year being 20. If corresponding medians for various schools show such wide variation, the individual scores must evidently show very much greater range. The overlapping of abilities in Latin is thus seen to be comparable to the overlapping of abilities in other subjects. The administering of standard tests of this kind should discover the pupils of exceptional interest and ability on the one hand, and pupils on the other hand who, through lack of ability or interests, do so poorly that it is useless to have them continue the study. In so far as Latin is a tool subject, standard tests are applicable. In so far as the subject is of interest chiefly be- cause of other values, to that extent the teacher should be cautious in using standard tests or should use them only for her own enlightenment, being careful that they do not formalize her work. This will mean that the results of the tests are in general not brought to the attention of the pupils. Physics. — A physics test has been devised by Professor Daniel Starch, University of Wisconsin. It consists of 75 mutilated sentences. They cover the 102 facts, principles, and laws of physics which the author has determined upon as the most essential. The basis for the determination was an examination of 5 widely used textbooks. The tests are easily administered. The value of the tests, however, has not been demonstrated. It is doubtful if physics can be reduced to facts in such formal fashion as this test would suggest. How to Measure 224 Commercial Tests. — In the volume, “ Commercial Tests and How to Use Them,” Sherwin Cody has brought together a summary of the use of tests in determining the relative standing of students graduating from the commercial departments of high schools. Tests are available, covering the following subjects: (1) Tabulating, mental alertness. (2) Reproducing instructions — designed to test memory and natural industry. (3) Invoicing. (4) Fundamentals of arithmetic, an adaptation of the Courtis tests. (5) Business arithmetic, including fractions, trade extensions, and percentage. (6) English, including spelling, elementary language, ad- vanced language, elementary punctuation, and advanced punctua- tion. (7) Letter writing. (8) Answering letters. (9) Stenographic tests covering transcribing and typewriter copying. (10) Copying for the mimeograph. (n) Addressing envelopes with the pen and filing. For each of the above lines of testing there are duplicate tests, full directions for administering the tests, keys for grad- ing, and tentative standards. The need of such commercial tests is evident to those who have attempted to select steno- graphic help or to evaluate the products of various commercial schools. The test in transcribing, for instance, is quite defi- nite. The student is expected to transcribe a standard business letter of 300 words in 5 minutes. This means 60 words per minute. In many high schools, where there is no particular standard, students are permitted to graduate with a speed in transcribing of only 30 words per minute or even less. In like manner, the student can be accurately The Measurement of High School Subjects 225 checked on ability to file, answer a letter, spell, or use the fundamental processes in arithmetic. One of the main values of Mr. Cody’s work is to suggest standards for commercial work, and all admit that this is a type of work which can be standardized. Sackett’s Ancient History Scale. — Some will doubt the value of this test because it attempts to reduce a thought subject to a mechanical basis. Professor Sackett has at least shown the difficulty of formulating an ancient history scale. His work is handled on an approved scientific basis, but if the premises are faulty the conclusion, of course, can- not be other than erroneous. What we want our students to get from ancient history is not a memory mastery of the facts, but, instead, an appreciation of the problems and the development of a method for the solution, not only of the problems of ancient history, but of present-day history. The scale (so called) consists of eight tests, each containing ten points. Test No. i, which is typical, is as follows: For what are the following men noted? (1) Hannibal (2) Khufu or Cheops (3) Demosthenes (4) Darius (5) Solon (6) Charlemagne (7) Attila (8) Constantine (9) Mithridates (10) Justinian It is evident that this is a fact testing scale. Its general acceptance would quite surely be detrimental to the proper teaching of history. A student might have all knowledge as tested by this scale, and still be entirely lacking in the spirit and method of history. The test should be used with 226 How to Measure extreme caution in order that it may not formalize the work. The caution should be even greater than in the case of the Latin tests referred to above, for history is in no sense a tool subject. The test should be used only for study and research purposes. The following illustration will make clear the meaning of research as here used: A teacher who had been teaching ancient history by out- line, endeavoring to master all of the facts in the book, was paired with another teacher who had the modern notion of motivation and problem organization of material, and the further notion that history was of value in so far as it had a bearing on the solution of particular problems at the present day. These teachers were encouraged to teach the same sub- ject in accordance with their respective ideas. At the end of the course comparison was made on the basis of the ques- tions used by the first teacher in former examinations. The results were entirely in favor of the second teacher. Possibly a fairer test and one even more favorable to the second teacher would have been a scientifically constructed stand- ardized test. On the other hand, if such standardized tests had been known to the second teacher in the beginning, it would doubtless have resulted in very much poorer work on her part because her work would have been formalized and a large part of the life taken out of it. At any rate, there is the possibility of real harm in using a formal test on a thought subject. BIBLIOGRAPHY i. “Standardized Tests in First Year Algebra,” devised by H. O. Rugg and J. R. Clark. Copies may be obtained from the School of Education, University of Chicago. References: Rugg, II. O., “The Experimental Determination of Standards in First Year Algebra,” School Review, 24: 37-66, January, 1916. Rugg, H. O., and Clark, J. R., “Standardized Tests and the Improvement of Teaching in First-Year Algebra,” School Review, 25: 113-132 and 196-213, February and March, 1917. The Measurement of High School Subjects 227 2. “First Year Algebra Scales,” by Henry G. Hotz. Copies may be obtained through the Bureau of Publications, Teachers College, Columbia University. Reference: Hotz, Henry G., “First-Year Algebra Scales,” Teachers College, Columbia University, Contributions to Education, No. 90. 3. “Standard Research Tests in Algebra,” devised by Walter S. Mon- roe. Copies may be obtained from the Bureau of Educational Measurements and Standards, Emporia, Kansas. Reference: Monroe, Walter S., “A Test of the Attainment of First- Year High-School Students in Algebra,” School Review, 23 : 159-171, March, 1915. 4. “Coleman’s Scale for Testing Ability in Algebra.” Address, W. H. Coleman, Crawford, Nebraska. Price $.50. per package of fifty. Package contains one copy of “Instructions to Teachers.” 5. Stockard, L. V., and Bell, J. C., “A Preliminary Study of the Measurement of Abilities in Geometry,” Journal of Educational Psychology, 7 : 567-580, December, 1916. 6. “ Geometry Tests,” a series of form tests, by J. H. Minnick. Avail- able at $.02 per copy. University of Pennsylvania, Philadelphia, Pennsylvania. References: Minnick, J. H., “A Scale for Measuring Pupils’ Ability to Demonstrate Geometrical Theorems,” School Review, 27: 101-109, February, 1919. Minnick, J. H., “Certain Abilities Fundamental to the Study of Geometry,” Journal of Educational Psychology, 9: 83 -90, February, 1918. 7. Rogers, Agnes L., “Experimental Tests of Mathematical Ability and their Prognostic Value,” Teachers College', Columbia Uni- versity, Contributions to Education, No. 89. 8. Henmon, V. A. C., “The Measurement of Ability in Latin,” Journal of Educational Psychology, 8: 515-538 and 589-599. Copies of the tests may be secured from V. A. C. Henmon, University of Wisconsin, Madison, Wisconsin, at $.01 per test. 9. “Brown’s Latin Tests.” For copies address President H. A. Brown, State Normal School, Oshkosh, Wisconsin. Six tests are avail- able, or promised, as follows: (1) Connected-Latin test, (2) Latin-Sentence test, (3) Formal-Latin-Vocabulary test, (4) Functional-Latin-Vocabulary test, (5) Formal-Latin-Grammar test, (6) Functional-Latin-Grammar test. 10. Starch, Daniel, “Educational Measurements,” The Macmillan Company. Tests in Latin, German, French, and Physics. For copies address Professor Daniel Starch, University of Wisconsin, Madison, Wisconsin. 228 How to Measure 11. Chapman, J. Crosby, “The Measurement of Physics Information,” School Review, 27 : 748-756. (This is a fact or information test.) 12. Uniform Science Tests in Physics,” by Franklin T. Jones, Uni- versity School, Cleveland, Ohio. Reference: School Review, 26: 341-348, May, 1918. The subjects covered are: thermometers, fusion, vaporization, specific heat, heat exchange. 13. Chemistry Scales. References: Bell, J. Carleton, “A Test in First-Year Chemistry,” Journal of Educational Psychology, 9: 199-209, April, 1918. Webb, Hanor A., “A Preliminary Test in Chemistry,” Journal of Educational Psychology, 10: 36-43, January, 1919. 14. Cody, Sherwin, “Commercial Tests and How to Use Them,” World Book Company, Yonkers-on-Hudson, New York. 15. Sackett, L. W., “A Scale in Ancient History,” Journal of Edu- cational Psychology, 8: 284-293, May, 1917. 16. Rugg, H. O., “A Scale for Measuring Freehand Lettering for Use in the Secondary Schools and Colleges.” Address, H. O. Rugg, School of Education, University of Chicago, Chicago, Illinois. Price, $.25 a copy. Reference: Rugg, H. O., “A Scale for Measuring Freehand Lettering,” Journal of Educational Psychology, 6: 25—42, January, 1915. 17. “Tests in Home Economics,” Supplementary Educational Mono- graphs, No. 6 of Vol. 2, University of Chicago Press. 18. Murdock, Katharine, “ The Measurement of Certain Elements of Hand Sewing.” (Sewing scale is included.) Teachers College, Columbia University, Contributions to Education, No. 103. 19. “A Brief Bibliography of Tests in High School Subjects,” School Review, 27: 799-809, December, 1919. Covers tests in Latin, Mathematics, Science, and Home Economics. 20. “Kansas Silent Reading Test No. 3.” Ability of high school stu- dents to read silently. Address, Bureau of Efficiency and Measure- ment, Emporia, Kansas. 21. “Standardized Tests in Silent Reading No. 3.” For high school students. Address, Bureau of Efficiency and Measurement. Emporia, Kansas. CHAPTER X THE MEASUREMENT OF GENERAL INTELLIGENCE “ This is an exceptionally slow class,” and “ I have such a large number of children this year who are very slow,” are expressions which the supervisor frequently hears as he passes from one classroom to another. These statements are based on actual facts in that they describe the condition of many children who are not making progress. The problem is a very real one to the teacher. It is also not infrequent to hear a teacher say, “ I have an exceptionally bright class this year,” or “ I haveqor 5 children who are far ahead of the others.” It is unusual, however, to hear such statements followed with the remark, “ I think certain children in this grade ought to be advanced to another grade.” The latter situation should become more common. In either case the problem is one of knowing the child’s mental age or general intelligence. He has been assigned to a certain grade chiefly on the basis of his chronological age or the number of years in school. He has been asked to do a certain type of work because children of his age who are normal are supposed to be able to do it. It may be that such children are being held back when they should be advanced, or the subject matter which is being presented in the regular class is not suited to them, and they are, therefore, in need of a different kind of subject matter presented in a special class. Sufficient information is available to show that general intelligence tests can be used to determine the mental ages of 229 How to Measure 230 children, and that this information can be used as a basis for reclassification of the pupils. In a city school system in which the Binet-Simon (old form) Test was being used, it was found that a great many children who were too old chrono- logically were also too old mentally for their grade, e.g. when children enter the first grade at the age of six, they are accord- ing to correct practice of normal age chronologically if they are 6 or 7 years old in the first grade, 7 and 8 years old in the second grade, etc. If a child in the first grade is 8 years old, he is 1 year too old chronologically. If he tests 8 years old mentally he is also 1 year too old mentally. He should, therefore, be in the second or third grade, provided his school life has been normal. If not, he should receive the special attention that will place him with his appropriate group as soon as possible. In the school system referred to above, out of less than 500 children tested in 4 schools, 60 such children were found. All of these children were advanced one grade and all were able to continue with a good class standing in the grade to which they were promoted. In a small Iowa city 39 children out of a total of 177 chil- dren in 4 schoolrooms were advanced in the fall of 1915, on a basis of a good class standing and being too old for the grade in which they were working. These children were selected from grades 5, 6, 7, and 8. Eleven of them were advanced one half of a grade, 28 were advanced a full grade. At the end of the first semester, February, 1916, not a single pupil failed in gaining promotion, and only 3 received a class stand- ing of less than 85%. The fact that such a large number of children in a school system can be advanced to advantage on a very general measurement, and that so little material is available showing the results of such practice, is indicative that a general intelligence test which will measure in very definite terms the general intelligence of large groups of children is greatly needed, and can be used to prevent waste and un- satisfactory results in classroom practice. The Measurement of General Intelligence 231 General intelligence tests have also been used for a number of years to select children who are subnormal to be placed in special classes in which they will receive a different type of instruction. Before such tests were available these children were com- pelled to try to do the same work as the normal child, which resulted in failure. After numerous repeti- tions had occurred it was realized that they could not ad- vance. They were, therefore, assigned to some other class in which they were permitted to do the thing they were able to do, but only after the loss of much time and energy to both pupil and teacher. With the aid of a general intelligence test these children can be located early in their school classes, thus reducing greatly the amount of waste. This tendency is clearly illustrated from Figures 13 and 14, which show the grades from which the children were selected to supply the special classes for subnormal children in a city of over 100,000 after these classes had been in operation two years. No. Cases Fig. 13. — Grades from Which 170 Children Were Taken for Mental Tests 1917-1918 Grade How to Measure 232 Fig. 14. — Grades from Which 333 Children Were Taken for Mental Tests 1918-1919 No. Cases Grades The error which has brought about the necessity for re- classification in the regular classes or in the special classes has resulted from an apparent assumption that children of a certain age group are of the same mental age. This principle The Measurement of General Intelligence 233 is seen in the practice of entering 6-year-old children into the first grade. At the end of one school year or at the age of 7 years they are supposed to be ready for the second grade, and so forth. That all children of the same age are not of the same mental age is apparent from the following graph showing the mental ages of 159 unselected children, all of chronological age 9. Fig. 15. —Distribution According to Their Mental Ages of 159 Unselected Children (9 Years of Age), Binet-Simon Scales. Number of Pupils • Mental Ages These 159 children make up the entire group of children 9 years of age in a group of 743 children in 4 elementary schools who were tested with the Binet-Simon Test. Instead of all having a mental age of 9 years, only 48 showed exactly this age, 21 children had a mental age of a normal child at 10 years of age, and 8 children a mental age of a normal child at 11 years of age, while 64 children showed a mental How to Measure 234 age of a normal child 8 years, 17 children a mental age of 7 years, and 1 child a mental age of 6 years of age. These figures show the same differences in abilities of children as are found in all unselected groups whenever a test that will distribute mental age is used. Although much has been written about the wide differences in the abilities of children and sufficient evidence given to support such conclusions, yet classroom practice too often lags behind because of the slowness with which usable tests have been developed and placed in the hands of the teachers. The teacher finds it exceedingly difficult to see and to plan for the needs of the class as a whole. Likewise, the principal in his organization of classes is too much given to an organiza- tion on a basis of the needs of the group instead of the needs of the individuals in the group. Splendid attempts and good progress have been made in many places to recognize in a very definite way the individual difference among children, but in general, educational practice continues to handle them en masse on account of the lack of knowledge of group in- telligence tests. A test in the hands of the teacher that will enable her to know the mental ages of the children in her class as she knows their chronological ages, will be a long stride toward classroom practice which handles children on a basis of their individual needs instead of group needs. Such tests are now available. The teacher can determine the mental ages of all the children in her class in the same time that was formerly required by an expert to determine the mental age of a single child. The following group intelligence tests are now available: Trabue Language Scales, not devised originally to measure general intelligence, but found to do so with an accuracy that makes them very valuable; the Otis Group Intelligence Scale; Haggerty’s Intelligence Examinations, Delta 1 and Delta 2; and Whipple’s Group Tests for the Grammar Grades. The Measurement of- General Intelligence 235 Trabue Language Scales Aim. — It has been found that the Trabue Language Scales will measure the general ability of pupils. Description of Tests. — The Trabue Language Scales consist of scales B, C, D, E, J, K, L, and M. Scales B, C, D, and E are practically equal to one another. They are intended to be used in pairs — B with C, and D with E — by teachers in determining the abilities of children between the ages of 7 and 20. Each scale is made up of 8 to 10 sentences with words omitted which are to be supplied. “ The first sentence in each of these 4 scales is about 1 unit above an arbitrary zero point, the second sentence is approximately 1 unit more difficult than the first, and so on until the last sentence in each scale is about n units above zero.” Scales J and K are intended to measure the ability of adults, and are, therefore, of practically no value for public school purposes. Scales L and M, which are not equivalent to scales J and K, or to any other pair, are intended to measure the abilities of high school students. , “ They have no very easy sentences, and the differences between the sentences are relatively small.” The grouping into pairs of these scales of equal value provides a duplicate test for checking. Below is given a copy of Language Scale B to show the nature of these scales : Name Write only one word on each blank. Grade Time limit, seven minutes. Age (on last birthday) TRABUE Language Scale B 1. We like good boys girls. 2. The is barking at the cat. 3. The stars and the will shine tonight. 4. Time often more valuable money. 236 How to Measure 5. The poor baby as if it were sick. 6. She if she will. 7. Brothers and sisters always to help other and should quarrel. 8. weather usually a good effect one’s spirits. 9. It is very annoying to tooth-ache, often comes at the most time imaginable. 10. To friends is always the it takes. Giving the Test. — The process of giving the Trabue Lan- guage Scales is very simple. A preliminary test is provided with simple sentences in order to make clear to the children exactly what they are expected to do. After this preliminary test has been given and explained fully, the children are ready for the regular test. Each child is provided with a scale on which he writes his name, grade, and age. On each sheet is the instruction, “ Write only one word on each blank.” Attention should be called to the fact that he will be given a time limit (7 minutes for scales B, C, D, and E; 5 minutes for L and M) in which to do as much as he can. He will possibly not be able to fill all the blanks. As soon as the time limit has expired see that every child stops work and all papers are collected. Scoring Results. — For the convenience of the teacher and the accuracy of the results, the author has provided a detailed scheme for scoring the answers to the different sentences, which is as follows: General Scheme Score 2 “ A score of 2 points is to be given each sentence completed perfectly. Errors in spelling, capitalization, and punctuation should not be allowed to affect the score. Score 1 “ A score of 1 is to be given each sentence completed with only a slight imperfection. A poorly chosen word or a common gram- The Measurement of General Intelligence 237 matical error, which makes the sentence less than perfect and yet leaves it with reasonably good sense, should serve to reduce the score from 2 to 1. Score o “ A score of o is to be given if the sentence as completed has its sense or construction badly distorted. A sentence must have reasonably good meaning and express a sentiment which might honestly be held by an intelligent person in order to receive a higher credit than zero.” The following is a sample of the answers provided in this scheme, for the first three sentences of Scale B: Language Scale B 1. We like good boys girls. Score 2 # and, an Score 1 or, not, and good, also. Score o for, with, said the, and the. 2. The is barking at the cat. Score 2 dog, hound, pup. Score 1 dogs, boy. Score o man, cat, god. 3. The stars and the will shine to-night. Score 2 moon. Score 1 light, planets, lights. Score o dipper, stripes, clouds, city, sky, sun. etc. The score for each sentence should be recorded on the margin of the test. After all the test papers have been scored, the 238 How to Measure scores are transcribed to a Class Record Sheet. Below is given a copy of the Class Record Sheet provided by the author with the scores from a 4-A class in a city school system. Table 49 Date Jan.1* 14 ’20. Completion Test-Language Scales. Language Scale B. City D. State M. School F. Room No. 9. Grade 4-A. Teacher H. S. Test Given by H. S. Number of Pupils taking test 35. Number regularly enrolled pupils not taking this test o. Test began at n : 30, closed at 11:37. Time allowed 7 min. Unusual conditions which might influence results of this test, none. Scores Assigned by H. S. recorded by H. S. Class Record r :n Score on Each of Sentences Names of Boys Age Score on of Senti 1a. CE CES Total Names of Girls Age Total Score Yr. Mo. Score Yr. Mo. S. L. . . IO 2 O 2 O O O O 0 O O O 2 L. G. . 9 9 2 2 2 2 O O O O O O 8 W. B. . II 3 O 2 2 O O 2 0 O O O 6 G. M. . IO 6 O 2 2 2 2 O O O O O 8 A. A. . . II O 2 2 2 O O O 0 O O O 6 M. D. . II 5 2 2 2 O O O 2 O O O 8 M. L. . IO 8 2 2 2 O O O 0 O O O 6 E. B. . 12 O 2 2 2 2 O O O O O 0 8 W. K. . 9 9 2 2 2 2 O O 0 O O O 8 S. H. . IO 4 2 2 2 2 O O O O O O 8 H. S. . . II O 2 2 2 O 2 O 0 O O O 8 C. M. . 9 8 2 2 2 2 O I O O O O 9 R. D. C. 13 O 2 2 2 2 2 O 0 O O O IO L. B. . II IO 2 2 2 2 2 O O O O O IO H. S. . . 12 8 2 2 2 2 2 O 0 O O O IO E. H. . 9 3 2 2 2 2 2 O O O O O IO A. B. . . 12 6 2 2 2 2 2 O 0 O O O IO E. Ha. . IO 9 2 2 2 2 2 O O O O O IO W. Q. . IO 8 2 2 2 2 2 O 0 O O O IO L. J. . 9 8 2 2 2 2 2 O O O O O IO F. R. . . IO O 2 2 2 2 2 I 0 O O O II P. D. . II 3 2 2 2 2 O 2 O O O O IO R. S. . . IO 3 2 2 2 2 2 2 0 O O O 12 C. R. . IO O 2 2 2 2 2 O I O O O II S. L. . . IO 4 2 2 2 2 O I 1 2 O O 12 R. A. . IO S 2 2 2 2 I 2 O O O O II J. F. . . 12 9 2 2 2 2 2 2 0 O O O 12 R. T. . IO I 2 2 2 O 2 I 2 O O O II M. B. . 9 7 2 2 2 2 2 2 0 O O O 12 S. C. . 9 I 2 2 2 2 2 O 2 O O O 12 W. A. . 14 4 2 2 2 2 2 2 1 O O O 13 E. L. . IO 3 2 2 2 2 I 2 I O O O 12 A. G.. . IO O 2 2 2 2 2 2 2 I O O IS G. P. . IO 4 2 2 2 2 I 2 2 I I O IS L. J. . IO 8 2 2 2 2 2 2 2 2 O O l6 This record is read as follows: S. L., age io years, 2 months, made a score of 2 on the second sentence and a score of zero on all the others which gives him a score of 2 ; W. B., age 11 years 3 months, made a score of 2 on sentences 2, 3, and 6 and a score of zero on all the others making a total score of 6; etc. These scores are then distributed and the class median is determined, which for this class is 10.6. The Measurement of General Intelligence 239 Interpreting and Using Results. — After the class records have been made and the class scores determined, the next problem for the teacher is to interpret her results and apply them to her classroom practice. This can be explained best by reference to concrete situations. The first point to be determined is the relation of the class score to any class standards. The class standards for the Tra- bue Language Scale as reported by the author are as follows: Table 50 Scales B, C, D, E, F. Grade or Class Score (Median) Half Score II 4.8 IIB 3-8 IIA 5-8 III 8.0 IIIB 74 IIIA 8.6 IV 10.0 IVB 9.6 IVA 10.4 V 11.4 VB II.I VA 11.6 VI 12.4 VIB 12.1 VIA 12.6 VII 134 VIIB i3-i VIIA 13.6 VIII 14.4 VIIIB 14.1 VIIIA 14.6 Tentative Standards in Scales J& K L & M H. S. I 15.2 7-5 7-5 H. S. II 16.0 8.6 9.2 H. S. Ill 16.7 94 io-S H. S. IV 17.4 10.0 “•5 Standard Language Scale Scores 240 How to Measure The class score for the 4-A grade reported on the above class record sheet, Table 49, is 10.6. The standard score for this grade is 10.4. This class is, therefore, slightly above the standard. When the test has been given in other classes of the same city further comparisons can be made. Below are given the class scores from the 4-B grade in the same school and from the 4-A and 4-B grades in another school: Table 51 School 4-B 4-A Grade I 9.6 10.6 2 10.2 II.8 Standards 9.6 10.4 It is seen, therefore, that the class reported in Table 49, although above the standard, is below the 4-A grade of the other school, which attained a score of 11.8. A fur- ther analysis shows that these four classes have made scores equal to or above the standards, and also that school number 2 scored considerably ahead of school number 1. After the score for the entire class has been secured and interpreted the scores of individual pupils should be analyzed to ascertain whether or not all pupils are properly placed and are receiving proper instruction. For this purpose the class record sheet showing the score of each pupil should always be kept available for frequent reference by the teacher. By referring to the record of the class in Table 49, it will be seen that the lowest scores made by any boy or girl are 2 and 8 respectively, and the highest 15 and 16 respectively. The lowest score made by any pupil is a score of 2 made by a boy, S. L., whose chronological age is 10 years, 2 months, and whose mental age as later determined by the Stanford Revi- sion of the Binet-Simon Test is 8 years, 2 months, or an Intelli- The Measurement of General Intelligence 241 gence Quotient of 80.3. The next lowest score is 6, which is also by a boy, W. B., whose chronological age is n years, 3 months, and whose mental age on the Binet-Simon Test is 9 years, 3 months, or an Intelligence Quotient of 82.2. The highest score made by any boy is 15, which was made by A. G., whose chronological age is 10 years, o months, and whose mental age is 10 years, o months, or an Intelligent Quotient of 100.0; and the highest score of any pupil is 16, which was made by a girl, L. J., whose chronological age is 10 years, 8 months, and whose mental age is 12 years, 1 month, or an Intelligence Quotient of 113.2. It is evident, therefore, that some of these children should receive special attention. The pupils who made the low scores are not benefiting from the class instruction to the extent they should. They should be placed in a special class where more individual or perhaps a different kind of instruction can be given. The question should likewise be raised in connection with the examination of any class, as to whether the pupils who make the highest scores should be advanced to another grade or be assigned to a faster group. In the group above it would seem that the girl, L. J., at least should be given such con- sideration. With the aid of the Trabue Language Scales the teacher can quickly determine the general intelligence of the class as a whole and also the general intelligence of each pupil in her class. If there is a question of doubt about certain pupils, the results from the Trabue Language test can be checked by a more refined measurement, such as the Stanford Revision of the Binet-Simon Test. With such knowledge about the mental ability of her pupils the teacher can classify her pupils so that her instruction can be more effective. Otis Group Intelligence Scale This scale has been devised in response to a wide demand for a test which will determine the general mental ability of 242 How to Measure children in large groups. Since the ability to read is required to take this test, it is not applicable to persons with less than 3 or 4 years of schooling. Of this test Dr. Lewis M. Terman says: “ With subjects of this much schooling, the Otis Scale probably comes as near testing raw ‘ brain ’ power as any system of tests yet devised.” The Aim. — The aim of this scale is to determine a pupil’s general mental ability. It is expected that the Otis Scale will be used for school purposes, to classify, quickly and effi- ciently, large groups of children on a basis of their mental ages, in order to meet more adequately their individual needs, and that the Binet-Simon Test and others will be used to supplement this scale in cases which are in doubt, or which call for more refined measurements. Description of the Test. — The Otis Group Intelligence Scale is divided into two forms, A and B, which are different in substance but similar in structure. Each form is in a separate booklet. By this means the same group of children can be examined at different times without a knowledge of the tests affecting the results. The total point score for each is the same. Each form has ten tests, as follows: Number Time Limit Test i 5 minutes Test 2 minutes Test 3 if minutes Test 4 . . Proverbs 6 minutes Test 5 . . Arithmetic 6 minutes Test 6 , . Geometric figures 6 minutes Test 7 Analogies 3 minutes Test 8 Similarities test 4 minutes Test 9 Narrative completion 6 minutes Test io Memory 3 minutes The scale can be used with children in grades 4 through the high school, and even with university students if desired. Giving the Test. — Any person who is able to teach can, The Measurement of General Intelligence 243 after a little study, apply these tests with a sufficient degree of accuracy to insure satisfactory results. Before attempting to give the tests, however, the teacher should practice on the instructions given in the Manual of Directions, which should always be available. Each child must have a copy of the scale in booklet form. The instructions for each test are written at the top of the test, but divided from the test by a heavy black line. Too much care cannot be exercised in seeing that the children follow specifically the instructions as outlined. Scoring Results. — An examiner’s key on transparent paper is provided, which makes the scoring of the papers a very simple matter. For the scoring of all tests except test 3, the check mark (vO opposite each correct answer can be used. The sum of the number of checks will be the score of the individual on the test. In scoring test 3, a check should be placed after each correct answer and a cross after each incorrect answer only. No attention need be paid to omitted answers. The score will be the number of correct answers minus the number of incorrect answers; that is, “ the number of checks minus the number of crosses.” (For more detailed instructions for scoring, see pages 29 and 30 of the Manual of Directions, 1919 edition.) The sum of the scores on each individual test will give the individual’s score. This score can be placed on the front page of each individual’s test sheet, or it can be transferred to a record sheet on which the name of each child can be written and the score on each test, together with his total score, placed opposite his name. The score of each pupil can be expressed in terms of first, mental age; second, intelligence quotient; third, percentile rank; fourth, coefficient of brightness. To date no age norms are available from which a child’s mental age can be determined. The author, however, is collecting results from these tests wherever they are given, How to Measure 244 and undoubtedly will have such age norms in publication in the very near future. To secure an age norm for the group tested the exami- nation booklets are arranged according to the exact ages of the children. “ To do this it will be necessary to take account of the date of the birthday. The Total Score Norm for the age of 12 years may then be taken to be the average of the Total Scores of all pupils whose ages were between n years, no months, and 13 years, no months. The Norm for the age of 12 years, 1 month, may be taken as the average of the Total Scores of all pupils whose ages were between n years, I month, and 13 years, 1 month, etc. The Mental Age of a pupil may then be seen at a glance by noting the age for which his Total Score is the Norm.” The intelligence quotient of a pupil up to 16 years of age can be secured by dividing his mental age by his chronological age. Beyond 16 years of age, the mental age is divided by 16, for the reason that an individual is practically mature at 16 years of age. For a further discussion on determining the scores, especially the percentile rank, and the coefficient of brightness, reference should be made to pages 32 to 36 of the Manual of Directions. Interpreting and Using Results. — In order to indicate to the teacher how she can determine the general intelligence of her pupils with the use of this test, the following results are given which were secured from two 4-B teachers and two 4-A teachers in a city school system. In all 104 children were tested. After the scores were obtained, the papers were classified according to the age groups, i.e. all the papers of children 8 years, o months, to 10 years, o months, were placed in one group; all the papers of children 9 years, o months to II years, o months, were placed in another group, etc. The average of the scores on the papers in the first group gave the age norm of the 9-year-old children; the average of the scores on the papers in the second group gave the age norm The Measurement of General Intelligence 245 of the io year old children, etc. The normal chronological ages for the fourth grade are 9 and 10 years. The age norms for these two groups in these four classes are 54.4 and 54.6 respectively. The small number of papers makes these figures only tentative norms. These norms would undoubtedly be changed by a larger number of papers, which are necessary to establish reliable standards. After the age norms for the different ages are secured the mental ages of the different individual pupils can be obtained by noting the age for which the pupil’s total score is the norm. For example, R. T., a 4-A pupil, 10 years, 1 month, made a score of 54. His mental age is, therefore, almost 10 years. Another 4—A pupil, S. H., 10 years, 4 months, made a score of 51. He is, therefore, less than 10 years old mentally. Until there is an age norm for children at every age including years and months, the exact mental ages cannot be determined. When that information is available to the teacher, she can for all practical purposes determine the mental ages of her children whereby a far better grouping or classification can be secured than on the basis of the chronological ages. Haggerty’s Intelligence Examinations : Delta 1 and Delta 2 Aim. — The purpose of this test is to measure the native ability of groups of pupils in the elementary school in order to group them properly or in a limited way to measure their progress. Description of Tests. — The tests appear in two pamphlets, the one, Intelligence Examination: Delta 1, for grades one to three inclusive; and the other, Intelligence Examination : Delta 2, for grades three to nine inclusive. Delta 1 contains the following exercises: Exercise 1. Oral Directions Exercise 2. Oral Directions 246 How to Measure Exercise 3. Copying Designs Exercise 4. Copying Designs Exercise 5. Picture Completion Exercise 6. Picture Completion Exercise 7. Picture Comparison Exercise 8. Picture Comparison Exercise 9. Symbol Digit Exercise 10. Symbol Digit Exercise n. Word Comparison Exercise 12. Word Comparison Exercises 2, 4, 6, 8, 10, and 12 determine the pupil’s score; the others are preliminary exercises and are not counted in scoring. Simple instructions are given to the teachers for the different performances under each exercise. The diffi- culty which small children would encounter in reading or in following complicated instructions is avoided. Delta 2 is an adaptation of the army intelligence tests. It has been used more widely than Delta 1. In addition to the examination of 15,000 school children in the state of Virginia, it has been used extensively in many of the larger city school systems throughout the country. It consists of the following exercises: Exercise 1. Sentence Reading Exercise 2. Arithmetical Problems Exercise 3. Picture Completion Exercise 4. Synonym-Antonym Exercise 5. Practical Judgment Exercise 6. Information The first 5 performances of Exercises 1 and 2, Delta 2, are given to show the nature of the tests: Exercise i Directions: 1. Read this question: Do cats see? No Yes The right answer is Yes; so a line is drawn under Yes. 2. Read the next question : Is coal white? No Yes The right answer is No; so a line is drawn under No. The Measurement of General Intelligence 247 Below are a great many more questions. Read them carefully, one at a time, and draw a line under the right answer. When you are not sure, guess. 1. Do dogs run? Yes No 2. Can a doll sing? Yes No 3. Does the sun shine? Yes No 4. Do men drink water ? Yes No 5. Are all apples red ? Yes No Exercise 2 Get the answers to these problems as quickly as you can. Use the side of this page to figure on if you need to. Samples. — i. How many are 5 men and 10 men? Answer (15) 2. If one pencil costs 5 cents, what will 4 pencils cost? Answer (20) 1. How many are 30 men and 7 men? Answer ( ) 2. A boy had 10 cents and spent 4 cents. How many cents had he left? Answer ( ) 3. If you save $7 a month for 4 months, how much will you save ? Answer ( ) 4. If 24 men are divided into groups of 8, how many groups will there be ? Answer ( ) 5. A boy had 12 marbles. He bought 3 more, and then lost 6. How many marbles did he have left? Answer ( ) Giving the Tests.— A carefully devised manual of directions is provided by the author which must be in the hands of the teacher and thoroughly understood by her before any attempt is made to give the tests. The instructions in the manual are simple so that no teacher should have any difficulty in applying the tests to her class. Each child in the class must be provided with a test. The entire class can be examined at once. Scoring the Results. — The Manual of Directions also provides explicit instructions for scoring the tests. A Scor- 248 How to Measure ing Key is provided for both tests. The score of each pupil is the sum of the scores made on the several items of the test. The maximum score for each test is as follows: Delta i Delta 2 Exercise Maximum Score Exercise Maximum Score 2 IO I 40 4 IO 2 20 6 16 3 20 8 20 4 40 IO 48 5 l6 12 25 6 40 Total . . . 129 Total . . 176 After the score for each pupil is determined these scores are transferred to a class record sheet and the median class score is determined. The results are given in terms of median scores and age norms for each grade. Using the Results. — These Intelligence Examinations have been used with a sufficiently large group of pupils to insure standards that are exceedingly valuable for comparative purposes. Test Delta 2 has been used with about 20,000 children and Test Delta 1 with 4000 children. The following tentative standards are available: Table 52. — Grade Standards for General Intelligence Test: Delta x Grade at end of year i 2 3 Score 35 55 70 Table 53. — Age Norms for General Intelligence Test : Delta i. Age 7 8 9 IO Score 35 5° 65 75 The Measurement of General Intelligence 249 Table 54. — Standard Scores in General Intelligence Examina- tion: Delta 2 for Each of Grades 3 to 9 Inclusive Grade . . . • • • 1 3 4 5 6 7 8 9 Score . . . . . . 1 40 60 78 96 no 120 130 Table 55. — Age Norms for General Intelligence Test : Delta 2 Age 8 9 IO ii 12 13 14 i5 Score .... 25 43 55 66 77 87 IOO ri5 Table 56. — Standard Scores in Exercises i and 2: Delta 2, for Grades 3 to 9 Inclusive Grade 3 4 5 6 7 8 9 Exercise i 14 20 23 27 30 32 35 Exercise 2 S-o 7.0 9.0 10.5 ii-S 13.0 15.0 From the above standards the teacher can tell whether or not her class as a whole measures up to the standard for the grade. She can also determine the mental age of each in- dividual pupil in her class, whether or not they are above or below the standard for the grade. With the aid of such a test the teacher should have no difficulty in determining the native ability of each individual pupil in her class. The Group Tests for Grammar Grades, by Dr. G. M. Whipple,1 are group intelligence tests similar in many respects to the Otis and Haggerty tests. The purpose of these tests is to select the brighter pupils from grades 4, 5, and 6, but the tests quite fully distribute pupils in these grades according to intelligence. It takes 92 minutes to give the tests to a sixth grade, the entire room being tested at once. But when the papers are scored, the teacher has before her a fair measure of the intelligence of each pupil in the group. 1 See Bibliography at close of chapter. 250 How to Measure The Standard Revision of the Binet-Simon Test This test is possibly the most accurate instrument so far available with which to determine the native ability of American children. On account of the fact that from 30 to 60 minutes are consumed in examining each pupil and that special training is required by the one applying the test, it cannot be used for the examination of large groups of children in a limited time. After the group tests suggested above have been given, there will always be questions which can be settled only by a more refined measurement such as the Stanford Revision of the Binet-Simon Test. It is not too much to expect that the group intelligence tests will open the way and extend the use of the latter test. An illustration of how this test will supplement the group tests is given in Table 57. The teachers in two 4—A classes in a city school system examined their children with the Trabue Language Scale. In all 44 children were tested. All of them were also examined by a psychologist with the Stanford Revision of the Binet-Simon Test. The standard for the 4—A grade on the Trabue Scale is 10.4. Nineteen of the 44 children made a score below this standard. On the Binet-Simon Test all but 10 of these 44 children made an Intelligence Quotient above 80. Table 57 gives the scores on the Trabue Language Scale and the Binet-Simon Test for those 18 children who scored below the standard for the Trabue Language Scale. From these figures it is seen that 9 of the 14 children scoring from 8.0 to 10.3 inclusive on the Trabue Scale tested between 80 to 90 Intelligence Quotient or below. In the group between 80 and 90 Intelligence Quotient are found, according to Terman, “ those children who would not accord- ing to any accepted social standards be considered feeble- minded, but who are nevertheless far enough below the actual average of intelligence among races of western The Measurement of General Intelligence 251 European descent that they cannot make ordinary school progress or master other intellectual difficulties to which average children are equal.” Of the children scoring below 8 on the Trabue Language Scale all except one (Intelligence Quotient 81.8) tested below 80 Intelligence Quotient. Table 57 Trabue Language Scale Intelligence Quotient on Binet-Simon Test 6-3 75-8 6.6 81.8 7.6 75-5 7.6 75- 8. 89.9 8-3 89.9 8-3 81.7 8.6 IOI. 8.6 66.1 9- 93-i 9-3 77-3 9-5 7i-5 9.6 73-3 9.6 90.9 10.3 84.8 10.3 96.9 10.3 98.3 10.3 80.9 In this particular city it is the practice to consider pupils eligible to a special class when they test below 80 Intelligence Quotient. It would seem, therefore, that by having the psychologist test with the Binet-Simon Test the children who scored below 10.3 on the Trabue Language Scale, those children could easily be detected who should receive special instruction. In the same way the children who test very high on the 252 How to Measure Trabue Language Scale should be given an individual examina- tion for further classification. By combining the use of the Binet-Simon test with such group intelligence tests as the Trabue Language Scale, the Otis Intelligence Scale, the Haggerty Intelligence Examinations: Delta 1 and Delta 2, and the Whipple Group Tests for the Grammar Grades, classroom practice can readily be placed on a more scientific basis. The availability of such tests makes it possible for every teacher to know the mental age of every child. The training of a teacher in a normal school should include a course in the testing of general intelligence so that every teacher may apply any general intelligence test with a reason- able degree of accuracy. Moreover, every teacher could well afford to spend six weeks in the summer in an institu- tion where such training could be secured. On account of the detailed instructions necessary for the application of the Stanford Revision test, no person should attempt to give it without having access to the “ Measure- ment of Intelligence ” by Terman, or some book like it in which such instructions are given. BIBLIOGRAPHY References Terman, Louis M., “The Measurement of Intelligence,” Houghton Mifflin Company. Trabue, M. R., “Composition-Test Language Scales,” Contribu- tions to Education, No. 77, Bureau of Publications, Teachers Col- lege, Columbia University, New York City. Whipple, G. M., “Group Intelligence Tests,” Public School Publishing Co., Bloomington, Illinois. Tests Haggerty, M. E., “Standard Educational Tests.” Intelligence Exam- ination: Delta 1 and 2, World Book Co., Yonkers-on-Hudson, New York, and Chicago. Prices: Delta 1, $1.50 per package of The Measurement of General Intelligence 253 25, scoring key, $.15; Delta 2, $1.50 per package of 25, scoring key, $.10. Manual of Directions, $.35. Otis, A. S., “Group Intelligence Scale.” World Book Company, Yonk- ers-on-Hudson, New York, and Chicago. Prices: Form A or B including one record sheet, in packages of 25, $1.50; examiner’s key, $.25; Manual of Directions, $.25. Trabue, M. R., “Trabue Language Scales; Scales B, C, D, E, J, K, L, and M,” Bureau of Publications, Teachers College, Columbia University, New York City. Price, each scale $3 per thousand; $.40 per hundred; carriage extra. Whipple, G. M., “Group Tests for Grammar Grades.” Public School Publishing Company, Bloomington, Illinois. Price, $.15 each. Stanford Revision of the Binet-Simon Test, C. H. Stoelting Company, 3047 Carroll Avenue, Chicago, Illinois. Price $.08 per copy. CHAPTER XI STATISTICAL TERMS AND METHODS The purpose of this chapter is to give only so much informa- tion from the science of statistics as the teacher needs to know in order to administer a test, tabulate the scores, and interpret the results. This will necessitate, also, the explanation of statistical terms sufficiently to enable one to understand such terms when used in the discussion of the measurement of any school subject. Securing Comparable Results. — One decided advantage of a standard test is the possibility of comparing the results with similar results in other rooms, other school systems, or with tentative or fixed standards. Manifestly, such comparisons can be made to advantage only when the tests have been given under similar conditions. The following suggestions may be considered as rules of the game for securing com- parable results: 1. In giving a test it is essential that the conditions of the test be kept constant. 2. The directions which accompany a test should be followed in every detail. If possible, use a stop watch to secure exact time, when there is a time limit. 3. It is an advantage if the examiner has a clear conception of the nature of the test, its purpose, and the use to be made of it. 4. At the time of giving the test all needed secondary data should be secured, such as name, age, grade, school, date, etc. 5. Most tests as a part of their instructions provide for a preliminary trial in order to make pupils familiar with the 254 Statistical Terms and Methods 255 test. In case such provision is not made in the instructions, the teacher should devise a preliminary test which should be similar but somewhat easier than the one to be given, in order that the pupils may thoroughly understand what is to be done and how to do it. 6. The test should be handled as nearly as possible just as any other regular lesson. An appeal to extra effort is allowable, but other comments likely to secure results that are not normal should be avoided. Appeals which are made to the child’s desire to do well in the test should be included as a part of the regular instructions, in order that conditions of the test may be uniform for comparisons. 7. For many purposes a single test is sufficient. In case the decision to be based upon a test is of unusual importance, at least two specimens should be taken, or two tests given, or the judging be done by at least two competent judges. In case there is a decided discrepancy between the two results, the teacher will realize that further testing should be done. As will be apparent from further study of statistical methods, a score for a class is much more reliable than for an individual, and the score for an entire city more reliable than that for a single class. This is due to the fact that slight errors tend to balance each other in such a way as to give a more accurate judgment on a large group than on a small group or a single individual. 8. Care should be taken not to use the material of stand- ardized tests for practice purposes. 9. In case the test is given frequently the results will be much more representative if an alternative test of equal value has been provided by the person who devised the test. Using a Standardized Test. — Teachers to-day can scarcely attend an educational meeting or read an educational magazine without hearing about scales and standardized tests, and their advantages in measuring the work of the schools. For the teacher thus interested, but who has not had a normal school 256 How to Measure or college course in educational measurement, the following directions are given with the assurance that an intelligent teacher may go forward in such work even though she may not have the help and guidance of a trained supervisor. i. Selecting the Test. — In selecting the test to use, the teacher may well be guided by the particular purpose which she has in mind. The preceding chapters dealing with the available tests in different subjects will permit the teacher to make a choice on the basis of the best test for the particular purpose. In general, those tests should be chosen which have been most widely used, and which require the least time for giving and marking the papers. Yet this is not an infallible rule. The Woody tests are certainly much more valuable than the Courtis tests in arithmetic. Yet the Woody tests have been given very little in comparison with the wide use of the Courtis tests. They are a little more difficult to administer and to score. Yet, from the standpoint of value in diagnosing the pupils’ difficulties, they are far superior to the Courtis tests. The tests that are going to survive and show value in the next few years cannot be determined at this time. The final judgment upon the test must be passed by the teacher in the schoolroom on the basis of its value in helping her in her work of discovering the needs of the children and applying the appropriate remedies. It may be properly assumed that, although a test is more difficult and requires a longer time, if it is superior in every respect, the teacher will find the time for giving it. It requires considerable time to give Gray’s Oral Reading Test, yet the results of giving the test are so valuable that the teacher does not hesitate to take the necessary time for giving it. When a test may be chosen on the basis of difficulty, as in the use of the Ayres Spelling Scale, the teacher should keep in mind that a good test should be so difficult that no pupil will make a perfect score, and sufficiently easy that most pupils in the grade will secure a score which is reasonably satisfactory. Statistical Terms and Methods 257 2. Giving the Test.—In giving a test the teacher should follow carefully the printed directions which accompany the test. This is the chief rule to keep in mind. Other details are mentioned above under “ Securing Comparable Results.” The teacher who has time and is willing to experiment may easily demonstrate the possibility of changing a score by a slight change in directions or by a different attitude in present- ing the work to children. The chief consideration, if com- parisons are to be made, is that the attitude, detailed directions, and every element entering into the giving of the test shall be as indicated in the directions, so that pupils in one city, or even in one state, may be compared with those in another. In handwriting, for example, pupils should be so instructed and handled that they will write at their natural rate, thus securing results in the test that will represent the normal situation. 3. Scoring the Papers. — Every test provides printed directions for scoring the papers in order to aid teachers in securing uniformity of results. These printed directions should be followed implicitly. If the teacher has opinions as to what should be done, and these opinions are different from the directions, such opinions should be abandoned if the results of the test are to be used for comparative purposes. The teacher is urged to have the pupils aid in scoring the papers in so far as it is possible. This can be done very largely in arithmetic, in spelling, in certain reading tests, and, to an extent, in writing. The chief purpose of involving the child in the grading is to further increase his interest. This is an incentive and a motive which is worth while for teaching purposes, and which will lead the child to greater effort in order that he may score higher in a future test. 4. Tabulating Results. — Directions for tabulating results or distributing the scores are provided in connection with most of the tests. A common method of making a dis- tribution is to arrange the papers in order. The teacher can 258 How to Measure then draw off the scores, noting the number of papers falling at each point. This gives the distribution. For further use, the teacher will need to supply the names of pupils opposite each score, or, in case she is noting mistakes, opposite each mistake. The results of any test cannot be intelligently used until they have been arranged in some systematic order, particularly if the number of pupils involved is large. 5. Statistical Calculations and Graphic Representations. — The statistical points to be determined after the results of a test have been tabulated are usually the median, the quartiles, and, sometimes, the average or standard deviation. These points will be explained in the next section of this chapter. When understood, they are very valuable, particularly in making a comparison of one room or one school system with another. To represent the scores graphically often helps the teacher to see points which would otherwise remain hidden. A graphic representation is made by noting the number of scores falling at each point of the scale, and representing the number by the distance from the base line, and then drawing a line connecting all of these points. The height of the line above the base line enables the teacher to see at a glance just what is happening in her class. 6. Interpretation of Results. — The teacher is warned to avoid conclusions until she has mastered the technique and the significance of the test and has given it to different groups, or enough times to the same group, to clear up in her own mind the various questions that may arise in giving the test. Especial care should be taken not to draw far-reaching, general con- clusions from a test. A test is usually devised for a specific purpose. The significance of the test in other fields can be known only through the figuring of coefficients of correlation after a large number of cases has accumulated. A good drawing of the moon by an eighth grade pupil means a good drawing of the moon — not an artist, not an astronomer. Statistical Terms and Methods 259 Nothing should be taken for granted. Mistakes will be avoided by caution, and fear will be eliminated by a thorough understanding. 7. Applying Remedies.—The ultimate purpose of a test, so far as the individual teacher is concerned, is to enable her to see the needs of her pupils and to search out the appropriate remedies. The discovery of the remedy in any subject takes her into the question of methods of teaching, but this is a desirable result. To use a test for measurement only, without carrying the work forward to a point of use and application in better teaching, is to close the eyes to the significance of a situation after it has been revealed. The teacher, after giving a test, is in the position of a specialist who has diagnosed a bodily ailment. The diagnosis means nothing unless the appropriate remedy is applied. The recognition of this fact leads a teacher or a group of teachers, again and again, into the study of methods of teaching with reference to the subject tested. 8. Cooperation. — In a city system, the closest possible cooperation is urged between supervisor and teachers, not only for the benefit of the teachers, but as well for the benefit of the supervisor. Cooperation, understanding, and mutual confidence are always valuable assets, and especially so in the use of the tests which may reveal teacher weaknesses as well as pupil weaknesses. The teacher, however, will be the first to want to correct any revealed defects, and her interest and cooperation will enable the superintendent or supervisor to secure other important results, such as : a. A more scientific attitude toward school work. b. A closer checking of results and a realization that pupil errors are specific and need individual attention. c. Better time allotments, more definite assignments, a clearer conception of the objectives to be attained, and more efficient methods of teaching. Statistical Terms. — The purpose of the statistical treat- 260 How to Measure ment of scores is intelligent interpretation. The first step in the handling of scores is to give them systematic arrangement. A distribution is a systematic arrangement of scores. A table of frequency is a table showing the scale and the distribution of scores at each point on the scale. The following are the unarranged grades of seventy-seven sixth grade pupils in arithmetic: 74, 92, 65, 69, 76, 80, 62, 73, 85, 81, 79, 66, 59, 75, 76, 81, 84, 74, 55, 73, 86, 75, 71, 60, 92, 85, 76, 82, 50, 65, 92, 100, 81, 75, 85, 97, 65, 91, 85, 86, 72, 55, 75, 75, 72, 77, 62, 95, 87, 75, 75, 70, 76, 87, 85, 82, 67, 90, 81, 95, 80, 86, 80, 75, 67, 70, 72, 84, 76, 70, 88, 72, 80, 75, 67, 82, 72. Thus arranged, the scores have little significance. They need statistical interpretation. The following table of fre- quency shows a scale with intervals of 5, and on the right hand side the number of scores at each point on the scale. This right-hand column represents the distribution. A grade is recorded at the nearest “ 5 ” point on the scale, thus, 74 is recorded or scored at 75, 92 at 90, etc. Table 58. — Frequency Table : Showing the Arithmetic Scores of 77 Sixth Grade Pupils Grades, Intervals of 5 (Scale) Number of Scores at Each Point on Scale (Frequency) 50 I ss 2 60 4 65 7 70 10 75 19 80 12 85 12 90 6 95 3 100 1 Total 77 Statistical Terms and Methods 261 This table is much more useful than the undistributed grades, as it enables the. teacher to see the number of pupils (or number of scores) at each point on the scale. Special significance is usually attached to certain points on the scale, such, for instance, as the passing mark. If 70 is the passing grade, the teacher sees at once that 14 of the pupils have failed. Other points on the scale that have statistical value are the median, the quartiles, the mode, the average, and the range. The median is the middle score, or the point on the scale above and below which an equal number of scores fall after the scores have been arranged into a table of frequency. In Table 58 there are 77 scores, so that the middle one would be the 39th score from either end of the distribution. The 39th score falls at 75, and therefore 75 is the median score. In case of an even number of scores, the median is the average of the two middle scores. The quartiles are the points on the scale arrived at by taking and -f- the scores, counting in from either end. It is usual to start at the bottom of the scale, so that counting up until \ the scores have been covered locates the point on the scale known as the first quartile, and the distance up the scale neces- sary to include of the scores locates the third quartile. The second quartile is seldom referred to as it is the same as the median. It is evident that the first and third quartiles are the points midway between the median and the extremes. The middle 50% is a term frequently used. It represents the number of scores falling between the first and third quartiles. The extremes are the outside limits of the distribution, and the distance between the extremes indicates the range of the distribution. The mode is the point on the scale where the greatest number of scores fall. In Table 58, the mode is 75. The average is found by adding the scores together, and dividing the sum by the number of scores. 262 How to Measure Deviation. — Some method of indicating by a single figure the deviation of the scores from some central point like the median is frequently used. Average deviation is most often used and it is found by taking the average of the deviations of the individual scores from some central tendency, usually the median. Standard deviation is also used to express devia- tion. It equals the square root of the sum of the squares of the deviations from the arithmetical average (although the median may be used instead of the average). The teacher will have little use for figuring deviation in the present volume. Correlation. — The relation between two paired series may be expressed by a single figure known as the coefficient of correlation. The figure ranges from — 1 to +1, the latter figure representing perfect correlation or agreement. The work of the present volume will not require the derivation of this term, so the reader is referred to works listed in the bib- liography in case of a desire to know the method of figuring the coefficient of correlation. BRIEF BIBLIOGRAPHY King, W. I., “Elements of Statistical Method,” The Macmillan Com- pany, New York, 1915. Thorndike, E. L., “Mental and Social Measurements,” Teachers Col- lege, New York, 1913. Brinton, W. C., “Graphic Methods for Presenting Facts,” Engineering Magazine Co., New York, 1914. Rugg, H. 0., “Statistical Methods Applied to Education,” Houghton Mifflin Company, Boston, 1917. Buckingham, B. R., “Statistical Terms and Methods,” Seventeenth Yearbook of the National Society for the Study of Education, Chap. IX, Part II, 1918. CHAPTER XII THE TEACHERS* USE OF SCALES AND STANDARDIZED TESTS The college instructor blames the high school teacher, the high school teacher complains of the grade teacher, each grade teacher above the first grade finds fault with the poor work of the teacher in the grade below, and the first grade teacher in turn is chagrined at the shortcomings of the home training. Must this go on indefinitely ? Whose opinion should prevail ? Is it not possible to get away from personal opinion to an agreed-upon consensus of opinion? May we not replace the constantly conflicting subjective standards with definitely defined objective standards? Present Grading System. — If 20 mechanics were sent out into a mill yard to cut and bring back a steel rod just long enough to reach from one girder to another, but were not given the measured distance between the girders before going, nor permitted to take a ruler or tape to use in selecting the rods, no experiment is needed to prove that each one of the 20 rods would be different in length and no one of them would exactly span the distance from girder to girder except by chance. On the other hand if the foreman were to use a steel tape in measuring the width between the girders, and were to permit the mechanics to measure the length of the rods before cutting them, they would return with 20 rods each meeting with his approval. Is it possible for the school foreman, the teacher, to replace her subjective standard, her mere opinion, by an objective standard approximating the steel tape of the shop? The need of more accurate, objective standards in grading is 263 264 How to Measure generally appreciated. The following are some of the evidences of such need: (1) There are constant complaints from teachers in upper grades, as indicated above, against the poor quality of work done in the lower grades. (2) There is wide variation in the distribution of grades among the various departments of the same school. In one high school, for example, 80% of the English grades were 90 or above, while only 4% of the mathematics grades were 90 or above. In the same high school, the German teacher gave 70% of her pupils 90 or above, while the Latin teacher gave only of her pupils a grade of 90 or above. A recent study of college grading well illustrates this point. The study covered a total of 12,782 grades by 10 professors covering a period of 5 years. The grades given by professors number 1, 3, and 4 are shown herewith: Professor Grades (Total) Failed 75-79 80-84 85-89 90-94 95-100 No. i . . . 1071 32.1% 12-7% IS-9% 14-9% 12-7% «.5% No. 3 . . . 1422 9.8 7.0 11.9 IS-9 14.4 40.7 No. 4 . . . 2196 3-3 6.2 19-3 36.2 28.5 6-3 The contrast between Professor No. i and No. 3, who represent the extremes, is brought out more strongly by the graphic representation (see Fig. 16) than by the table. Pro- fessor No. 1 fails approximately one third of his students and then distributes the others about equally among the 5 remain- ing points of the scale. Quite the opposite, Professor No. 3 gives two fifths of his students an honor grade, and then distributes the other grades about equally among the 5 other points of the scale. These figures are in the main true for each of the 5 years studied, without regard to the maturity of the students, whether they be freshmen, sophomores, juniors, or seniors. Teachers’ Use. of Scales and Standardized Tests 265 Fig. 16. — Showing graphically the distribution of grades given by three college professors at Iowa State College. A study of the distribution of the grades given by the faculty of any large high school or college is likely to show similar results, unless the problem of grading has received special attention. (3) There is a wide variation in the distribution of grades among teachers of the same department. Of 2 instructors in the same department 1 gave to 43% of his students the 266 How to Measure grade of “ excellent ” and to none the grade of “ failure,” whereas the other gave to none of his students the grade of “ excellent ” and to 14% the grade of “ failure.” 1 There must have been a few good and a few bad in each group. (4) The fact that pupils transferring from one school system to another are frequently demoted indicates that minor details rather than large fundamental considerations are the determining factors in classifying them. Since pupils are constantly shifting, in many schools as high as 20% being new to the system each year,2 this is a very important item. In fairness to the child, as well as the school from which he came, it should be possible to determine his standing through the use of objective standards, and so place him in the proper grade.3 Differences in Grading Same Paper. — A study by Dr. Daniel Starch illustrates very clearly the variation among teachers of a single subject in grading that subject. A paper in English was submitted to 142 teachers of English. The grades varied from 50 to 97, the passing grade being 75. Twenty-six of these teachers, or 18%, marked the paper a failure, that is, graded it below 75. On the other hand, 14 1 Starch, Daniel: “ Educational Measurement,” p. 3, quoted from Dearborn. 2 Typical facts with reference to the proportion of school children who leave school, because of leaving the city, are easily gathered from current school reports. The following are illustrative: In Waterbury, Connecticut, 1914-15, there was a total enrollment of 13,954. Of this number 902, or 5.4 per cent, left school during the year. Of those who left school, 426, or 47.2 per cent, left the city. Similar facts for other cities follow: In Des Moines, Iowa, 1913-14, 10.7 per cent left school. The proportion of those who left the city was 61.3 per cent. In Decatur, Illinois, 1913-14, 15.7 per cent left school. Of these 62.7 per cent left the city. In Connersville, Indiana, 1910-12, two years combined, 14.8 per cent left school. Of these 57.8 per cent left the city. Pupils who leave the city will usually enter other school systems. 3 Asst. Supt. O’Hern in the May, 1918, number of Elementary School Journal calls attention to the value of standard tests for placing new pupils in the right grades. Teachers’ Use of Scales and Standardized Tests 267 of the group marked it 90 or above, indicating that in their opinion it was a very superior paper. In mathematics, a similar test gave results that were even more surprising, particularly so in view of the fact that mathe- matics is considered one of the exact sciences. A geometry paper which was submitted to 118 teachers received grades ranging from 29 to 92, the passing mark being 75. Sixty-eight of the teachers, or nearly 58% of them, marked the paper a failure. Fifty of the group marked it 75 or above, one giving it a grade of 92. A history paper graded by 70 teachers showed similar variations, the grades ranging from 43 to 90. This but illustrates the present chaos resulting from the lack of standards in grading an ordinary examination paper. When this is multiplied by the variation in sets of exam- ination questions, it is apparent that on the present basis of examinations it is absolutely impossible to compare one sys- tem with another, one grade with another, or to compare from month to month the same grade with itself. It is unnecessary to discuss fully the above points. Others might be added, all indicating the need of objective standards. Uniform Examination Not Satisfactory. — One may ask if the purposes of an objective standard for measuring school achievement cannot be accomplished by a uniform course of study, uniform examination questions, and uniform grading. These items may properly receive attention in order. In the first place a uniform course of study is undesirable. It must be adjusted to community demands and pupil interests. It should differ greatly for children from the exclusive residence districts of New York City, and the children from Iowa farm homes. To attempt to secure a rigid uniformity in the course of study would be deadening in the extreme. The course of study should be flexible and provide for local variations. To possess knowledge which is useful and usable is much more fundamental in a democracy than to strive for a large common 268 How to Measure intellectual possession composed too largely of material which is stale and useless. In the second place, all will agree that there is nothing more baneful and stupefying in its influence than a rigid examination system. It makes subject matter the aim and end. It leads to cramming. It militates against use and application. It directs pupils to words in books instead of to life’s real problems and their solutions. And strange as it may seem, a standard test will accomplish the desirable results of a uniform examination system without the undesirable results appearing. In subjects for which standard tests are not available, examinations must continue to be used. They have a value if rightly used. In the third place, all will admit that uniformity of grading is desirable. It is difficult, however, with an ordinary examination, although common practice may be improved by adopting a 5-point system and distributing grades according to the normal curve of distribution. How to improve the grading of a group of teachers along these lines has been well explained by Gray, Meyer, Dearborn, Judd, Starch, Kelly,1 and others. But one of the greatest advantages of the standard test or scale is that it greatly aids in securing uni- formity of results in grading. In order to standardize a test, specific directions have of necessity been prepared for giving the test and for grading the returns. All of this means greater uniformity in grading, and greater fairness to individual classes or pupils in case of comparison. In fact, a standard test is, in a way, a well-selected uniform examination, accom- panied by specific directions which greatly aid in securing uniformity and fairness. Standard Tests. — But a standard test is much more than 1 Kelly, F. J., “Teachers’ Marks and their Distribution.” Contributions to Education No. 66, Teachers College, Columbia University. This volume contains a good bibliography on the study of school and college marks and grading. Teachers’ Use of Scales and Standardized Tests 269 a uniform examination. The standardization of a single test or scale often requires a year or more of intensive work by one of our ablest educators. Not only must the subject matter be carefully selected and adapted to pupil ability, but it must be tried out with thousands of pupils, revised, and again tried out, until every detail of the test, its administration, its evaluation, and the grade or age standards, has been determined. Such an undertaking is too much to expect from the overworked teacher. But the teacher may prop- erly be expected to profit by the standard tests of subject matter which have become available. The difference between an examination and a standard test, as well as the progress of measurement in education, is fairly well illustrated in the attempt to measure arithmetic in the two Cleveland surveys, the first by a local commission in 1906, the second by a survey committee composed of educa- tional experts selected from all parts of the country only nine years later, 1915. The arithmetic test given in the first Cleveland survey was devised by men of maturity and judgment, but had not been standardized. It was not even based upon a wise selec- tion of subject-matter, and it could not lead to any valid con- clusions. It was used in at least one later survey.1 It did not justify further use, although it was doubtless as good as any test that could have been quickly devised under the circumstances. At the time Thorndike’s writing scale had just appeared but had not come into general use, and there were no standard tests. In 1915, however, the work of the Cleveland schools was measured in a scientific manner which carried conviction every- where. In writing, spelling, arithmetic, and reading, scales or standard tests were applied which clearly revealed the grade to grade progress of the pupils, made possible comparison of one building with another, and permitted comparison of the 1 East Orange, N. J., 1911, by Dr. E. C. Moore. 270 How to Measure work in Cleveland with similar work in other cities through- out the country. While a particular teacher need not be greatly concerned about having a test that will permit comparison of the work in one city with the work in another, or even a comparison of her work with the work of other teachers in the same grade throughout the system in which she works, yet she should be concerned about the progress of the children within her own room. She should know the results of her work. She should have a device for the definite measurement of progress, due to a particular method, or a given time devoted to the work. These aims cannot be accomplished through the ordinary examination. They can be accomplished only through the use of scales and standardized tests. Initiating the Use of Standard Tests. — Whether the initiative in the use of standard tests be taken by the teacher, the superintendent, or a survey commission, the final result should be to help the teacher, and through her, the pupil. Miss Laura Zirbes,1 of the Cleveland University School, took the initiative in the use of standard tests, completely transformed her own theory and practice, and brought new life and more rapid progress to her pupils. In Boston, the initiative came from the central office, but in such sympathetic and cooperative form that teachers were effectively reached. Of more pronounced effect probably than any of these factors, however, was the stimulation among the Boston teachers of an inquiring attitude towards the whole problem of arithmetic instruction. “ The results from the tests have shown the need of improvement; they have shown that the problem of arithmetic teaching is not yet solved, and they have prompted many teachers to study their own work as the first step towards improving methods of instruction.”2 Later 1 “Diagnostic Measurement as a Basis of Procedure,” Elementary School Journal, March, 1918, pages 505-522. 2 Boston, Educational Bulletin No. X. Teachers’ Use of Scales and Standardized Tests 271 an entire bulletin1 was devoted to showing teachers and principals how to use the results of standard tests in reaching individual pupils and improving instruction. The teacher who uses a standard test in her own room for the purpose of knowing her pupils or locating the weak places in her instruction may take pride in the fact that she is putting herself in line with a vast army of scientific workers in educa- tion. She determines the median ability of 30 pupils in a single grade, the distribution of ability, the points of weak- ness, and the remedies to apply. A principal does the same for the entire building; the superintendent for the entire school system; a state bureau for the state; and a research specialist, by combining city and state results, gets norms of performance for a nation. The teacher thus sees herself as a contributor in a great piece of constructive work in scientific education, and she may, if she wishes, locate her particular group of children with reference to the thousands of other children throughout the county, — she may feel the thrill of being one of the 700,000 lieutenants who marshal the army of 22,000,000 American school children, in the interests of a safer and saner democracy. Uses of a Standard Test.—However, the most helpful point for the present purpose is that standard tests should be used by the individual teacher for the purpose of finding the weaknesses in her own work, evaluating methods, and definitely measuring the progress of her own pupils. It will be worth while to enumerate in order the uses that a teacher may make of a standard test. Some of these are in common with the uses which may be made of the results of standard tests by principals and superintendents, but many of them apply directly to the particular schoolroom and are in addi- tion to other uses. Standardized tests may be used: 1. To determine conclusively whether or not a pupil is making progress. 1 Boston, Educational Bulletin No. XIII. How to Measure 272 2. To determine how much progress a pupil has made in a given time. 3. To determine whether a pupil should be promoted, retained, or reclassified, in so far as the mastery of subject matter is made a condition of progress. Dr. Starch states that promotion on the basis of measured ability would save one year to one third of the pupils in the public schools.1 4. To determine even more accurately whether or not the class is making progress and the amount of such progress. 5. To determine whether or not a class is up to standard when received from another teacher. This use of the stand- ard test would remove the constant complaint of teachers that the work has not been covered in the preceding grades. 6. To justify a year’s work with a class on the basis of actually measured progress. This will make it possible to show to a prejudiced principal or superintendent (if such there be) that reasonable progress has been made by a class. 7. To show results in a manner that completely discounts the advantages of another teacher more attractive and popu- lar, in case such teacher depends upon winning promotion by methods not contributing to pupil progress. 8. To detect the fact, in case more time cannot profitably be spent with retarded pupils. See, for example, the con- clusion of Superintendent Bliss of Montclair, N. J., that a group of subnormal pupils could not profit by further work in arithmetic.2 9. To release bright pupils from further work after deter- mined standards have been reached, as long as said standards are maintained. The teacher would thus limit the work required along mechanical and routine lines. Rice’s articles 3 on the “ Spelling Grind ” over 20 years ago emphasized the fact of wasted youth through the schools. Over emphasis 1 Fifteenth Yearbook of the National Society for the Study of Education, Part I, p. 146. 2 Ibid., p. 75. 3 The Forum, XXIII: 163-172, 409-419. Teachers' Use 0} Scales and Standardized Tests 273 upon the mechanical phases of school work closes the door to story, romance, history, literature, music, and play. 10. To test one method against another by the amount of measured progress made by the pupils, e.g. textbook procedure versus large motivated problems, as a basis for developing ability in solving reasoning problems (in so far as devised tests adequately measure this educational product). It is apparent that such use of standardized tests would re- place the trial and error method as a means of determining correct procedure, and would replace it by a method much more scientific. 11. To test one class plan, study plan,1 or administrative device against another, by measured results with the pupils. 12. To determine the proper apportionment of school time to various subjects of study and other school activities. This use of standard tests has been well pointed out by Dr. Haggerty.2 Standard Test Saves Time. — Naturally the teacher asks, “ But will not this scientific testing require a much larger time expenditure than I can give to it? I’m crowded for time as it is.” This question can be answered only on the basis of the experience of other teachers. That experience shows that after the technique is once mastered, the time required for standard testing is not more, but frequently less, than the time consumed in marking papers under the old examination system. After the writing scale has been used for a while, has been conveniently posted for reference by pupils, and has been explained to them, the teacher will find that a committee of pupils can be relied upon to grade the writing of the room, honestly and quite accurately. In fact each pupil will grade 1 See p. 113, Schoolman’s Week Proceedings (University of Pennsyl- vania), April, 1918, for comparison of class study and independent study in spelling. Reported by J. N. Adee, Superintendent Schools, Johnstown, Pa, 2 School and Society, IV: 761-771. 274 How to Measure his own writing by comparison with the scale. After the spelling test has been given, pupils may be allowed to ex- change papers and correct them while the teacher gives the correct spelling of the words. Likewise in arithmetic, the pupils can help the teacher in quickly grading the papers. This help by pupils in the simpler tests should be encouraged not alone because it saves the time of the teacher, but chiefly because it creates a desirable interest and stimulates the pupils to put forth a greater effort to reach a given standard. Standard Test a More Effective Tool. — The question with regard to the time required for giving standard tests is a legitimate one, and an effort has been made to answer it. Every conscientious teacher will agree, however, that time is not the chief consideration. She puts in a full quota of time each day, and will continue to do so. If she is as wise as conscientious, she will also provide time for sufficient sleep and recreation each day. The chief consideration is that the teacher in mastering the details of the use and inter- pretation of a standard test is equipping herself with a more effective tool of service. Why should the teacher guess and estimate when she can measure? The unsatisfactory nature of the present grading system has been dwelt upon. A grade of 85 in one room cannot be compared with a grade of 85 in another room. The present unscientific method of grading must be replaced by scientific procedure if we are to continue to make educational progress. Improvement is certainly hampered by the use of a system which does not even permit of comparison, and thus give a definite measure of progress. Under the old system when two schools determined to compare the spelling ability of their pupils, all that they could do was to get the pupils together and have them compete in a spelling match. And yet as we look back upon the spelling match we see that the result was finally determined by the one best speller. The general merit of spelling in one school as com- pared with the general merit in the other was not determined. Teachers' Use of Scales and Standardized Tests 275 Measuring a Human Product. — The teacher may insist that she is dealing with a delicate human product. This is true; and yet, as Thorndike has pointed out, mental products can be measured and are being measured. “ Whatever exists, exists in some amount.” The work of the physician probably compares as closely as any other with that of the teacher. We want a physician who is kind and sympathetic, but we are not willing that these qualities be substituted for accurate and adequate knowledge. Regardless of his kindness and sympathy, he counts the pulse, and takes the temperature. In case an anaesthetic is to be administered, he calls in an expert to determine the amount and to administer it accord- ing to standard methods. In case of a surgical operation he again calls for an expert, frequently a busy, unsympathetic stranger. In all of this work, regardless of his kindness, sympathy, geniality, and his spiritual qualities in general, he relies upon accurate knowledge, definite measurement, and tested skill. He proceeds scientifically. The teacher should do likewise. “It is a popular superstition that human action, person- ality, and behavior will be penned up and hindered when measured by logical categories and fixed units. But, just as the pound weight has not interfered with the production of butter, and the yardstick has not obstructed improvement in the manufacture of cotton or other goods, so methods of teaching, it may be assumed, ‘ will improve and develop freely, even when fixed standards are applied.’ The spirit can still go where it listeth. Measurement must meekly follow, gather up the results, and give them a value.” Weights and measures call to mind definite units, such as pound, quart, and yard, and these are infinitely more valuable for commercial purposes than “ as much as a man can lift,” “a small jar full,” or “the length of a man’s arm.” Stand- ards have made commercial transactions possible at great distances on a basis of perfect understanding and fairness. 276 How to Measure There is no doubt that teaching and the products of school work are going to be benefited in a similar manner by the application of definite standards of measurement. Measure- ment is always taking place in one form or another. School work is being constantly ’ noted as good, fair, or poor, as satisfactory or unsatisfactory, and is constantly being rated by such standards as are available, be these standards crude or otherwise. Many large cities have established bureaus of measurement and efficiency. Each bureau has a head with an adequate clerical staff. Such an organization is needed in a large city even when the teachers administer and grade the tests. A central bureau can establish city standards, make valuable comparisons, and interpret results in a way to be most val- uable and helpful to all teachers as well as superintendents and supervisors. But more and more the directors of central bureaus realize that they are failing unless they reach the individual teachers. Dr. Ballou emphasizes this on every page of his recent bulletin interpreting results in arithmetic.1 He assures us that in the last analysis “ the teacher must find out what her trouble is and then apply the remedy.” The present work makes no effort to discuss the complete list of available tests, but instead is limited to such tests as have been standardized sufficiently to recommend their use to the teacher who, for the most part, is untrained in the use of statistical methods. In beginning the work in measurement, teachers should make no effort to employ all available tests, but should carefully select the test to be given. As pointed out by Ballou, teachers will do well to give tests that are reasonably simple, that can be scored and tabulated with reasonable ease, and that have been given to a sufficient number of children so that well-founded standards of achieve- ment have been established, the first assumption always being that the test measures desirable phases of school products. 1 Boston, Educational Bulletin No. XIII. Teachers' Use of Scales and Standardized Tests 277 BIBLIOGRAPHY 1. “Standards and Tests for the Measurement of the Efficiency of Schools and School Systems,” Part I, Fifteenth Yearbook of the National Society for the Study of Education (1916), Public School Publishing Co., Bloomington, Illinois. 2. “The Measurement of Educational Products,” Part II, Seven- teenth Yearbook of the National Society for the Study of Edu- cation (1918), Public School Publishing Co., Bloomington, Illi- nois. 3. Indiana University Studies, by Haggerty, M. E.: 27, “Arithmetic: A Cooperative Study in Educational Measurements” ; 32, “Stud- ies in Arithmetic.” 4. Ayres, L. P., “A Survey of School Surveys,” Indiana University, Second Conference on Educational Measurements, pp. 172-181. 5. Ballou, Frank W., “Improving Instruction Through Educational Measurement,” Proceedings N. E. A., 1916: 1086-1093. 6. Ballou, Frank W., Bulletins of the Department of Educational Investigation and Measurement, Boston, Nos. X, XIII. 7. Bobbitt, J. F., Twelfth Yearbook of the National Society for the Study of Education, Part I, pp. 7-96. 8. Courtis, S. A., “Standardization of Teachers’ Examinations,” Proceedings N. E. A., 1916 : 1078-1086. 9. Cubberley, E. P., “The Significance of Educational Measurements,” Indiana University, Third Conference on Educational Measure- ment, pp. 6-20. 10. Freeman, Frank W., “Some Practical Studies of Handwriting,” Elementary School Journal, 14: 167-179, December, 1913. 11. Haggerty, M. E., “Some Uses of Educational Measurements,” School and Society, 4: 761-771, November 18, 1916. 12. Harlan, Chas. L., “A Comparison of the Writing, Spelling, and Arithmetic Abilities of Country and City Children.” 13. Judd, Chas. H., “Reading Tests,” Proceedings N. E. A., 1915: 561-565. 14. Judd, Chas. H., “Standardized Units of Achievement of Pupils and Measurable Standards of School Administration,” Proceed- ings N. E. A., 1917 : 721-724. 15. Judd, Chas. H., “Measuring the Work of the Public Schools,” Survey Committee of the Cleveland Foundation, Cleveland, Ohio. 16. Judd, Chas. H., “A Look Forward,” Seventeenth Yearbook of the National Society for the Study of Education, Part I, pp. 152-160. 278 How to Measure 17. Melcher, George, “The Two Phases of Educational Research and Efficiency in the Public Schools,” Proceedings N. E. A., 1916: 1073-1078. 18. Moore, E. C., Report on East Orange, New Jersey, 1911. 19. Morrison, J. Cayce, “The Supervisor’s Use of Standard Tests of Efficiency,” Elementary School Journal, 17: 335, January, 1917. 20. O’Hern, Joseph P., “Practical Application of Standard Tests in Spelling, Language, and Arithmetic,” Elementary School Journal, 18 : 662-679, May, 1918. 21. Rice, J. M., “The Futility of the Spelling Grind,” Forum, 23 : 163- 172 and 409-419. 22. Starch, Daniel, “Educational Measurements.” 23. Strayer, George D., “The Use of Tests and Scales of Measurement in the Administration of Schools,” Proceedings N. E. A., 1915: 579-582. 24. Strayer, George D., et al., “Report of the Committee on Tests and Standards of Efficiency in Schools and School Systems,” Proceed- ings N. E. A., 1913, 392-406. 25. Thorndike, E. L., “The Elimination of Pupils from School,” Bulletin No. 4, 1907, United States Bureau of Education. 26. Wilson, G. M., “The Handwriting of School Children,” Elementary School Teacher, n: 540-543, June, 1911. 27. Wood, Ernest R., “Tests in Efficiency in Arithmetic,” Elementary School Journal, 17: 446-453, February, 1917. 28. Zirbes, Laura, “Diagnostic Measurement as a Basis for Procedure,” Elementary School Journal, 18 : 505, March, 1918. APPENDIX Data for Estimating the Degree oe Difficulty Required to Produce 20 Per Cent of Wrong or Omitted Responses When a Given Step of the Scale Produces from 8 Per Cent to 40 Per Cent of Such. They are Used in Connection with the Thorndike Reading Tests for the Determination of Pupils’ Scores. Given Percentage Add Given Percentage Add Given Percentage Add Given Percentage Add 8.0 .84 5 .61 13.0 .42 5 .26 I •83 6 .60 I .42 6 •25 2 .82 7 .60 2 .41 7 .24 3 .81 8 •59 3 .40 8 .24 4 .80 9 .58 4 •39 9 •23 S •78S 11.0 •57 . 5 •39 16.0 •23 6 .78 1 •57 6 .38 1 .22 7 •77 2 •56 7 •37 2 .21 8 .76 3 •55 8 •37 3 .21 9 •75 4 •54 9 •36 4 .20 9.0 •74 5 •53 14.0 •36 5 .20 i •73 6 •52 1 •35 6 .19 2 .72 7 •52 2 •35 7 ,l8 3 •7i 8 •5i 3 •34 8 .18 4 •7i 9 •5i 4 •33 9 •17 5 .70 12.0 •49 5 •33 17.0 •17 6 .69 1 •49 6 •32 1 .l6 7 .68 2 .48 7 •3i 2 •15 8 .67 3 .48 8 •30 3 •15 9 .66 4 •47 9 •30 4 .14 10.0 •65 5 .46 15.0 .29 5 .14 i .64 6 •45 1 .28 6 •13 2 •63 7 •45 2 •27 7 .12 3 .62 8 •44 3 •27 8 .12 4 .62 9 •43 4 .26 9 .11 279 280 How to Measure Given Percentage Add Given Percentage Sub- tract Given Percentage Sub- tract Given Percentage Sub- tract 18.0 .11 20.0 .00 8 .19 6 •37 I .10 I .00 9 .19 7 •37 2 .10 2 .01 24.0 .20 8 •38 3 .09 3 .ox I .21 9 .38 4 .09 4 .02 2 .21 28.0 •39 5 .08 5 •03 3 .22 1 •39 6 .08 6 •°3 4' .22 2 •40 7 .07 7 .04 S -23 3 •40 8 .07 8 .04 6 •23 4 .40 9 .06 9 •05 7 .24 5 .41 19.0 •05 21.0 •OS 8 .24 6 .41 1 •05 1 .06 9 .24 7 .42 2 .04 2 .06 25.0 •25 8 .42 3 •°3 3 •07 1 .26 9 •42 4 •°3 4 •07 2 .26 29.0 •43 5 .02 5 .08 3 .27 1 •43 6 .02 6 .08 4 .27 2 •44 7 .01 7 .09 S .27 3 •44 8 .01 8 .09 6 .28 4 •45 9 .00 9 .10 7 .28 S •45 22.0 .10 8 .29 6 .46 1 .11 9 .29 7 .46 2 .11 26.0 •30 8 •47 3 .12 1 •30 9 •47 4 .12 2 •30 30.0 •47 5 •13 3 •31 1 .48 6 •13 4 •31 2 .48 7 .14 S •32 3 •49 8 .14 6 •32 4 •49 9 •15 7 •33 S •49 23.0 •iS 8 •33 6 •So 1 .16 9 •33 7 •So 2 .16 27.0 •34 8 •5i 3 •17 1 •35 9 •5i 4 •17 2 •35 31.0 •5i 5 .18 3 •35 1 •52 6 .18 4 •36 2 •52 7 .18 5 •36 3 •S3 Appendix 281 Given Percentage Sub- tract Given Percentage Sub- tract Given Percentage Sub- tract Given Percentage Sub- tract 4 •53 6 .62 8 •71 9 .80 5 •54 7 •63 9 .72 38.0 .80 6 •54 8 •63 36.0 .72 1 .80 7 •54 9 •63 1 ■73 2 .81 8 •55 34-o .64 2 •73 3 .81 9 •55 1 .64 3 •73 4 .81 32.0 •56 2 •65 4 •74 5 .82 1 •56 3 ■65 5 •74 6 .82 2 •57 4 .66 6 ■75 7 •83 3 •57 5 .66 7 •75 8 •83 4 ■57 6 .66 8 •75 9 •83 5 •58 7 .67 9 .76 39-o .84 6 .58 8 .67 37-o .76 1 .84 7 •58 9 .67 1 •77 2 •85 8 •59 35-o .68 2 •77 3 ■85 9 •59 1 .68 3 •77 4 •85 33-o .60 2 .69 4 •78 5 .86 1 .60 3 .69 5 •78 6 .86 2 .61 4 .70 6 .78 7 .86 3 .61 5 .70 7 •79 8 •87 4 .61 6 .70 8 •79 9 .87 5 .62 7 •7i INDEX Algebra, tests in, 214-220. Ancient history scale, 225-226. Arithmetic, measurement of, 58; Courtis Tests in, series B, 59-67; nature of Courtis examples, 61; directions for giving Courtis Tests, 61-63; scoring Courtis test papers, 63-67; standard Courtis scores, 67; remedial instruction in, 71-74, 85, 88-90; reasoning tests, 75-78; Woody scales, 78-91; Boston research tests in fractions, 91-06; Cleveland survey tests, 96-101; Kansas diagnostic tests in, 101-105; the teacher’s problem in, 105-106; the next step, 106; bibliography, 107-109. Average, 261. Ayres, Leonard P., work on spelling vocab- ulary, 6; spelling scale, 6-7, 9; writing scale, 28-35. Ballou, Frank W., referred to and quoted, 276. Bibliography, on spelling, 22; on hand- writing, 56-57 ; on arithmetic, 107-109; on reading, 154-155; on English composition, 179-180; on drawing, 191; on history, 209-210; on geography, 210; on language, 210-211; on music, 211; on high school tests, 226-228; on measurement of general intelligence, 252-253; on statistical methods, 262; on teachers’ use of standard tests, 277- 278. Binet-Simon test, old form, 230; Stanford revision of, 250-252 ; comparison of re- sults, 251; use of Binet-Simon test and group tests, 250. Blewett, Ben, referred to, 2, 4. Bobbitt, J. F., referred to, 1. Boston, use of Ayres spelling scale in, 10; spelling list, 16; research tests in frac- tions, 91-96; copying test, 179; tests in geography, 204-207. Breed and Frostic scale for measuring the general merit of English composition, 178. Brown’s silent reading tests, 151; standard scores, 151. Buckingham, B. R., extension of Ayres spelling scale, 10; spelling scale, 11-13. Butte survey, referred to, 18, 48, 54. Childs, H. G., use of Thorndike drawing scale, 189. Cleveland survey, 8, 48, 50, 96-101, 269 Cody, Sherwin, commercial tests, 224. Commercial tests, 224. Composition, measurement of, 156-180; Nassau County scale, 157-161; scoring results, 161-162; using results, 162- 166; Willing scale, 166-172; the teacher’s own scale, 172; Gary scale, 172-178; Thorndike’s extension of Hillegas scale, 178; Harvard-Newton scale, 178; Breed and Frostic scale, 178; Starch’s punctuation scale, 179; Boston copying test, 179; bibliography, 179- 180. Connersville course of study in elementary mathematics, referred to, 71. Correlation, 262. Counts, George S., referred to, 101. Courtis, S. A., referred to, 58; tests in arithmetic, 58-70; Silent Reading test No. 2, 126-133; aim, 126; description of, 127; giving test, 127; scoring re- sults and computing scores, 127; class record sheet, 128; interpreting and using results, 130; standard scores, 130; remedial measures, 132. Deviation, 262. Diagnostic tests in mathematics, 221-222. Distribution, 260. Drawing, how now measured, 181; Thorndike scale in, 181-186; require- 283 284 Index ments of a scale, 183-184; how Thorn- dike scale was derived, 184-186; limita- tions of Thorndike scale, 186-188; grade standards, 188-190; using the scale, 190-191; bibliography, 191. Examinations, uniform, 267. Fordyce scale for measuring the achieve- ment of reading, 149; standard scores in, 150- Freeman, Frank W., referred to, 1; standards in handwriting, 39, 40; analytical charts in handwriting, 43-44 Geography, Starch test in, 202; Hahn- Lackey scale in, 202; Boston tests in, 204. Geometry, tests in, 220-221. Grading, present systems, 263-266; grad- ing same paper, 266-267 i using standard tests, 268-274. Grammar, tests in, 209. Grand Rapids survey, 97. Gray, C. Truman, score card for hand- writing, 45. Gray, William S., oral reading test, 133-143. Haggerty, M. E-, intelligence examina- tions : Delta 1 and Delta 2, 245-249. Haggerty and Noonan achievement ex- amination in reading: Sigma 1, 153; standard scores, 153. Hahn-Lackey geography scale, 202. Handwriting, scale for, referred to, 1; measurement of, 23; first scale in, 24; Ayres scale, 24, 28-35 ; what to measure in, 24; giving the test in, 25-27; scor- ing for speed, 27; scoring for quality, 36; recording the scores, 37-38; stand- ard scores, 38; standards for speed, 38-39; standards for quality, 40; social standard in, 41-42 ; remedial instruction, 42-47; Gray’s score card, 45 ; propor- tion of children at standard quality, 48-50, 52-53, at standard speed, 49, 51; the Thorndike scale, 53—55; compara- tive scores, 55; Lister-Meyers scale, 56; bibliography, 56-57. Harvard-Newton scale for measuring English composition, 178. Henmon, V. A. C., Latin tests, 222-223. High School subjects, tests in, 212-228. Hi egas scale for measuring the quality of English composition, 156. History, measurement of, 192-202; Bell and McCullum test in, 193-195; Van Wagenen history scales, 196-201; diag- nostic tests in, 201-202 ; bibliography, 209-210. Intelligence, general, measurement of, 229- 234; differences among children, 233; Trabue language scales, 235-241; Otis group scale, 241-245; Haggerty’s intelligence examinations, 245-249; Whipple group tests, 249; Binet-Simon test, Stanford revision, 250-252; bibliog- raphy, 252-253. Iowa, spelling scale, 11; elimination re- ports, 41. Iowa elimination reports, referred to, 41. Jones, W. Franklin, determines spelling vocabulary, 5; hundred spelling demons, 16-18. Kansas silent reading test, 149; standard scores, 149. Kelley, Truman L., history tests, 201. Language, Starch tests in, 207-208; Charters, test in, 208; Trabue, scales in, 235-241. Latin, tests in, 222-223. Median, 261. Middle fifty per cent, 261. Mode, 261. Monroe, Walter S., standardized silent reading test, m-i ; aim, hi description of, 112; giving test, 114; interpreting and using results, 114; class record sheet, 115 ; standard scores, 116; scoring results, 114; remedial measures, 117; algebra tests, 215. Music, Seashore tests in, 209. Otis group intelligence scale, 241-245; aim, 242; description of, 242; giving test, 242; scoring results, 243; inter- preting and using results, 244. Physics, tests in, 223. Index 285 Quartile, 261. Reading, measurement of, 110-155; fac- tors in, in; Monroe’s silent reading tests, 111-117; remedial measures, 117-118; Thorndike’s scale for sentences and paragraphs, 119-126; Courtis silent reading test, 126-133; oral read- ing, 133; Gray’s oral reading test, 133- 143; Haggerty’s visual vocabulary test, 143-148; Kansas silent reading test, 149; Fordyce scale, 149; Brown’s silent reading tests, 151; Thorndike’s scale for word knowledge, 152 ; achieve- ment examination in, 153-154; bibli- ography, 154-155- Rice, J. M., referred to, 1, 20; spelling test, 13-16. Rogers, Anna L., diagnostic tests in mathe- matics, 221-222. Rugg and Clark, algebra test, 215-220. Sackett, L. W., ancient history scale, 225- 226. Salt Lake City survey, 52, 53. Scales, stages of development, 3; uses of, 19-21; standard, 271-274. Seashore, C. E., music tests, 209. Silent reading test, 152. Smith, William Hawley, referred to, 20. Spelling, vocabulary for scale, 5-6; Ayres scale, 6-10; giving a test, 7; scoring the papers, 8; distributing scores, 9; Buckingham’s extension of Ayres scale, 10; Iowa spelling scale, 11; Buckingham scale, 11-13; Starch test, 15-16; Bos- ton minimum list, 16; Jones’ demons, 16-18; pupil’s list of misspelled words, 18; uses of spelling scale, 19-21; methods of teaching, 21; bibliography, 22. Springfield (111.) survey, 48. Starch, Daniel, spelling test, 15-16; standards in handwriting, 39, 40; punctuation scale, 179; geography test, 202; language tests, 207-208; physics test, 223; quoted, 266. Statistical methods, securing comparable results, 254-255; using a standardized test, 255-259; selecting the test, 256; scoring the papers, 257; tabulating results, 257; statistical terms, 250-262; standard tests, 268-274 > applied to human product, 275-276. Stockard and Bell, geometry tests, 220-221. Stone, C. W., reasoning tests in arithmetic, 75-78. Surveys. See Butte, Cleveland, Grand Rapids, Salt Lake City, Springfield. Table of frequency, 260. Teachers’ composition scale, 172-178; Gary composition scale as plan, 172; need of, 172; value to teachers, 178. Thorndike, E. L., referred to, 1, 24, 260; handwriting scale, 24, 53-55; reading scales, 110-126, 152; extension of Hille- gas composition scale, 178; drawing, scale, 181-186. Trabue, M. R., Nassau County supplement to the Hillegas scale, 157-166; aim, 157 ; description of, 157 ; applying scale, 161; scoring results, 161; interpreting and using results, 162; class record sheet, 162. Visual vocabulary test, Part I, 143-148; aim, 143; description of, 143; giving test, 144; scoring results, 144; inter- preting results, 145 ; using results, 146; corrective measures, 148. Visual vocabulary test, Part II, 151. Whipple group tests for grammar grades, 240. Willing scale for measuring written com- position, 166—172; aim, 166; description and application of, 166; scoring results, 166; class record sheet, 167 ; interpret- ing and using results, 168; class scores, 168; scoring errors for general merit, 169. Withers, John W., referred to, 2. Woody, Clifford, arithmetic scales, 78-91. Zirbes, Laura, referred to, 270. Printed in the United States of America.