Proposal to use the SUMEX-AIM Resource for Computer Simulation of Language Acquisition John R. Anderson Human Performance Center University of Michigan Ann Arbor, Michigan 3 Oo The purpose of this research is to understand language acquisition. There has been a great deal of research on first language acquisition in children, second language learning by adults, and learning of artificial languages by laboratory subjects. The principle goal of this research is not gecting more experimental evidence. Rather it is to develop a working computer simulation model that can learn natural languages. The model would attempt to explain the already available set of experimental facts. It is also hoped that such a model would be a contribution to the artificial intelligence goal of developing language understanding systems. Some of the detailed plans of the research are described in the accompanying grant proposal that was awarded by NIMH (grant number 1 RO 1 MH26383-01). The period of this award is May 1, 1975 to May 1, 1977. That proposal states an intention to use Augmented Transition Networks as the basic grammatical formalism. I have already completed some initial learning programs using the augmented transition network formalism. The very earliest of this work is described in the NIMH proposal. More recently I have decided to try to develop a production system formalism as en alternate to the augmented transition network. There are three main reasons for this switch 2. in representational formalism. First, I think it is easier to represent the grammatical knowledge contained in highly inflected languages (eg., Finnish, Latin) by production systems rather than augmented transition networks. Second, I think it is easier to represent human information processing limitations in terms of production systems, Third, I think production systems serve as a means of representing non-linguistic proced- ures such as inference-making. Therefore, a theory of induction of pro. duction systems for language has the promise of generalizing to the induc~ tion of other human cognitive skills. I have bean using the SUMEX facility in a pilot project this summer. I have been bringing up a version of my production system called ACT on this facility. It is hoped that in a few months this program will be in a sufficiently developed form that other SUMEX users may use that t+ production system. t uses an associative network representation as its basic data base. This is a variant of the HAM propositional network that I developed earlier and is described in the accoupanyine proposal (p. 23 - 27). In the ACT system various portions of the network are active at any point. of time. The productions look for patterns of activation in the net- work. If these patterns exist, the productions are executed causing exter- nal actions to be taken, building network structure, and possibly changing the state of activation of the network, Activation spreads associatively through the network and there is also a dampening process which deactivates network structure. A preliminary description of the ACT system is given in the accompanying document "An Overview of ACT." It is a chapter froma forthcoming book. The most relevant section in that chapter is from pages il to 25. It was originally projected that this simulation work would be performed on the Michigan Computer Systen. However, there are @ number £ of advantages of the SUMEX-AIM facility. All the programming will occur in LISP. The INTERLISP system in SUMEX, as surmised from my own experi-~ entation, permits programming and debugging ¢0 progress at least twice as fast as with Michigan LISP. Also programs in INTERLISP would be more available to other A.f. users than programs in Michigan LISP. The Michigan computer is isolated from the national A.i. community whereas I can take advantage of the connections SOMEX-AIM has through the TYMNET and the ARPANET. Finally, the SUMEX-ATM facility provides free computing resources and so will relieve some of the-strain fron my tight research budget. It is intended that there will be continued development and testing of this production system formalism as a model of human information processing. There are plans to build substantial ACT production system models for language generation and understanding.and for inference making. A.2. cC.3. c.4. c.5. Responses to SUMEX-AIM Questionnaire Read the accompanying proposal. The research is currently supported by a grant from NIMH (grant number 1 RO 1 MH 26383-O1) for the period May 1, 1975 to May 1, 1977. The amount of the award for the first year is $20,000. This is to pay for a programmer, computer time, and rental of a terminal. Read the accomparnjing proposal. It is expected that this research will have some general contribution to make to development of language understanding systems, modeling human cognitive processes, and development of production systems. None There should be no difficulty in making my programs generally’ available to users of SUMEX-AIM. Yes Yes Read next to last paragraph in accompanying proposal. The INTERLISP language on SUMEX is the principle requirement of my research. I do not anticipate requiring any additional systems programs not already available at SUMEX. Estinated requirements per month: 100 connect hours 2 CPU hours ee ~ Lb i Qa UY a> 1500 file pages bh 2 _ us ‘ The principle times of use in Ann Arbor would probably be 0600-0900 and 1800-2100 I intend to communicate with SUMEX via the TYMNET. I would either use the private node in Ann Arbor or the public node in Detroit. The toll cost to Detroit could be met from my current grant as could the cost of terminal rental. Not really relevant ~ mes tem Seite wate phe soar @ ap eae pegs y a > pee DB elogebeg at : (Tiva the fot wig 1th ore sor fOr wet Sond 3 ad Os ri ght ae Pep hag) re Ege 4G BIRT MDATE . : 4 | 5 , Jonn BR. Anderson Junior Fellow i Aug. 27, 1947 oe nem = —- a an nc PLEGE OF MATA icity, State, County! NPE NATIONALITY (UF noa-U.S. citken, SEX to kind of ving god expiva0d dite} ds > Vancouver, B.C-, Canada Canadian - Ji - 4ug- 1974 - calaursate tan sng and ingiude PastIoele erat) - YEAR SCMENTSFIC mr mo DEGRES CONFERRED FIELD | Oe m L { - : + . B.A. 1965 Psychology . Gancouver. wd Stanford University ~ oe Ph.D. 1972 Psycholoz: Stanford, Califo ormia y “ HONORS 1968-~The Governor-Gener ral's Gold Me adal (Head of eraduating classes in ris and Sciances , University of British colunbia) MAJOA RESEARCH INTEREST ROLE IN PROPOSED PROJECT Language & Human Memory Principal Investigator RESEAACH SUPPORT (See instructions) NSF - Recognition Memory for sentences: a procass madel Sept. 1, 1973 - Sept. 1, 1975 - $40,000 $20,265 for year 1 50% of research effor grant number — GB-40298 RESEARCH AND/OR PROFESSIONAL EXPE RIENCE (Starcing with present position, jist trainiag and expenance relevant [0 arse of project Listall or mest representative publications Do not axceed 3 pages far each individual. Research and Profess! ional Experience: Junior Fellow, University of Michigan, 1973 - present . Assistant Professor, Yale University, 1972 - 1973 £ mory under the supervision 0 Numerous experiments in graduate school in human me Gordon H. Bower at Stanford University, 1968 ~ 1972 * Publications: Reber, A. S- ‘and Anderson, J. R. The perception of clicks in linguistic and non-lLinguistic messages. Perception & Pevchovhysics, 19/0, 8, 81-89. Anderson, J. R- and Bower, G. H. On an assoc ofa rive trace for sentence memory. Journal © of Verbal Le Learning and Verbal Behavior, 1971, 19, 673-680. Anderson, J. R. FRAN: A simulation model of aes recal in G. H. Bower (Ed.), The Psychology of Learning an nd. “Motivation on, Vol. 5. New York: Acadamic Press, 197: Anderson, J. RB. and Bower, G. H- Recognition and retrieval processes in free recall. Psychological Review, 1972, 79; 97-123. 41H 398 (FORMERLY PHS 493) 6 . Rev. 1/73 , u, 5. COVERNMENT PRINTING OFFICE: as7t @ - 454-723 Angarson, J. BR. A stochastic model of sentence memory. Doctoral dissertation, Stanford Unis ity, “June, 1972. dnderson, J. R. and Bowar, G. H. Configural properties in sentence memory Journal of Verbal Learning and Verbal Behavior, 1972, 11, 594-605. son, J. R. and Bowe c, G. H. Human Associative Mamory. Washington: 3 3 -l a OnS s, 1973. Reder, L. M., Anderson, J. R., & Bjork, R. A. A semantic interpretation of encoding spacificity. Journal of Experimental Psychology, 1974, 4, 648-656 Anderson, J. R. Verbatim and propositional representation of sentences in ismediate and leng-term memory. Journal of Verbal Learning and Verbal Benavior, in press.. son, J. R. and Bower, G. H. A propositional theory of recognition memory. Memory & Cognition, in press. Anderson, J. R. and Bower, G. H. Interference in memory for multiple contexts. n & Cognition, in press. Anderson, J. R. Retrieval of propositional information from long-term memory. Cognitive Psychology, in press. . Anderson. J. R. and Hastie. R. Individuation and reference in memorv: proper names end definite descriptions. Cognitive Psychology, in prass. Anderson, J. R. Computer simulation of a language-acquisition system, first report. In R. L. Solso (Ed.) Information Processing and Cognition: The Loyola Symposium, in press. Anderson, J. R. Language acquisition by computer and child. To appear in: S. Y¥. Sedelow & W. A. Sedelow (Eds.), Current Trends in Computer Use for Language Research, in preparation. x Special Note I am in the second year of an exchange visitor's visa. [I can renew the visa for another year. My wife, an American citizen, is currently petitioning to have my status changed to that of a permanent resident. Therefore, I / will be able to be at the University of Michigan for the entire period of the proposed research. COMPUTER SIMULATION OF LANGUAGE ACQUISITION : - A. Introduction iL. Direction and goals of the research Most simply stated, the purpose of this research is to understand language sequisition, There has been a great deal of research on first lenguage ecqui- sition in children, second Language learning by adults, and Learning of arti- ficial lenguages by Laboratory subjects. Tais research is not principally concerned With getting more exper rimental evidence. Rasner it is concerned wit developing an infornation-processing model that can be used to explain tne already aveilasle set of experimental “facts. One of tne principal concerns governing the design of this model is just that it be able to Learn a natural Language Twill snow that this, in itself, is a very significant goal. it that algorithms adequate to learn & naturel language are quite complex. It is not possible to sit down and sinply specify tnem verovally or with. quations. This research makes Use of the computer as & tool tc pnd test complex modes. Wwaererore, i nave been aeveLouinis & computer jon model of language acquisition. Tris model is calied LAS (en acronym ror Langu23® Acquisition Syste). Most OF the proposed budget is concerned with suppor rting the developme ant of this progre®. Input to LAS con- sists of sentences of the Language paired with represenvaet tons of their meaning. Therefore it simulates langusge learning in sitnetions where 4 Learner cen figure o f the sentenc ease of such & situation simple pictures and sentences describing then. grammar which allows it to go from sentences to lying meaning. The grammar can also be used to meanings. It is also hoped that this program evolution of computer language understanding sys really has two purposes, one in psychology and one ut the meaning oO would be one in I became interested in language acquisition with a computer simulation model of human memory - in e book by myself and Gordon Bower entitled Hunz computer program Was 6m attempt t principal purpose of that re retrieval system (called HAM) and test it in ersion of HAM is used witnin LAS. HAM 's sy understander which was capable. of dealing with subset of English and which was capable of usin to resolve reference. Nevertheless, it was re a2 5 em a etom a & wey which the learner Tne progran representations OL will make 2 contr o simulate simple que search was to develop e from context. The simplest is presented with constructs & their umdaer- sentences to convey ibution to the Tous, the research ieial intelligence. generate Tens. @ in artifs of my Work escribed The The fu experivents. € = considerable exh guete and Anderson bilities compared to the work of Schank (1973 result of my own experiences and sbudyin T vecame pessimistic about the value of re D> : e rms of a computer program. To represent the unbounded Lins Yr o oO o Oo rh ct ry % ny Pp 5 $ oH ou rs Oo xs QO 0 he a i pu fu Kh ch a 8 DO a ch ck O by ow oD 0,0 nog ft fee ue .. Dp 0 4 fo He go Oo hock ct a 5 Sb © wo er 8 ch oO \— « co QO M mG is) ry ee fo oO P ce oO UW yy DO ta Oo ra 0 OQubline of Provosal The concern in this proposal will be primarily with developing a systen logically adequate for language acquisition nd only secondarily with 2 systen t } , that simulated actual humen performance. I do not istic goal until we have a characterization of the so adequate for natural language acquisition. Tais emphasis’ on Logical. adequacy is clear in the organization of the proposal. I will irst review the work that has been done on computer language understanding. This is importent ce- cause LAS is a language understander as well es a learner. Then I will review the formal results on granmar induction. Tnen LAS-1 will be deseribed. LAS 1 is a first pass version of the LAS program adequa to learn simple languages. rh OY te Than I will propose en extensive set of developments to be added to the progran, aimed both at increasing its Linguistic powers and making it a realistic sinu- Jation. In describing LAS-1 and the proposed extensions, I will review rele- vant research in the child language literature. Finally, I will propose a series of experiments with artificial languages to check specific claims LAS makes about language jearnability.— ‘ om 2. Computer Language nderstanding Computers have been applied to natural language processing for 25 years. There has been & succession of major reconcentualizations of the problem of language understanding, each of which constitutes @ clear advance over the previous. conceptions. However, any realistic assessment would concede that we are very far from @ general language understanding system of human capability. The ergument has been advanced that there are fundamental obstacles that will prevent this goal from ever being realized (Dreyfus, 1972). ‘These arguuents are shamefully imprecise and lacking in rigor. Te best (e.g., Bar-Hillel, 1962) has to do with the extreme open-endedness of language, that en effectively unbounded variety of knowledge is relevant to the understanding process. It is boldly asserted, without proof, that it is not possible to rovide the computer with the requisite background knowledge. In reviewing the work on natural language systems, T will constantly measure them with respect to the goal or general language understanding. I appreciate that it is a legitimate artiviciel intelligence goal to develop a lenguage system for some special purpose application. Such attempts are free from the Dreyfus and Bar-Hillel criticisms. However, from any psychological point of view these systems are interesting only as they advance our under- standing of how lenguage is understood in general. 9 Machine Treastation SAL S 2 Ue ee tye first inten enea with treasiation. Comp: us a ial ssive effort turned out to be & dismal, failure (ALPAC, 1955; Jus agner 1965). Today, it is fashionable to attribute the Failure to che then-current ingoverished concepsion oF lenguage (e.g., Simmons, 21970; WilKS, 1973). Tre early attemots took the Tors of substitution of equivalent words across lansguagss This was ausnentes oy use oF surface sbtrucvure and word associations but ab no point was tne word abandoned 2S the princioal unit or meaning. Recent work on language understanding (e.g., Schank, 1972s Winograd, 1973) has ecandoned the word as tae mit of meaning. It remains to be seen “~hether current attempts e.g., Wilks, 1973) at machine translation nave better success. Interactive Systens inveracties =f Tae now popular task domain for applications of computers to lenguase is in constructing systems thay can interact with tne user in nis own langusgs2- . PS ~ a Question-answering systexs are the most common, the User can in program abouts Kno s data base and input new xno depen j % yp pu DY % eantatinn snd store n taat Lo. the ini is 37 & wh used to guide an interrogetion of the data base For the answer. The f system is critical in tne answering of questions since many answers will not be directly stored but will have to be inferred from what is in memory. Both parsing and inferencing run into time problems. The central tine problem in parsing has to do witha the exsrerme syntactic end lexical ambiguity of natural languege. Rach word in & sentence admits of n syntactic and semantic interpretations where mon the average may be as high es 10. If there are a words, mt interpretations must be consicere ealtnoughn a only one is intended. ‘The fact that language 4s so amoiguous Was & surprising 6 n discovery of the early machine attempts at parsing (e.g., Xuno, 31955). Thus, there is exponential growth in processing time with sentence length. To date, no heuristics have been gemonstrated that change in general this exponential function of sentence length to something closer to @ Linear function. The human cen use general context to reduce ambiguity to something approxinating the linear yelation. Suppose there are m facts in the data base and the desired @educt wineti long. Then, there 15 something Like mm possible comssne There is also an exponential growth factor in tne task of in xr the desired @eduction. ‘This suggests that very caeep ings n is difficult to acnieve and this is certainly true of our every-c38y yeasoning. However, it also suggests that inference making should become more difficult as we knoy more facts (i.e., nigh & shich is clearly not the case. The provlem fecing inference systems is to select only those facts that are relevant. 10 Aaderson Resolution theorem-proving (Robinson, 1965) is the most studied of the mechaai- cal inference systems. It is also here thet the most careful work has Been cone on heuristics for selecting facts fron the data vase. ‘These methods include semantic resolution (Slagle, 1965), lock resolution (Boyer, 1971), and linear resolution (Loveland, 1970; and Luckhan, 1970). In practical applications these heoristies have served to considerably reduce the growth in computation tine. e t nstrations of the optimality of these heuristics are tasx- ir are no gensral theorems about their optimality. I suspect that eral deal effectively with the problens of exponential g20¥ he Althousn there are potentially serious time problems both in pars pr inferencing, 2 problems have not surfaced in tne past ograns might haye expects isis because these programs have all been rather narrowly constrained. ir lenguage systems only need to deal with a srall portion of possible syntactic constructions and possible word meanings. Also, because of restrictions in the domain of discourse, only 4 restricted set of inferences are needed. Some of the interac ano r etive systems (ELIZA - Weizenbaum, 1966; PERRY -— Colby & effort to Go a complete job of sentence analysis. gs performed to permit success in marrowly circum- ‘antences were generated by filling in pre-prosramnzed The ambition in programs like Colby's or Weisen- earance of understanding. Weisenbaun's program jan psychotnerapist and cColoy's a paranola patient. L uage understanding it was difficult » % H nat these might just be manifesta- Other attempts made more serious efforts at language understanding. They avoided the time vrobdlems inherent in arsing and inferencing by Gealing with restricted task domains. Slagle's DEDUCOM (1965) dealt with simple set inclu- Sion probleas; Green, Wolf, Chomsky & Laughery (1963) with baseball questions; Lindsay (1963) with kinship terms; Kelloggs (1968) with data management systems; Woods (1953) with airline schedules, Woods (1973) with iumar geology; Bobrow (1964) and Charniak (1969) with word arithmetic problems; Fikes, Hart & Nilsson (1972) with a robot world; Winograd (1973) with a blocks world. Other systems like Green and Raphael (1968), Coles (1969), Schank (1972), Schwarez, Berger, 2 3 (1969), Anderson and Bower (1973), Rumelhart, Lindsey end Norman (1972), and Guillian (1969) have not been especially designed for specific task domains but nonetheless succeed only because they worked with sericusly Limited data pases and restricted classes of English input. Because the parser deals with only certain word senses and certain syntactic structures Linguistic am- biguity is much reduced. Those programs that use general inference procedures {ke resolution theorem proving are notabdly inefficient even with restricted a bases. Winograd made extensive use of tne Ta itie ecting inferencing with specific neuristic information. Tne validity of se heuristics depended criticaliy on the constraints Ae. toe li fom Dae ary PNG CSG Winograd (1 1973) has comoined good task analyses, programming: iil, powers of advanced progreé pning languages to create the oeast ext neud standing system, I have heard it seriously claimed that the W sys could be extended to baceme a general model of language unders 2 WHRAD is needed would be to program in all the knowledge o n adult chend the parsing rules to the point where they handled all English sent Admitted ly, this would be a big tasx requiring hundreds of man-years oO put, it is argued, no greater than the work that goes into writing big ing systems, Clearly, this argument is faulty if only because it + deal with the time problems in general inferencing and general parsing. r, it is also unclear whether human langueze understanding can ve captur a Fixed program, Further, it is dubious whether it is manageable to a ook keeping -thas is necessary to assure that all the specific pieces of Kn are properly integrated and interact in the intended ways. Our Li e conpe= tence is not a fixed object. This is clear over the period of as We learn new gramnat ce styles, new words, and new ways of thinki think this is also true ove nort spans of time. That is, the way humans ¢ with the time problems naan in parsing and inferencing is to adjust the parsing and inferencing eccording to context. Language Acauisition as the Road to General Language Understanding The preceding remarks were meant to suggest how an adaotive language system might provide the solution to the fundan antal croblems in general language understanding. Rather than def ining and hand-programming all the reauisite knovledge, way now let the language understanding system discover that hoovledges and yrugrau iusel1i ‘ine language acquisition system is a mechanized bookkeeping system for integrating ell the knowledge required for language understanding. By its very nature it treats linguistic knowledge as constantly changing object. So we know 1t would change with a changing linguistic community. We might hope that it could adapt over snort periods (like hours) to its current context Learning systems are frequently regarded as the universal panacea for all thet ails artificial intelligence. Therefore, one should be rightfully suspicious whnetker LAS will provide a vieple route to the creation of a general language understanding system. Certainly, the initial version of LAS falls far short of the desired goal. However, with our current state of knowledge it is just not possible to evaluate LAS's pretensions as an eventual lenguage understanding system. It is only by systematic exploration and development of LAS that we ever will be able to Getermine the viability of the learning approach. a + Whatever the potential of the learning approach in artificiel intelligence, clearly it is the only viable psychological means of characterizing human lin- gauistic knowledge. It would be senseless to provide a catalog of all the knov- ledge used in language widerstanding. A catalog of everything is a science of nothing {a quote from T. Bever). Rather, we must characterize the mecnonism that creates that knowledge and how that mechanism interacts with exverience. 12 Anderson tanaat wpetor Woads Systen The Linguistic formalisms used by LAS are very similar to W Bugnentec transition networks. This sechion on computer Langes. ¢ eoneludges with a description of Woodst systema and an exposition of the suita~ bility of his formalisms for the current praject. There are three critical features that LAS reauires of the formalisns thet will express jts grammatical knowledge. First, it should be a formalism thay can be used with equal facility for language parsing and lenguage generation. Tnis igs pecause it is unreeson- able to assume that a child incependently learns now to speak and how to under- stand, Second, we want @ formalism for whicn it is. easy to devise a constructive algorithm for inducing grammar. That is to say, some descriptions of grammatical nowledge are computationally easier to induce when others, even though the CAS Janguage they describe. Third, we want the f rmalism to be close a t to the assumptions it makes about the interpretative system that uses tne gremnar for speaking and understanding. This is because that interpretative sysven is taken as innate. Thus, it is not possible to induce new programs for interpreting the grammatical rules, it is only possible to induce new grammatical rules. g: A guiding consideration in this research is that these gronmatical formulation are satisfied by a finite-state tran x un we rie Qu om ry fo ct p rH © Hy Q epresentation. Tne proclem is that natural languages are fundamentally more complex than finite state languages. However, Woods has shown a way to keep ke some of the advantages of the finite state representation, put echieve the trang?farmational crammar . Unadst angmanted transition nebworks are similar to and were suggested by the and Dewar (1968) end Bobrow end Fraser (1970). Transition networks are like finite state grammars except that one permits as labels on arcs not only termin- al symbols but also names of other networks. Determinetion of whether the are should be texen is evaluated by a suoroutine call to another network. This subd-network will analyze 4 sub-phrase of the Linguistic string veing analyzed py the network thet called it. The recursive, context-free aspect of language is captured by one network's ability to call another. Figure 1 provides an example network taken from Woods’ (1970) paper. The first network in Figure 1 provides the Mainline” network for analyzing simple sentences. From this mainline network it is possible to call recursively the second network for anaiysis of noun phrases or the third network for the analysis of prepositional phrases. Wood (1970) describes how the network woule recognize an illustrative network grammers of Thorne, bratley, To recognize the sentence "Did the red bam collepse?" the network is started in state S. The first transition is the aux transition to state qo permitted by the auxiliary "did." From state qj we See that we can get to state a3 if the next "thing" in the input string is an NP. To ascertain if this is the case, We call the state NP. From state NP we can follow the are lebeled det to state gg because of the determiner "the." From here, the adjective "red" causes & loop which returns. to state qg, and the subsequent noun "harn” causes a transi- tion to state q7. Since state q7 is 2 final steve, it is possible to "pop uo" from the NP computation end continue the computation of the top level 8 beginning in state 43 whieh is at the end of the iP arc. From q3 the vero "collapse" permits a transition to the state 13 adj #8 & rn Gy ate Q . nor Spy Ee +R) MP el) FIG. A, A sample transition network, § is the start state. Gee Ga es. (From Woods, 197 ) = yo Ga Gan ANG Gy g AIC the final stat ub q),. and since this state is final and “collapse is the last worg in the string, the string 1s accepted as & sentence (po. 991-592). + . . : bennei ke nat 1s known as & recursive transitior Hq T have illustrated in Pigure l n a network which is equivalent to 2 context-free phrase-structure grammar. Woods! networks are in fact of much stronger computats nal power - essentially that of a Turing Machine. This is because Woods permivs arbitrary ections. This gives the networss the ability of transformational grogmars to permute, copy, and delete fragments of a sentence. Thus, with nis network formalisms Woods can derive tne deen structure of a senbence. The croblem with this grammaticas representation 1s that it is too powerful and permits commutation of many things that are nov pars of 4 speaker! grammatical competence. In +e Ve o e % & of Ss the LAS system all conditions and ections on networs arcs are teken from 2 small repertoire of oper tons possible in the HAM memory system {see And son & Bower; 1973}. This vay some context-sensitive xr duced into the language without introducing psycnologically unreslistic powers. Tn many way pover and beh one criticai @ network formalisas of Woods ere isomorphic in their to the program granmars of Winograd. However, + i neer Tne flow of coatrol is contained in Winograd’s pro- a particular progran is committed to 4 certain beha- e in the network formalism. The flow of control is ~ which uses the grammatical enowledge contained in different interpretative systems the same net- be used in different Ways. This is critical gram gramrars vior., Tnis is not th econtaineé in sni r the netKerss. nus by WwW Ss ee = war’ (orarcar enecifiecation can to LAS's success where cnree different interpreters use the same gronmaticel iL formalisms to guide understanding, generation, and language inducticn. 3. Researen on Gramucar Tnduction Apparently the modern work on the problem of grammar snduction began with the collaboration of N. Choasky and G. Miller in 1959 (see Milier, 1967). There have been significant formal results ootained in this field and it is essential that we review this researcn before considering LAS. The approach taken in this field is well characterized by the opening remarks of & recent highly-articulate review chepter by Biermann and Feldaxan (1972): The grammatical inference problem can be described as follows: a8 finite set of gmbol strings from some language L and possibly @ finite set of strings from the complement of L are known, and &@ grammar for the language is to be discovered . +--+ Consider & cless C of grammars and a machine 4M. Suppose some Ge C and some I (an information sequence) in t(L(G)) are chosen for pre~ sentation to the Machine Mc. .-- Intuitively, jdentifies G if it eventually guesses only one grammar and that grammar generates exactly L(G), (pp. 31-33) The significant point to note about this statement is that 44 is completely abstracted away fron the problem of a child trying to learn nis lenguage. There has been virtually no concern for algorithzs that will efficiently induce the subset of gremmars that generate natural languages. The problem 15 is posed in general terns. The character ization is with inducing 2 characteriza ation of the well-formea However, this is not the task which the child faces. mappings between conceptuali ons and strings oF th must understand what is $90: to him and Learn how Te a characterization of the well-formed strings €© product of the mapping between sentences and meanings in the formal work on language induction, there has about the contribution that semantics might have to” man is without any practical so he set of possible langusges ig too unrestricted. Worrable solutions are pos- sible to practical problems only when it is possible to sree atly PRStE Ley the candidate languages or because important clues exist wW! The grammatical inference problem as charac eterized by Blermann and Feld-~ tutions. Workadle solutions do not exist because. ct ib wa priori possible languages. Chomsky (1965) argued ssentially Yor this view with respect to the problem of a child learning his firs» danguage. He suscgested that the child could take advantage of linguistic universals which greatly restricted the possible languages. i will argue that such universals exist in the form of strong constraints between the structure of a sentence and the semantic structure of the referent. These constraints pr ovide critical cues for the induction problem, Gold's Work - Prahahiy the most influential paver in the field is by cola (1957). He provided an exp icit diterion for success in 2& lan guese induction proolem and pod , proceeded to formally determine which lLearner-teacher ractians could achieve thet criterion for which languages. Gold considers 4 = the Limit if after some finite time the Learner discove s enerates the strings of the Jenguege. He considers tyvo information S ner eq in the first the learn is presented with all the sentences of the language and in the second the Learner is presented with all strings, eacan properly jdentified as senvence or non-sentence. ‘Then Cold aszs this question: Suppose the learner can assume the language comes from some formally characterized class of languages; can he identify in the limit “hich language 1t is? Gold considers tne classical nesting of language classes — finite cardinality languages, reguler (finite state), contexc-Tfree, context-sensitive, and primitive reeursive. His clessic result is that if the learner is only given positive information acout the language (i.e., the first information sequenc e), then he can only identity finite cardinality languages. However, given Po itive arn S (i.e., the second information sequence), he can le sive languages. , The proof that the finite state cless is not identifiable with only pos- itive information is deceptively simple. Among the finite state banguages are all languages of finite cardinality (i.e., with only finitely rany strings). At every finite point in the information sequence the learner will not know if the language is gener ated by one of the infinite, nite cardin- te ality languages swhich includes the sample or ani state grammar which includes the sample. Logica s similarly easy to prove that any language in the primitive recur- S c s can be induced given positive and negative i information. Tt is possible to enumerate all possible primitive recursive grammars. Assume an AZ algorithm that proceeds through this countably in one fter another until it finds the c sta greammar 2S Long 4s the informati it. incorrect grammar G will be rejected inf sequence-~either ecause the seque & 8 anerated by G, or 35 & positive ins G. correct grammar has some Tinite p alg LL eventually consider it and stay tec atser than the above but these wi fhe algorithm outlined in the second proor m For instance, the position is astronomical of E ordering of all possible eontext-sensitive lan 3 g terminal symbols. However, Gold also proved the here 15 L . aa a ehniaue. That 1s to Sey, given any alsgo- Ss more effective t 1 3 ve language for which the enumeration rithm one can pick som algorithm will be fast So, Gold le two very startling results that we must live with. First, only fin: lity languages cea be induced without use of negative information. 2 necause children get little negative fTeedoack ck they do get (Brown, 1973). Second, - and make Litel Se *arnat negative reecpa t oO 4 enumeration. This is startling because no procedur than blind blind ennun opeless 85 a practical induction elsorithm for natur- ' al langu2:: see how natural language can be induesd despite Gold's res: 3 review some other research of the same ilk. € the esrly attempts to provide & constructive algorithm was proposed by T is, he attempted to define an elgorithm waich would con— by bit the correct grammar rather than enumerating possiole grammars. LAS is a constructive algorivthn. His ideas were never programmed an had their ou logical flaws exposed by Shamir and Bar-Hillel (1962) and by Eorning (1969). In nmart Solomonorf hes served as @ straw man that served to justify the enumerative pproeach over the constructive (e.g., Horning, 1969). 8 br arried the Gold analyses ferther. Feldman (1970: Feldman and his students have c e provided some urtner finitions of languages jdentifiability and proved Gold-Liks results for these. F man considered not only the task of inferring a grammar that generated the semple, --- also the task of inducing the most simple grammar. Gran- mar complexity was measured in terms of number of rules and the complexity of sen- tence derivations. Horning (1969) provided procedures for inducing grammars whose rules have different provabilities. Biermann (1972) provided e nucber of efficient constructive algorithms for inducing finite state grammars when the number of states is known, ‘Tnis is a relatively tractable probvlen first formulated in 1956 by Moore, however, Moore's algorithms are much less efficient than Biermann's. stave grammar induction that 4 Pao (1969) formalized an elgorithn for finite ste n advance. A sample set of did not require the number of states to be known sentences was provided which utilised ell the rules in the grammer. A minizal finite state network was constructed that generated exactly the sentences. Thea an attempt was made to generalize by merging nodes in the work, The algorithn checked the consequences of potential generalizations DEO LT Andersen mwledy Tey os a ’ asxing the teacher wa a u in the target langu t a ie J sented these induct po oD at voods' work, she d Shoe Found that such a if sne provided punc networks occur. Bas the sentence's surfa and Ruff (1963) foun easily when surface Crespi-Reghizzi eo C in program was gives. Lnroi we 5 4 in the inéuction of operat r-precedence lenguegz> wal 2 S cont free languages. For a special subset of operator precedence languages he was able to define an algorithn that worked with only positive information. Except s the only available result of success for finite cardinality languages, this 1 with just positive information. S have shown relatively efficient, constructive algoritams are Bo esting lenguage classes if the algorithms nave access to informetion ab neets surface structure. The provlem wita their work is thet this is provided in an ada hoe manner. It has the flavor of cnsating and cer-- tainly is not the way things happen with respect to naturel language induction. a *. T think the work of Pao and of Crespi-Regnizzi neve promising 352 7 Dow. Risface thructure of the contomce mov be inferred hy com] paring te to its semantic referent. Crespi-Reghizzi has also shown how the properties of a restricted subclass of languages can be used to reduce tre reliance on negative information. While natural languages cartainly have aspects that can be best captured witha context-sensitive grammatical forralisms, most context~-sensitive languages are ridiculous candidates for a natural languege. An efficient induction algorithm should not become bogged down 2s does Gold's enuneration technique considering these absurd Janguages. Cet nee de Graczar as a Mapping Between Sentence and Canception There is one sense in which ell the preceding work is irrelevant to the tesx of inducing a natural language... They have as their goal the induction of 2 correct syntactic characterization of a4 sarget language. But this is now what natural language learning is about. In learning a natural language the al is to learn a map that allows us to go trom sentences to their corresponding eptual structures or vice versa. I argue that this task is easier than learning the syntactic structure of 4 natural Langusze. This is not becaus there is any magic power in semantics per se, but peceause natural languages are to in a very non-arbitrary manner the s so structured that they incorpora ture of their semantic referent. The importance of semantics nh as b forcefully brougat home to psychologists by a pair of experiments by Moesser and Bregman (1972, 1973) on the induction of artificial languages. They con- pared Language learning in the situation wnere their subjects only saw well- formed strings of the language versus tne situation where they 547 well-formed strings plus pictures of the semantic referent of these strings. In eltaer case. the criterion test was for the subject to be able to detect wnick strings 18 Anderson of the language were well-formed -- without aid of any referent pictures After 3909 training trial s subjects in the no-referent condition were at chance in the criterion test wnerea subjects in the referent condition were essentially perfect The Role of Semantics Results lize those of Moesser and Bregman have left some b there is some magic power 1m naving senantie referent. However that there is no necessary advantage to having a semantic rerere lationship between a sentence and its semantic referent coulc, i be an aroditrery recursive relation. Inducing this relation 15 3% aifficult as inducing an arbitrary recursive language. ‘This la in need of ea proof which ft have provided (Anderson, 1975). It to reproduce here, bu 4Y Ot + algoritnm to bitrary semantic rela ences, coul La identify an arbitrary Gold's wor o ind tion algoritha for the se iv c - be more effective than tne impossible enumeration alsoritaum for identifying en arbitrary lenguase. Thus, for it to be possible to induce the semanzic relation, there. must be strong constraints on possible form of that semantic relation. How does this semantic referent facilitate grammar induction? There are at least three weys: . First, rules of natural language ere not formulated with respect to single waras but with respect to word classes Like noun or transitive vero which have & common s2nantic core. Se semantics can help determine the word classes. This is much more efficient than earning the syntactic rules Yor each word separately... Second, semantics is of considerable aid in generalizing rules. A general heuristic employed by LAS is that, if two syntactically similar rules function to. create the same semantic structure, then they can be merged into a single rule. Third, there is a non-arbitrary correspondence between the structure of the semantic reZerent and the structure of the sentence whi permits one to punctuate the sentence with surface struc- 1 ~ en ture information. The nature of this correspondence will pe explained later. Siklossy 'S Work The only attenpt to incorporate semantics as 2 guide to grammar induction Was by Siklossy (1971). He attempted to write @ program that would be able to learn languages from the language-through-pictures books {e.g., Richards et al., 1961). The books in this series attempt to teacn 2 ween ae *y pre- senting pictures paired with sentences that describe the Sit losey 's program, Zbie, used general pattern—mai correspondences between the pictures (actually han end the sentences. The program does use information to help induce tne surface structure of the senten of LAS. However, it remains unclear exactly what u s of semantics or what kinds of Lenguages the program can learn. Tne displayed exexples 0 of the program's behavior are very Sparse with exanoles n tions. AS we Will see, & progran must have strong it is to learn a language. The few examples of gen lows: Suppose Zyie sees the following three senten caniques: to find picture descriptions) on in the picture encodings e at in the manner poe a 8 it W 19 Anderson 3) Joha walks 2) Mary walks 3 John telks able sentence. If dees not a these generalizations. des no discussion of now his program's benavior relates ng a language. The one example of an attenpt to simulate x +} is Kelley (1957). His progres attempted to simulate the Cc utterances Yrom one word, to two words, to three words. aan to be making use of semantic lmao bub he never speciries ro 's performance. In general the details of the program ar exanple the program never gets to the point of producin gd iti nelear whether 1% could. be i e not coepiained. i: nS grammatical sen 3 oOo cr 5 13 QO wy a po oa Y, Rationale i z Q 0 pB er bp ption in the LAS project is that 4 language learner can some- meaning of sentences and that language Learning takes volace gtances. The specific goal is ing of the oO rh @ te _ w mu ge - 1 * 1D ct a i ke y pss pu m 3 ct oO 3 a fF a4 yor OD 3 ny ‘Ss to ex plein how the pal mantic referent permits Lansuag 2 velop @ computer program woaich ired * aa th- ‘semantic 4 intersretat ro mB ch W i oO Fe jos choy Be Fy 4 4 fo ry OF ry a © oO ry ot ms kr Ww oO 0 YOR PP A 4 a] 5 foo nO # as to hw” QO nm Oo t d E ct fay a complexity, it is ess acquisition ake the form of 4a computer pro need of a computer model after describing + imate go21 to provide a faithful simulation of child language 4 quisition. One mi question whether 4 system constructed just to succeed at age learning W have much in common with the child's acquisitian systes. I strongly suspect it will, provided we insist that the system have the sane juror mation processing linitations as a child and provided its language les suetion has the sane information-—processing demends as at of the chile. ne consideration underlying this optimistic forec at learning & naturel language imposes very severe and highly unique on-processing demands on any induction system and, consequently, there § very severe Limitat ions on the possible structures for 4 successful system. A similer argument has been forcefully advance ed oy Simon (196 to the information-processin 1g demands of various problem-s solving te This project does have as an ult eC te py MH e rym t t % be @ Oo ee a pe cr ct es Wy oO uo Oo 6 oO ct The current version of the progran LAS, L works ~ en overly simpli domain and maxes unreasonable assumptions abou z gz Nonetheless, it predicts meny of the gross rea generalization in child language learning. Tt i errivly "off" in other aspects. It turns out thav many of its failures of simulation can be traced to the un- realistic erate ns it is making about tass domain and inforzation processing abilities. Many of the proposed devel oments of the prograd nave as their goal tne elimination of these unrealistic assucpcicons. The assumptions vere made to make the problem more trac tayle in a first-pass attempt. oo ures O2 veoneralization z aA & 20 rm we ae i 5 - The Progran LAS. 1 [i -AeUe e ne aE ROO OA AN Tis section deserives LAS I, 4 relatively smali progrem that was put together in eight montas. Tt has achieved success in é€ non-trivial natural languzge in- duction will be principally concerned wita extending SPRAK which uses the network formalisms %o generate 2 uses the seme networss for sentence understancing BRACKET which punctuates sentences With their surfece structure by compari referents, aid SPFAKTEST which puilds an initial ® etbwork grammar to parse a sentence end GENERALL = which eneralizes the initiel ranmar. > LAS is an interactive program written in Michigan Lisp (Hafner & Wileox, 1975). The progres acceots as input Lists o* words, whica it treats as sentences, and scene descriptions encoded in 2 yariant of tne HAM propositional language {see Anderson & Bower, 1973). obeys commands to speak, understand, and learn. The logical structure of LAS is illustrated in Figure 2. Central to LAS is an augmented transition network grammar similer to thet of Woods (19TG). ‘In response to the ccmaand, Listen, LAD eVORES vie wEU St oH UNDORCTAND. Tac inet to UDR STAND is a senvence- LAS uses the information in the network grammar to parse the sentence and optain 2 representation or the sentence's meaning. in response to the command, Speak, LAS avoxes the progres SPEAK. SPEAA aceives a picture encoding and uses the information in the network grammar tO generate a sentence e 409 describe the encoding. Note that LAS i S Doth to speak and understand. The principle pur in LAS is to provide & test of the grant 3 using the same network formalism pose of SPEAK and UNDERSTAND ed by LEARIMORS. H “ od . ts au C QO The philosphy pehind the LEARNMORE program iS to provide LAS with the sene information that a child has when he is learning a langusse through osten- sion. it is assumed that in this learning mode the adult can both direct the child's attention to what is being described and focus the child on that aspect of the situation wnich is being deseribed. Thus, LEARNMORE is provided with a sentence, & HAM description of the scen ana an indication of the main o proposition in the sentence. It is to proauce @5 output the network grammar that will be used by SPEAK end UNDERSTAND. + is possible that the picture description provides more information then is in the senvence. This provides more information than is in the sentence. This provides no obstacle to LAS's heuristics. In this particulsr yersion o- LAS 4% is assumed shat it already knows the meaning of the content words in v! entence. With this informetion BRACKET will assign a surrace structure & he sentence. SPRAXTEST will deter- mine whether the sentence is handled by tne current grammar. if not, additions are made to handle this case- These additions generalize bo ovhner c2se5 SO that LAS can understand many more sentences than the ones it was explicitly trained with. ct Pp Mm wv el SENTENCE MAIN: PROPOSITION DTCTURE ENCODING | an Ee ai gata tee RES EATS, i LRA RNMORE | pp ? lee i? | Bracke = i soeartest? | | GENERALIZE| JEN DED 4 4 TRANSITION | NETWOR | GRAMMAR | RUGHE SENTENCE HAM CONCEPTUALIZATION SENTENCE Figure 2. A schematic represe entation sivin of the major subcomponents of Ls _ UNDERSTAND. 22 ~LEARNMORE,S PICTURE ENCODING the input and DSA The SPEAKTEST program would permit LAS to construct a parsing network adequate to nance ile all the sentences jt was presented with. Also it would make many Llow-Leve vel generalizations about phrase structures and word classes. would permit ssfull nalyae or generate many novel sentences. es c ization i % the BRALL re i he 3 “grammars is essential © GENERALE is oO The HAM, 2 Memory Syste The Het 6 De LAS. L uses 4 version of the HAA memory syste (sea Anderson & Bower 1973) ir > called HAM. 2. HAM, 2 provides LAS with two essen al features. First, it provides & representational formalism for propos siti onal knowledg2- This is used for representing the comprehension ouspus of UNDERSTAND, the to-pe-spoken input to SPEAK, the semantic information in 1 long-ter menory, end syntactic in- formation about word classes. HAM: 2 else “contains a memory sear chi rithm MATCHAL which is used to evaluate various parsing ¢ conditions. F stance, the UNDERSTAND progrem requires thee ce rtain features be true of 2@ ora for 2 parsing rule to apply. These are checsed bB & The same MATCHL process is used by the SPEAK prosras to Get et Ds ection essociated with a parsing rule ereates part of th en struc- sure. This MATCHL process is ° variant of the one deseribed in Anderson and Bower (1973; Cn. 9 2 12) and i sg details will not be discussed here. However, it would be useful to de ripe here the repress ntationalL ror- r now the int eornmation in the sc malisms used bY HAM, 2. Figure 2 jllust es sentence A rec. square is above the circle would be represents’ © ith the HAM. 2 networs formalisas. There are four distinct Pr two nodes X and Xi xX is red, X_1is 2 sauere, 4 Each propos sit son is repre sented by 4 distinecs tre ture consists of a root proposition node conn node and by @ Bli ink to a predicate node. ; posed into 4 R link pointing to a relavion node and into & ° Link pointing ‘to en object node. The semantics of these represens tations are to be interoreted in terms of simole set-theoretic notions. Th subject is @ subset of the predicate. Thus, tne individual X is & subset of the red things, the square things, and the things aoove x. The individual yis a subset of the circular things. » One other point needs empnasizing epout this representation. There is e distinction made between words and the concepts which they reference. The words are connected to their correspo onding ideas oY Jinks Labelled W. Figure 3 illustrates all the network notation needed in the current japlementation of LAS. There are & number of respects in which this represeatation is simn- eee then the old HAM representation. Cnere are not the means for represent— ng the situation {time + place) in whicn such a fact is true or for enbedding one proposition within another. Thus, we cannot express in EAM. 2 such sen- 2 tences as Yester day in =y¥ pedroom 2 rec square Was Boove the circle oF John believes that 8 red guuare Ts apove tae circle. nepre believes that § qd Figure 3. An e yample of 2 propositional netw CIRCLE ork representation in HAM-< 7 Dy. “_ Anderson re only con- a ension. In * 1 Lng ostension, tac assumed time and place are . Concepes Lise pelief woich require enoedded propositions are too for ostension. In future yesearch LAS will be extended oceyona the c asive domain. At that point, complications Will pe reqgaired in + resentations, however, wien starting out on 4 project it is prefe seep things @s simple as There are @ numocr of motivations for the associative network representa- tion. Anderson and Bower (1973) have combined this representation with a nun- per of assumptions about the psychological processes that use then. Predic- tions derived from the Anderson and Bower mo ‘ & to be generally true of human cognitive performances. Howeve 2c HAM's representation have not been emple that recommends associative network. represenvetions s a compute has to do with the facility with which they can be searcned. Another advantage of this representation is particularly relevant to the LAS project. This has to do with the modulerity of the representation. Bach proposition is coded as a netyorn structure that can be acessed end used, independents of other So far, I have snown how the HAM. 2 reoresentation encodes the episodic hn input to SPFAK and the output of UNDERSTAUD. It cen alse Ss the semantic and gsyntectic information required by the parsing eaten tenctan sy TAM 9 weavta anaods the foot that rircta ae ot PRIUS UL LCoS sew oa = and square are potn shapes, red and bine a belong to the word class *CA bub square and blue telong to the word ciass FCB. Pr . 7 oT -ae 3 2 Note the word class information 15 prediceved oF the words while the categor- teal information is predicated of the concepts attached to these words. The a mntactic rule only applied to s e both colors, circle and red te) categorical information would be used if 2 2 anrm7.37 i shaves or only to colors. Tre word class information micnt be evoked ifa 2S vy —_ y~ a ae language arbitrarily applied one syntactic rule to one word class and onother rule to @ different word class. Inflections are @ common example ort syntactic s 3 rules which apply to arbitrarily defined vord cle HAM, 2 has 2 small - language oF commands which cause various memory Links to be built. The following four are eli. that are currently used: 1. (Ideate X Y) = create 2 W link from word K to idee XY. 2, (Out-of X Y¥) - create & proposition node Z, From this root node ereate uk 3. (Relatify X Y) - create an R link from X to h. (Objectify % Y) - create an QO link = These commands wilkL appear in LAS's parsing networks to create memory erions. Often rather than memory structures required in the conditions ane iv appear in these commands. Tf the nodes, variables (denoted X1, Xe, etc) variable hes as 445 value a memory node that node is used in the structure puilding. if the variable has no value, @ memory node is created and assigned to it and that node is used in the menory operation. To illustrate the use of these comzands, the following is & Listing of the commands that would create the structure in figure 3: