Appendix A: KSL Brochure 5 P41 RR00785-16 managerial posts and conference chairmanships in both the American Association for Artificial Intelligence (AAAI) and the International Joint Conference on Artificial Intelligence (IJCAD. Several KSL faculty and former students have received significant honors. In 1976, Ted Shortliffe received the Association of Computing Machinery Grace Murray Hopper award. In 1977, Doug Lenat was given the IJCAI Computers and Thought award, and in 1978, Ed Feigenbaum received the National Computer Conference Most Outstanding Technical Contribution award. In 1979 and 1981, Ted Shortliffe's book Computer-Based Medical Consultation: MYCIN was identified as the most frequently cited work in the IJCAI proceedings. In 1982, Doug Lenat won the Tioga prize for the best AAAT conference paper while Mike Genesereth received honorable mention. In 1983, Ted Shortliffe was named a Kaiser Foundation faculty scholar, and Tom Mitchell received the IJCAI Computers and Thought award. In 1984, Ed Feigenbaum was elected a fellow of the American Association for the Advancement of Science (AAAS), and he and Ted Shortliffe were elected fellows of the American College of Medical Informatics(ACMI). Larry Fagan was elected a fellow of ACMI in 1985. In 1986, Ed Feigenbaum was elected to the National Academy of Engineering and in 1987, Ted Shortliffe was elected to the Institute of Medicine of the National Academy of Sciences. The American Association for Medical Systems and Informatics Young Investigator Award for Research in Medical Knowledge Systems was presented to Glenn Rennels in 1988 and to Mark Musen in 1989. KSL Research Environment Funding—The KSL is supported solely by sponsored research and gift funds. We have had funding from many sources, including DARPA, NIH/NLM, ONR, NSF, NASA, and private foundations and industry. Of these, DARPA and NIH have been the most substantial and long-standing sources of support. All, however, have made complementary contributions to establishing an effective overall research environment that fosters interchanges at the intellectual and software levels and that provides the necessary physical computing resources for our work. Computing Resources—Under the Symbolic Systems Resources Group, the KSL develops and operates its own computing resources tailored to the needs of its individual research projects. Current computing resources are a networked mixture of personal workstations, Lisp workstations, and central host computers and network utility servers, reflecting the evolving hardware technology available for Al research. Our central host is currently a Sun 4/280 running Sun Unix 4.0 (this is the core of the national SUMEX biomedical computing resource). It provides a central service for remote network access, electronic mail storage and routing, large-scale file storage, and printer spooling services. Increasingly, computing functions, such as electronic mail reading and composition, text processing, and information . retrieval, are being moved to distributed user workstations. Our Lisp workstations include 34 Texas Instruments Explorers, 2 Symbolics 3600- E. H. Shortliffe 226 5 P41 RRO0785-16 Appendix A: KSL Brochure series machines, 3 SUN 3/75 workstations, and 4 NeXT machines. Much of the routine computing is done with 80 Apple Macintosh II computers, 15 of which have Texas Instruments microExplorer Lisp co-processor boards. Network printing, file storage, Internet gateway, and terminal interface services are provided by dedicated machines including a VAX 11/750, a SUN 3/180, and numerous special-purpose microprocessor systems. These facilities are integrated with other computer science resources at Stanford through an extensive Ethernet and to external resources through the ARPANET, TELENET, and the BARRNet (Bay Area Regional Research Network) link to the NSFNet. Funding for these resources comes principally from DARPA and NIH and hardware vendor gifts. 227 E. H. Shortliffe 5 P41 RROO785-16 Appendix B: Lisp Performance Studies Appendix B: Lisp Performance Studies Performance of Two Common Lisp Programs On Several Systems (Report KSL 89-02) by Richard Acuff Abstract To assist in the evaluation of Lisp platforms for the Stanford University Knowledge Systems Laboratory, 22 Common Lisp implementations were benchmarked. Run time and compilation time data on two moderate-sized application programs are presented, along with data on the effect of compiler optimization levels and on the impact of display I/O on run time. For these Lisp benchmarks, several systems did not rank where we expected them based on speed ratings using other conventional measures. Also, the rankings of machines by Lisp speed differed for the two programs we tested The data indicate that the performance of Lisp systems is very application dependent. Software environment should play at least as strong a role in machine selection as performance benchmarks. 1. Introduction At Stanford University's Knowledge Systems Laboratory (KSL), a large amount of software is written in Lisp. Thus, the performance of Lisp systems is often crucial to the productivity of the lab. In order to assist us in understanding the performance of different Lisp systems, we have undertaken an informal survey of 22 Common Lisp implementations using two software packages developed in the KSL. The main goal of this survey was to understand the execution speed performance of systems that we might use in the KSL for research and development or dissemination of research results. Secondary goals were to evaluate the effect of compiler optimizer settings on execution speed and to evaluate the effect of reducing the amount of output on execution speed. There have been a number of projects to measure the performance of Lisp systems. Gabriel's work [Gabriel 1985] is probably the best known, and is the origin of the so-called "Gabriel Benchmarks", a set of small test programs for measuring specific aspects of Lisp system performance. The Gabriel benchmarks are extremely valuable, for people trying to compare Lisp systems, if used knowledgeably. However, the aspects of a Lisp system stressed by a particular program are often difficult to determine so that it is usually best, where possible, to run that program on the systems in question rather than attempting to dissect the program and forecast its performance analytically. Also, with the advent of numerous implementations of Common Lisp [Steele 1984], we can now use much larger test programs without the bother and uncertainty of porting between dialects. In this survey we have focused on execution speed which has long been an important criterion for comparing computer systems. The first comparison of 229 E. H. Shortliffe Appendix B: Lisp Performance Studies 5 P41 RROO785-16 two systems solving the same problem (benchmarking) was probably made shortly after the creation of the second computer, and benchmarking has been a primary differentiator among computer systems ever since. However, execution speed benchmarks are only one aspect of the performance of systems, especially Lisp systems. Issues like programming and user environments, compatibility with other systems, the ability to handle "large" problems, and cost (hardware, software, and human) must also be considered, and, given a machine that is "fast enough", these other issues will almost always be the overriding factor. Descriptions of the programs used in this evaluation are given in Section 2. A description of the methodology used in performing the tests is in Section 3, and information about the Lisp systems tested is in Section 4. Data on the execution speed of the test programs are presented in Section 5, followed by compilation speed data and a comparison between compilation speed and execution speed in Section 6. The effect of choosing various values for the SPEED and SAFETY options of the OPTIMIZE declaration on the BB1 system are discussed in Section 7. The effect of reducing the screen output of the SOAR benchmark is presented in Section 8. Details of the test procedures and descriptions of the systems tested are in the appendices. 2. Test Software The software systems used in these tests were SOAR [Laird 1987] and the BB1 blackboard core [Hayes-Roth 1985 and Hayes-Roth 1988]. These test programs were chosen primarily because they are implemented in pure Common Lisp, making them extremely portable!. Both are systems in daily use in the KSL but represent two distinct research directions in terms of program function and structure. These systems were initially developed in environments other than those tested, and no attempt was made to optimize their performance for any of these tests. Neither of these systems is an intensive user of numeric computation. A copy of the Common Lisp source code used for these tests may be obtained from the author by sending U.S. Mail to "Richard Acuff, Stanford KSL, 701 Welch Road, Bldg. C, Stanford, CA 94305" or electronic mail to "acuff@SUMEX-AIM.Stanford.EDU". There were one or two small porting difficulties that were traced to problems in the test code which had to be fixed. For instance, many systems allow (!NTERN "NAME" ‘USER) where others require (INTERN “NANE" (FIND-PACKAGE “USER")). Also we were unable to get SOAR to work in either versions 1.0 or 1.1 of Allegro Common Lisp for the Mac II due to unexplained software hangs so it is omitted from SOAR-related charts. E. H. Shortliffe 230 5 P41 RROO0785-16 Appendix B: Lisp Performance Studies 2.1. SOAR SOAR is a heuristic-search based general problem solving architecture developed by Paul Rosenbloom, ef. al. See [Laird 1987] for more information on the SOAR system. All test runs of SOAR were done solving an eight-puzzle problem in one of three modes: Mode A (simply solve the problem), Mode B (solve the problem while "chunking" or "learning"), and Mode C (solve the problem after having "learned" in Mode B). An "eight puzzle" is a common children's game with 8 tiles, numbered 1 to 8, on a 3 by 8 grid such that a tile adjacent to the empty place can be pushed into it. "Solving the eight-puzzle problem" consists of producing a series of tile moves such that, from a given arbitrary starting configuration, the eight puzzle ends up with all the tiles in numerical order, reading from the upper left around the puzzle clockwise, with the empty place in the middle. The version of SOAR used was 4.4.4, dated April 19, 1987. It consists of 1 large LISP source file and 2 small SOAR files containing productions for solving the eight-puzzle problem. The LISP source is 10,661 lines (280,050 characters) of lightly commented code. 2.2. BB1 BB1 is a blackboard-based problem solving architecture developed by Barbara Hayes-Roth. For more information on the BB1 blackboard core, see [Hayes-Roth 1985]. For further information on BB1, see [Hayes-Roth 1988]. All references to BB1 in this document refer only to the "core" blackboard parts of the system and do not include any other layers of the problem solving architecture or the user interface, as these components are not in pure Common Lisp. All test runs of BB1 went through three cycles of adding 10 items to the blackboard, accessing those 10 items, and then deleting them. The version of BB1 used was 1.2. The LISP source used consists of 10 files ranging from 36 lines (814 characters) to 3,396 lines (107,528 characters) of lightly commented code, with a total of 8,722 lines (295,199 characters) of code. 3. Methodology All the tests were performed in as near to a "normal" working environment as could be achieved. We tried to duplicate the working conditions that a researcher would likely have both in hardware and software. Where possible we selected test machines configured with the amount of memory, amount and type of disk, type of display, etc. that a typical developer would purchase and use. We ran the software in a way that a developer using the system would probably use it. Thus, if it was normal to run with garbage collection enabled, under a window system, within an editor, or in a multi- programming environment, then that was done. For instance, Sun machines were tested under SunView with a couple of perfmeters running. The HP 231 E. H. Shortliffe Appendix B: Lisp Performance Studies 5 P41 RRO0785-16 machine was tested while running in GnuEmacs on X.10. MIT-style Lisp machines were run with all networking and other background processing on, and no special process priority. No expert tuning or system configuration was done beyond what the tester could do by reading over the user documentation. All systems were tested in single-user mode, which is the way those tested are normally used for Lisp work. We feel that although this methodology results in less repeatable and less explainable results, it gives a good approximation to what the end user will experience. Where time allowed, multiple runs were made to ensure accurate readings. Unfortunately the collection of the raw data (i.e. arranging for machine access and making the timed runs) proved to be an extremely time consuming process, taking a day or more for some of the systems, so the information in this report was collected over a long period of time (October, 1987 to January 1989) and some of the data may be dated by now. The procedures used for running the tests are fully described in Appendix B. The TIME macro was used to collect timing information. Most times were recorded to the nearest second. When reported by the TIME macro, some extra information, usually relating to paging, memory management, "kernel" time, etc., were recorded, but are not analyzed here. If several runs were made, only the best number is reported herein for the sake of brevity. Wherever possible, source files were stored on local disks (for the Sun 3/75 systems the files were on a Sun 3/180 NFS server on the same subnet). 4, Systems Under Test The systems that we tested were chosen based on their availability to the testers as well as their suspected usefulness in future KSL programming efforts. All of the systems tested were workstations, as we were not able to obtain access to mainframe systems. It is also the case that workstations, with their bit-mapped displays and dedicated processors, currently provide the best Lisp development environments, in our opinion, and thus were more interesting to us. . A mnemonic code is used for each of the 22 systems. Usually the code is the model of the machine except where there is more than one Lisp for a machine (as in the case of the Sun 3/75) in which case a letter is prefixed to indicate the Lisp being used. Table 1 gives a mapping between codes and machine types. See Appendix A for detailed descriptions of system configurations. 5. Execution Speed Most of the tables and charts in this report refer to elapsed-times (wall-clock time) in seconds. Most of the tables and charts have the system types ordered according to what seems to be the most interesting comparison. We have attempted to group systems of allegedly comparable performance (according to our perception formed from talking to vendor representatives, talking to other users, reading reports, etc.) E. H. Shortliffe 232 5 P41 RRO0785-16 Code 3/260 3/60 386 386T 4/260 4/280 DEC-II DEC-III E-3/75 EXP1 EXP2 EXP2+ F-4/280 HP K-3/75 L-3/75 Mac2 Maci mxX RT Sym XCL Test Date Summer 1988 Summer 1988 Spring 1988 Spring 1988 Summer 1988 Winter 1988 Fall 1987 Fall 1987 Fall 1987 November 1988 November 1988 November 1988 January 1989 Fall 1987 Fall 1987 Summer 1988 Spring 1988 December 1988 November 1988 Spring 1988 Winter 1988 Winter 1988 Appendix B: Lisp Performance Studies System Type Sun 3/260 with Lucid Lisp} Sun 3/60 with Lucid Lisp Compaq 386 with Lucid Lisp Compaq 386 portable with Lucid Lisp Sun 4/260 with Lucid Lisp Sun 4/280 with Lucid Lisp DEC MicroVax II with VaxLisp DEC MicroVax ITI with VaxLisp Sun 3/75 with Franz Extended Common Lisp Texas Instruments Explorer I Texas Instruments Explorer IT Texas Instruments Explorer I] Plus Sun 4/280 with Franz Allegro Common Lisp Hewlett Packard 9000/350 Sun 3/75 with Kyoto Common Lisp Sun 3/75 with Lucid Lisp Apple Macintosh II with Allegro Common Lisp Symbolics Maclvory Texas Instruments microExplorer IBM RT/APC with Lucid Lisp Symbolics 3645 Xerox 1186. Table 1: Mapping between codes and system types It is worth noting that on almost all of the systems tested, virtual memory paging was a negligible part of the overall run time for the tests. Nor was it a very significant factor during compilation. In general, we do not expect this to be true for most production systems. Indeed, we would not be surprised if paging time were a major component of overall run time for most systems. 5.1. BB1 The data for the run times of the BB1? tests are given in Table 2. Figure 1 shows the data graphically. 1 The Lucid and Franz Extended Common Lisp products tested are versions prior to multi- programming within the Lisp and prior to the inclusion of generation-based scavenging garbage collection in those systems. The Allegro Common Lisp was not tested with multiprogramming enabled. discussed in Section 7. 233 These times are for default settings of the SPEED and SAFETY optimization qualities E. H. Shortliffe Appendix B: Lisp Performance Studies oO Exp2 Exp2+ 4/260 4/280 F-4/280 386 386T mx Mact sym 3/260 RT DEC-III Exp | 3/60 L-3/75 E-3/75 K-3/75 HP DEC-II Mac2 5 P41 RRO0O785-16 Code Run Time RT 75 DEC-HI 63 Expl 87 3/60 73 L-3/75 ~ 90 B-3/75 211 K-3/75 96 HP 115 DEC-II 207 XCL 559 Mac2 254 Table 2: Run times for BB1 120 180 240 1 4. { | Note: XCL has been left out to improve readability Figure |: BBt Run (sec) Systems that are marketed as comparable generally came out close to each other with the following notable exceptions: - There was a significant difference between the 4/280 and the 4/260. Even though the 4/260 had more memory, similar disk, more tuning effort, and was tried with several later versions of Lisp it was consistently slower than the 4/280 tested earlier. We are at a loss to explain this discrepancy. E. H. Shortliffe 234 5 P41 RRO0785-16 Appendix B: Lisp Performance Studies It is also worth noting that, except for VaxLisp, Lucid Lisp seemed the most difficult to tailor to a particular machine when it was being installed. « The DEC machines seem to be poor at running Lisp even though they are usually thought of as competitive when running FORTRAN or C. ¢ The microExplorer (mX) did better than expected probably because its - weak point, paging, was not stressed by this test. ¢ The much older Franz Lisp (E-3/75) did relatively poorly compared to Lucid Lisp on the 3/75, but the newer version on the Sun 4 did well relative to the somewhat older Lucid lisp on the Sun 4. - XCL was over twice as slow as the nearest competitor. ¢ For unknown reasons the Symbolics machines were slower than expected. The Maclvory was a bit over 4 times slower than the microFxplorer and the 3645 was slower than the Explorer I. 5.2. SOAR The data for the SOAR run tests are given in Table 3 and presented graphically in Figure 2. The figures are for the sum of the A, B, and C modes}, Once again most systems fit where expected with the following notes: ¢ The Lucid Sun 4's are somewhat faster than the TI Explorer II for the SOAR test whereas the opposite was true for the BB1 test. « XCL and DEC-II were over twice as slow as the nearest other system. Code Run Time Code Run Time Exp2 94 RT 177 Exp2+ 62 DEC-III 454 4/260 58 Expl 369 4/280 82 3/60 187 -4/280 120 L-3/75 278 386 126 E-3/75 484 386T 151 K-3/75 697 mxX 154 HP 219 Maci 339 DEC-II 1851 Sym 193 XCL 1519 3/260 154 Mac2 No data (see footnote 1) Table 3: Aggregate Run Times for SOAR 1 The A and C mode figures are for the "no trace" configuration as described in Section 8. 235 E. H. Shortliffe Appendix B: Lisp Performance Studies 5 P41 RROO785-16 0 60 120 180 §=240 300 360 = 4200 480 540 600 660 720 Exp2 Exp2+ 4/260 Note: DEC-11, Mac2, and XCL have been left out to improve readability 4/280 F-4/280 386 386T mx Maci sym 3/260 RT DEC-I!I Exp! 3/60 L-3/75 E-3/75 K-3/75 HP Figure 2; Sum of SOAR run times (sec) 5.3. Normalized Run Times A given machine, call it A, may have run the SOAR test faster than another machine, B, while B was faster for BB1. Figure 3 depicts this difference. For both BB1 and SOAR the run times have been normalized by dividing the run time by the average of the run times for all the machines, leaving out DEC-II, Mac2, and XCL to improve readability. Lucid Lisp seemed to perform relatively better with SOAR than with BB1 in all cases, while VaxLisp and, to a much lesser extent, the dedicated Lisp machines, seemed to do better with BB1. There are many possible explanations for these variations, but trying to analyze each of them was well beyond the scope of this study. The reasons are most likely a result of differences among implementations in the efficiency of various operations, some of which are used by SOAR but not by BB1 and vice versa. For instance, SOAR might make heavy use of hashing E. H. Shortliffe 236 5 P41 RRO0785-16 Appendix B: Lisp Performance Studies 0.00 0.50 1.00 1.50 2.00 2.50 3.00 1 | ! l T T t Exp2 |[RRpaeaare Note: DEC-II, Mac2, and XCL have been left out to improve readability Exp2+ [BRRSAEn 4/260 [Rare Average (normal) system performance 4/280 (aeaeeeee F-4/280 [ar 386 . BB) Run Time 3e6eT RS ae 1 SOAR Run Times mx Maci sym 3/260 RT DEC-IIt Exp1 3/60 L-3/75 E-3/75 K-3/75 HP Figure 3: Normalized Run Times (time/average_time) while BB1 makes heavy use of list primitives, or one system might include a large number of SETQ operations while the other might be more applicative in nature. The developers of SOAR and BB1 do not currently have information on the aspects of the Lisp systems stressed by their software. 6. Compilation Speed Developers and researchers must worry about how fast their programs compile as well as how fast they run. SOAR and BB1 compilation times are given in Table 4 and Figure 4. Figures 5 compares run time with compile time. The ratio of compilation time to run time is shown. A system with a high rating spends relatively more time compiling than running. The absolute value of these numbers have little meaning. They are only useful for comparing systems. 237 E. H. Shortliffe Appendix B: Lisp Performance Studies 5 P41 RRO0785-16 Code SOAR BB1i Code SOAR BBI1 Exp2 132 89 RT 574 586 Exp2+ 78 76 DEC-III 423 633 4/260 307 324 Exp1 520 327 4/280 523 482 3/60 569 551 F-4/280 535 264 L-3/75 1040 919 386 386 355 E-3/75 450 444 386T 479 416 K-38/75 1365 1234 mX 152 186 HP 237 235 Maci 906 950 DEC-IT 1227 =61774 Sym 252 257 XCL 1800 1927 3/260 687 540 Mac2 0 349 Table 4: Compilation Times 0 180 360 340 720 900 1080 1260 1440 1620 1800 1980 Exo2 [ER Exp2+ 4/260 4/280 Fim F-4/280 ae 386 386T mx Maci 4 1/2 hour BBI Compile Time J j SOAR Compile Time sym 3/260 RT DEC-I1! Expl 3/60 L-3/75 EB RT Re ee aE E-3/75 [eee K-37 75 EERE gn aan er HP eee IO OU oases esen cre cpecatec ete ote ope ete ete efe ete stecete ele ete ete ce ete ete ete ele ete etecaters GUM oss ce ose ase ose tue ede sue eye ese ese cde ene enn ce tue ene ede ote cup oot ede ese ge ce nt net tie de ae ct ge te gr nn is ae a Macz2 (see footnote 1) Figure 4: Comptiation Time (sec) As one might expect, the specially microprogrammed Lisp machines had relatively fast compilers. Some machines with run times slower than predicted spent relatively less time compiling. For example, the VaxLisp compiler was relatively fast, but generated very slow code. The Lucid compiler seemed to take a long time but generated fast code. The Allegro E. H. Shortliffe 238 5 P41 RRO0785-16 Appendix B: Lisp Performance Studies (see footnote 1° Figure 5: Relative Performance of Compiler (Compile_Time/Run_T ime) Common Lisp for the Mac II took little time but still somehow generated impressively fast code for BB1. 7. Effect of OPTIMIZE Settings on BB1 The OPTIMIZE declaration is a way of controlling the behavior of a Common Lisp compiler. Two of the most significant qualities thus controlled are SPEED and SAFETY. Each of these can be set to an integer from 0 to 3. A high setting for SPEED tells the compiler that fast running code is desired, which typically enables various optimizations. The Common Lisp specification doesn't require any optimizations or even that they necessarily be controlled by this setting, but many current implementations switch on optimizers such as dead code eliminators, tail and mutual recursion eliminators, fancy register allocators, and facilities to take advantage of type declarations. The SAFETY quality is somewhat less well understood. It has little to do with the "safety" of the program since a correct Common Lisp program is still required to run correctly if SAFETY is low, but it has an impact on the debuggability of the program. A high SPEED and low SAFETY may allow, for instance, disabling number-of-arguments checking to allow 239 E. H. Shortliffe Appendix B: Lisp Performance Studies 5 P41 RRO0785-16 faster function calls on some architectures, or type checking on system functions (such as CAR or SETQ) might be disabled. Kyoto Common Lisp (KCL) goes so far as to "hardwire" function calls such that if FOO calls BAR and FOO is compiled then if BAR is later redefined and FOO isn't, FOO will continue to call the old version of BAR, thereby destroying much of the flexibility of the Lisp. We chose 4 settings of SPEED and SAFETY to study: 1. The default setting that the Lisp system has when it is initialized. This is what most people use. 2. SPEED 3, SAFETY 0 (written (3, 0) below) which should generate the fastest code. 3. SPEED 0, SAFETY 3 (written (0, 3) below) which should generate slow but very debuggable code, since the compiler should have done very few, if any, optimizations. 4. SPEED 3, SAFETY 2 (written (3, 2) below) which should generate optimized code while retaining "sanity checks". The BB1 system used in these tests has very few declarations and does little numerical work. Both of these attributes seem common among most Common Lisp programs we use. Code Default (3.0) (0.3) (3,2) 27 25 Exp2 27 25 Exp2+ 17 17 18 18 4/260 56 46 47 46 4/280 34 34 48 34 F-4/280 56 56 56 54 386 AT 47 52 47 386T 54 54 60 54 mX = 33 34 34 30 Maci 129 129 130 130 Sym 111 109 110 111 3/260 62 62 69 62 RT 75 76 77 75 DEC-III 63 60 71 70 Expl 87 87 90 83 3/60 73 72 76 72 L-3/75 90 90 127 90 E-3/75 211 215 206 206 K-3/75 96 165 147 88 HP 115 113 141 118 DEC-IE 207 206 231 236 XCL 559 543 559 556 Mac2 254 258 261 259 Table 5: BB1 Run Times for Various OPTIMIZE Settings E. H. Shortliffe 240 5 P41 RROO785-16 Appendix B: Lisp Performance Studies Table 5 and Figure 6 give the results for running BB1 with the four OPTIMIZE settings. Figure 7 shows the compilation times for the various OPTIMIZE settings. 0 30 100 150 200 250 J 1 Exp2 Note: XCL has been left out to improve readability Exp2+ 4/260 4/280 F-4/280 386 386T mx Maci Sym B® perauit O «3,0) Mi 00,3) (3, 2) 3/260 RT DEC-III Exp! 3/60 L-3/75 E-3/75 K-3/75 HP DEC-II Mac2 Figure 6: BBI runs with various OPTIMIZE settings (sec) 241 E. H. Shortliffe Appendix B: Lisp Performance Studies 5 P41 RR00785-16 QO 500 1000 1300 2000 2500 3000 Exp2 Exp2+ 4/260 4/280 F-4/280 386 386T mx Maci EB ceraut OF 03,0) 0,3 (3, 2) sym 3/260 RT DEC~II! Exp] 3/60 L-3/75 E-3/75 K-3/75 el HP — DEC-I Mac2 Figure 7; BB1 compilation times with various OP TIMIZE Settings (sec) ' These charts reveal somewhat surprising results. In several cases, SPEED 3, SAFETY 0 did not give the best results! Lucid Lisp did consistently better when SPEED was higher than SAFETY, as did the HP 9000, and VaxLisp. KCL was definitely behaving strangely with SPEED 0, SAFETY 3 coming out E. H. Shortliffe 242 5 P41 RRO0785-16 Appendix B: Lisp Performance Studies a good bit faster than SPEED 3, SAFETY 0, with both of those much slower than "default" or SPEED 3, SAFETY 2. Figure 8 depicts the speedup factor between the slowest time and the fastest time for the BB1 tests with various OPTIMIZE settings. 1.00 1.10 1.20 1.30 1.40 1.50 1.60 1.70 1.80 1.90 2.00 Exp2 Exp2+ 4/260 4/280 F-4/280 386 386T mx Maci sym 3/260 RT DEC-IH Exp! 3/60 L-3/75 E-3/75 K-3/75 HP DEC-11 Mac2 XCL Figure 8: BB! Speedup Factors Due to OPTIMZE Setting: 8. Effect of Output Reduction on SOAR The eight-puzzle benchmark for SOAR was originally written when SOAR ran primarily on slower machines than those tested here. Thus it tends to generate a lot of output relative to the amount of computation for some of the modes. For some systems, particularly those with large bit-mapped displays and full-screen windows, this output can be very expensive. To understand the extent of this effect we tested SOAR in the A mode and in the C mode both with full output, and with greatly reduced output (no trace). Table 6 with Figures 9 and 10 show results of these runs. Figure 11 depicts the amount of speedup (ratio of run times) realized by SOAR with reduced output. 243 E. H. Shortliffe Appendix B: Lisp Performance Studies 5 P41 RR00785-16 Mode A Mode B Code Full Reduced Full Reduced Exp2 33 18 18 16 Exp2+ 23 11 13 11 4/260 23 13 11 9 4/280 35 15 14 11 F-4/280 36 36 20 19 386 41 27 21 19 386T 52 31 26 23 mxX 50 27 29 27 Maci 165 65 63 44 Sym 55 40 34 32 3/260 49 33 23 22 RT 61 36 32 28 DEC-HI 95 76 95 92 Exp1 90 63 75 71 3/60 66 38 34 3l L-3/75 82 67 45 41 E-3/75 124 ~109 81 80 K-3/75 186 136 120 111 HP 61 51 52 52 DEC-II 351 283 390 401 XCL 473 390 243 232 Table 6: SOAR Run Times with Full and Reduced Output 0 60 120 180 | t } Note; DEC-!1, XCL, and Mac2 nave been left off to improve readability Exp2 2 Exp2+ [iRRES 4/260 ee 4/280 | Saeed F-4/280 |e 1 386 [eee Reduced Output 386) (ees MX Serer A FAORDMMIM - sce ote te ets see cel ate ets aseaee Ges ate sta uns ats Sym ee ee ee ee ee 3/260 RT (ee DEC-||| -EREeEs Exp | —-xaeae 3/60 ; K-3/75 FRRRBReESeeSe eee ner eee eee eee eee eee HP Full Output Figure 9: SOAR A Mode (sec) E. H. Shortliffe 244 5 P41 RROO785-16 Exp2 Exp2+ 4/260 4/280 F-4/280 336 eee 386T mx Maci sym 3/260 RT DEC-II Expl 3/60 L-3/75 E-3/75 K-3/75 HP Exp2 Exp2+ 4/260 4/280 F-4/280 386 Fees 386T mx Maci sym 3/260 RT DEC-H Exp! 3/60 L-3/75 E-3/75 K-3/75 HP DEC-H XCL 60 Appendix B: Lisp Performance Studies 70 80 90 100 110 120 4 1. 1. 1 tput T t t T Note: DEC-I, XCL, and Mac2 have deen left off to improve readapilit Figure 10: SOAR C Mode (sec) 2.00 2.20 2.40 2.60 Note: No results for Mac2 (see footnote 1) I I 1 Figure 11; SOAR Speedup Due to Reduced Output 245 E. H. Shortliffe Appendix B: Lisp Performance Studies 5 P41 RROO785-16 Three factors seemed to influence the speedup with reduced output: - A fast processor, since the amount of time spent computing versus doing V/O would be reduced, causing a reduction in I/O time to be more significant. - Alarger screen or window since it is expensive to scroll a large area. - Alarge-overhead I/O system such as the MaclIvory's Dynamic Windows. 9. Future Work Obvious areas in which this work might be extended include: - Updating the results to reflect more recent versions of the Common Lisp systems; - Adding more test systems, especially mainframes; « Benchmarking other programs besides SOAR and BB1; - Evaluating the effect of declarations on run times; - Adding measurements of storage management overhead; - Collecting more data on J/O overhead; ¢ Understanding better why platforms vary in performance from application to application and Lisp implementation to Lisp implementation. 10. Conclusions Two moderate-sized applications, SOAR and BB1, were benchmarked on 22 Common Lisp systems to help in the evaluation of different Common Lisp systems. The run and compile times for these benchmarks were presented and discussed. A large variation was observed between the ranking of systems when running the SOAR test versus the ranking when running the BB1 test. This leads us to conclude that while these experimental results and ones like them can be used to class machines together roughly, it is impossible to use such a set of benchmarks to decide in advance how a given application will perform on a given system. There is no substitute for actually running the program on the systems in question. Figure 12 shows the average of the normalized! run times for the test programs with the systems ranked in order. On the basis of this data, the systems tested may be ranked as follows: 1 The data were normalized by dividing each by the average of the results for all the tested implementations. E. H. Shortliffe 246 5 P41 RR00785-16 Appendix B: Lisp Performance Studies Very Fast (< 0.50 anr -- averaged normalized run time): TI Explorer IT Plus (Exp2+), TI Explorer IT (Exp2), and Sun 4 with Lucid Lisp (4/280 and 4/260) Fast (> 0.50 anr, < 1.00 anr): TI microExplorer (mX), Compaq 386 (386), Sun 4 with Franz Lisp (F-4/280), Compaq 386 portable (886T), Sun 3/260 (3/260), IBM RT/APC (RT), and Sun 3/60 Medium (> 1.00 anr, < 1.50 anr): Symbolics 3645 (Sym), Sun 3/75 with Lucid Lisp (L-3/75), HP 9000/350 (HP), TI Explorer I (Exp1), and DEC MicroVax III (DEC-IID Slow (> 1.50 anr, < 2.50 anr): Symbolics Maclvory (Maci), Sun 3/75 with Kyoto Common Lisp (K-3/75), and Sun 3/75 with old Franz Extended Common Lisp (E-3/75) Very Slow (> 2.50 anr): Apple Macintosh II with Allegro Lisp (Mac2), DEC MicroVax IT (DEC-II), and Xerox 1186 (XCL), E-3/75 K-3/75 Maci DEC-III Exp) HP L-3/75 sym 3/60 : RT 3/260 386T F-4/280 386 mx 4/260 4/280 Exp2 Exp2+ Note: Mac2, XCL, and DEC-II, with scores of 3.36,5.37, and 6.98, nave been left off to improve readability T T T T 0.00 0.25 0.50 0.75 1.00 1.25 1.50 75 2.00 2.25 2.50 Figure 12: Averaged Normalized Run Times We were surprised at the high speed of the small 386 machines, and at the slowness of the still early Maclvory, the DEC machines, and the Xerox machine. Dedicated Lisp machines compile relatively faster than conventional machines, and, generally, conventional machine systems that took more time to compile produced faster code, as one would expect. 247 E. H. Shortliffe Appendix B: Lisp Performance Studies 5 P41 RROO785-16 While the experiment to measure the effect of different settings of the OPTIMZE declaration was interesting, with such a small sample no real conclusion about the effect of various OPTIMIZE settings can be drawn. However the indications are that, in the absence of other declarations (e.g.. for TYPE), only relatively small gains are available. It is probably best to experiment with various settings to see which gets the best speed for a given program. Reducing the amount of output that a program generates can have a large effect on the run time of the program, especially when moving the program to a faster machine. This indicates that it is worth taking some time to consider the nature of the I/O system and interaction needed by a program when designing a user interface for a fast-running program. These results must be used very carefully since they represent only one piece of information about the performance of the very complex systems tested. We have measured only execution speed, but many aspects of the software will impact the development of programs such that in a given amount of time a program might be written for one machine that runs faster and perhaps with fewer errors than a program written in the same amount of time on another machine that ranks faster in these tests due to superior support given to the programmer during development. Do not underestimate the power of the programming environment. 11. Acknowledgements This work would have been completely impossible without the assistance of many people and companies. Mike Kramer of Texas Instruments Inc. supplied the Explorer II Plus processor board. Eric Warner and Michael Borke of Sun Microsystems Inc. supplied access to the Sun 4 systems and the Sun 3/260 and 3/60 systems. Franz Inc. supplied a test version of Extended Common Lisp. Marty Hollander of Franz Inc. supplied a version of Allegro Common Lisp for the Sun 4. Jeff Harvey of Digital Equipment Corp. arranged access to the MicroVax systems. Susan Rosenbaum and Eric Gilbert of Lucid Inc. supplied access to the Compaq machines and the IBM RT. Bruce Hamilton of Hewlett Packard Inc. arranged access to the HP 9000. Many thanks to all of them. 12. References [Gabriel 1985] Gabriel, R. P. Performance and Evaluation of Lisp Programs, M.I.T. Press, Cambridge, Massachusetts, 1985. [Hayes-Roth 1985] Hayes-Roth, B. A Blackboard Architecture for Control, in Artificial Intelligence Journal, Volume 26, pp. 251-321, July 1985. [Hayes-Roth 1988] Hayes-Roth, B., and Hewett, M. BB1: An Implementation of the Blackboard Control Architecture, in Blackboard Systems, edited by Robert Engelmore and Tony Morgan, Addison-Wesley, 1988, pp. 297-313. E. H. Shortliffe 248 5 P41 RRO0785-16 Appendix B: Lisp Performance Studies (Laird 1987] Laird, J. E., Newell, A., and Rosenbloom, P.S. Soar: An Architecture for General Intelligence, in Artificial Intelligence Volume 33, Number 1, pp. 1-64, 1987. [Steele 1984] Steele, G. L. Jr. Common Lisp the Language, Digital Press. 1984 249 E. H. Shortliffe Appendix B: Lisp Performance Studies 5 P41 RR00785-16 Appendix A -- System Descriptions This appendix contains detailed descriptions of the systems used in these measurements. In the descriptions, "Code" refers to a short name used to indicate the systems under test. Usually it is the model of the machine except where there is more than one Lisp for a machine (as in the case of the Sun 3/75) in which case a letter is prefixed to indicate the Lisp being used. "Timing Template" indicates how the information reported by the TIME macro was recorded. "Elapsed" indicates the total elapsed time, "run" indicates CPU time used, "gc" indicates time spent in garbage collection, "user" and "system" distinguish between user mode and kernel mode time, and "paging" indicates time waiting for virtual memory disk operations.Code: Code: 3/260 Computer Type: Sun 3/260 Operating System: Sun OS 3.4 Lisp: Lucid 2.0 Disk Configuration: 280MB Swapping Size: 60MB Memory Configuration: 8MB Display Configuration: Color in mono mode Other Configuration: Special Comments: used :EXPAND 130 :GROWTH-RATE 130 Timing Template: elapsed (user-run + system-run) Date-of-test: Summer 1988 Code: 3/60 Computer Type: Sun 3/60 Operating System: Sun OS 3.4 Lisp: Lucid 2.1 Disk Configuration: SCSI 141MB Swapping Size: unknown Memory Configuration: 24MB Display Configuration: Hi Res Color in mono mode Other Configuration: Special Comments: Timing Template: elapsed (user-run + system-run) Date-of-test: Summer 1988 E. H. Shortliffe 250