December 1, 1999
Scripts are data structures that represent typical situations or activities such as eating at a restaurant or attending a birthday party. Scripts were invented in the 1970's as a way of structuring knowledge and reducing search in AI programs (Minsky, 1974; Schank & Abelson, 1977; Wilks, 1975). For example, in a story understanding program, scripts help with pronoun resolution, word sense disambiguation, and filling in missing details (Dyer, 1983). Since the invention of scripts, few have attempted to build a database of scripts. This may be because entering scripts is tedious and requires many choices to be made about naming of concepts, level of detail, and what alternatives to include.
We have built a database of about 100 scripts within the ThoughtTreasure commonsense platform (Mueller, 1998). Each script contains information about the events that make up the script, roles played by people and physical objects, entry conditions, results of performing the script, personal goals satisfied by the script, emotions associated with the script, where the script is performed, how long it takes to perform the script, how often the script is performed, and the cost of the script.
The remainder of the paper is organized as follows: First we describe the ThoughtTreasure platform into which scripts have been integrated. We then discuss the representation of scripts in ThoughtTreasure. Next we discuss some problems that arise when entering scripts and strategies for dealing with them. We then describe some simple applications of the script database. Finally we review related databases.
The ThoughtTreasure platform for commonsense reasoning, begun in 1994, contains 27,093 atomic concepts, 35,020 English words and phrases, 21,529 French words and phrases, and 51,305 assertions.
Concepts are organized into a hierarchy. The top-level hierarchy is as follows:
Each concept is linked to zero or more roughly synonymous lexical entries (words and phrases):concept => object => abstract-object => physical-object => animate-object => situation => action => state => relation => attribute
schnapps = Holland gin, Hollands gin, Hollands, schnappsEach lexical entry is linked to zero or more concepts:
orange = color-orange, fruit-orangeAn assertion is of the form:
[predicate argument1 argument2 ...]For physical objects, assertions provide information about parts and typical attributes such as size, shape, and color:
The arrangement of objects in locations such as hotel rooms, kitchens, and theaters is represented using 2-dimensional grids such as:[part-of pod-of-peas green-pea] [diameter-of green-pea .25in] [green green-pea] [sphere green-pea]
ThoughtTreasure contains planning agents, or procedures for achieving goals, such as making a telephone call, on behalf of actors in a simulated world. Since planning agents are written in C and can invoke any function in ThoughtTreasure, they are difficult for other programs to analyze and use. Scripts provide a simpler, declarative alternative to planning agents. We have converted the existing planning agents into scripts, and added new scripts.==hotel-room1// wwwwwwwwwwww b:bed wbbbbb mw d:lockable-door wbbbbb w m:minibar wx Zw w:wall wwwwwwdddwww x:phone w x:night-table wwwwwwwwwwww Z.wd:hotel-room
Schank and Abelson (1977) originally defined a script as a data structure consisting of the following elements:
A simple ThoughtTreasure script is as follows:
The script concept is blackout. It is located under disaster in the hierarchy. Lexical entries for the script include power failure in English and panne de courant in French.Object blackout [English] power failure, blackout; [French] black out, panne de courant, panne d électricité [ako ^ disaster] [duration-of ^ NUMBER:second:3600] [emotion-of ^ [anger human]] [emotion-of ^ [unhappy-surprise human]] [emotion-of ^ [worry human]] [event01-of ^ [anger human]] [event01-of ^ [electronic-device-broken electricity-network]] [event01-of ^ [unhappy-surprise human]] [event01-of ^ [worry human]] [event02-of ^ [fetch-from human na light-source]] [performed-in ^ apartment] [performed-in ^ house] [performed-in ^ office] [period-of ^ NUMBER:second:3.1536e+07] [role01-of ^ human] [role02-of ^ electricity-network]
The script has two roles: the human experiencing the blackout and the electricity network. The role numbers correspond to positions in an assertion about an instance of a script:
[blackout John electricity-network1]
Emotions associated with the script include anger, unhappy-surprise, and worry for the human. Note that these are emotion concepts, located in the emotion hierarchy and linked to lexical entries not shown above. Anger is a kind of negative-emotion associated with the noun anger and the adjectives angry, pissed, and others. The other concepts shown above (such as electricity-network) are also placed at appropriate points in the hierarchy and associated with lexical entries.
The script consists of two events: (1) the power outage and corresponding emotional reaction and (2) obtaining an alternative lighting source. The script typically occurs in an apartment, house, or office once a year for an hour.
In general a script consists of the following fields:
specifies that A1 and A2 occur simultaneously, followed by B1 and B2 which occur simultaneously. Priority and simultaneity are the only temporal relations that can be represented.[event01-of S A1] [event01-of S A2] [event02-of S B1] [event02-of S B2]
The goto predicate can be used to indicate repetition. For example:
specifies the event sequence A, B, A, B, A, B, ... No mechanism for exiting the loop is currently provided.[event01-of S A] [event02-of S B] [event03-of S [goto event01-of]]
We have so far entered 93 scripts. As we entered these scripts we noticed and filled in a number of missing concepts and lexical entries in ThoughtTreasure. We also encountered several difficulties.
How should the enterer deal with variations in scripts such as mailing a letter in a mailbox versus handing it to a postal clerk?
If the variations involve major differences in the sequence of events or other fields of the script such as the goal or result, each variation can be entered as a specialization of a more general script:
Variations that do not make a big difference can be ignored for the time being. Thus the current script for reading a book does not specify whether the reader is reading in bed or in a chair.mail-letter => mail-letter-at-mailbox => mail-letter-at-post-office
If the variations of a script involve a single physical object that can be generalized, the generalization can be entered: When confronted with the choice between pen and pencil, the enterer can code writing-instrument.
More variations become apparent as the enterer focuses on more detail. The enterer can limit the level of detail in a script by invoking other scripts: Instead of entering all the details about how one reaches a mailbox, a reference can be made to a transportation script. The transportation script can then be defined with specialization scripts for different modes of transportation such as walking and driving. The driving script can invoke an action for putting the key in the ignition. At this point, the enterer may consider the level of detail sufficient and move on.
It is not always clear exactly when a script begins and ends, but the enterer must make some choice: At the beginning of the script for teaching a class, should the teacher prepare for the class? Or is that part of another script? At the beginning of the script for walking a dog, should the dog walker put a leash on the dog?
Entering scripts requires many judgment calls by the enterer, and it is likely the results will contain various errors and confusions. As applications are built we will learn which imperfections of the database are most in need of perfecting.
We have implemented two simple web-based applications using the script database, as a precursor to more advanced applications.
The first application answers the following commonsense questions:
What does a ___ do? (human)The second application performs a shallow search from concepts activated by text to scripts, in order to determine the most likely scripts for some English text entered by the user. Given:
What is a ___ used for? (physical object)
Where is a ___ found? (human or physical object)
What does ___ consist of?
What is the result of ___?
Where does one ___?
How long does ___ take?
How often does one ___?
How much does ___ cost?
the application produces:John poured shampoo on his hair.
A more advanced script recognition algorithm would perform a deep search (Leacock & Chodorow, 1998).score 2.0 for script take-shower based on shampoo, hair score 2.0 for script go-for-a-haircut based on shampoo, hair
In this section, we compare ThoughtTreasure scripts with related databases: Cyc, FrameNet, Gordon's Expectation Packages, and WordNet.
Cyc (Lenat, 1995) is a commonsense knowledge base begun in 1984. As of December 1999, it contained 89,373 constants, 10,255 nonatomic constants, 1,012,272 assertions, and 306,886 deductions.
Cyc's Events are similar to scripts. Whereas scripts are represented in ThoughtTreasure using a convenient relational-database-like format, Cyc represents events using arbitrary first-order predicate calculus assertions. This leads to considerable variation in the representation of subevents. Sometimes the subevent appears only in the left-hand side of an inference rule:
while other times the subevent appears only in the right-hand side:(=> (and (subEvents ?X ?U) (isa ?U Staining)) (isa ?X WoodRefinishing))
(Cyc assertions and statistics in this paper are based on the Cyc C distribution of April 4, 1997, except where otherwise noted.)(=> (and (isa ?U ShapingSomething) (subEvents ?U ?X)) (isa ?X CuttingSomething))
The mapping from the roles of a script to the roles of its subevents is not always provided. When the mapping is provided, several formats are used. For example, the following rule specifies that the person opening the presents at a birthday party is the same person honored by the party:
(=> (and (isa ?OPENING OpeningPresents) (subEvents ?PARTY ?OPENING) (eventHonors ?PARTY ?HONOR) (isa ?PARTY BirthdayParty)) (performedBy ?OPENING ?HONOR))
The following rule specifies that a dental hygienist performs the teeth cleaning in a dental exam:
Information about the roles, places, and duration of a script is represented in a more uniform fashion:(=> (and (isa ?CLE TeethCleaning) (performedBy ?CLE ?AGT) (subEvents ?EXM ?CLE) (isa ?EXM DentalExam)) (isa ?AGT DentalHygienist))
(=> (isa ?U Dancer) (actsInCapacity ?U performedBy DancingProcess-Human HobbyCapacity)) (=> (isa ?FM Firefighter) (actsInCapacity ?FM performedBy ExtinguishingAFire JobCapacity)) (=> (and (isa ?U ChangingOil) (eventOccursAt ?U ?X)) (isa ?X ServiceStation)) (=> (and (isa ?U ProducingAnAlcoholicBeverage) (eventOccursAt ?U ?X)) (isa ?X Brewery)) (=> (isa ?X WeddingCeremony) (duration ?X (HoursDuration 0.5 2))) (=> (isa ?U ResearchProject) (duration ?U (YearsDuration 0.5 10)))
The goal of the three-year FrameNet project (Baker, Fillmore, & Lowe, 1998), begun in 1997, is to build a collection of frames and annotate sentences in a corpus with links to roles of those frames. 20 frames and 28 unique roles have been defined (Johnson, 1998).
The frames, which focus on the areas of perception, cognition, communication, and motion, are:
Arriving, Awareness, Behaver-evaluating, Cogitation, Conversation, Departing, Encoding, Inchoative, Judgment, Noise, Path-shape, Placing, Questioning, Removing, Request, Response, Self-motion, Statement, Static, Transportation
The Transportation frame consists of the following roles:
Role Example Driver Kim drove through the woods Cargo & Passenger Kim drove the kids to the store Vehicle Kim drove the truck to the store Source Kim drove out of the garage Path Kim drove down the street Goal Kim drove into the woods Manner Kim drove dangerously Distance Kim drove 500 miles Area Kim drove throughout the countryside (Johnson, 1998)
FrameNet frames lack information about subevents (such as "starting the Vehicle" in the Transportation frame), locations, tracks, entry conditions, and results.
Gordon (1999) developed a database of 768 simplified scripts called Expectation Packages (EPs) to be used as part of a system for browsing photographic libraries. Each EP contains slots for Events, Places (locations and tracks), People (roles), Things (props), and Misc (entry conditions and results). Slot values are taken from the Library of Congress Thesaurus for Graphic Materials (LCTGM), which contains 5,760 subject terms.
A sample EP is:
28. Commuting on a crowded expressway Events Automobile driving, Automobile travel, Radio broadcasting, Traffic congestion Places Express highways, Toll roads People Commuters Things Automobiles, Helicopters, Horns (Communication devices) Misc Air pollution (Gordon, 1999, p. 194)
Since slot values must be LCTGM subject terms and not assertions, the EPs are unable to represent arguments to subevents. Thus the fact that the "Commuter" is doing the "Automobile driving" is not captured in the above EP. Because LCTGM terms do not exist for many concepts, EPs often lack important common subevents, props, and roles of a script. EP Events often refer not to subevents, but to generalizations of the script:
Script Event Commuting on a crowded expressway Automobile driving Going polka dancing in a dance hall Folk dancing Hunting for alligators in a swamp Alligator hunting
Gordon (1999) writes that "the purpose of Expectation Packages is not to accurately represent scripts, but rather to interconnect existing terms in a thesaurus" (p. 111).
WordNet (Fellbaum, 1998) is a lexical database begun in 1985. Version 1.6 contains 129,509 lexical entries, 99,642 synonym sets, and 140,475 relations (not including inverses). It differs from ThoughtTreasure in several ways:
Scripts are not included in WordNet since concepts such as eating at a restaurant and going to a birthday party are not lexicalized in English (p. 100).
All that being said, since some English verbs can be decomposed into subevents, WordNet version 1.6 does break down 427 verb synonym sets into verb synonym sets via the entailment relation (pp. 77-81). 19 events are broken down into 2 or more subevents. They are:
arraign: indict charge board: feed consume breathe: inhale exhale buy: pay choose cast: film stage perform eat: chew swallow master: understand drill learn maul: mutilate injure postpone: cancel reschedule push: press move quilt: pad sew settle: move migrate sky_dive: glide jump descend smoke: inhale exhale stampede: run rush sublime: condense evaporate tease: arouse disappoint trade: buy sell wine_and_dine: eat drink
This table summarizes the above databases along with ThoughtTreasure:
Name | Scripts (#) | Subevents (#/script) | Roles (#/script) | Places (#/script) | Other (#/script) |
Cyc | 185 | 1.71 | 0.032 | 0.092 | 15.76 |
FrameNet | 20 | 0 | 4.94 | 0 | 0 |
Gordon's Eps | 768 | 3.12 | 6.14 | 1.71 | 1.29 |
ThoughtTreasure | 93 | 8.57 | 5.30 | 0.86 | 6.10 |
WordNet 1.6 | 427 | 1.06 | 0 | 0 | 0 |
The Scripts column shows the number of scripts in the database. Cyc contains 3071 event constants. Only those events having one or more subevents are considered scripts here. FrameNet frames are considered scripts. ThoughtTreasure contains 2247 action concepts. Only those actions having one or more subevents are considered scripts here. WordNet 1.6 contains 12,127 verb synonym sets. Only those verb synonym sets having outgoing entailment links are considered scripts here.
The Subevents column shows the average number of subevents per script. ThoughtTreasure subevents include arguments, unlike Gordon's EPs, WordNet, and sometimes Cyc. For example, the driving script includes:
instead of simply the atomic concept "Starting an Automobile".[put-in driver key ignition-switch] [turn driver ignition-switch] [motor-vehicle-on motor-vehicle]
The Roles column shows the average number of roles per script. Roles are called frame elements in FrameNet. For Gordon's EPs, Roles includes People and Things.
The Places column shows the average number of locations per script.
The Other column shows the average number of other pieces of information provided for each script, such as entry conditions and results. For Gordon's EPs, Other is the Misc slot. Entry conditions, results, goals, and emotions are provided as assertions rather than atomic concepts in ThoughtTreasure.
Cyc statistics were obtained as follows: Event constants were obtained using the Hierarchical Browser starting from Event with microtheories set to ALL and depth set to 10. A total of 3071 unique Event constants were obtained. Each constant was then queried using the View Constant command of the Cyc Navigator. Links to "more rules" were followed. The 17,596 assertions from these queries were then collected together in a file. Subevent, Role, and Place tuples were then extracted from these assertions by looking for formulas whose predicate was subEvents, actsInCapacity, and eventOccursAt, respectively. Variables were instantiated by looking for isa formulas in the same assertion. For example given:
the following tuples were extracted:(=> (and (isa ?Z TurningOffWater) (isa ?Y WashingHair) (subEvents ?X ?Z) (subEvents ?X ?Y) (isa ?X Bathing)) (startsAfterEndingOf ?Z ?Y))
If it was not possible to extract any tuples from an assertion, then an Other tuple was generated for each Event mentioned in the assertion. 16,665 unique tuples resulted. The tuples for each Event were collected together. Only Events having at least one subEvents tuple were considered to be scripts. For each of these 185 Events, the number of Subevents (subEvents), Roles (actsInCapacity), Places (eventOccursAt), and Others were counted.Bathing:subEvents:TurningOffWater Bathing:subEvents:WashingHair
In future work we will add more scripts and 2-dimensional grids for representing locations where those scripts are performed. The grid locations of human and physical object roles should also be represented for each event of a script. For example, the waiter is located near the restaurant-table when taking the customer's order in the wait-tables script.
We have constructed a database and lexicon of typical situations or scripts. The database provides a rich network of interconnections that can be used in computational linguistics tasks. We also hope the database will prove useful in building context-aware digital devices.
[ako ^ mail-letter] [cost-of ^ NUMBER:USD:0.33] [duration-of ^ NUMBER:second:600] [event01-of ^ [pick-up sender snail-mail-letter]] [event02-of ^ [ptrans sender na post-office]] [event03-of ^ [wait-in-line sender]] [event04-of ^ [ptrans-walk sender na postal-counter]] [event05-of ^ [pre-sequence postal-clerk sender]] [event05-of ^ [pre-sequence sender postal-clerk]] [event06-of ^ [hand-to sender postal-clerk snail-mail-letter]] [event07-of ^ [weigh postal-clerk snail-mail-letter]] [event08-of ^ [postmark postal-clerk snail-mail-letter]] [event09-of ^ [post-sequence postal-clerk sender]] [event09-of ^ [post-sequence sender postal-clerk]] [event10-of ^ [ptrans sender post-office na]] [goal-of ^ [owner-of snail-mail-letter recipient]] [goal-of ^ [s-employment postal-clerk]] [performed-in ^ post-office] [period-of ^ NUMBER:second:604800] [role01-of ^ sender] [role02-of ^ recipient] [role03-of ^ snail-mail-letter] [role04-of ^ post-office] [role05-of ^ postal-counter] [role06-of ^ postal-clerk]
[ako ^ dentist-appointment] [cost-of ^ NUMBER:USD:200] [duration-of ^ NUMBER:second:3600] [emotion-of ^ [nervousness role-patient]] [emotion-of ^ [pain role-patient]] [event01-of ^ [ptrans role-patient na dental-office]] [event02-of ^ [ptrans-walk role-patient na waiting-room]] [event03-of ^ [wait role-patient]] [event04-of ^ [action-call dental-assistant na role-patient]] [event05-of ^ [ptrans-walk role-patient waiting-room dental-operatory]] [event06-of ^ [sit-in role-patient dental-chair]] [event07-of ^ [inject dentist novocaine mouth]] [event08-of ^ [wait role-patient]] [event09-of ^ [drill-tooth dentist tooth dental-drill]] [event09-of ^ [listen role-patient elevator-music]] [event10-of ^ [fill-tooth dentist tooth dental-filling]] [event11-of ^ [ptrans role-patient dental-operatory na]] [goal-of ^ [p-health role-patient]] [goal-of ^ [s-profit dentist]] [performed-in ^ dental-office] [period-of ^ NUMBER:second:1.5768e+08] [role01-of ^ role-patient] [role02-of ^ dentist] [role03-of ^ dental-assistant] [role04-of ^ tooth] [role05-of ^ mouth] [role06-of ^ dental-office] [role07-of ^ waiting-room] [role08-of ^ dental-chair] [role09-of ^ dental-operatory] [role10-of ^ dental-filling] [role11-of ^ novocaine]
Baker, Collin F., Fillmore, Charles J., & Lowe, John B. (1998). The Berkeley FrameNet Project. In Proceedings of COLING-ACL '98. pp. 86-90. Association for Computational Linguistics. Available: http://www.icsi.berkeley.edu/~framenet/docs/acl98.ps.
Dyer, Michael G. (1983). In-depth understanding. Cambridge, MA: MIT Press.
Fellbaum, Christiane. (Ed.). (1998). WordNet: An electronic lexical database. Cambridge, MA: MIT Press. Available (earlier versions of some chapters): http://www.cogsci.princeton.edu/~wn/papers/.
Gordon, Andrew S. (1999). The design of knowledge-rich browsing interfaces for retrieval in digital libraries. Doctoral dissertation, Northwestern University, Evanston, IL. Available: http://www.ils.nwu.edu/~gordon/Dissertation.pdf.
Johnson, Christopher. (1998). Syntactic and semantic principles of FrameNet annotation (Online). University of California, Berkeley. Available: http://www.icsi.berkeley.edu/~framenet/docs/train/annomanual/anno_manual.html.
Leacock, Claudia, and Chodorow, Martin. (1998). Combining lexical context and WordNet similarity for word sense identification. In Fellbaum, Christiane. (Ed.), WordNet: An electronic lexical database. pp. 265-283. Cambridge, MA: MIT Press.
Lenat, D. B. (1995). CYC: A large-scale investment in knowledge infrastructure. Communications of the ACM, 38(11), 33-48.
Minsky, Marvin. (1974). A framework for representing knowledge (AI Laboratory Memo 306). Artificial Intelligence Laboratory, Massachusetts Institute of Technology. Available: ftp://publications.ai.mit.edu/ai-publications/0-499/AIM-306.ps.
Mueller, Erik T. (1998). Natural language processing with ThoughtTreasure. New York: Signiform.
Schank, Roger C., and Abelson, Robert P. (1977). Scripts, plans, goals, and understanding. Hillsdale, NJ: Lawrence Erlbaum.
Wilks, Yorick. (1975). A preferential, pattern-seeking, semantics for natural language inference. Artificial Intelligence. 6(1), 53-74.