A database and lexicon of scripts for ThoughtTreasure

by Erik T. Mueller

December 1, 1999


Abstract
Since scripts were proposed in the 1970's as an inferencing mechanism for AI and natural language processing programs, there have been few attempts to build a database of scripts. This paper describes a database and lexicon of scripts that has been added to the ThoughtTreasure commonsense platform. The database provides the following information about scripts: sequence of events, roles, props, entry conditions, results, goals, emotions, places, duration, frequency, and cost. English and French words and phrases are linked to script concepts.

Introduction

Scripts are data structures that represent typical situations or activities such as eating at a restaurant or attending a birthday party. Scripts were invented in the 1970's as a way of structuring knowledge and reducing search in AI programs (Minsky, 1974; Schank & Abelson, 1977; Wilks, 1975). For example, in a story understanding program, scripts help with pronoun resolution, word sense disambiguation, and filling in missing details (Dyer, 1983). Since the invention of scripts, few have attempted to build a database of scripts. This may be because entering scripts is tedious and requires many choices to be made about naming of concepts, level of detail, and what alternatives to include.

We have built a database of about 100 scripts within the ThoughtTreasure commonsense platform (Mueller, 1998). Each script contains information about the events that make up the script, roles played by people and physical objects, entry conditions, results of performing the script, personal goals satisfied by the script, emotions associated with the script, where the script is performed, how long it takes to perform the script, how often the script is performed, and the cost of the script.

The remainder of the paper is organized as follows: First we describe the ThoughtTreasure platform into which scripts have been integrated. We then discuss the representation of scripts in ThoughtTreasure. Next we discuss some problems that arise when entering scripts and strategies for dealing with them. We then describe some simple applications of the script database. Finally we review related databases.

The ThoughtTreasure platform

The ThoughtTreasure platform for commonsense reasoning, begun in 1994, contains 27,093 atomic concepts, 35,020 English words and phrases, 21,529 French words and phrases, and 51,305 assertions.

Concepts are organized into a hierarchy. The top-level hierarchy is as follows:

concept
  => object
     => abstract-object
     => physical-object
        => animate-object
  => situation
     => action
     => state
        => relation
        => attribute 
Each concept is linked to zero or more roughly synonymous lexical entries (words and phrases):
schnapps = Holland gin, Hollands gin, Hollands, schnapps
Each lexical entry is linked to zero or more concepts:
orange = color-orange, fruit-orange
An assertion is of the form:

[predicate argument1 argument2 ...]
For physical objects, assertions provide information about parts and typical attributes such as size, shape, and color:
[part-of pod-of-peas green-pea]
[diameter-of green-pea .25in]
[green green-pea]
[sphere green-pea]
The arrangement of objects in locations such as hotel rooms, kitchens, and theaters is represented using 2-dimensional grids such as:
==hotel-room1//
wwwwwwwwwwww    b:bed
wbbbbb    mw    d:lockable-door
wbbbbb     w    m:minibar
wx        Zw    w:wall 
wwwwwwdddwww    x:phone 
w               x:night-table 
wwwwwwwwwwww    Z.wd:hotel-room
ThoughtTreasure contains planning agents, or procedures for achieving goals, such as making a telephone call, on behalf of actors in a simulated world. Since planning agents are written in C and can invoke any function in ThoughtTreasure, they are difficult for other programs to analyze and use. Scripts provide a simpler, declarative alternative to planning agents. We have converted the existing planning agents into scripts, and added new scripts.

Representation of scripts

Schank and Abelson (1977) originally defined a script as a data structure consisting of the following elements:

We mostly adopt the Schank and Abelson representation. To ease the task of entering many scripts, we do not attempt to represent alternative paths through a script. Since scripts are entered into ThoughtTreasure's concept hierarchy, script tracks (such as eat-in-fast-food-restaurant) are naturally implemented as specializations of more general scripts (such as eat-in-restaurant). The sequence of events is not divided into scenes.

A simple ThoughtTreasure script is as follows:

Object blackout

[English] power failure, blackout; [French] black out,
panne de courant, panne d électricité
[ako ^ disaster]
[duration-of ^ NUMBER:second:3600]
[emotion-of ^ [anger human]]
[emotion-of ^ [unhappy-surprise human]]
[emotion-of ^ [worry human]]
[event01-of ^ [anger human]]
[event01-of ^ [electronic-device-broken
               electricity-network]]
[event01-of ^ [unhappy-surprise human]]
[event01-of ^ [worry human]]
[event02-of ^ [fetch-from human na light-source]]
[performed-in ^ apartment]
[performed-in ^ house]
[performed-in ^ office]
[period-of ^ NUMBER:second:3.1536e+07]
[role01-of ^ human]
[role02-of ^ electricity-network]
The script concept is blackout. It is located under disaster in the hierarchy. Lexical entries for the script include power failure in English and panne de courant in French.

The script has two roles: the human experiencing the blackout and the electricity network. The role numbers correspond to positions in an assertion about an instance of a script:

[blackout John electricity-network1]

Emotions associated with the script include anger, unhappy-surprise, and worry for the human. Note that these are emotion concepts, located in the emotion hierarchy and linked to lexical entries not shown above. Anger is a kind of negative-emotion associated with the noun anger and the adjectives angry, pissed, and others. The other concepts shown above (such as electricity-network) are also placed at appropriate points in the hierarchy and associated with lexical entries.

The script consists of two events: (1) the power outage and corresponding emotional reaction and (2) obtaining an alternative lighting source. The script typically occurs in an apartment, house, or office once a year for an hour.

In general a script consists of the following fields:

Roles (roleNN-of)
A human or physical object that participates in the script. Role 1 is the role from whose viewpoint the script is described. Role 1 of the eat-in-restaurant script is the customer.
Role scripts (roleNN-script-of)
A link to a script described from the viewpoint of a role other than role 1. The viewpoint of the waiter role in the eat-in-restaurant script is described by the wait-tables script.
Entry conditions (entry-condition-of)
The entry conditions of the script. For example:
[entry-condition-of attend-class [enroll student course]]
Results (result-of)
The results of performing the script. For example:
[result-of sleep [restedness sleeper]]
Goals (goal-of)
The personal goals achieved by the script. Personal goals are defined in the goal hierarchy and include the achievement, preservation, and satisfaction goals defined by Schank and Abelson (1977). For example:
[goal-of eat-in-restaurant [s-hunger customer]]
Emotions (emotion-of)
The emotions associated with the script.
Location (performed-in)
The locations where the script is performed.
Duration (duration-of)
The duration of the script.
Period (period-of)
The duration between occurrences of the script for role 1.
Cost (cost-of)
The cost of performing the script for role 1.
Events (eventNN-of)
The events of the script.
Events are listed in chronological order, with multiple events for a given number considered roughly simultaneous. For example:
[event01-of S A1]
[event01-of S A2]
[event02-of S B1]
[event02-of S B2]
specifies that A1 and A2 occur simultaneously, followed by B1 and B2 which occur simultaneously. Priority and simultaneity are the only temporal relations that can be represented.

The goto predicate can be used to indicate repetition. For example:

[event01-of S A]
[event02-of S B]
[event03-of S [goto event01-of]]
specifies the event sequence A, B, A, B, A, B, ... No mechanism for exiting the loop is currently provided.

We have so far entered 93 scripts. As we entered these scripts we noticed and filled in a number of missing concepts and lexical entries in ThoughtTreasure. We also encountered several difficulties.

Difficulties in entering scripts

How should the enterer deal with variations in scripts such as mailing a letter in a mailbox versus handing it to a postal clerk?

If the variations involve major differences in the sequence of events or other fields of the script such as the goal or result, each variation can be entered as a specialization of a more general script:

mail-letter
  => mail-letter-at-mailbox
  => mail-letter-at-post-office
Variations that do not make a big difference can be ignored for the time being. Thus the current script for reading a book does not specify whether the reader is reading in bed or in a chair.

If the variations of a script involve a single physical object that can be generalized, the generalization can be entered: When confronted with the choice between pen and pencil, the enterer can code writing-instrument.

More variations become apparent as the enterer focuses on more detail. The enterer can limit the level of detail in a script by invoking other scripts: Instead of entering all the details about how one reaches a mailbox, a reference can be made to a transportation script. The transportation script can then be defined with specialization scripts for different modes of transportation such as walking and driving. The driving script can invoke an action for putting the key in the ignition. At this point, the enterer may consider the level of detail sufficient and move on.

It is not always clear exactly when a script begins and ends, but the enterer must make some choice: At the beginning of the script for teaching a class, should the teacher prepare for the class? Or is that part of another script? At the beginning of the script for walking a dog, should the dog walker put a leash on the dog?

Entering scripts requires many judgment calls by the enterer, and it is likely the results will contain various errors and confusions. As applications are built we will learn which imperfections of the database are most in need of perfecting.

Applications

We have implemented two simple web-based applications using the script database, as a precursor to more advanced applications.

The first application answers the following commonsense questions:

What does a ___ do? (human)
What is a ___ used for? (physical object)
Where is a ___ found? (human or physical object)
What does ___ consist of?
What is the result of ___?
Where does one ___?
How long does ___ take?
How often does one ___?
How much does ___ cost?
The second application performs a shallow search from concepts activated by text to scripts, in order to determine the most likely scripts for some English text entered by the user. Given:
John poured shampoo on his hair.
the application produces:
score 2.0 for script take-shower based on shampoo, hair
score 2.0 for script go-for-a-haircut based on shampoo, hair
A more advanced script recognition algorithm would perform a deep search (Leacock & Chodorow, 1998).

Related work

In this section, we compare ThoughtTreasure scripts with related databases: Cyc, FrameNet, Gordon's Expectation Packages, and WordNet.

Cyc

Cyc (Lenat, 1995) is a commonsense knowledge base begun in 1984. As of December 1999, it contained 89,373 constants, 10,255 nonatomic constants, 1,012,272 assertions, and 306,886 deductions.

Cyc's Events are similar to scripts. Whereas scripts are represented in ThoughtTreasure using a convenient relational-database-like format, Cyc represents events using arbitrary first-order predicate calculus assertions. This leads to considerable variation in the representation of subevents. Sometimes the subevent appears only in the left-hand side of an inference rule:

(=> (and (subEvents ?X ?U) (isa ?U Staining))
    (isa ?X WoodRefinishing))
while other times the subevent appears only in the right-hand side:
(=> (and (isa ?U ShapingSomething) (subEvents ?U ?X))
         (isa ?X CuttingSomething))
(Cyc assertions and statistics in this paper are based on the Cyc C distribution of April 4, 1997, except where otherwise noted.)

The mapping from the roles of a script to the roles of its subevents is not always provided. When the mapping is provided, several formats are used. For example, the following rule specifies that the person opening the presents at a birthday party is the same person honored by the party:

(=> (and (isa ?OPENING OpeningPresents)
         (subEvents ?PARTY ?OPENING)
         (eventHonors ?PARTY ?HONOR)
         (isa ?PARTY BirthdayParty))
    (performedBy ?OPENING ?HONOR))

The following rule specifies that a dental hygienist performs the teeth cleaning in a dental exam:

(=> (and (isa ?CLE TeethCleaning)
         (performedBy ?CLE ?AGT)
         (subEvents ?EXM ?CLE)
         (isa ?EXM DentalExam))
    (isa ?AGT DentalHygienist))
Information about the roles, places, and duration of a script is represented in a more uniform fashion:
(=> (isa ?U Dancer)
    (actsInCapacity ?U performedBy DancingProcess-Human
                         HobbyCapacity))
(=> (isa ?FM Firefighter)
    (actsInCapacity ?FM performedBy ExtinguishingAFire
                         JobCapacity))

(=> (and (isa ?U ChangingOil) (eventOccursAt ?U ?X))
    (isa ?X ServiceStation))
(=> (and (isa ?U ProducingAnAlcoholicBeverage)
              (eventOccursAt ?U ?X))
    (isa ?X Brewery))

(=> (isa ?X WeddingCeremony)
         (duration ?X (HoursDuration 0.5 2)))
(=> (isa ?U ResearchProject)
    (duration ?U (YearsDuration 0.5 10)))

FrameNet

The goal of the three-year FrameNet project (Baker, Fillmore, & Lowe, 1998), begun in 1997, is to build a collection of frames and annotate sentences in a corpus with links to roles of those frames. 20 frames and 28 unique roles have been defined (Johnson, 1998).

The frames, which focus on the areas of perception, cognition, communication, and motion, are:

Arriving, Awareness, Behaver-evaluating, Cogitation,
Conversation, Departing, Encoding, Inchoative, Judgment,
Noise, Path-shape, Placing, Questioning, Removing, Request,
Response, Self-motion, Statement, Static, Transportation

The Transportation frame consists of the following roles:

Role              Example
Driver            Kim drove through the woods
Cargo & Passenger Kim drove the kids to the store
Vehicle           Kim drove the truck to the store
Source            Kim drove out of the garage
Path              Kim drove down the street
Goal              Kim drove into the woods
Manner            Kim drove dangerously
Distance          Kim drove 500 miles
Area              Kim drove throughout the countryside
(Johnson, 1998)

FrameNet frames lack information about subevents (such as "starting the Vehicle" in the Transportation frame), locations, tracks, entry conditions, and results.

Gordon's Expectation Packages

Gordon (1999) developed a database of 768 simplified scripts called Expectation Packages (EPs) to be used as part of a system for browsing photographic libraries. Each EP contains slots for Events, Places (locations and tracks), People (roles), Things (props), and Misc (entry conditions and results). Slot values are taken from the Library of Congress Thesaurus for Graphic Materials (LCTGM), which contains 5,760 subject terms.

A sample EP is:

28. Commuting on a crowded expressway
Events  Automobile driving, Automobile travel,
        Radio broadcasting, Traffic congestion
Places  Express highways, Toll roads
People  Commuters
Things  Automobiles, Helicopters, Horns (Communication devices)
Misc    Air pollution
(Gordon, 1999, p. 194)

Since slot values must be LCTGM subject terms and not assertions, the EPs are unable to represent arguments to subevents. Thus the fact that the "Commuter" is doing the "Automobile driving" is not captured in the above EP. Because LCTGM terms do not exist for many concepts, EPs often lack important common subevents, props, and roles of a script. EP Events often refer not to subevents, but to generalizations of the script:

Script                               Event
Commuting on a crowded expressway    Automobile driving
Going polka dancing in a dance hall  Folk dancing
Hunting for alligators in a swamp    Alligator hunting

Gordon (1999) writes that "the purpose of Expectation Packages is not to accurately represent scripts, but rather to interconnect existing terms in a thesaurus" (p. 111).

WordNet

WordNet (Fellbaum, 1998) is a lexical database begun in 1985. Version 1.6 contains 129,509 lexical entries, 99,642 synonym sets, and 140,475 relations (not including inverses). It differs from ThoughtTreasure in several ways:

Scripts are not included in WordNet since concepts such as eating at a restaurant and going to a birthday party are not lexicalized in English (p. 100).

All that being said, since some English verbs can be decomposed into subevents, WordNet version 1.6 does break down 427 verb synonym sets into verb synonym sets via the entailment relation (pp. 77-81). 19 events are broken down into 2 or more subevents. They are:

arraign:       indict charge
board:         feed consume
breathe:       inhale exhale
buy:           pay choose
cast:          film stage perform
eat:           chew swallow
master:        understand drill learn
maul:          mutilate injure
postpone:      cancel reschedule
push:          press move
quilt:         pad sew
settle:        move migrate
sky_dive:      glide jump descend
smoke:         inhale exhale
stampede:      run rush
sublime:       condense evaporate
tease:         arouse disappoint
trade:         buy sell
wine_and_dine: eat drink

Summary

This table summarizes the above databases along with ThoughtTreasure:

Name Scripts (#) Subevents (#/script) Roles (#/script) Places (#/script) Other (#/script)
Cyc 185 1.71 0.032 0.092 15.76
FrameNet 20 0 4.94 0 0
Gordon's Eps 768 3.12 6.14 1.71 1.29
ThoughtTreasure 93 8.57 5.30 0.86 6.10
WordNet 1.6 427 1.06 0 0 0

The Scripts column shows the number of scripts in the database. Cyc contains 3071 event constants. Only those events having one or more subevents are considered scripts here. FrameNet frames are considered scripts. ThoughtTreasure contains 2247 action concepts. Only those actions having one or more subevents are considered scripts here. WordNet 1.6 contains 12,127 verb synonym sets. Only those verb synonym sets having outgoing entailment links are considered scripts here.

The Subevents column shows the average number of subevents per script. ThoughtTreasure subevents include arguments, unlike Gordon's EPs, WordNet, and sometimes Cyc. For example, the driving script includes:

[put-in driver key ignition-switch]
[turn driver ignition-switch]
[motor-vehicle-on motor-vehicle]
instead of simply the atomic concept "Starting an Automobile".

The Roles column shows the average number of roles per script. Roles are called frame elements in FrameNet. For Gordon's EPs, Roles includes People and Things.

The Places column shows the average number of locations per script.

The Other column shows the average number of other pieces of information provided for each script, such as entry conditions and results. For Gordon's EPs, Other is the Misc slot. Entry conditions, results, goals, and emotions are provided as assertions rather than atomic concepts in ThoughtTreasure.

Cyc statistics were obtained as follows: Event constants were obtained using the Hierarchical Browser starting from Event with microtheories set to ALL and depth set to 10. A total of 3071 unique Event constants were obtained. Each constant was then queried using the View Constant command of the Cyc Navigator. Links to "more rules" were followed. The 17,596 assertions from these queries were then collected together in a file. Subevent, Role, and Place tuples were then extracted from these assertions by looking for formulas whose predicate was subEvents, actsInCapacity, and eventOccursAt, respectively. Variables were instantiated by looking for isa formulas in the same assertion. For example given:

(=> (and (isa ?Z TurningOffWater)
         (isa ?Y WashingHair)
         (subEvents ?X ?Z)
         (subEvents ?X ?Y)
         (isa ?X Bathing))
    (startsAfterEndingOf ?Z ?Y))
the following tuples were extracted:
Bathing:subEvents:TurningOffWater
Bathing:subEvents:WashingHair
If it was not possible to extract any tuples from an assertion, then an Other tuple was generated for each Event mentioned in the assertion. 16,665 unique tuples resulted. The tuples for each Event were collected together. Only Events having at least one subEvents tuple were considered to be scripts. For each of these 185 Events, the number of Subevents (subEvents), Roles (actsInCapacity), Places (eventOccursAt), and Others were counted.

Future work

In future work we will add more scripts and 2-dimensional grids for representing locations where those scripts are performed. The grid locations of human and physical object roles should also be represented for each event of a script. For example, the waiter is located near the restaurant-table when taking the customer's order in the wait-tables script.

Conclusion

We have constructed a database and lexicon of typical situations or scripts. The database provides a rich network of interconnections that can be used in computational linguistics tasks. We also hope the database will prove useful in building context-aware digital devices.

Appendix A: mail-letter-at-post-office

[ako ^ mail-letter]
[cost-of ^ NUMBER:USD:0.33]
[duration-of ^ NUMBER:second:600]
[event01-of ^ [pick-up sender snail-mail-letter]]
[event02-of ^ [ptrans sender na post-office]]
[event03-of ^ [wait-in-line sender]]
[event04-of ^ [ptrans-walk sender na postal-counter]]
[event05-of ^ [pre-sequence postal-clerk sender]]
[event05-of ^ [pre-sequence sender postal-clerk]]
[event06-of ^ [hand-to sender postal-clerk snail-mail-letter]]
[event07-of ^ [weigh postal-clerk snail-mail-letter]]
[event08-of ^ [postmark postal-clerk snail-mail-letter]]
[event09-of ^ [post-sequence postal-clerk sender]]
[event09-of ^ [post-sequence sender postal-clerk]]
[event10-of ^ [ptrans sender post-office na]]
[goal-of ^ [owner-of snail-mail-letter recipient]]
[goal-of ^ [s-employment postal-clerk]]
[performed-in ^ post-office]
[period-of ^ NUMBER:second:604800]
[role01-of ^ sender]
[role02-of ^ recipient]
[role03-of ^ snail-mail-letter]
[role04-of ^ post-office]
[role05-of ^ postal-counter]
[role06-of ^ postal-clerk]

Appendix B: have-filling-done

[ako ^ dentist-appointment]
[cost-of ^ NUMBER:USD:200]
[duration-of ^ NUMBER:second:3600]
[emotion-of ^ [nervousness role-patient]]
[emotion-of ^ [pain role-patient]]
[event01-of ^ [ptrans role-patient na dental-office]]
[event02-of ^ [ptrans-walk role-patient na waiting-room]]
[event03-of ^ [wait role-patient]]
[event04-of ^ [action-call dental-assistant na role-patient]]
[event05-of ^ [ptrans-walk role-patient waiting-room
               dental-operatory]]
[event06-of ^ [sit-in role-patient dental-chair]]
[event07-of ^ [inject dentist novocaine mouth]]
[event08-of ^ [wait role-patient]]
[event09-of ^ [drill-tooth dentist tooth dental-drill]]
[event09-of ^ [listen role-patient elevator-music]]
[event10-of ^ [fill-tooth dentist tooth dental-filling]]
[event11-of ^ [ptrans role-patient dental-operatory na]]
[goal-of ^ [p-health role-patient]]
[goal-of ^ [s-profit dentist]]
[performed-in ^ dental-office]
[period-of ^ NUMBER:second:1.5768e+08]
[role01-of ^ role-patient]
[role02-of ^ dentist]
[role03-of ^ dental-assistant]
[role04-of ^ tooth]
[role05-of ^ mouth]
[role06-of ^ dental-office]
[role07-of ^ waiting-room]
[role08-of ^ dental-chair]
[role09-of ^ dental-operatory]
[role10-of ^ dental-filling]
[role11-of ^ novocaine]

References

Baker, Collin F., Fillmore, Charles J., & Lowe, John B. (1998). The Berkeley FrameNet Project. In Proceedings of COLING-ACL '98. pp. 86-90. Association for Computational Linguistics. Available: http://www.icsi.berkeley.edu/~framenet/docs/acl98.ps.

Dyer, Michael G. (1983). In-depth understanding. Cambridge, MA: MIT Press.

Fellbaum, Christiane. (Ed.). (1998). WordNet: An electronic lexical database. Cambridge, MA: MIT Press. Available (earlier versions of some chapters): http://www.cogsci.princeton.edu/~wn/papers/.

Gordon, Andrew S. (1999). The design of knowledge-rich browsing interfaces for retrieval in digital libraries. Doctoral dissertation, Northwestern University, Evanston, IL. Available: http://www.ils.nwu.edu/~gordon/Dissertation.pdf.

Johnson, Christopher. (1998). Syntactic and semantic principles of FrameNet annotation (Online). University of California, Berkeley. Available: http://www.icsi.berkeley.edu/~framenet/docs/train/annomanual/anno_manual.html.

Leacock, Claudia, and Chodorow, Martin. (1998). Combining lexical context and WordNet similarity for word sense identification. In Fellbaum, Christiane. (Ed.), WordNet: An electronic lexical database. pp. 265-283. Cambridge, MA: MIT Press.

Lenat, D. B. (1995). CYC: A large-scale investment in knowledge infrastructure. Communications of the ACM, 38(11), 33-48.

Minsky, Marvin. (1974). A framework for representing knowledge (AI Laboratory Memo 306). Artificial Intelligence Laboratory, Massachusetts Institute of Technology. Available: ftp://publications.ai.mit.edu/ai-publications/0-499/AIM-306.ps.

Mueller, Erik T. (1998). Natural language processing with ThoughtTreasure. New York: Signiform.

Schank, Roger C., and Abelson, Robert P. (1977). Scripts, plans, goals, and understanding. Hillsdale, NJ: Lawrence Erlbaum.

Wilks, Yorick. (1975). A preferential, pattern-seeking, semantics for natural language inference. Artificial Intelligence. 6(1), 53-74.


Copyright © 1999 Erik T. Mueller. All Rights Reserved.