LANGUAGE COMPREHENSION AS GUIDED EXPERIENCE Rolf A. Zwaan Barbara Kaup Robert A. Stanfield Carol J. Madden Florida State University KEY WORDS: comprehension, embodied cognition, language, mental representation, mental simulation, situation model Address all correspondence to: Dr. Rolf A. Zwaan Department of Psychology Florida State University Tallahassee, FL 32306-1270 phone: 850-644-2768 FAX: 850-644-7739 zwaan@psy.fsu.edu ABSTRACT (SHORT) Language comprehension is best viewed as guided experience. The linguistic input provides cues to the human brain as to how to construct experiential simulations of the state of affairs it denotes. We show that this view of language comprehension is consistent with a range of extant evidence in a variety of fields, ranging from historical linguistics to cognitive neuroscience. We furthermore discuss new evidence that directly supports the experience-based view. We argue that the prevailing amodal view of language comprehension is unable to coherently account for this evidence. ABSTRACT (LONG) Most models of language comprehension assume that it entails (1) the conversion of the verbal input into amodal propositions and (2) the integration of these propositions as well as propositions derived from semantic long-term memory into a coherent network. In contrast to this "network-building" view, we propose to view language comprehension as guided experience. According to this proposal, the verbal input provides a set of cues to the comprehender on how to run mental simulations (in the manner proposed by Barsalou: Perceptual Symbol Systems", BBS 22(4), 1999) of the events described, such that comprehension becomes equivalent to a vicarious experience of those events. We show that this proposed view is consistent with evidence from a variety of fields, ranging from historical linguistics to cognitive neuroscience. We furthermore discuss new experimental evidence that directly supports our view and is not predicted by the network-building view. The network-building view is unable to coherently account for this evidence. We conclude that viewing language comprehension as guided experience provides a powerful searchlight onto this fundamental human cognitive skill. 1. Introduction 1.1. The nature of language comprehension The prevailing view in cognitive science has been that language comprehension is the construction of a coherent mental representation consisting of the propositions that are expressed by the verbal input, augmented with propositions derived from background knowledge if presupposed by the verbal input. We propose a different view. Language comprehension is guided experience where by language forms a complex set of cues (footnote 1) on how to construct an experiential representation of the state of affairs it describes. A record of the verbal input may be kept as well. This proposal breaks with the standard view in two ways. First, it defines the goal of language comprehension as the construction of a mental representation of the referential situation, rather than of the input itself. This constructivist view is of course not new (e.g., Bower & Morrow 1990;Gernsbacher 1990; Graesser, Millis, & Zwaan 1997; Graesser, Singer, & Trabasso 1994; Johnson-Laird 1983; Sanford & Garrod 1981; van Dijk & Kintsch 1983; Zwaan & Radvansky 1998). However, our proposal is a more radical version of this view in that it assumes that the building blocks of situation models are perceptual symbols derived from how we perceive and interact with the environment (Barsalou 1999) and not amodal propositions. Until now, situation models have typically been viewed as constructed from amodal propositions (e.g., Kintsch 1998; but see Johnson-Laird 1983; Johnson-Laird, Herrmann, & Chaffin 1984; Sanford & Garrod 1998). In our view, perceptual symbol systems provide a more natural representational format for conceptualizing situation models than amodal propositions. We will show that the view of language comprehension as guided experience is consistent with a range of literature that the amodal- propositional view cannot readily account for. Various other researchers have proposed perceptually grounded embodied views of cognition (e.g., Barsalou 1999; Bisiach 1988; Glenberg 1997; Goldstone & Barsalou 1998; Harnad 1990; Johnson 1987; Johnson-Laird 1983; Lakoff 1987; MacWhinney 1999). Our proposal is similar in spirit. However, we focus specifically on language comprehension. Given that most of the language comprehension research has focused on narratives, we will focus on narrative comprehension in what follows. However, in section 8.3.1. we examine whether and how the view of language comprehension as guided experience can be extended to other text genres. 1.2. Main claims Our main claims are as follows: _ Comprehending language is based on how we experience our environment. We routinely construct, maintain, and update situation models of our environment (e.g., Damasio 1999). Language comprehension allows us to do this vicariously. _ Language comprehension involves perceptual symbols (Barsalou 1999). We do not claim that perceptual symbols are only constructed under special conditions in language comprehension. Rather, we claim that such symbols are constructed routinely. _ Linguistic expressions are cues to the mind/brain as to how to construct situation models. Linguistic cues allow us to take a perspective within the referential world from which we construct our models. _ The construction of situation models based on verbal input involves the same brain areas that are involved in the construction and maintenance of experience-based situation models. This implies that language comprehension involves a variety of brain areas, much more than the traditional language areas (Broca's and Wernicke's areas). _ The mind/brain may keep a record of the verbal input, but this is not critical to comprehension once a referential experiential representation has been established. We hypothesize that the record of the verbal input is an experiential representation of the input itself and not an amodal representation of the propositions expressed by the input. We adduce evidence from various fields, ranging from historical linguistics to neuroimaging, in support of each of these claims. However, we begin with a discussion of what might be viewed as the standard view of language comprehension in general and text comprehension in particular. 2.0 Theories of comprehension 2.1. The propositional view The amodal conceptualization of language comprehension can be summarized as follows. Comprehenders transform the verbal input into propositions and connect these propositions when they share an argument. Whenever it is impossible to connect an incoming proposition to the propositions already in working memory (WM), a "coherence gap" occurs, and comprehenders use their background knowledge to generate a "text-connecting" or "bridging" inference, a proposition derived from semantic or episodic long-term memory (LTM) that has an argument in common with the proposition(s) currently in WM and with the incoming proposition. Furthermore, the amodal conceptualization of text comprehension assumes that readers may occasionally make "elaborative inferences," propositions from background knowledge that share an argument with the incoming proposition, but do not serve to connect it to the propositions already in WM. This view was most comprehensively discussed in an influential article by Kintsch and van Dijk (1978), but similar ideas can be found elsewhere, for instance in an early textbook on psycholinguistics (Clark & Clark 1977) and still underlies various more recent conceptualizations of language comprehension (e.g., McKoon & Ratcliff 1992; Myers & O'Brien 1998). The Kintsch and van Dijk model was successful with respect to the task it was designed for: the prediction of the recall likelihood of text statements. However, although Kintsch and van Dijk were quick to acknowledge the limitations of their model and saw propositions more as a "convenient shorthand," this did not prevent many researchers from adopting their model as the model of language comprehension. 2.2. Situation models Van Dijk and Kintsch (1983) and Johnson-Laird (1983) proposed that language comprehension entails more than merely the construction of a textbase. Specifically, it is tantamount to the construction of a mental representation of the state of affairs described in the linguistic input, a mental model or situation model. Similar ideas had been proposed in linguistics and philosophy (e.g., Habel 1986; Hangkamer & Sag 1976; Jackendoff 1983; Kamp 1981). A great deal of empirical evidence has been amassed for the situation- model view reviewed in Zwaan & Radvansky 1998). This evidence suggests that comprehenders are influenced by their experience with the type of situation that is being described, rather than by the propositional structure of the verbal input. Consider the following sentences: 1a The actress walked onto the stage. A moment later she collapsed. 1b The actress walked onto the stage. An hour later she collapsed. In a propositional analysis, the propositions derived from these two sentences would be connected through argument overlap to yield an interconnected network of propositions, the textbase, representing the meaning of the verbal input. For example, the predicates WALK and COLLAPSE share the argument ACTRESS. In addition, LATER can be interpreted as LATER-THAN-THAT, whereby THAT refers to the most recently processed proposition, and thus results in the complex proposition LATER-THAN([WALK[ACTRESS,ONTO STAGE]]). The time adverbs moment and hour would simply modify this proposition, but would otherwise be irrelevant with respect to the structure of the textbase. However, Zwaan (1996) found that the two temporal expressions differentially impact processing. For example, when subjects are presented with the verb mentioned in the first sentence (e.g., walked) and are asked to indicate (by quickly pressing a YES or NO key) whether or not the word appeared in the text, they do this reliably more quickly when the time interval is small (as in moment) compared to when it is long (e.g., hour or day). Thus, the verb is more accessible to the comprehender when it denotes an event that is temporally close to the current state of affairs in the story than when it denotes a temporally more distant event (see Anderson, Garrod, & Sanford 1983; Carreiras, Cariedo, Alonso, & Fernandez 1997; Rinck & Bower, in press, for converging evidence). Findings such as these are difficult to explain from the view that language comprehension can be modeled as the construction of a coherent network of propositions. However, they are exactly what would be predicted if we assume that comprehension involves taking a perspective within the referential situation and then constructing a mental representation (see also MacWhinney 1999). In the referential situation, the event of WALKING is still close in time to the current event (COLLAPSING) in the moment version, but not in the hour version and hence the difference in response latencies. The two decades worth of studies reviewed by Zwaan and Radvansky show similar effects for dimensions other than time, such as space, causation, motivation, and protagonist. Although this evidence is consistent with an experience-based view of language comprehension, it lacks specificity with respect to the structure and content of situation models. We believe this is due to a combination of a narrow (and often implicit) theoretical focus and methodological limitations. 2.3. Theoretical and methodological limitations In the theoretical arena, situation-model research has been motivated more by the goal to demonstrate that the textbase is insufficient to capture the essence of comprehension than by the goal to delineate what situation models are. An uncharitable conclusion about the research might be that the situation model is whatever the textbase is not. Often, situation models appear to be viewed as mere addenda to the textbase, or as a mechanism enabling the construction of a coherent textbase. As such, this view puts the cart before the horse. In the methodological arena, situation model researchers have relied mostly on reaction-time measures. Although informative with regard to ease of integration and the activation levels of information, these measures are not the optimal tools to provide a window into the nature of the mental representation. Two developments will enable us to overcome these limitations and examine the contents of situation models more closely and thus test the assumption inherent in the situation-model view (but rarely made explicit) that language comprehension is guided experience. In the theoretical arena, frameworks have been developed to view language as relying on basic perceptual/experiential representations, chiefly in linguistics (e.g., Heine 1997; Johnson 1987; Lakoff 1987; Lakoff & Johnson 1980; Langacker 1987; Talmy 1988) and more recently also in cognitive psychology (Barsalou 1999; Glenberg 1997; Goldstone & Barsalou 1998; Harnad 1990; Kelter, in press; MacWhinney 1999) and neuropsychology (Bisiach 1988). The experience-based framework makes specific predictions about the content of situation models, which are amenable to empirical tests. Thus, a theoretical framework and representational format now exist that can be used to generate specific predictions with regard to the experiential nature of situation models. As such, experience-based theories throw a new searchlight (Popper 1985) on language comprehension. Second, there are two advances in the methodological domain that will enable research into the perceptual nature of situation models. First, brain imaging techniques such as PET (positron emission tomography), fMRI (functional magnetic resonance imaging) enable researchers to examine and localize patterns of activation in the brain as a function of exposure to verbal stimuli. As such, these methods provide information with regard to the potential overlap in brain areas activated by experience and by language comprehension. In addition, it is possible to use different behavioral measures to examine experiential aspects of comprehension, e.g., the presentation of pictorial stimuli. We will discuss recent research using these methods in sections 3. and 5. 2.4 Situation models and representations of the verbal input Researchers have proposed multi-level kinds of mental representations that accommodate both mental representations of the verbal input (its surface structure or its propositional structure) and mental representations of the referential situation (e.g., Anderson 1983; Johnson-Laird 1983; Kintsch 1998; Paivio 1971). There seems to be general agreement that language comprehension involves both a record of the verbal input and a mental representation of the described state of affairs. Furthermore, there are no unequivocal proposals regarding the representational format of situation models. For example, while Kintsch (1998) views situation models primarily in terms of propositional networks, he also allows for the possibility that aspects of situation models are encoded as mental images, a position that appears consistent with Anderson (1983). Johnson-Laird (1996), on the other hand draws, a distinction between mental models and mental images, the former being more abstract than the latter. Barsalou (1999) recently proposed a view of situation models as perceptual simulations, mental representations involving perceptual symbols, records of neural activation that occurs when events in the environment or in the observer's body are perceived. This view offers a more natural account of situation-model construction than does the amodal view and as such, it is central to our main thesis. It will be discussed in more detail in section 2.6. We furthermore derive from Barsalou's proposal the idea that if a representation of the verbal input is stored in LTM, it is stored in the form of schematic perceptual symbols of the experiential states of hearing or seeing words. As we will argue in section 7, these representations can be used to derive referential meaning, but are as such not representations thereof. 2.5. Experience-based representations as the building blocks of cognition. Our view trades on the assumption that people are able to segment the information coming in during direct experience into meaningful units, e.g., events and objects. Researchers from various research domains have made the same assumption (e.g., Croft 1998; Donald 1991; Gibson 1979; Goldberg 1998; Johnson 1987; Lakoff 1987; Langacker 1987; Nelson 1996; Talmy 1988; Zacks & Tversky, in press; Zwaan & Radvansky 1998; Zwaan 1999). Zacks and Tversky (in press) discuss a range of evidence from a variety of disciplines attesting to the human ability to carve up the perceptual stream into events. Specifically, the human perceptual system is attuned to changes in the environment and instances of maximal perceptual change are used as cues for where to place event boundaries (Newtson, Engquist, & Bois 1977). Thus, we assume that experience-based representations form the building blocks of language comprehension. With many others, we assume that these event representations are extracted from direct experience and are stored as such in the brain, where they can be reactivated by verbal input, leading to an experiential simulation of the event in question (cf. Barsalou 1999). 2.6. Situation models as experiential simulations Following Barsalou (1999), we view situation models as perceptual simulations, although we use the broader, and in our view more appropriate, term experiential simulations. Barsalou (1999, p. 641) defines perceptual simulations as "the top-down activation of sensory-motor areas to reenact perceptual experience." In the case of language comprehension, it is the verbal input that serves as cues to activate and integrate perceptual symbols stored in LTM into experiential simulations (see section 5 for a detailed discussion). Thus, in contrast to existing situation model theories of discourse comprehension, we consider situation models to be dynamic representations (Freyd 1992), the building blocks of which are experiential event simulations. 3.0. Claim 1: language comprehension is experientially-based 3.1 Language and displacement Language use is an exclusively human capability and perspective taking is one of its essential characteristics. There are several features that set language apart from other forms of communication, but of particular interest here is the feature of displacement (Hockett 1959). While humans must have already had the ability to mentally represent interpretations of non-present situations, the advent of language made it possible to communicate those interpretations to other minds. Thus, with language, communication was no longer bound to the "here-and-now" of the immediate physical surroundings and humans were enabled to take different perspectives. As such, displacement allows us to convey events in ways that deviate from our everyday experience. For example, we can describe events in orders different than their chronological order, something Aristotle considered a hallmark of fiction (Poetics, trans. 1967) 3.2. Mimetic competence The primary function of language is to convey a mental representation of an actual ("The roads are slippery"), a desired ("Please close the door"), an undesired ("Don't walk on the grass"), or even a fictitious ("If I were a millionaire .....") state of affairs from one mind to another. It makes sense to assume that some ability to mentally represent and understand events in the world was already present in hominids before the advent of language. Donald (1998) refers to this as episodic competence: "Remembering and responding to social situations is a complex multi-channel task that demands the integrated use of several brain functions. One of these functions is social event-perception, and its corollary, event parsing, which includes an implicit understanding of the agents, their interactions, their effects on the contingencies of action, and the consequences of perceived episodes." (pp. 58- 59). Episodic competence makes substantial demands on attentional and WM resources, given that models of the environment need to be maintained and constantly updated as (social) situations change. It became especially important when cultures of ever-increasing complexity started developing. Donald argues that mimetic skill, the skill to mentally and physically simulate events, can be viewed as the basis for early hominid accomplishments, such as toolmaking. To make a tool, one has to envision its purpose, choose a material, and design a shape that best meet the constraints of the human body and hand and of the object that the tool is designed to operate on. In other words, one needs to form a situation model of the event of using the tool before one is made. This skill provides a foundation for the development of language, because it underlies human conventionality. Donald proposes that the brain areas involved in creating this behavior form an executive suite, which includes: Prefrontal cortex, tertiary areas of the parietal-temporal cortex and most of the insula, cingulate gyrus and hippocampus, plus midline thalamus and basal ganglia. The availability of the left-hemisphere areas for language allowed hominids to convey situation models from one brain to another through the medium of language. 3.3. Body and environment as templates The origin of language itself can be viewed in perceptual terms (Heine 1997). For example, the human body has functioned as a template for the development of all human number systems. In languages such as English, this is expressed only in the decimal number system, corresponding to our fingers. However, in other languages, the perceptual origin of the number system is more transparent. For example, in the Central Sudanic language Mamvu, the word for six is "the hand seizes one" and for 10 it is "all hands" and for 20 "one whole person." Similarly, spatial terms in a variety of languages are based on three models: the human body in an upright position, environmental landmarks, and dynamic concepts (Heine 1997; Svorou 1994). Basic perceptual representations can be metaphorically and metonymically extended to instances that ostensibly show a less direct perceptual origin. Lakoff (1987) discusses how the Japanese classifier hon is typically used to classify long and thin objects, such as sticks, pencils, hair, trees, and ropes, but is extended to refer to martial arts contests involving long and rigid objects such as staffs and swords and even to martial arts contests that do not involve staffs or swords. Given that staffs and swords are the primary functional objects in martial arts such as kendo, they are associated with the primary goal of matches. Therefore, hon is also associated with a win. A similar conceptual extension underlies the use of hon with respect to baseball. For example, hon classifies hits in baseball that involve a straight trajectory. Lakoff argues that this is motivated by two reasons. First, straight trajectories resemble the image schema of a long and thin object. Second, hits are the primary goal of baseball. Because ground balls, foul balls and pop flies don't meet either of these criteria, they are not typically classified by hon. 3.4. The structure of verbal descriptions The experiential origin of language is also reflected in the typical structure of verbal descriptions. For example, the default order in which events are narrated is their chronological order (Dowty 1986). Thus, the order in which events were perceived to occur in a real or fictional situation is a major mechanism for structuring verbal descriptions. It has been demonstrated empirically that deviations from chronological order, as in Before he patted the dog, he jumped the fence lead to (minor) processing difficulties compared to their chronology-respecting counterparts(Mandler 1986; Muente, Schilz, & Kutas 1998). Another example is the "strong iconicity assumption" (Zwaan 1996), according to which comprehenders expect consecutively narrated events to have occurred in a temporally contiguous fashion, unless told otherwise. Consistent with this principle, Grimes (1975, p. 36) observed that in Kate, a language of Papua New Guinea, temporally contiguous events are grammatically separated from events "that are separated by a lapse in which nothing of significance for the story happens." Zwaan (1996) provided empirical support for the strong-iconicity assumption. Consecutively narrated events are more easily understood when they occurred consecutively in the narrated situation than when there is a time lapse (as in "an hour later"). Findings such as this support the claim that comprehending such a verbal description means constructing an experiential representation of the described states of affairs. 3.5. Ontogenesis of language ability 3.5.1. Mental event representations. Pre-verbal children parse incoming information into meaningful representations not only of objects and spatial arrangements (Mandler 1996), but also of sequences of events (Nelson 1996). These representations provide a cognitive scaffolding for the child's acquisition of language. While it is relatively easy to see how children form perceptual representations of objects, events are by nature evanescent. However, Wynn (1996) demonstrated that 6-month-old infants are able to individuate actions by using perceptual cues. This is quite easy when temporal boundaries consist of a contrast between motion and nonmotion. However, the infants were even able to segment streams of continuous action. The underlying mechanisms by which they achieve this are not well understood. One hypothesis (e.g., Zacks & Tversky, in press) is that they involve being able to detect the transition from one kind of motion into another kind of motion. In other words, the infants use moments of maximal perceptual change as cues for establishing event boundaries. Another hypothesis is that events become cognitive entities after they have been repeatedly experienced in different contexts (Avrahami & Kareev 1994). 3.5.2. Construction grammar. Atomic events underlie the acquisition not only of words, but also of argument structures (e.g., Croft 1998; Goldberg 1998). Argument structures carry meaning and correspond to basic perceptual events (Lakoff 1987; Langacker 1987; Talmy 1988). For example, in English the basic event of caused motion (X causing Y to move Z) takes the following form: Subject-Verb-Object-Oblique. This construction corresponds to the light verb put, which is acquired by children at a young age. Once learned, this construction can be used to understand sentences such as Pat sneezed the foam off the cappuccino despite the fact that sneeze is an intransitive verb (see Goldberg 1998 for a discussion). The process here is one of metaphorical extension (e.g., Lakoff 1987). 3.6. Evidence on being "in" the referential situation Much of the literature on situation models (as reviewed in Zwaan & Radvansky 1998) can be construed as consistent with the experience-based view advanced here. For example, Zwaan (1999) proposed that the evidence is consistent with the view that comprehenders behave as if they are "in" the narrated situation (see also Gerrig, 1993). Mental representations of objects that are "in" or relevant to the current situation are more activated than those of objects that are not (e.g., Glenberg, Meyer, & Lindem 1987; MacDonald & Just 1989; Morrow, Bower, & Greenspan 1989; Morrow, Greenspan & Bower 1987; Rinck & Bower 1995), events that are currently ongoing in the situation are more activated in the comprehender's WM than events that are not (Magliano & Schleich 2000; Rinck & Bower, in press; Zwaan 1996; Zwaan et al., in press), protagonists that are in the current situation are more activated than protagonists that are no longer (Anderson, Garrod, & Sanford 1987; Carreiras et al. 1997), goals that have not been accomplished yet are more activated than accomplished goals (Trabasso & Suh 1993). Moreover, readers assume the spatial perspective of a protagonist in the story (Bower, Black, & Turner 1979; Bryant, Tversky, & Franklin 1992; Franklin & Tversky 1990; Morrow & Clark 1989; Rall & Harris 2000). This shift from an environmental perspective to a perspective within the referential world, made possible because of the feature of displacement (Hockett 1959), has been termed the "deictic shift" (Duchan, Bruder, & Hewitt 1995; Gerrig, 1993). Recent evidence suggests that children learn perspective taking and situation model construction from language at a very early age (Rall & Harris 2000). Like adults, three and four-year-olds recall deictic verbs of motion, such as come and go and bring and take more accurately if they are consistent with the perspective of the protagonist than if they are not, suggesting that they assume the perspective of the protagonist. 4.0. Claim 2: Perceptual symbols are used in language comprehension 4.1. Perceptual processes and representations in language comprehension Stanfield and Zwaan (in press) conducted a direct test of the experience- based view. They presented subjects with sentences such as Rick put the pencil in the cup and Rick put the pencil in the drawer. The sentence was followed after 250 ms by a picture of an object that was or was not denoted by the sentence. On critical trials, the pictured object had been mentioned in the sentence (e.g., pencil) and was presented in one of two orientations: vertical or horizontal, thus creating a match or a mismatch with the implied orientation of the object mentioned in the sentence. The subjects responded as quickly as possible whether the picture depicted an object mentioned in the sentence. Stanfield and Zwaan found that recognition responses were significantly faster when there was a match between the implied orientation and the orientation of the picture than when there was a mismatch (838 ms vs. 882 ms). Amodal propositional theories do not predict such a finding, given that the sentences did not state anything about the object's orientation and given that no amodal theory predicts that comprehenders make inferences about object orientation when presented with verbal stimuli such as these. Of course, the finding is exactly what would be predicted by the experience-based view. Klatzky, Pellegrino, McCloskey, and Doherty (1989) found that the comprehension of verbally described actions was facilitated by primes specifying the relevant hand shape, which subjects had been trained to enact previously. For example, sensibility judgments for phrases such as inserting a key or picking a grape were faster when preceded by the prime finger grasp. Having ruled out alternate explanations, Klatzky et al. interpret these findings as suggesting that a cognitive/motoric simulation of the hand shape was responsible for the priming effect. In the terminology used here, the perceptual symbol for the hand shape was activated by the prime word and was then readily available for the experiential simulation of the described action. Morrow and Clark (1989) observed that comprehenders' interpretation of Motion words, such as "approach" depends on physical characteristics of the situation. Specifically, it depends on the size of the object and the landmark. For example, in "The mouse approached the fence" subjects represented the mouse as being much closer to the fence than they did the tractor in "The tractor approached the fence". Similarly, the tractor would be closer to the landmark in The tractor approached the fence than in The tractor approached the farm. It seems that an embodied-perceptual account could explain this more readily than an amodal symbolic system. For how would the various meaning gradations of action verbs be represented in such a system? It would seem unlikely that we have stored multiple copies of "approach" in which its meaning is defined relative to specific combinations of objects and landmarks (see sect. 5.5. for a discussion on compositionality). 5.0. Claim 3: Language as processing cues 5.1. Towards a taxonomy of processing cues Cognitive linguists view language as a set of processing instructions to the comprehender so as to construct a mental representation of the described states of affairs (e.g., Givon 1992; Langacker 1987; Talmy 1988). Based in part on this research, as well as on research on situation models, we propose a taxonomy of processing cues on which we distinguish between construction and profiling cues on the one hand and integration cues on the other. Construction and profiling cues both involve the construction of single event representations, whereas integration cues involve the manner in which the event currently being simulated is integrated with the evolving situation model. It should be noted that our distinction between construction/profiling on the one hand and integration is based on the nature of the processes involved, not on their temporal order. Thus, we do not necessarily want to claim that construction/profiling always precedes integration. 5.2 Construction cues Apart from the nature of the representations involved, the modal and amodal views make similar claims about construction cues. Linguistic forms are cues to the comprehender to activate information from semantic or episodic LTM into WM and use them as the building blocks for mental representations. On the amodal view, these representations are stored amodally in LTM and on the perceptual view, they are stored as perceptual symbols. Thus, the two examples provided below are not diagnostic with respect to the viability of our experiential view of language comprehension, but provide a characterization of the process we have in mind. The most basic construction cues are indefinite noun phrases (NPs) and verb phrases (VPs). An indefinite NP can be viewed as a cue to introduce a perceptual symbol in the situation model standing for an entity with the properties denoted by the noun and possibly included modifiers such as, for instance, adjectives. Examples for entities introduced by an indefinite NP are objects (e.g., a house, a green tomato), people (e.g., a farmer; a cute little boy), abstract entities (e.g., a good idea, freedom), or events (e.g., a festival, a brutal beating). In contrast to indefinite NPs, definite NPs do not usually trigger the construction of a new perceptual symbol but rather act as a signal to reactivate a symbol that is already part of the integrated episodic memory representation constructed for the purpose of assigning a new property to it or a new relation to it and other tokens. As another example, verb phrases with spatial prepositions instruct the comprehender how to assemble the denoted entities into a spatial configuration. Thus, for example, above is an instruction to place a perceptual symbol denoting one entity above that denoting another entity in the representation. This is the meaning of above. 5.3. Profiling cues Profiling can be likened to directing attention, and figure-ground separation (Langacker 1987). By profiling, speakers and writers control what information about a situation becomes activated in the comprehender's mind/brain by specifying which components are figures and which are ground, or by simply directing the comprehender's attention to a specific aspect of the event. Our visual attention system is extremely selective, such that we often fail to notice what afterwards appear as blatant changes in our visual field, a phenomenon known as change blindness (e.g., Rensink, O'Regan, & Clark 1997; Simons & Levin 1997). If our visual cognitive system constructs representations only as needed and lets them disintegrate when attention shifts, we would not expect our cognitive system to be more constructive when the input is not directly perceptual, but verbal. Thus, profiling is not just a quirk of language. Profiling implies that we do not simply convey a situation, we convey a specific interpretation of that situation. For example, "The shark is below the seal" does not yield the same representation as "The seal is above the shark" (although it would in an amodal propositional representation). In the first case, the shark is the figure and the seal is the ground, whereas in the second case figure and ground are reversed (Langacker 1987). Thus, although both sentences describe the same spatial layout, they profile the situation differently. In perceptual terms, what we are doing here, is manipulating the focus of attention in the perceptual field. Consistent with this view, McKoon, Ward, Ratcliff, and Sproat (1993) found that sentences such as However, lately he's taken up deer hunting leads to a reduced accessibility of DEER compared with However, lately he's taken up hunting deer. Again, this difference would not be captured in an amodal propositional system, as both sentences would have the same propositionalese translation HUNT(he, Deer) (see Perfetti & Britt 1995 for a discussion). However, it does provide evidence for the role of syntactic structures in profiling representations. In deer hunting, the entire activity is profiled (e.g., against an implied background of other leisure activities, such as sailing or skiing). In contrast, hunting deer profiles deer with respect to a background of other potential targets, e.g., ducks or bears. Thus, one would expect DEER to be more activated in the latter case than in the former. Context might profile a property of an object. For example, subjects confirm more quickly that tomatoes are round than that tomatoes are red after reading a sentence such as The little girl found a tomato to roll across the floor with her nose that emphasizes their roundness (McKoon & Ratcliff 1988; see also Tabossi & Johnson-Laird 1980). Another way to profile events is through the use of verb aspect (Croft 1998; ter Meulen 1995; Vendler 1967; Verkuyl 1972). Verb aspect profiles the temporal contour of events. For example, in He swept the floor, the action of sweeping is profiled as punctual and completed. Of course, world knowledge tells us that sweeping is not a punctuate activity, but linguistic analyses suggest that this is how it should be interpreted. On the other hand He was sweeping the floor profiles the action as extended through time. The simple past tense describes the action of sweeping as punctual and can thus be viewed as "a semantic ticket to disregard change internal to the described event, to treat it as atomic and close off its internal structure to further description" (ter Meulen 1995, p. 12). On the other hand, the progressive profiles the action as ongoing. As such, it opens up its internal structure. Zwaan and Stanfield (submitted) investigated this prediction empirically. They presented subjects with sentences such as The teacher swept the floor and The teacher began sweeping the floor, embedded in short narratives. The sentences were followed by a word denoting an instrument typically used in the action (a broom in this case). In a condition where the broom had not been previously mentioned in the text, the correct recognition response was NO. Subjects were reliably slower in making a correct rejection when the action was described as ongoing than when it was described as punctual. This finding, as well as Zwaan and Stanfield's additional findings, supports the view that verb aspect is a cue to profile events. Perspective of description can also be considered a profiling cue. A sentence such as Susan was sitting in the living room when Bill came in profiles the situation from the perspective of Susan whereas the sentence Susan was sitting in the living room when Bill went in profiles the situation from the perspective of Bill. Both adult and preschool comprehenders have been demonstrated to assume a protagonist's perspective (Bower, Black & Turner 1979; O'Brien & Albrecht 1992; Rall & Harris 2000). Another way to profile is to make use of pragmatically marked constructions. As was mentioned above, the definite article is usually used in cases in which the corresponding entity is already available to the addressee. Using the definite article in a context in which the corresponding entity is not already available to the addressee, can be considered a means to highlight the corresponding entity and does indeed lead to enhanced activation for its mental representation (Gernsbacher & Schroyer 1989). Choosing an adequate level of grain size is also a matter of profiling. Plural expressions such as the children or they seem to function as a signal to the reader/listener not to distinguish between the individuals and to represent only one token (standing for all the children referred to), whereas partitioning plural expressions such as several of the children or both signals for the individuals to be kept distinct (Kaup, Kelter, & Habel, submitted). This difference is, for instance, evidenced by responses to a sentence-interpretation task, in which subjects were told that each of a list of sentences refers to two particular individuals. Notwithstanding the instruction, subjects interpreted sentences differently depending on whether or not the subject NP was they or both. A sentence such as Both bought two bars of chocolate was interpreted as describing a situation in which four chocolate bars were bought, whereas a sentence such as They bought two bars of chocolate was interpreted as describing a situation in which only two chocolate bars were bought. There are various other profiling cues. One example is voice (active vs. passive). By saying "Reggie fouled Shaquille," we make Reggie the figure and Shaquille the ground whereas the reverse is accomplished by "Shaquille was fouled by Reggie." Thus, in the first case we may be led to wonder about Reggie's motives, whereas in the second case we may wonder about the consequences for Shaquille. Another example is manner of reference (e.g., proper name vs. role). Introducing a character with a proper name usually signals that this is the main character of a story whereas introducing a character by his or her role name (e.g., "the waitress"), typically signals a scenario-bound character (Anderson, Garrod, & Sanford 1983), that is, a character bound to only one of the settings of a story (e.g., a restaurant). The main character is more likely to become the vantage point from which the reader experiences the story than a scenario-bound character. It is important to note that different languages may profile different aspects of a situation. For example, when English speakers use hang for hanging up a coat, they profile the orientation of the coat and not how it is attached. In contrast, when speakers of Korean use kelta to denote the same action, they profile how the coat is attached and not its orientation (Bowerman 1996). 5.4. Integration cues Integration cues guide the integration of the constructed event representation constructed in WM with the ongoing simulaton. For example, the current simulation may involve the same agent as the previous simulation, or the same location, time, or object, or all of the above. In all this, integration cues is to inform comprehenders which elements from previous simulations should be reactivated into WM to be reused in the current simulation. Recent electrophysiological evidence suggests that integration at the discourse level is a rapid process (van Berkum, Hagoort, & Brown 1999). When a sentence contradicts an earlier sentence, a negative going brain wave is generated, which peaks around 400 ms. This so-called N400 also occurs when a word does not fit in with the sentence in which it occurs. These findings suggest that integration at the discourse level is at least as rapid as integration at the sentence level. Several other theories have been proposed with regard to the integration of discourse (e.g., Gernsbacher 1990; Kintsch & van Dijk 1978; Kintsch 1988; Myers & O'Brien 1998), but none from a perceptual perspective. These theories explicitly or implicitly assume a propositional point of view and regard text comprehension as the integration of propositions whereas we propose that comprehension involves the incorporation of elements from previous mental simulations into current mental simulations. There is a variety of integration cues in language. Some of the most obvious ones are definite NPs and sentential connectives such as and, then, because, and however. As noted in the previous section, definite NPs are cues to reactivate already represented information for the purpose of assigning further properties to the representation. Sentential connectives, on the other hand, connect simulations, typically on temporal and causal dimensions. Particular VPs explicitly tell the comprehender to discontinue a simulation, as in "John was playing the piano. When his mother came in he stopped." Here, the final verb phrase stopped tells the comprehender to discontinue the simulation of John playing the piano. If the final verb had been continued, this would have been a cue to maintain the simulation of John playing the piano. Indeed, Zwaan, Madden, and Whitten (in press) found that the activation level of PLAYING in the comprehender's WM decreases when the text states that the activity is discontinued. A time adverbial such as an hour later not only tells the comprehender that the upcoming event occurred later than the previously simulated event, it also specifies the lapse in time between the two events. As such, it informs the reader about a discontinuity in the simulation. Given that we do not experience such discontinuities in real life, it can be expected that they provide a processing problem for the comprehender, and indeed research shows they do (Zwaan 1996). 5.5. Compositionality The process of integration presupposes that meaning is created by combining perceptual symbols in some fashion. For instance, the perceptual symbol for zebra can be acquired by combining the perceptual symbols for horse and striped even without ever actually having seen a zebra. The principle of compositionality, according to which the meaning of the whole is a function of the meaning of its parts has been a central issue in linguistic and philosophical work on the semantics of natural language (e.g., Barwise & Etchemendy 1989; Kamp & Partee 1995; see also Estes & Glucksberg 2000; Osherson & Smith 1981; Wisniewski 1997 for psychological perspectives). Consistent with our view, linguistic analyses suggest that perceptual representations play a major role in the interpretation of particular adjective noun combinations. For instance, the adjective round can be used to describe objects of various shapes and geometries (e.g., a ring, a plate, a table, a bubble, a head etc.), which at first sight seems to suggest that the meaning of round is highly context dependent or in other words that round is a polysemous adjective. However, the flexibility of round can be explained without using polysemy if perceptual representations are taken into account (Lessmoellmann 2000). Specifically, round accesses shape parameters that according to Biederman's (1987) recognition-by-components theory are encoded in the 3-D model of an object and are computed during object recognition. As such, round denotes the shape of the boundary of an object's cross section and specifies it as having no vertices. Accordingly, the property of being round can be attributed to an object by using the adjective round if and only if, the representation of the object has a cross-section the boundary of which in principle can be round. This explains our ability to use round to describe all the different kinds of objects mentioned above. This also explains why we, for example, cannot use the expression the rope is round to describe a situation in which a rope is laid out on the floor so that it forms a circle, or why we cannot combine the adjective round with objects that are typically conceptualized as one dimensional, as for instance streets or trails. 6.0. Claim 4: Overlapping brain areas 6.1. Neuronal assemblies Areas in the left frontal and the left temporal lobes have traditionally been viewed as the sites of language processing, as is still the case (e.g., Binder et al. 1997). Although the involvement of these areas in language processing is undeniable, recent brain imaging studies are beginning to show that language processing extends far beyond the boundaries of Broca's and Wernicke's areas. For example, a recent PET study found that threat words, such as destroy and mutilate presented as part of a modified Stroop task, activated bilateral amygdalar regions to a greater extent than do neutral control words (Isenberg, et al. 1999). The amygdala's role in emotional processing is well documented (e.g., LeDoux 1995). In addition, activation was found in sensory-evaluative and motor-planning areas, areas that are normally activated when the organism senses danger. This is all the more noteworthy given that the subjects ostensive task was not comprehending words, but naming the color of their ink. Pulvermueller (1999) proposed a Hebbian model of word recognition that accommodates findings such as these. The perception of a word activates assemblies of neurons located throughout the brain. For example, some action verbs will activate parts of the motor cortex, whereas animal nouns will activate parts of the visual cortex. In their commentary to the Pulvermueller article, Posner and DiGirolamo (1999), while in general agreement with Pulvermueller's view, argued that it is too rigid. They argued that which parts of the assembly will be activated depends on the semantic and task context in which the word is processed. This is consistent with the behavioral literature, (e.g., see the tomato example of McKoon & Ratcliff 1988 discussed in section 4.3). Thus, a context-sensitive version of the assembly model is a useful working model from which to develop a brain-based model of discourse comprehension. In this model, reading or hearing a word activates linguistic (lexical, grammatical, phonological) representations as well as associated nonlinguistic information (motor representations in the case of action verbs, visual representations in the case of object nouns, emotional representations in the case of emotion adjectives, and so on). This clearly implies that language-based activation is not restricted to the language areas of the brain. 6.2. The role of language cues Taking Pulvermueller's proposal one step further, we suggest that language cues influence the nature of the pattern of activation. For example, as discussed earlier, our behavioral experiments (Zwaan & Stanfield, submitted) show that "The teacher swept the floor" is less likely to activate the concept of "BROOM" than "The teacher began sweeping the floor". Extrapolating from these findings, one might predict that the simple past tense version is less likely than the past progressive to activate visual and motor areas of the brain associated with the action of sweeping and its instrument. Thus, not only the semantic context, but also the particular form of the linguistic expression in which a word occurs might affect the nature of the pattern of activation it produces in the brain. 6.3. Neuroimaging of discourse comprehension Several recent neuroimaging studies of discourse processing show activation of non-language areas during discourse comprehension that are consistent with the claim that language comprehension engages more than the traditional language areas of the brain. We discuss these studies along the lines of our taxonomy of processing cues. As there is to date no research on profiling cues, we focus on research on construction and integration cues. 6.3.1. Neuroimaging of construction processes. Mellet et al. (1996) conducted a PET study to examine regional cerebral blood flow changes as subjects constructed mental images from verbal input. In the experimental condition, subjects began with the mental visualization of a single cube. The verbal input consisted of (French) prepositions (up, down, left, right, front, back) specifying the relative positions of subsequent imagined blocks that the subjects used to assemble a mental image of a three-dimensional object that would ultimately consist of 12 blocks. Thus, this study explicitly instructed subjects to use prepositions as cues to construct a situation model. Relative to a passive listening condition (to nonspatial words) and relative to a resting condition, activation was found in regions that make up the dorsal route of spatial processing (superior occipital and parietal regions), as has been shown to be activated in the spatial processing of external visual stimuli (Mishkin, Ungerleider, & Macko 1983). Thus, verbal input in the form of spatial prepositions produced, in the context of the imagery task, activation in brain areas that subserve visual spatial processing. Carpenter, Just, Keller, Eddy, and Thulborn (1999), in an event-related fMRI experiment, found that a sentence comprehension task activated brain regions known to be activated during spatial processing, such as mental rotation (areas in the left parietal lobe around the intraparietal sulcus). Carpenter et al. presented subjects with sentences such as "The star is (not) above the plus" after which the subject pressed a button, which triggered the presentation of a visual display (e.g., a picture of a star above a plus) to which the subjects responded with a TRUE/FALSE response by pressing one of two buttons. The event-related paradigm enabled Carpenter et al. to draw some conclusions about the time course of activation: the parietal areas were activated while subjects were reading the sentences (rather than merely during picture verification). In addition, they found activation in the right posterior temporal lobe, which they interpreted as evidence that the pictorial referents of the sentences were activated. Mellet, et al. (2000), in an fMRI study, found that mental images of spatial layouts derived from pictures or verbal descriptions engaged the same visual mechanisms during an imagery task. In both cases there was bilateral activation of superior occipito-parietal areas, which might reflect the spatial processing required for the task and activation of the right inferior temporal gyrus, which is thought to subserve the formation of complex images, i.e., construction in our terminology. Although these results are interesting, it should be noted that the tasks were hardly naturalistic. Thus, the most prudent assessment of these results is probably that they show that brain areas used in spatial processing and imagery can be engaged during language comprehension. However, whether this occurs during spontaneous comprehension cannot be determined from these findings. 6.3.2. Neuroimaging of integration processes. In order to isolate the brain structures that subserve the integration of situation models, comparisons have to be made that control for basic language processing. One way to do this is by contrasting conditions that do not promote or allow for integration with conditions that do. Several recent neuroimaging studies have attempted to use this paradigm (Fletcher, et al. 1995; Maguire, Frith, & Morris 1999; Robertson, et al. 2000; St. George, Kutas, Martinez, & Sereno 1999). A tentative conclusion from these studies is that integration of information derived from verbal input involves brain areas traditionally not associated with language, e.g., areas in the right hemisphere. This conclusion has to be tentative considering that the fMRI and PET methodology puts constraints on experimental designs that would perhaps not be acceptable in experimental psychology and because of the small number of relevant studies. On the upside, the conclusion is consistent with various brain lesion studies showing that that patients with right-hemisphere damage have comprehension problems at the level of discourse, as well as with experiments in which hemispheric activation is assessed by presenting verbal stimuli to the right or left visual field (see Beeman 1998 for an extensive review). 7.0. Claim 5: What is the role for a record of the verbal input? 7.1 When are records of the verbal input useful? If the situation model plays such an essential role in comprehension, is there a need to maintain the assumption that the brain for some amount of time maintains a record of the verbal input? There are arguments why it is. A verbal representation is useful in the face of indeterminacy. That is, when the verbal input does not sufficiently constrain the range of simulations to be made, comprehenders tend to rely on an uninterpreted representation of the verbal input until sufficient constraining information has been processed (Mani & Johnson-Laird 1982). 7.2. Evidence from brain lesion studies Recent brain lesion evidence supports a dissociation between memory for verbal input and situational information. Romani and Martin (1999) tested a subject, AB, who was operated for a left-frontal haematoma, which produced low density regions in the postero-lateral left frontal lobe and in the adjacent anterior parietal lobe. Whereas AB showed preserved abilities regarding the long-term retention of nonverbal visual information, he was impaired in the long-term memory of words, behaving at the level of amnesic control subjects. Interestingly, despite this handicap, AB exhibited a performance in the normal range with respect to comprehension of and memory for stories. His memory for individual sentences was below the normal range, however. This pattern seems consistent with the view that the language areas in the left hemisphere are involved in maintaining a record of the verbal input. When these areas are damaged, recall of verbal information is greatly impaired. However, the fact that the ability to form situation models from verbal input (as well as from pictorial input) was preserved, suggests that the maintenance of verbal information is not necessary to construct situation models (from simple stories at least). Verbal reports from spatial-neglect patients also suggest that language-like amodal propositions are not used by the brain to represent referential meaning. These patients show in their descriptions of complex objects or spatial layouts from a specific vantage point an impaired representation of the side contralateral to the lesion and show impairment of the other side when the perspective is reversed (Barbut & Gazzaniga 1987; Bisiach 1988). Bisiach reports about a patient who, when asked to point to the right side of his bed responded adequately, but when asked to point to the left side of his bed answered after some hesitation: "If this is the right side [the external, right surface of the right bed rail], this [the inner, left surface of the same bed-rail] must be the left!" (Bisiach 1988, p. 478). As Bisiach notes (p. 466), the surprising aspect of findings such as these is not so much that people use spatial representations, but rather that language apparently was unable to fill the gap in the spatial representation. This is consistent with the idea that language is used to convey referential meaning, but that language-like amodal propositions are not used to represent it mentally. Referential meaning is critically dependent upon input from experiential representations. 8. Conclusions and outlook 8.1. Summary We have proposed a view of language comprehension as "guided experience," a variation on Neisser's (1967, p. 136) characterization of language comprehension as "externally guided thinking." Not only is language itself built upon a perceptual foundation, but the comprehension processes it engenders are largely the same processes that are activated by direct experience. We always experience reality from a specific vantage point: our own. Language allows a speaker or writer to convey an interpretation of a situation at time X and place Y to a hearer or reader at time X' and place Y'. The verbal message provides the hearer/reader with a set of cues to perceptually reconstruct the sender's interpretation. We have adduced evidence from a range of fields that is consistent with this view, including cognitive linguistics, cognitive psychology, and cognitive neuroscience. In the first section, we made five assumptions on which our view of language comprehension as guided experience is based: 1. Comprehending language is based on how we experience our environment. 2. Language comprehension involves perceptual symbols. 3. Linguistic expressions are cues to the mind/brain as to how to construct situation models. 4. The construction of situation models based on verbal input involves the same brain areas that are involved in the construction and maintenance of experience-based situation models. 5. The mind/brain may keep an uninterpreted record of the verbal input, but this is not critical for comprehension once an experiential referential representation has been established. The first assumption is supported by considerations about the phylogenetic and ontogenetic development of language and by analyses of the history of language itself. In addition, it is supported by a large number of experiments on situation models. The second claim has recently become the topic of empirical studies and has already received some direct support, but more is clearly needed. The third assumption is supported by many careful linguistic analyses, as well as by several language comprehension experiments, which show that relatively subtle linguistic differences can greatly affect the nature of mental representations. There is also support from neuroimaging research. The fourth assumption is beginning to receive support from neuroimaging studies have implicating many areas hypothesized to be part of the executive suite (sect. 3.2.) in the construction and maintenance of language-derived situation models. The fifth assumption is supported by brain-lesion studies, as well as by text-comprehension experiments. Although the argumentation would have to be rather contorted, it could be argued that an amodal propositional system could account for many of these findings. For example, the fact that brain areas involved in perceptual processing are activated during language processing does not in and of itself mean that language processing involves perceptual symbols. Without any further evidence, it cannot be ruled out a priori that amodal propositions are activated in those areas (Zwaan, Stanfield, & Madden 1999). However, we now have initial empirical evidence that language comprehension involves perceptual symbols (sect. 4.1.), although more evidence is clearly needed. 8.2. Experience vs. language comprehension Throughout this article, we have regarded language comprehension as guided experience. The operative word here is "guided." We do not want to equate language comprehension with experience. There are at least four ways in which language comprehension is not identical to experience. First, experiential representations constructed from verbal input are less determined by the actual input than are representations constructed in direct experience. Thus, when constructing experiential representations during language comprehension, we are, for instance, more free to deliberately abstract or schematize than we are in direct experience. As a consequence, language comprehension will often (though perhaps not always) produce a lower resolution than direct experience. Second, language comprehension often involves the construction of an experiential representation of a situation that is not the comprehender's physical situation (manuals, tutorials, and drug prescriptions are exceptions). Thus, in language comprehension, the comprehender actually has to suppress the perceptual input in order to form a language-based experiential representation (Glenberg 1997). More often than not, such suppression is not needed in experience. A notable exception is when verbal input has to be suppressed when a complex task has to be performed (e.g., driving in a large and unfamiliar city). Third, language comprehenders are not free to choose a perspective or focus on the situation, as we are, to some extent, in real life. The perspective and focus are provided to us by the speaker/writer. As a consequence of this, language-based experiential representations are often less ambiguous or in other words interpreted to a stronger degree than are representations constructed in direct experience. A fourth difference is that the processes by which the representations are constructed are quite different. It is conceivable that these differences are in some way reflected in the resulting representations. In fact, research on reality monitoring has shown that subjects' ability to distinguish between different sources from which information was acquired is mainly a factor of the operations involved inthe construction of a particular representation (e.g., Johnson, Raye, Foley, & Foley 1981). Given these restrictions, it does not make sense to equate language comprehension with experience. There often is less detail, there is competition from actual perceptual representations, and we have less freedom to perceptually explore the situation. It is, thus, more appropriate to view language comprehension as guided experience. In addition, it may prove beneficial in understanding the nature of language-derived representations to further elucidate this distinction empirically. Given the many similarities between language representations and perceptual event representations presented here, future research aimed at revealing the differences may provide a better understanding of language comprehension. 8.3. Potential problems for a perceptual view 8.3.1. What about nonnarrative language? Our focus has been on narratives. As mentioned (sect. 1.1.), there are good reasons for this. Narratives are the most "natural" discourse genre. Every culture has produced them and they constitute the first genre that children are exposed to. People spontaneously produce narratives, whereas they, whether in grade school or in academia, only produce non-narrative texts under some pressure (publish or perish). However, if we view language comprehension as guided experience, then this view should extend to non-narrative genres. The experience-based view lends itself easily for explaining procedural text performance (Glenberg & Roberston 1999). Procedural texts are sets of instructions to the comprehender to physically carry out actions and the referents are actual objects in the comprehender's immediate environment: If this doesn't work either, simultaneously hold down the Control, Alt, and Delete keys, Please sign on the dotted line, Connect Panel A to Panel B using the 3/4" screws. However, the matter seems less straightforward for expository text. For example, how does one construct an experience-based representation of an article such as this? It has been noted very often that most expository texts are replete with perceptual metaphors. For example, in this article, we talked about the construction of mental representations. Thus, one might create an experience-based representation of an edifice being built, or, more likely in the case of cognitive scientists, of a network to which nodes and links are incrementally added. Lakoff and Nunez (1998) show how even mathematical concepts such as sets and functions can be traced back to everyday bodily experience. Goldstone and Barsalou (1998) discuss other relevant evidence supporting the notion of a perceptual foundation for abstract concepts. There is also evidence that people can represent seemingly abstract relations such as ownership in situation models (Radvansky, Wyer, Curiel, & Lutz 1997). Nonetheless, the question remains as to whether people routinely construct experiential simulations when they read expository text. We would argue that this is the case as long as the referents are known. Often, when we read an expository text outside of our domain of expertise, we might feel as if we are trapped in an experiment reading Bransford & Johnson's (1973) "washing clothes" passage without the title. In such cases, where no experiential simulation can be run, no real understanding takes place (that is, there is no indexing, Glenberg & Robertson, 1999). All the comprehender comes away with may be some rudimentary simulations and a memory record of the verbal input. This is why we often need to reread scientific passages. It takes a great deal more effort to construct experiential simulations than when we read narratives. 8.3.2. What happens during shallow processing? Prima facie, another potential problem for the experience-based view would be to explain what goes on during fast and shallow processing (McKoon & Ratcliff 1992), although this type of processing might not constitute comprehension per se as it does not necessarily involve an "effort after meaning" (Graesser, Singer, & Trabasso 1994) which, as a task analysis suggests (Garnham & Oakhill 1996), is the goal of comprehension. Nonetheless, it would seem incumbent upon an experience-based view to explain what is going on during this type of processing. After all, it would be unparsimonious to claim that no experience-based representations are activated during shallow processing. Fortunately, an experience-based account of shallow processing is quite straightforward. Shallow processing is characterized chiefly by a lack of integrative processes. However, this does not lead to the conclusion that no perceptual symbols are being activated and used. What it means is that those perceptual symbols are not consistently integrated in simulations. Thus, shallow processing poses no problem for our view of language comprehension as guided experience. It would not be much different from processing our environment in a shallow way (e.g., walking to the library while lost in thought). 8.4. Conclusion Our proposal has several advantages over theories that view comprehension as the construction of networks of amodal propositions. First, it is consistent with a range of findings from historical linguistics to neuroscience and is able to give a coherent account of these disparate findings. Second, it is able to account for findings that the amodal propositional view cannot account for, or would not have predicted. Third, it provides a more natural account of the comprehension process than do amodal propositional theories. If one does a task analysis of comprehension--ignoring for the moment that one would have to do such an analysis for different text genres and different reader goals--it becomes clear that we read or listen to discourse because we want to learn something about real or fictional events that took place in a different place and/or time--(Garnham & Oakhill 1996). As Garnham and Oakhill argue, most psychologists have evidently failed to do such a task analysis, or else they would have realized that constructing coherent representations of the textual input is not normally the goal of comprehension. It is certainly not why language ability developed. It would be quite absurd to claim that there was selection pressure for language processing if its sole purpose was to construct mental representations of the verbal input. However, it does make sense to assume that language-based experiential simulations developed because there was selective pressure to coordinate actions and relate events across time and space. It is, thus, also quite natural to view the goal of language comprehension to be the vicarious experiencing of events from a different time/place. For these reasons, we think the notion of language comprehension as guided experience will prove to be a useful searchlight in the quest to understand the phenomenal human accomplishment of language. Author note. We thank Anders Ericsson, Stephanie Kelter, Leo Noordman, Mike Rinck, Jos van Berkum, & Jeff Zacks for very helpful comments on an earlier draft of this article. However, they are not necessarily endorsing all the views espoused here nor are they responsible for any errors or omissions. References Anderson, J. R. (1983) The Architecture of Cognition. Harvard UP. Anderson, A., Garrod, S. C., & Sanford, A. J. (1983) The accessibility of pronominal antecedents as a function of episode shifts in narrative text. Quarterly Journal of Experimental Psychology 35A: 427-440. Avrahami, J., & Kareev, Y. (1994) The emergence of events. Cognition 53:239-261 Barsalou, L.W. (1999) Perceptual Symbol Systems. Behavioral and Brain Sciences 22:577-660. Barwise, J. & Etchemendy, J. (1989) Model-theoretic semantics. In: Foundations of Cognitive Science, ed. M. I. Posner. MIT Press. Biederman, I. (1987). Recognition-by-components: A theory of human image understanding. Psychological Review 94:115-147. Binder, J.R., Frost, J.A., Hammeke, T.A., Cox, R.W., Rao, S.M., & Prieto, T. (1997) Human brain language areas identified by functional magnetic resonance imaging. The Journal of Neuroscience 17:353-362. Bisiach, E. (1988) Language without thought. In: Thought Without Language, ed. L. Weiskrantz. Clarendon Press. Black, J. B., Turner, E., & Bower, G. H. (1979) Point of view in narrative comprehension memory. Journal of Verbal Learning and Verbal Behavior 18:187-198. Bower, G. H., Black, J. B., & Turner, T. J. (1979) Scripts in memory for text. Cognitive Psychology 11:177-220. Bower, G. H., & Morrow, D. G. (1990). Mental models in narrative comprehension. Science 247:44-48. Bowerman, M. (1996) Learning how to structure space for language. In: Language and Space, eds. P. Bloom, M.A. Peterson, L. Nadel, & M.F. Garrett. MIT Press. Bransford, J. D., Barclay, J. R., & Franks, J. J. (1972) Sentence memory: A constructive versus interpretive approach. Cognitive Psychology 3:193-209. Bransford, J. D., & Johnson, M. K. (1973) Consideration of some problems of comprehension. In: Visual Information Processing, ed. W. G. Chase. Academic Press. Bryant, D. J., Tversky, B. & Franklin, N. (1992) Internal and external spatial frameworks for representing described scenes. Journal of Memory and Language 31:74-98. Cabeza, R., & Nyberg, L. (2000) Imaging cognition II: an empirical review of 275 PET and fMRI studies. Journal of Cognitive Neuroscience 12:1-47. Carpenter, P.A., Just, M.A., Keller, T.A., Eddy, W.F., Thulborn, K.R. (1999) Time course of fMRI-activation in language and spatial networks during sentence comprehension. NeuroImage 10:216-224. Carreiras, M., Carriedo, N., Alonso, M. A., & Fernandez, A. (1997) The role of verbal tense and verbal aspect in the foregrounding of information in reading. Memory & Cognition 23:438-446. Clark, E. V. (1971) On the acquisition of the meaning of "before" and "after." Journal of Verbal Learning and Verbal Behavior 10:266-275. Clark, H. H. (1970) The primitive nature of children's relational concepts. In Cognition and the Development of Language, ed. J. Hayes. John Wiley. Clark, H.H., & Clark, E.V. (1977) Psychology and Language: An Introduction to Psycholinguistics. Harcourt Brace Jovanovich. Croft, W. (1998) The structure of events and the structure of language. In: The New Psychology of Language: Cognitive and Functional Approaches to Language Structure, ed. M. Tomasello. Erlbaum. Damasio, A.R. (1999) The Feeling of What Happens: Body and Emotion in the Making of Consciousness. Harcourt Brace & Company. Donald, M. (1991) Origins of the Modern Mind: Three Stages in the Evolution of Culture and Cognition. Harvard UP. Donald, M. (1998) Mimesis and the executive suite. In: Approaches to the Evolution of Language: Social and Cognitive Bases, eds. J.R. Hurford, M. Studdert-Kennedy, & C. Knight. Cambridge UP. Dowty, D. R. (1986) The effects of aspectual class on the temporal structure of discourse: Semantics or pragmatics? Linguistics and Philosophy 9:37-61. Duchan, J.F., Bruder, G.A., & Hewitt, L.E., eds. (1985) Deixis in Narrative: A Cognitive Science Perspective. Erlbaum. Estes, Z. & Glucksberg, S. (2000) Interactive property attribution in concept combination. Memory & Cognition 28:28-34. Fletcher, P.C., Happe, F., Frith, U., Baker, S.C., Dolan, R.J., Frackowiak, R.S.J., & Frith, C.D. (1995) Other minds in the brain: a functional imaging study of "theory of mind" in story comprehension. Cognition 57:109-128. Freyd, J. J. (1992). Five hunches about perceptual processes and dynamic representations. In: Attention and Performance XIV. Synergies in Experimental Psychology, Artificial Intelligence, and Cogntive Neuroscience, eds. D. E. Meyer & S. Kornblum. MIT Press. Franklin, N., & Tversky, B. (1990) Searching imagined environments. Journal of Experimental Psychology: General 119:63-76. Garnham, A., & Oakhill, J. V. (1996) The mental models theory of language comprehension. In: Models of Understanding Text, eds. B. K. Britton & A. C. Graesser. Erlbaum. Gernsbacher, M. A. (1990) Language Comprehension as Structure Building. Erlbaum. Gernsbacher, M. A., & Shroyer, S. (1989) The cataphoric use of the indefinite this in spoken narratives. Memory & Cognition 17:536-540. Gerrig, R. J. (1993). Experiencing Narrative Worlds. New Haven: Yale UP. Glenberg, A.M. (1997) What memory is for. Behavioral and Brain Sciences 20:1-55. Glenberg, A. M., Meyer, M., & Lindem, K. (1987) Mental models contribute to foregrounding during text comprehension. Journal of Memory and Language 26:69-83. Glenberg, A. M. & Robertson, D. A. (1999) Indexical understanding of instructions. Discourse Processes 28:1-26. Gibson, J.J. (1979) The Ecological Approach to Visual Perception. Houghton Mifflin. Givon, T. (1970) Notes on the semantic structure of English adjectives, Language, 46:816-837. Givon, T. (1992) The grammar of referential coherence as mental processing instructions. Linguistics 30:5-55. Goldberg, A. (1998) Patterns of experience in patterns of language. In: The New Psychology of Language: Cognitive and Functional Approaches to Language Structure, ed. M. Tomasello. Erlbaum. Goldstone, R.L., & Barsalou, L.W. (1998) Reuniting perception and cognition. Cognition, 65:231-262. Graesser, A. C., Millis, K. K., & Zwaan, R. A. (1997) Discourse comprehension. Annual Review of Psychology, 48:163-189. Graesser, A. C., Singer, M., & Trabasso, T. (1994) Constructing inferences during narrative text comprehension. Psychological Review 101:371-395. Grimes, J. (1975). The Thread of Discourse. The Hague: Mouton. Habel, C. (1986) Prinzipien der Referentialit�t. Springer. Hangkamer, J., & Sag, I. A. (1976) Deep and surface anaphora. Linguistic Inquiry 7:391-428. Harnad, S. (1990) The symbol grounding problem. Physica D 42:335-346. Heine, B. (1997) Cognitive Foundations of Grammar. Oxford UP. Hockett, C.F. (1959) Animal 'languages' and human language. Human Biology 31:32-39. Isenberg, N., Silbersweig, D., Engelien, A., Emmerich, K., Malavade, K., Beati, B., Leon, A.C., & Stern, E. (1999) Linguistic threat activates the human amygdala. Proceedings of the National Academy of Sciences, 96:10456-10459. Jackendoff, R. (1983) Semantics and Cognition. MIT Press. Johnson, M. (1987) The Body in the Mind: The Bodily Basis of Meaning, Imagination, and Reason. University of Chicago Press. Johnson, M. K., Raye, C. L,, Foley, H. J., & Foley, F. A. (1981) Cognitive operations and decision bias in reality monitoring. American Journal of Psychology 94:37-64. Johnson-Laird, P. N. (1983) Mental Models: Towards a Cognitive Science of Language, Inference, and Consciousness. Harvard UP. Johnson-Laird, P.N. (1996) Mental models. In Models of Visuospatial Cognition (Counterpoints), eds. M. de Vega, M.J. Intons-Peterson, & P.N. Johnson-Laird. Oxford UP. Johnson-Laird, P.N., Herrmann, D.J., & Chaffin, R. (1984) Only connections: A critique of semantic networks. Psychological Bulletin, 96:292-315. Kamp, H. (1981) A theory of truth and semantic representation. In: Formal Methods In the Study of Language, eds. J. Groenendijk, T., Janssen, & M. Stokhof. Amsterdam Mathematisch Centrum. Kamp, H. & Partee, B. (1995) Prototype theory and compositionality. Cognition, 57:129-191. Kaup, B., Kelter, S., & Habel C. (submitted) Representing referents of plural expressions and resolving plural anaphors. Kelter, S. (in press) Mentale Modelle. In: Psycholinguistik - Psycholinguistics, eds. G. Rickheit, T. Herrmann, & W. Deutsch. De Gruyter. Kintsch, W. (1998) Comprehension: A Paradigm for Cognition. Cambridge UP. Kintsch, W., & van Dijk, T. A. (1978) Toward a model of text comprehension and production. Psychological Review 85:363-394. Klatzky, R.L., Pellegrino, J.W., McCloskey, B.P., & Doherty, S. (1989). Can you squeeze a tomato? The role of motor representations in semantic sensibility judgments. Journal of Memory and Language 28:56-77. Lakoff, G. (1987) Women, Fire, and Dangerous Things: What Categories Reveal About the Mind. University of Chicago Press. Lakoff, G., & Johnson, M. (1980) Metaphors We Live By. Chicago UP. Lakoff, G., & Nunez, R. (1998) Conceptual metaphor in mathematics. In: Discourse and Cognition: Bridging the Gap, ed. J.P. Koenig. CSLI Publications. Langacker, R. (1986) Foundations of Cognitive Grammar (Vol. 1). Stanford UP. LeDoux, JE (1995) Emotion: Clues from the Brain. Annual Review of Psychology 46: 209-235. Lessmoellmann, A. (2000) Der Ball ist rund: Formadjektive und Objektkonzepte. In: R�umliche Prozesse und sprachliche Strukturen, eds. C. Habel & C. von Stutterheim. Niemeyer. MacDonald, M.C., & Just, M.A. (1989) Changes in activation levels with negation. Journal of Experimental Psychology: Learning Memory, & Cognition 15:633-642. MacWhinney, B. (1999) The emergence of language from embodiment. In: The Emergence of Language, ed. B. MacWhinney. Erlbaum. Magliano, J.P., & Schleich, M.C. (2000) Verb aspect and situation models. Discourse Processes 29:83-112. Maguire, E.A., Frith, C.D., and Morris, R.G.M. (1999) The functional neuroanatomy of comprehension and memory: the importance of prior knowledge. Brain 122:1839-1850. Mandler, J. M. (1986) On the comprehension of temporal order. Language and Cognitive Processes 1:309-320. Mani, K., & Johnson-Laird, P. N. (1982) The mental representation of spatial descriptions. Memory & Cognition 10:181-187. McKoon, G., & Ratcliff, R. (1988) Contextually relevant aspects of meaning. Journal of Experimental Psychology: Learning, Memory, and Cognition 4:331-343. McKoon, G., & Ratcliff, R. (1992) Inference during reading. Psychological Review 99:440-466. McKoon, G., Ward, G., Ratcliff, R. & Sproat, R. (1993) Morphosyntactic and pragmatic factors affecting the accessibility of discourse entities. Journal of Memory and Language 32:56-75. Mellet, E., Tzourio, N., Crivello, F., Joliot, M., Denis, M., & Mazoyer, B. (1996) Functional anatomy of spatial mental imagery generated from verbal instruction. The Journal of Neuroscience 16:6504-6512. Mellet, E., Tzourio-Mazoyer, N., Bricogne, S., Mazoyer, B.. Kosslyn, S.M., & Denis, M. (2000) Functional anatomy of high-resolution visual mental imagery. Journal of Cognitive Neuroscience 12:98-109. Miller, G. A., & Johnson-Laird, P. N. (1976). Language and Perception. Harvard UP. Mishkin, M., Ungerleider, L.G., Macko, K.A. (1983) Objects vision and spatial vision: Two cortical pathways. Trends in Neuroscience 6:414-417. Morrow, D. G., Bower, G. H., & Greenspan, S. L. (1989) Updating situation models during narrative comprehension. Journal of Memory and Language 28:292-312. Morrow, D. G., & Clark, H. H. (1989) Interpreting words in spatial descriptions. Language and Cognitive Processes 3:275-291. Morrow, D. G., Greenspan, S. L., & Bower, G. H. (1987) Accessibility and situation models in narrative comprehension. Journal of Memory and Language 26:165-187. M�nte, T.F., Schiltz, K., & Kutas, M.(1998) When temporal terms belie conceptual order. Nature 395:71 - 73. Myers, J. L., & O'Brien, E. J. (1998) Accessing the discourse representation during reading. Discourse Processes 26:131-157. Neisser, U. (1967) Cognitive Psychology. Prentice-Hall. Nelson, K. (1996) Language in Cognitive Development: Emergence of the Mediated Mind. Cambridge University Press. Newtson, D., Engquist, G., & Bois, J. (1977) The objective basis of behavior units. Journal of Personality and Social Psychology 35:847-862. O'Brien, E. J., & Albrecht, J. E. (1992) Comprehension strategies in the development of a mental model. Journal of Experimental Psychology: Learning, Memory, and Cognition 18 777-784. Osherson, D. N. & Smith, E. E. (1981) On the adequacy of prototype theory as a theory of concepts. Cognition 11:237-262. Paivio, A. (1971) Imagery and Verbal Processes. Holt, Rinehart, and Winston. Perfetti, C.A., & Britt, M.A. (1995) Where do propositions come from? In: Discourse Comprehension: Essays in Honor of Walter Kintsch, eds. C.A. Weaver, S. Mannes, S., & C.R. Fletcher. Erlbaum. Popper, K.R. (1985) Objective knowledge: An Evolutionary Approach (Rev. Ed.). Clarendon Press. Posner, M.I., & DiGirolamo (1999) Flexible neural circuitry in word processing. Behavioral and Brain Sciences 22:299-300. Pulvermueller (1999) Words in the brain's language. Behavioral and Brain Sciences 22:253-270. Radvansky, G. A., Wyer, R. S., Curiel, J.C., & Lutz, M. F. (1997) Situation models and abstract ownership relations. Journal of Experimental Psychology: Learning, Memory, and Cognition 23:1233-1246. Rall, J., & Harris, P.L. (2000) In Cinderella's slippers? Story comprehension from the protagonist's point of view. Developmental Psychology 36:202-208. Rensink, R.A., O'Regan, J.K., & Clark, J.J. (1997) To see or not to see: The need for attention to perceive change in scenes. Psychological Science 8:368-373. Rinck, M., & Bower, G. H. (1995) Anaphora resolution and the focus of attention in situation models. Journal of Memory and Language 34:110-131. Rinck, M., & Bower, G. H. (in press) Temporal and spatial distance in situation models. Memory & Cognition. Robertson, D.A., Gernsbacher, M.A., Guidotti, S.J., Roberston, R.W., Irwin, W., Mock, B.J., & Campana, M.J. (2000) Functional neuroanatomy of the cognitive process of mapping during discourse comprehension. Psychological Science 11:255-260. Romani, C., & Martin, R. (1999) A deficit in the short-term retention of lexical-semantic information: Forgetting words but remembering a story. Journal of Experimental Psychology: General 128:56 - 77. Sanford, A. J., & Garrod, S. C. (1981) Understanding written language: Explorations in comprehension beyond the sentence. Wiley. Sanford, A. J., & Garrod, S. C. (1998) The role of scenario mapping in text comprehension. Discourse Processes 26:159-190. St. George, M., M. Kutas, A. Martinez, and M.I. Sereno (1999) Semantic integration in reading: Engagement of the right hemisphere during discourse processing. Brain 122:1317-1325. Simons, D.J., & Levin, D.T. (1997) Change blindness. Trends in Cognitive Science 1:261-267. Stanfield, R.A., & Zwaan, R.A. (in press) The effect of implied orientation derived from verbal context on picture recognition. Psychological Science. Svorou, S. (1994) The grammar of Space. Benjamins. Talmy, L. (1988) Force dynamics in language and cognition. Cognitive Science 12:49-100. ter Meulen, A.G.B. (1995) Representing Time in Natural Language: The Dynamic Interpretation of Tense and Aspect. MIT Press. Trabasso, T., & Suh, S. (1993) Understanding text: Achieving explanatory coherence through on-line inferences and mental operations in working memory. Discourse Processes 16:3-34. van Berkum, J.J.A., Hagoort, P.M. & Brown, C.M. (1999). Semantic integration in sentences and discourse: Evidence from the N400. Journal of Cognitive Neuroscience 11: 657-671. van Dijk, T. A., & Kintsch, W. (1983) Strategies of Discourse Comprehension. Academic Press. Vendler, Z. (1967) Linguistics in Philosophy. Cornell UP. Verkuyl, H.J. (1972) On the Compositional Nature of the Aspects. Reidel. Wisniewski, E. J. (1997) When concepts combine. Psychonomic Bulletin & Review 4:167-183. Wynn, K. (1996) Infants' individuation and enumeration of actions. Psychological Science 7:164-169. Zacks, J., & Tversky, B. (in press) Event structure in perception and cognition. Psychological Bulletin. Zwaan, R. A. (1996) Processing narrative time shifts. Journal of Experimental Psychology: Learning, Memory, and Cognition 22:1196-1207. Zwaan, R.A. (1999) Situation models: The mental leap into imagined worlds. Current Directions in Psychological Science 8:15-18. Zwaan, R.A., Madden, C.J., & Whitten, S.N. (in press) The presence of an event in the narrated situation affects its activation. Memory & Cognition. Zwaan, R. A., & Radvansky, G. A. (1998) Situation models in language comprehension and memory. Psychological Bulletin 123:162-185. Zwaan, R.A., & Stanfield, R.A. (submitted) Modulating the flow of situational information during comprehension. Zwaan, R.A., Stanfield, R.A., Madden, C.J. (1999) Perceptual symbol systems: Can an empirical case be made? Behavioral and Brain Sciences 22:636-637. Footnotes 1. Theories of procedural semantics (e.g., Clark & Clark, 1977, Miller & Johnson-Laird, 1976) also treat verbal input as cues, namely as cues to construct a model and compare it to the outside world to obtain a truth value. Though similar in spirit, webelieve our proposal is different, specifically with respect to the emphasis it places on the experiential aspect of comprehension.