Jacobs, Arthur M. & Grainger, Johnathan (1994) Models of visual word recognition: sampling the state of the art. Journal of Experimental Psychology: Human Perception and Performance 20 (6) 1311-1334. 

Models Of Visual Word Recognition: Sampling The State Of The Art

Arthur M. Jacobs1 & Jonathan Grainger2

1 Cognitive Neuroscience Lab
Brain & Language Group
Centre National de la Recherche Scientifique (C.N.R.S.)
Marseille, France
 
 

2 Laboratoire de Psychologie Expérimentale
Centre National de la Recherche Scientifique (C.N.R.S.)
Université René Descartes
Paris, France
 
 

Short title: Models of visual word recognition
 
 

AUTHOR NOTES

Jonathan Grainger is now at:-
Centre pour la Recherche en Psychologie Cognitive (CREPCO),
Université de Provence,
29 Av. Robert Schumann,
13621 Aix-en-Provence
France

Email: JACOBS@LNF.CNRS-MRS.FR;   Grainger@idf.ext.jussieu.fr
 
 

ABSTRACT

A chart of models of visual word recognition is presented that facilitates formal comparisons between models of different formats. In the light of the theoretical contributions to this special section, sets of criteria for the evaluation of models are discussed, as well as strategies for model construction.

INTRODUCTION
 

This special section on models of visual word recognition appears at a time of theoretical and methodological instability. As in other domains, complex "brain-style" simulation models compete with and try to replace simple "computer-style" stage models (cf. Newell, 1990; Rumelhart, 1989; Roberts & Sternberg, 1993). Notions of nonlinear system dynamics, chaos and complexity theory mix up with traditional notions of information, signal detection, and choice theory. Neoconnectionist models whose role and value for the field have been compared with those constructed during the early growth period of quantum physics conquer the theory market of the field (even though it is legitimate to ask whether the field has yet had its Kepler, Newton, or Mendel, much less its Einstein, Heisenberg, or Schroedinger). New methods of brain imaging and mapping provide a rapidly growing data base that challenges models designed to predict results obtained with traditional techniques of mental chronometry and psychophysics. The issues of model-to-data connection, of model comparability, and testability become central and hotly contended: Where traditional models directly predicted a hit rate, or a choice probability, we now face the problem of connecting simulated cycle times, orthographic error scores, training epochs, or trajectory lengths to standard dependent variables and to a variety of newer performance measures obtained in an increasing number of reading studies using brain mapping techniques (Posner & Carr, 1992). In sum, like in other fields, models and methods in the word recognition literature become more microscopic, more dynamic, and more complex. This trend towards complex models is accompanied by larger trends to question the pillars of scientific research in general, and psychological research in particular. Recent attacks on the classical predicate calculus view of scientific theorizing (Lakoff, 1987), and the enterprise of modeling cognitive processes (e. g. Penrose, 1989; Uttal, 1990), or the critiques of the general scientific strategy of decomposition and localization (Bechtel & Richardson, 1993), and the use of statistical tools as cognitive theories (Gigerenzer & Murray, 1987), are cases in point.

The purpose of this special section is to present a sample of the actual world of models of visual word recognition that reflects the above mentioned general trends, that helps to identify the strengths, weaknesses, and empirical or theoretical lacunae in the current modeling enterprise, and that works to isolate those elements in the orchestra of models that constitute a reasonable candidate set, or cohort, for a future unified theory of basic processes in reading. Ideally, this theory should be able to explain results from different data domains, e. g. behavioral and brain imaging studies using a range of different tasks, different language domains, e. g. shallow and deep orthographies, or alphabetic and syllabic writing systems, and different subject domains, e. g. children, adults, and pathological cases. A side-goal of this special section is to discuss criteria for model evaluation, and strategies for model construction. Much of the motivation for it came from lively discussions among some of the present contributors during a symposium on models of word recognition, organised by the first author within a workshop on multidisciplinary approaches to language processing in Marseille, France (Besson, Courrieu, Frenck-Mestre, Jacobs, & Pynte, 1992).

In what follows, we start with a comparison of past and current models of visual word recognition in the form of a chart, which includes proposals for future standards of model evaluation. We then remark on the distinctive theoretical aspects of the articles in the present volume. In the last section, we discuss strategies for model construction in the light of the present contributions.

COMPARISON OF MODELS AND CRITERIA OF EVALUATION

Models of visual word recognition can be clustered according to many features. One attempt at a comprehensive classification of models of word recognition was based on the fundamental distinction between parallel coding systems vs. lexical instance models (Carr & Pollatsek, 1985). Here we develop a set of more formal criteria for classifying past and current models of word recognition, and for evaluating their descriptive and explanatory adequacy, generality, simplicity and falsifiability, that complements and extends the earlier effort.

One striking difference between models in our field, and models of cognition in general, concerns their format. Some model builders prefer verbal, qualitative accounts of the processes under study, arguing, for example, that mathematical models are purely descriptive, or that algorithmic/simulation models offer little more than an existence proof that a particular computer algorithm or network architecture has the necessary computational power to achieve a successful simulation (see Note 1). However, the majority of researchers contributing to the present special section prefer mathematical (closed-form) and algorithmic formats for their models. This may reflect their belief that purely verbal theories of cognition are much too ambiguous and imprecise and should be replaced by small-scale algorithmic models (Broadbent, 1987; see Note 2), that qualitative descriptions are not falsifiable and are not distinguishable from other qualitative descriptions (Massaro, 1992), or that informal models are inferior to formal models, because they cannot distinguish between the plausible and the implausible (Uttal, 1990). This raises the question to what extent the adequacy of the level of description depends or not on factors such as the extent and quality of the empirical data base (cf. Marr, 1982), and whether the area of word recognition is sufficiently developed for a succesful progression from qualitative to quantitative expression of theoretical concepts, which characterizes all major scientific theories (MacKay, 1988; Weinberg, 1992). The contributions to the present special section provide a mixed answer to this question.

It is important to start with a determination of the formats of models used in this field. This makes it easier to find evaluation criteria that fit different types of models. The simplest classification distinguishes three formats: verbal, mathematical, and algorithmic models. By verbal models we mean any model that is expressed verbally or graphically without making use of closed-form or algorithmic formulations. Mathematical models are those that use closed-form expressions to represent the modeled section of reality. Finally, algorithmic models are all models that are implemented in form of a simulation program, including production systems and neural nets of the localist or distributed families. Surely, each format has its attractions and shortcomings. The verbal or boxes-and-arrows diagram format attracts the expression of creative ideas, when the data base is still too sparse to reasonably constrain more formal models. It also attracts the organisation of results coming from a broad variety of tasks, as evidenced by extant comprehensive models (e. g. Besner & McCann, 1987; Carr & Pollatsek, 1985; Ellis & Young, 1988; Morton & Patterson, 1980). As pointed out by Carr and Pollatsek, the risk that builders of comprehensive, verbal models run is that the proliferation of processing structures (as represented by boxes, for instance) makes the model either too unwieldy to use or too powerful to test.
 

Often mentioned advantages of mathematical and algorithmic models are explicitness, precision, and, perhaps last but not least, the fact that they decrease the model builder's natural trend for accepting comfortable inconsistencies. Besides, psychology like any science needs practical frameworks, ways to turn ideas into calculations. Possible drawbacks of algorithmic models are the dangers that they fossilize thinking and restrict creativity more than verbal models; that they focus the model builder's attention too much on mathematical or implementation details that are irrelevant, and thus obscure the discovery of general principles; that, in the absence of a computational theory, they are nothing more than mimicry (Marr, 1982), or that they cannot explain much if they still have to be explained themselves (Olson & Caramazza, 1991).

The variety in format, scope, and complexity of models of word recognition reflects the creativity of the field, but also raises the issue of how to compare and test these models in a fair and constructive way. Although pluralism of models and methods is useful (Estes, 1991; Stone & van Orden, this issue), the current lack of standards for comparing verbal, mathematical, and algorithmic models cannot serve the enterprise of cognitive theory, while it gives rise to unnecessary polemics and confusion among students and scholars in this field.

A Table Summarizing 15 Models of Visual Word Recognition
 
Family Model For- 

mat 

Task Depend. 

Variab. 

Det 

Prob 

Loc Dist  Mac 

mic 

Mod 

Int 

Ord 

Int 

Perf 

Lear 

Ser 

Par 

Stat 

Dyn 

FE WSE ONE RCE
LOGOGEN & 

MULTI-COMPONENT 

Morton69 

Rum&Sip74 

Colth. et al.77 

PI 

PI 

LD 

PC 

PC 

RTCm 

SERIAL SEARCH & VERIFICATION  Forster76 

Paap et al.82 

LD 

PI,LDT 

RTCm 

PC, RTCm 

DUAL-ROUTE Colth&Rastle94 A LD,N RTCm D L m I I P P D * * * *
RESONANCE, 

INTERACTIVE 

ACTIVATION 

PARALLEL 

DISTRIBUTED 

PROCESSING 

McC&Rum81 

Grossb&Ston86 

Jaco&Graing92 

Norris94 

Kawamoto93 

Seid&McC89 

PI 

LD 

PI,LD 

LD,N 

LD,N 

PC 

PC,RTCm 

RTCm/d 

PC,RTCm 

PC,RTCm 

PC,RTCm 

PARALLEL 

CODING SYTEMS 

Carr&Poll85 

Besn&McCan87 

PI,LD,N 

LD,N 

PC,RTCm 

RTCm 

FUZZY LOGICAL  Massa&Cohe91 M PI PC,RTCm D L M M I P P D - * - -

Table 1. A selective overview of models of visual word recognition.

Table 1 gives a selective overview of models of visual word recognition that have been classified according to a number of features we believe to be relevant. We consider six families of models that partially overlap:

1) Logogen and multicomponent models. This family includes sophisticated guessing, and criterion bias models of the signal detection / statistical decision family (Coltheart, Davelaar, Jonasson, & Besner, 1977; Morton, 1969), which are special cases of the multicomponent model, as formally demonstrated by Rumelhart and Siple (1974).

2) Serial search and verification models (Forster, 1976; Paap, Newsome, McDonald, & Schvaneveldt, 1982).

3) Dual-route models (Coltheart & Rastle, this issue)

4) Resonance, interactive activation, and parallel distributed processing models. As argued by Stone and Van Orden (this issue), the original, deterministic interactive activation model (McClelland & Rumelhart, 1981) represents the prototype of a canonical resonance model. The models of Jacobs and Grainger (1992), and Norris (this volume) represent generalizations of this model. The models by Grossberg and Stone (1986) and Kawamoto (1993) are representative of the adaptive resonance family. The developmental model by Seidenberg and McClelland (1989) is the prototype of a parallel distributed processing model.

5) Parallel coding systems models (Besner & McCann, 1987; Carr & Pollatsek, 1985).

6) Fuzzy logical models (Massaro & Cohen, 1991).

The distinctive features associated with these classes of models are:

1) Format: This feature has the three values mentioned above: verbal (V), algorithmic (A), and mathematical (M). In accord with Estes (1975) the assumption here is that this order (V-A-M) corresponds to different ranks of explicitness.

2) Task: The task feature has three values corresponding to the three main paradigms used in the word recognition literature: The perceptual identification task (PI) and its Reicher variant, the lexical decision task (LD), and the naming task (N). To simplify an already complicated chart, tasks to which a given model can, in principle, be applied were not considered. We only counted tasks yielding data to which a model was directly connected in the original publication, e. g. Forster's (1976) model was connected to data from a table summarizing the effects of word frequency on mean correct "yes" response latencies in the lexical decision task.

3) Dependent Variable: Here we distinguish three dependent variables measured in the above three paradigms: Percent Correct (PC) and RT means and distributions (standard deviations) for correct (RTCm/d) responses. As for feature 2), we do not consider dependent variables that a model can in principle predict, or that were predicted in publications other than the one referred to in this chart, but only those directly referred to in the original tables and/or figures.

4) Simplicity: This feature is comprised of eight binary subfeatures, presented here in an alphabetic order. It is useful to note that hybrid cases are often possible with such binary classifications. Therefore we occasionally had to make choices that can only be considered first-order approximations.

i) Deterministic / Probabilistic (D/P). This does not code whether a model uses a probabilistic decision rule (e. g. Luce's choice rule, 1959) but whether a model can generate different responses for two or more presentations of the same stimulus.

ii) Localist / Distributed (L/D). This codes whether a model uses localist or distributed representations.

iii) Macro / micro (Mm). This codes whether a model has microstructure that allows it to predict performance for individual items.

iv) Modular / Interactive (M/I). This codes the presence/absence of interactivity/recurrent feedback between and/or within different levels of representation. Some models that are weakly or locally interactive are considered (quasi-)modular here. For example, the fact that the Carr and Pollatsek model includes a bidirectional arrow between semantic memory and the phonological decision mechanism (see Figure 1 below) is neglected in view of the fact that the spirit of the model is largely modular. The case of the logogen model is particular. According to the present definition it is not an interactive model but rather an integrative one, much as the fuzzy logical model of perception (Massaro & Cohen, 1991). Another particular case is the hybrid model of Coltheart and Rastle (this volume) which is quasi-modular in the sense that the two processing routes are considered to be independent, but fully interactive, as specified by the authors themselves, as far as the lexical route is concerned. Our choice here was to consider it as interactive. Finally, the model by Seidenberg and McClelland (1989) also poses a problem, because in the original article (p. 526) it is said to be an interactive model, whereas in Seidenberg, Plaut, Petersen, McClelland, and McRae (this issue; p. XXXX) it is considered a "simple feedforward network". Since the nonimplemented model is interactive in its spirit and the implemented model contains a feedback loop from the hidden to the orthographic units, our choice is to classify it as interactive.

v) Ordinal / Interval (O/I). This codes whether a model makes ordinal or interval-scaled predictions that should be observed in the data.

vi) Performance / Learning (P/L). This codes whether a model is a performance and/or a learning model (presence/absence of a learning algorithm).

vii) Serial / Parallel (S/P). This codes the presence/absence of a serial search/verification/decision mechanism.

viii) Static / Dynamic (S/D). This codes the fact that a model explicitly specifies processing and/or learning dynamics in a way allowing predictions about intermediate processing states and the form of information accumulation / activation functions. Thus, models that have "dynamics" allowing them to make predictions only about finishing times and asymptotic accuracy are considered "static" here.

We have called this feature "simplicity", because our working hypothesis is that a DLMMOPSS model (deterministic, localist, macro, modular, ordinal, performance, serial, static) is considerably simpler to describe, comprehend, and test than a PDIMILDP model (probabilistic, distributed, micro, interactive, interval, learning, parallel, dynamic).

6) Effects: This feature has four values corresponding to a selection of extant groups of effects in the word recognition literature, that figure centrally in current debates, and represent which of these effects the model tries to explain. Effects that a given model could, in principle, explain are not taken into account here but only those that the model explicitly tried to explain in its original publication (see Taft, 1991, for a comprehensive discussion of effects that can or cannot be explained by extant models of word recognition).

- FE (frequency effect)

- WSE (word superiority effect)

- ONE (orthographic neighborhood effects). For simplicity, in this column we do not distinguish between facilitatory neighborhood density effects on "yes" RT (Andrews, 1989; 1992; Grainger & Jacobs, 1993b, 1994b; Sears, Hino, & Lupker, 1994; Snodgrass & Mintzer, 1993), inhibitory neighborhood density effects on "no" RT (Coltheart et al., 1977; Grainger & Jacobs, 1993b, 1994b), and inhibitory neighborhood frequency effects on percentage correct and "yes" RT (Grainger, 1990; Grainger & Jacobs, 1993b; 1994b; Grainger et al., 1989, 1992; Segui & Grainger, 1990; Snodgrass & Mintzer, 1993)

- RCE (regularity/consistency effect)

It should be noted that the above formal model classification was a quite delicate operation and that it can only be considered a rough, first-order approximation. We do not want to make any firm claims about its validity, and apologize to authors whose models either did not attract our attention, or who do figure in the chart but may have been misclassified in some respects. The aim of Table 1 is to provide a simplified tool for model comparison, evaluation, and theoretical debate. Further research should attempt to extend or modify it in sensible ways.

Toward Standards For Model Evaluation

The data summarized in Table 1 can be used for model comparisons and for the development of standards of model evaluation that are compatible with standard criteria elaborated in the philosophy of science (e. g. Thagard, 1988), such as descriptive and explanatory adequacy, simplicity and falsifiability, or generality. The fact that so many models compete for the explanation of standard effects in the literature (Table 1) suggests that a major goal of research in this rapidly moving, theoretically aggressive field must be the development of criteria for model evaluation. The following section does not provide criteria strictu sensu, that is, a set of motivated necessary or sufficient conditions for determining which of two hypotheses, models, or theories provides the best explanation of the relevant evidence. This is indeed a very difficult issue, whose solution would require space well beyond the one given to this editorial. Nevertheless, the following paragraphs aim at providing some general guidelines.

Descriptive Adequacy
 

The values corresponding to the format, dependent variable, or static/dynamic and ordinal/interval features in Table 1 can all be used for the evaluation of the descriptive adequacy of models. This has two aspects: the degree of accuracy with which a model can and eventually does predict a data set. The question of how accurately a model of a given format can possibly predict data is important when considering complex models that may either undershoot or overshoot the level of accuracy useful for describing a set of data. As a heuristic inspired by early psychophysics, a model can be said to have potential descriptive adequacy when it allows predictions at the level of scale at which the dependent variable of interest is measured.

Consider the word frequency effect. Any of the three model formats allows predictions about this effect at the level of an ordinal scale. However, can this be considered satisfactory in the case of an effect that varies considerably quantitatively (Forster, this issue). Since our dependent variable is interval-scaled (i. e. mean RT), only those models considered in Table 1 that make explicit predictions concerning the effect at the level of an interval scale have potential descriptive adequacy for any task measuring the frequency effect in terms of mean RT. All other models undershoot the level of accuracy of the dependent variable of interest (but see Van Orden & Goldinger, this issue, for a different view).

Criteria for evaluating the actual descriptive adequacy depend on the models' format. Mathematical models usually can be evaluated by standard indices of goodness-of-fit. However, in the field of word recognition mathematical models are not the rule, and the criteria that are used in evaluating verbal and algorithmic models differ considerably. To make the three model formats comparable, and to avoid that eye-balling becomes the standard of model evaluation in our field, there seems to be only one fair solution: verbal models should be formulated in a way allowing predictions at the level of an ordinal scale, and algorithmic models should be specified in a way allowing us, for instance, to transform activation values into response probabilities, or cycle times into RTs, thus making the computation of goodness-of-fit indices possible (Grainger & Jacobs, this issue; Jacobs & Grainger, 1991; 1992). In this context, the analysis of RT distributions is a more powerful means for evaluating the descriptive adequacy of complex models than the analysis of RT means, and should be included whenever possible in formal model comparisons (Jacobs & Grainger, 1992; Mewhort, Braun, & Heathcote, 1992; Roberts & Sternberg, 1993).

The problem of descriptive adequacy or behavioral accuracy (Mewhort, 1990) becomes delicate when considering parallel distributed processing and adaptive resonance models. Some authors using such models are liable to accept a trade-off between behavioral accuracy and some form of neurobiological and/or ecological plausibility, preferring qualitative accounts over quantitative ones on the assumption that their explanation offers elements that traditional, symbolic accounts lack. Thus, it has been argued that with regard to classical accounts, parallel distributed processing models offer an additional element: the presence of a mechanism (e. g. the existence of attractors, expected to be "present in the brain"; Hinton and Shallice, 1991). However, much as the symbolic, verbal models this new generation of models is supposed to compete with or eventually replace, qualitative accounts rather than quantitative fits are offered, sometimes justified on account of technical simplifications (Hinton & Shallice, 1991; p. 88). However, some principles illustrated in technically simplified (e. g. small) networks, do not generalize to more complex (e. g. larger) networks, thereby calling into question their applicability to the brain by expansion of the network (Feldman-Stewart & Mewhort, 1994). Moreover, one can argue that neurobiological plausibility must start with behavioral accuracy (Mewhort, 1990). We will return to this important issue when discussing the articles by Van Orden and Goldinger, and Stone and Van Orden.

If we manage to homogenize the format of prediction for models of visual word recognition, such that verbal models could be evaluated with regard to the degree to which they predict the correct rank ordering of the data, while mathematical and algorithmic models could be evaluated using the standard, interval-scale measures of goodness-of-fit, at least two other challenges remain. The first concerns the evaluation of the goodness-of-fit when taking the number of free parameters into account (as with the Akaike information statistic, or the R2; see Massaro & Cohen, this issue). Here we face the problem of having to define what counts as a free parameter when verbal, and algorithmic models are part of the strong inference competition. Ignoring the case of verbal models, we concentrate on a comparison between mathematical and algorithmic models. A case in point is a recent study by Massaro and Cohen (1991; see also Massaro & Friedman, 1990), in which the authors adopted a solution consisting in reducing the algorithmic model (models of the interactive activation family) to mininetworks of a few processing units. This made it possible to fit all parameters of these mininetworks in the same way to the data as those of the mathematical models (variants of the fuzzy logical model of perception). However, even this solution is not very economic, as noted by the authors. More importantly, we have no guarantee that the behavior of these mininetworks using a certain number of free parameters corresponds to the behavior of its macro-parent (the original interactive activation model; cf. Feldman-Stewart & Mewhort, 1994; Stone & Van Orden, this issue). For this method to become a standard option, it would be useful to run simulation studies testing whether the behavior of a complex nonlinear model, such as the interactive activation model, remains unchanged across different sizes, numbers of parameters, and other structural-dynamic modifications. Alternatively, one could try to use very powerful computers to run simulations with interactive activation type or other connectionist models, in which all parameters are set free and (re)adjusted to some empirical data set. However, this option contrasts somewhat with the simplicity criterion, discussed below, and, at least for the case of interactive activation type models, faces the problem of global instability (Stone & Van Orden, this issue).

Furthermore, such time-intensive options involving the adjustment of a large number of free parameters should not prevent us from pursuing another interesting avenue towards a solution for quantitative comparisons between mathematical and algorithmic models. Basically, this consists in translating those aspects of the algorithmic model that are held responsible for the simulation of a given effect into closed-form expressions. The progress that is being made in the formal analysis of the behavior of complex algorithmic models suggests that in the future some of the models we now consider as being algorithmically too complex can be expressed as sets of equations (Hertz, Krogh, & Palmer, 1991). But even without higher-level mathematics, important aspects of algorithmic models can be summarized in the form of simple equations. These become sub-models of the original model, much as the equations derived from signal-detection and choice theory in the logogen model (Morton, 1969). An example for distributed connectionist models is the frequency overlap (sub-)model (Seidenberg, 1993b). The present papers by Kawamoto, Farrar, and Kello, and Stone and Van Orden give examples for adaptive resonance models. Examples for localist connectionist models of the interactive activation family are the letter-frequency (sub-)model (Grainger & Jacobs, 1993a), and the noninteractive dual read-out (sub-)model (Grainger & Jacobs, this volume).

The second challenge one faces in the enterprise of strong inference tests of descriptive adequacy is to find a general solution to the problem that finding that one model fits the data better than competing models does not establish this best-fitting model as the probable source of the data (Collyer, 1985).

To summarize, two major problems for the evaluation of the descriptive adequacy of cognitive models used in the area of word recognition, as in the related fields of attention and memory (Broadbent, 1987; see Note 2) are: i) the wide variety of model formats, and ii) the varying levels these models have in predicting effects (potential descriptive adequacy). We therefore should strive to develop (more) precise versions of our models (or new models) that can be compared quantitatively by use of standard methods of goodness-of-fit, while taking the necessary cautions with these procedures (Collyer, 1985).

Generality

Everybody would agree that a model that explains only one result or effect is not very interesting. So, what can we say about the generality of models in our field. Three features in Table 1 can be used for a first-order evaluation of the generality of models of word recognition. These are task, dependent variable, and effects.

However, when evaluating a model's generality in more detail, it is useful to distinguish two aspects not included in Table 1 that may be termed horizontal and vertical generality. Horizontal generality refers to a model's ability to generalize across different stimulus sets and/or configurations (stimulus generality), different tasks (task generality), or response types / measures (response generality). Vertical generality refers to a model's ability to generalise across different scales of the modeled process, e. g. (macrostructural) static-asymptotic behavior vs. microstructural dynamics, or different types or sizes of a processing structure, such as the number of entries in the lexicon of a simulation model. Vertical generality has received little attention in comparison with horizontal generality but it might become an important issue in a field that provides more and more complex algorithmic models, some of which may have severe limitations for scaling-up, e. g. distributed connectionist models (Feldman-Stewart & Mewhort, 1994).

Regarding horizontal generality, it is easy to define a general model as one that has stimulus, task, and response generality. This implies that it can account for more than just one effect, as measured by one dependent variable, in one particular task, a major problem plaguing theory construction in cognitive psychology (Jacobs, 1994; Jacobs & Grainger, 1992; Ratcliff, 1978). Not surprisingly, verbal models usually fare pretty well with regard to horizontal generality, at least to the extent, that they are used by many researchers to interpret their data in a large variety of tasks. Good examples are the models by Morton (1969), Forster (1976), Carr and Pollatsek (1985), or Besner and McCann (1987). In contrast, very few researchers have used the mathematical version of the original logogen model, or Treisman's (1978) sophisticated extension of that model. For algorithmic models, the picture is complex. On the one hand, two well-known algorithmic models (the interactive activation and activation verification models mentioned above) have been explicitly designed to (quantitatively) account for one effect (the word superiority effect), as measured by one dependent variable (2AFC), in one particular task (the Reicher paradigm). On the other hand, both models have been used by a large number of researchers for interpreting data from a variety of tasks, but as verbal models ! Thus these models can be said to be very specific, when considering their algorithmic variants, and very general when looked at in their verbal clothes. It is therefore difficult to compare the generality of these models. Nevertheless, at least as far as the interactive activation model is concerned, recent attempts to generalise it both horizontally and vertically in its algorithmic format have yielded encouraging results. Thus, it has successfully been used to quantitatively predict letter- and word recognition performance, as measured by both asymptotic accuracy or time-accuracy functions, in a variety of tasks, using different response measures (alphabetic and lexical decision tasks, and various perceptual identification tasks, such as the Reicher, progressive demasking, or letter search tasks), and also using 4 and 5 letter English and French lexica, or 4 letter Dutch and German lexica (Grainger, 1990; Grainger & Jacobs, 1993a,b; 1994a,b; Jacobs & Grainger, 1991; 1992; Heller & Jacobs, 1993; Norris, this issue). It can also be combined with learning mechanisms (Norris, this issue), and, more importantly, it has recently been shown that the model is able to self-organize (Murre, Phaf, & Wolters, 1992), thus undermining the critique by Grossberg (1984). In addition, it can be used for experimental modeling (Neumann, 1990), e. g. in studies simulating different possibilities of visual dyslexia (Jacobs, Heller, & Nazir, 1992).

A useful first step towards more objective judgments of generality would be that each model builder puts an upper bound on the model's horizontal and/or vertical generality by explicitly specifying the facts or classes of facts that the model cannot explain, or that it can only explain with the help of auxiliary assumptions. The present Table 1 should be a useful tool for this enterprise. Interesting new attempts to develop objective methods of evaluating the generality of models can be found in Cutting, Bruno, Brady, and Moore (1992) and Thagard (1988).

Simplicity And Falsifiability
 

A natural question that follows from the issues of descriptive adequacy and generality is to ask at what price the former two features of a model are bought. The eight "simplicity" subfeatures given in Table 1 can serve as a first-order approximation for answering this question with respect to models of visual word recognition.

Models, so we know since the days of Copernicus, have to be as simple as possible (see Note 3). The issue of simplicity becomes all the more important because of the recent trend in psychology from simple to complex models (Roberts & Sternberg, 1993). Some even argue that we are at a point, where standard cognitions of modelers ("a model that is made more and more complex will be able to explain more", and "the true models fits the data best") cease to coexist peacefully and begin to clash with one another (Collyer, 1985). Although there probably exists something like an absolute simplicity (cf. Weinberg's, 1992, discussion of beautiful theories), a model's simplicity can be evaluated more easily in relative than in absolute terms, e. g. with respect to reference points such as alternative models and the above two criteria of accuracy and generality. Thus, a model can be simpler than a competing model, and it can be relatively simple with respect to its degree of generality and/or descriptive accuracy. An example is the new generation of stage models of RT that is more general than the old generation (Sternberg, 1975) but also by far more complex (Roberts & Sternberg, 1993). On the other hand, its authors claim that it is simpler than alternative models, e. g. the cascade model that uses many free parameters (McClelland, 1979).

What criteria can be used to determine the simplicity of models of word recognition, then. Like for the RT models tested by Roberts and Sternberg, the answer to this question is quite easy when we compare mathematical models. In this case, the number and length of equations (number of free parameters) are straightforward measures of simplicity (Collyer, 1985; Cutting et al., 1992). As an exercise in simplicity evaluation, we might compare three mathematical models that account for the word superiority effect by relating the number of equations and free parameters to the number of independent data points fit by each model (Chiang, 1978; Grainger & Jacobs, this issue; Massaro & Cohen, 1991; this issue).

The simplicity judgment becomes harder when verbal and algorithmic models are concerned. Nevertheless, it may be useful to consider some possible solutions. For verbal models, a simplicity rating could be based on the number of hypotheses or the number of boxes and arrows used in the model diagram. Consider the different versions of the logogen model, for example. The pre-1977 version had 5 boxes, the 1977 version has 8 boxes, and the 1980 or 1985 versions have 12 boxes (Morton and Patterson, 1980; Patterson & Morton, 1985). In the logogen model, the number of boxes and their interconnections (arrows) has been continuously adjusted to new evidence. The final (?) 12-box version therefore seems like a good standard against which to test other verbal models, claiming to account for the same range of phenomena, e. g. the 10-box model of Caramazza and Miceli (1989), the 10-box model of Besner and McCann (1987), or the 7-box model of Coltheart, Curtis, Atkins, and Haller (1993). Unless one is radically opposed to functional-architectural models (Seidenberg, 1988), there seems to be no good reason, why such comparisons should not be made between these models in order to find out which one is the simplest. Once a minimal functional architecture has been agreed upon by way of strong inference competition, one could envisage to formulate the resulting model in "harder", more explicit algorithmic or mathematical terms, for quantitative tests against the data.

Finally, for algorithmic models, one could enumerate the number of representation levels or layers, input, ouput, and hidden units, fixed and free parameters, type and number of inter- and intra-level connections, or design and system principles embodied in the model (Stone & Van Orden, this issue). As a last exercise in simplicity evaluation, consider two recent algorithmic models of the distributed connectionist / parallel distributed processing family, the developmental model by Seidenberg and McClelland (1989) and the "componential attractor" model by Plaut and McClelland (1993). The first model used 400 orthographic input units, 100/200 hidden units, and 460 phonologic output units to simulate word and nonword pronounciation. The latter, improved model includes only 108 grapheme units, 100 hidden units, and 57 phoneme units. In purely quantitative terms, this is a significant simplification, given that the new model predicts the relevant data much better than the old one (Seidenberg et al., this issue). However, in qualitative terms the simplification is even more important. By using the redundancy contained in the English grapheme-to-phoneme correspondencies (due to the strong phonotactic constraints that arise from the structure of the articulatory system) for building improved representations into their network, Plaut and McClelland enormously simplified the description of the system, while improving its simulation performance. They also considerably reduced the "emergence" part of the network's performance, if one assumes that the amount of emergent properties of a distributed connectionist learning model is negatively correlated with the amount of redundancy the model builders use in constructing the network. As Simon (1969) noted, we can enormously simplify the description of a complex system, if we find the right representation(s). Thus, in contrast to Seidenberg and McClelland (1989), Plaut and McClelland provide a positive example in this respect. Thus, when evaluating the simplicity of two or more complex algorithmic models, quantitative comparisons (e. g. concerning the number of units) may reveal deeper levels of simplification due to the choice of representational schemes that make use of the redundancy of the modeled system. Stone and Van Orden (this issue) provide another example of how to evaluate the simplicity of algorithmic models by comparing resonance and interactive activation type models.

Popper (1959) linked the criterion for simplicity to that of falsifiability (i.e. a model's ability to generate predictions that can be falsified). He proposed that, given two models in the same domain with equal success, we should prefer the simpler, but where simplicity is defined as a property that places the greatest restrictions on the world, that is, on how the empirical data can turn out to be. Thus, we should prefer the model that is more easily falsified (cf. Estes, 1975; Massaro & Cowan, 1993; see Note 4). An abstract, general way of defining simplicity in this sense can be derived from traditional schemas of scientific explanation (Popper, 1935; Hempel, 1965; Thagard, 1988). According to such schemas, the explanation of facts (F) by a model (M) requires also a set of auxiliary hypotheses (A) and a set of given conditions (C). If all members of C are accepted to be independent of F or M, then we have a means of determining which of two models is simpler by counting A. The main problem in cognitive science is to agree upon C and the independence of F and M from C. This raises the questions of whether cognitive models, and models of word recognition in particular, have explanatory adequacy, and if so, how we can determine and compare it for models of different formats.

Explanatory Adequacy
 

With regard to explanatory adequacy, the main problem consists in finding a set of data that one model (or class of models) can only handle by refering to ad hoc assumptions, while the other accounts for it on the basis of widely accepted empirical assumptions (Chomsky, 1965). However, it might be argued that all assumptions underlying models in this field are ad hoc. McNamara (1992) states this very clearly: "Theories in cognitive psychology are not based on natural law or on known biological mechanisms. Consequently, all theories are ad hoc and can be modified to account for troublesome data. For this reason, theory testing in cognitive psychology cannot be limited to a single experiment or even to a set of experiments on a single issue. Instead, theories must be tested in many domains and in as many ways as possible. Crucial experiments may be possible occasionally, but theory testing is usually a war of attrition: A theory is deemed successful if it accounts for most of the data most of the time, with a minimum number of unprincipled modifications".

Although one might wonder whether there is a radical difference between theories constrained by natural law and models constrained by psychological effects (cf. McClelland, 1993), McNamara's argument about theory testing in cognitive psychology must be taken seriously. As a consequence, we should try harder than in the past to develop multitask models (Jacobs, 1994), and to find effects that one model (or class of models) can only handle by refering to ad hoc assumptions, while the other accounts for it on the basis of antecendent conditions or general principles (see Note 5). In this respect it is critical to draw a clear distinction between empirical phenomena that are built into a simulation model and phenomena that can be considered predictions from it (Dell, 1988).

If we agree that a simple model is one with a minimum of ad hoc assumptions, then it is essential - in the light of McNamara's (1992) statement - that we reach consensus in our field with respect to what we call general laws or principles (see section on strategies for model construction below). In this regard, McClelland (1993) has recently made some proposals that are of interest here. For example, he refers to "Morton's law for effects of context and stimulus information in perception". We can use this "law" to set up the conditions that should produce predicted effects, such as the context-independence of the z-transformed 2AFC probabilities in a phoneme detection task. We can then confront two well-defined models with the data and decide which one gives the best explanation using the above criteria of descriptive adequacy, generality, and simplicity/falsifiability. Massaro and Cohen (1991) have made an exemplary first step in this direction that should be followed more in our field (cf. also Massaro & Friedman, 1990).

The key to the problem of explanatory adequacy of information processing models may well be given in Marr (1982; Rueckl & Kosslyn, 1992; see Pylyshyn, 1989, for a critical view). After a certain procrastination, there are now some initial attempts to apply Marr's ideas to higher-level cognitive phenomena including word recognition (Humphreys, Wiles, & Dennis, 1994; Jacobs, 1994).

Other criteria

There are, of course, other dimensions along which models can be evaluated. The available space prohibits going into details here, and we feel that taking into account the points discussed above would already be a considerable step towards cleaning up the jungle of models in our field. In future work, we should also develop formal criteria for evaluating the modifiability, research generativity, equivalence class, or completeness of models.

A final word about plausibility is in order, though, because it is perhaps the first criterion that comes to mind but is absent in the above considerations. We have adopted a neutral position with regard to the issue of neurobiological plausibility. That is, we consider that all models of word recognition discussed here are functional models, regardless of their format, or other features. In particular, we assume that the present connectionist models, whether of the localist or distributed type, are not more plausible than the non-connectionist models (see Note 6; Forster, this issue; Mewhort, 1990).

THE ARTICLES

Task-oriented Model Classification

To start with a simple, task-oriented classification, the theoretical contributions and models presented in this special section basically fall into three intersected sets defined by the three major tasks currently being used. Models falling into the first two sets view word recognition as a special case of the general problem of pattern recognition and aim at predicting performance in the two major perceptual paradigms in the field, the perceptual identification task (including variants such as the Reicher paradigm; Reicher, 1969), and the lexical decision task, including the masked-priming and accuracy-measure variants of this task (Forster & Davis, 1984; Grainger and Jacobs, 1993a; Paap & Johansen, this issue; Segui and Grainger, 1990). Well-known members of these two sets include the logogen model (Morton, 1969), the interactive activation model (McClelland & Rumelhart, 1981), the fuzzy logical model of perception (Massaro & Cohen, 1991), or the bin / serial search model (Forster, 1976). The intersection of the two sets contains models such as the activation-verification model (Paap et al., 1982), and the semistochastic interactive activation model (Jacobs & Grainger, 1992). In the present special section, the papers of Massaro and Cohen; Paap and Johansen; and Grainger and Jacobs deal with these two sets.

The models of the third set are more concerned with aquisition and production aspects of reading, and focus on the prediction of performance in the naming task. The dual-route model (Coltheart, 1978; Coltheart et al., 1993) is the best-known member of this set. Analogy models (Glushko, 1979), or the developmental model of Seidenberg and McClelland (1989) might be considered examples for members of the intersections with the perceptual identification and lexical decision task sets, respectively. The present papers by Seidenberg et al.; Rastle and Coltheart; Norris; Kawamoto et al.; Stone and Van Orden; and Van Orden and Goldinger deal with this set.

Strategy-oriented Classification

What job are these models supposed to do ? The response produced in a lexical decision or naming task, for instance, is the problem to solve by the information processing system under study. This response corresponds to a state description of the solution. The modeler's task is to discover a sequence of processes that will produce the goal state from an initial state. Translating from a process to a state description enables modelers to recognize when they have succeeded. There are multiple tools of process descriptions, the most prominent of which are systems of differential or difference equations, and simulation models. In a large number of cases they have provided, in physics or biology, the clue for the simple description of complex systems and behavior (Nicolis and Prigogine, 1989; Simon, 1969). An example for a succesful, simple process description of behavior in cognitive psychology is the family of stage models (Massaro, 1987; Roberts & Sternberg, 1993; Sternberg, 1975).

There are two positions with regard to the modeling strategy, and the goal of psychological research in general, that have influenced the theoreticians contributing to this special section to different degrees. The first position can be summarized as follows.

The mind is a complex system that can be thought of as (nearly) decomposable, (quasi) modular, and hierarchic. In a wide range of situations, it has a simple functional structure, as suggested by the fact that behavior often has a simple quantitative structure (RT additivity is a case in point). The goal is to explain behavior functionally by modeling only that part of structure and dynamics that is crucial for abstraction.

We will call this the "in-principle reductionist" position, keeping in mind that it accepts that it is not a trivial matter to infer properties of the whole from properties of the parts and the laws of their interaction (this refers to the observation that the in-principle reductionist is also a pragmatic holist; Simon, 1969).

The alternative position possesses features of neobehaviorism, neogestaltism, neogibsonianism, neoconnectionism, and complexity theory. Next is an attempt to summarize it. We will call this the "in-principle holist" position, assuming that the in-principle holist can also be a pragmatic reductionist.

The brain is about the most complex system scientists can study. The essential level of analysis of the psychological study of that behavior are highly complex neural mechanisms. Because of the myriad properties, interactions and nonlinear dynamics of these mechanisms, the information that led to the final state from the initial state cannot be recovered. Therefore we cannot deduce a set of elementary causal structures that underlie performance of cognitive systems. Instead, we can only hope to describe the overall performance of the system as a whole.

The theoretical contributions and models presented in this volume can be classified with respect to their distance relative to these two positions. Representing the positions as fuzzy sets rather than points, the papers by Forster; Coltheart and Rastle; Paap and Johansen; and Massaro and Cohen belong to the first set, but vary with regard to the value of their membership function. Members representing the holist fuzzy set are the papers by Stone and Van Orden, and Van Orden and Goldinger. The remaining papers by Grainger and Jacobs, Seidenberg et al., and Kawamoto et al. belong to the fuzzy intersection.

A Standard For Progress Evaluation

About a decade ago, Carr and Pollatsek (1985) portrayed models of word recognition as models of how visual, phonological, and semantic codes are activated and either coordinated, or individually selected to support the decision or action requirements of a task. This portrayal stresses the notions of isolable systems and codes, whose nature and cooperative-competitive role has to be specified by any model of word recognition. At the end of their "tour de force" through the rich and complicated word recognition literature, they took the risk of proposing a complicated parallel coding systems model that they found to be most compatible with the data they reviewed. This comprehensive, verbal model was illustrated by nine boxes, 13 one-directional arrows, and one bidirectional arrow, representing the different component structures being part of three parallel systems involved in word recognition: two visual-orthographic pathways, one mediated by whole-word meaning and the other by morphemic decomposition, and a third pathway based on phonological recoding that was itself divided into cooperating subpaths, one mediated by whole-word phonology and the other by phonological assembly. Figure 1, adapted from Figure 6 in Carr and Pollatsek (1985) illustrates this model.

Figure 1. Information flow chart illustrating Carr and Pollatsek's (1985) parallel coding systems model of visual word recognition. The different portions of the system underlying conscious word recognition (visual and semantic code formation, phonological recoding, and morphemic decomposition) are separated by different forms and typographies. The triangle named "word recognition" (standing for conscious word recognition in working memory) represents the ability of the perceiver to gain access to the outputs of the encoding mechanisms represented by the boxes, circles, and ellipses for the purposes of decision and action or response production.

The model was the result of what still seems to be the most comprehensive review of the literature on word recognition (see also Carr, 1986). To the extent that it reflects the general constraints provided by this literature, it can tentatively be considered as a standard or working model of word recognition. Nevertheless, its authors stated that there were many points on which uncertainty existed and that work would have to continue on the details. Consequently, the model contained unacceptably vague components, such as the spelling-to-sound translation system. In what follows, we will discuss the progress made in modeling visual word recognition - as represented by the theoretical contributions and models of this special section - taking Carr and Pollatsek's (1985) model as a standard for evaluation. It will become clear that none of the verbal-metaphorical, mathematical or algorithmic models presented here is as complete/comprehensive as Carr and Pollatsek's (1985). However, considerable progress has been made with regard to filling several boxes and arrows of the "standard model" with content and formal details, or extending it, e. g. by learning mechanisms.

Visual Code Formation

We start with the contribution of visual code formation to word recognition. Three models in this special section deal explicitly with the problem, central in Carr and Pollatsek's review, of why processing of letters presented in a word or pseudoword context is better than that of letters presented in nonwords or in isolation. The three models presented by Massaro and Cohen, Paap and Johansen, and Grainger and Jacobs, give different formal accounts of the word superiority effect, and differ in many ways. However, they agree in one respect. None of them adopts the explanation of the phenomenon offered by Carr and Pollatsek (1985), i. e. phonological recoding as a unitization mechanism that abstracts information from the visual system and transforms it into a code that is safe from masking and from memory loss long enough to support the decision processes that are required in the tachistoscopic recognition environment. Even more surprisingly, all three models belong to a class that Carr and Pollatsek (1985) considered as being an unattractive explanation of the word superiority effect, namely sophisticated guessing models. The bias Carr and Pollatsek (1985) and others at that time had against sophisticated guessing models very much relied on Johnston's (1978) arguments against the explanatory adequacy of that class of models. However, as argued in the paper by Grainger and Jacobs, Johnston's arguments may have been taken for granted prematurely.

The fact that the present sophisticated-guessing models can account for the effects of word frequency, word superiority, and orthographic neighborhood on performance in perceptual identification and lexical decision may, after further testing, lead to a revision of the "standard model". In this respect it is useful to reconsider the arguments Carr and Pollatsek leveled against two of the present competing models.

As concerns the interactive activation model, Carr and Pollatsek (1985) saw two main but not necessarily insurmountable problems. The explanation of strategical modulations of the word superiority effect (Carr, Davidson, and Hawkins, 1978) based on the assumption that subjects have control over the letter-to-word excitation parameter (Rumelhart & McClelland, 1982) certainly requires further empirical testing, before any firm conclusions can be drawn (cf. Henderson, 1987). However, their second criticism concerning the assumption that responses are computed solely on the basis of activation at the letter level, which was in contrast with the results of Hawkins, Reicher, Rogers, & Peterson (1976), is now met by the present dual read-out model.

As concerns the activation-verification model, Carr and Pollatsek (1985) criticised the idea that the verification process is disabled in explanations of tachistoscopic recognition, and that therefore the interactive activation and activation-verification models are not sufficiently fundamental to warrant a concerted effort to distinguish them in this task domain. They also criticised the facts that the model takes no position on the type of code produced at the word level, and that it needs modification in order to account for the effects of word frequency, so ubiquitous in tachistoscopic recognition. This is the point that Paap and Johansen counterattack in the present special section. They present experimental evidence that word frequency does not systematically influence response accuracy in a lexical decision variant of the Reicher task. In addition, they present correlational evidence that when word frequency is measured not in logarithmic units but in plain units, neither word frequency nor bigram frequency significantly affect lexical decision times. Thus, Paap and Johansen challenge the word recognition community to justify the standard log transform of word frequency either on the basis of psychological models or broadly-based principles. This challenge is taken up by Forster (this issue) who argues that in a lexical decision experiment the effect of word frequency on response time can be shown to be a continuous, graded effect across the entire frequency spectrum. This is taken as evidence for a lexical locus of frequency effects and for the account given by the bin/lexical search model.

A word is in order concerning Paap and Johansen's challenge of the utility of perceptual identification tasks, such as Snodgrass and Mintzer's (1993) fragmentation technique, for distinguishing factors important to "primary lexical identification" (i. e. activation and verification in their model) from those that are "secondary and peculiar to only certain tasks". One may wonder whether these authors do not adopt a little-to-narrow, "pure perceptualist" view of the word recognition process. It may well be that threshold tasks, such as Snodgrass and Mintzer's, invite subjects to do sophisticated (unconscious) or even unsophisticated (conscious) guessing and that this "bias" is not a ubiquitous part of "primary lexical identification". On the other hand, however, any learner of a language, or imperfect speaker of foreign language(s), will agree that all kinds of guessing processes are an integral and central part of every day word recognition in reading, hearing, and speaking. It is difficult to see what compells us not to study the role of such important processes, called "secondary" by Paap and Johansen. It is indeed time to stop arguing away from our models and methods such guessing, decisional, bias, expectational, or strategical processes. Rather, if we want to achieve any substantial understanding of the reading process "outside the laboratory", i.e. one in which the distinction between seeing and saying drawn by Paap and Johansen becomes artificial, our models must explain what part these "secondary" processes play in relation to "primary" processes (Grainger & Jacobs, 1993b, 1994b; Jacobs, 1994; Stone & Van Orden, 1993). Methods for quantifying the part guessing may play in different data-limited and resource-limited tasks provide data that can be critical for testing current models and constraining future comprehensive models of word recognition whose validity should go beyond the pure perceptualist's 2AFC-task approach to reading (Grainger & Jacobs, this issue; Jacobs, Grainger, & Nazir, 1994). Clearly, word recognition is more than and different from performance in a Reicher, lexical decision, or naming task (Carr & Pollatsek, 1985). Future algorithmic models of word recognition should acknowledge this and specify which structures and processes are task-specific and which are not (Grainger & Jacobs, 1993b, 1994b; see also section on strategies for model construction and Figure 2 below).

Among the authors concerned with the explanation of the word superiority effect, Massaro and Cohen are liable to believe that it is still too early to model the representations and processes underlying reading in explicit detail. Their way to deal with the complexity inherent in the reading process, and any empirical measures of it, is to acknowledge the impossibility of specifying or modeling the variability in information used in the reading process, and to account for it by estimating free parameters in the model-to-data fit. Their fuzzy logical model of perception describes the reading process as a prototypical pattern recognition situation in which the reader exploits multiple sources of information in perception and action. Unlike other stage models that aim at determining the locus of effect of factors under study, this model aims at a quantitative description of the contribution of multiple factors to reading performance in a wide variety of tasks, including letter search (Jacobs et al., 1994), word fragment completion (Massaro, Weldon, & Kitzis, 1991), and sentence processing (Massaro, 1987). From a pluralistic modeling perspective (Estes, 1991; Stone & Van Orden, this issue) it can therefore be seen as an analytical tool, usefully complementing comprehensive verbal and detailed algorithmic models of word recognition that cannot provide such quantitative estimates. For models of this type it might be useful to develop an estimate of the likely error in predictions due to the simplifying, heuristic assumptions of decomposability and independence, if we view the real word recognition system as a complex (dynamic, nonlinear) system (Bechtel & Richardson, 1993; p. 27).

Phonological Recoding

In Carr and Pollatsek's (1985) functional-architectural model, visual code formation is a direct source of input to two semantic processes -- morphemic decomposition and semantic memory -- and can itself be directly attended should the task require visual or orthographic decisions. It is also a source of input to a third, supplementary or alternative pathway to semantic memory and word recognition. This pathway is phonological recoding. It is important to distinguish between this view of phonological recoding and the one taken by current models in which phonological recoding is the only route to lexical and semantic knowledge (Lukatela & Turvey, 1994; Van Orden, Johnston, & Hale, 1988; Van Orden, Pennington, & Stone, 1990).

Among the present models that incorporate different phonological recoding mechanisms to explain effects of regularity/consistency on naming performance, two use algorithmic/simulation formats of either the localist, interactive activation family (Coltheart & Rstale; Norris) or the distributed connectionist type (Seidenberg et al.). A striking difference between these models concerns the number and the nature of routes that lead from a printed word to its pronounciation. The classical two-routes architecture (Coltheart, 1978; Coltheart et al., 1993) is opposed to a single-route architecture (Plaut & McClelland, 1993; Seidenberg & McClelland, 1989; but see Forster, this volume, for an alternative view), and a hybrid architecture that can either be viewed as a single-route, dual-route, or multiple-route one (Norris). The models also differ in the degree to which they use explicit symbolic representations of orthographic and phonological structure, and the importance they attach to a learning mechanism in accounting for the basic phenomena. The 144 grapheme-phoneme correspondence rules used in the dual-route cascaded model are opposed to the "improved orthographic and phonological representations" used in the distributed model (see Note 7), and six kinds of orthographic units used in Norris' model.

In Seidenberg et al., the dual-route and distributed connectionist models are examined with regard to how well they account for differences in the frequency of pronounciations of 590 monosyllabic nonwords. The outcome of the strong inference test is close to a perfect draw. Both models yield excellent fits to the data (percentage of responses that matched subjects first, second, third or fourth pronounciations). A more detailed analysis is therefore necessary to determine the winner. Seidenberg et al. now change the dependent variable for this more detailed analysis. They examine the two models with regard to how they explain the consistency effect on naming latency (Glushko, 1979) also observed in their data. One conclusion of their detailed model comparisons is that the effect "emerges quite naturally" in parallel distributed processing models. This emergence is perhaps not as mysterious as is usually the case with such models (cf. Bechtel & Richardson, 1993). First, it is explained by the fact that the model encodes the degree of consistency in the mapping between spelling and sound during the learning phase (see Note 7). Actually, the network develops a hermaphrodite structure in the hidden units layer. It develops componential attractors, i. e. clusters of units that form separate, orthogonal sub-basins coding the onset, vowel and coda. Interestingly, the network seems to develop these componential attractors only for the pronounciation of regular words. For exception words, it develops what Plaut and McClelland (1993; p. 2) call "much less componential" attractors. Second, this effect is supposed to follow from independently-established principles about distributed representations and error-learning algorithms (Seidenberg, 1993a; see section on strategies for model construction below).

Another conclusion that Seidenberg et al. draw from their present work is that the dual-route model requires the relaxation of two strong assumptions of the original model (Coltheart, 1978), and the inclusion of several auxiliary assumptions to deal with the consistency effect. Furthermore, its underlying assumptions are considered as being inferred from neuropsychological data rather than from independently motivated principles (data-fitting approach). Thus, with regard to the criteria outlined earlier in the present article, the parallel distributed processing model looks like the better explanation.

This contrasts with Coltheart and Rastle's conclusion that parallel distributed processing models cannot account for the serial position effect on naming latency they obtain in their experiment, because activation of phonemes in these models occurs in parallel. Their revised dual-route, interactive activation model uses a cascaded assembly process that delivers information related to the left part of words earlier than information related to the right parts. It is this serial processing feature of the assembly system that leads to the critical prediction, which, according to Coltheart and Rastle, distinguishes the dual-route model from all parallel distributed processing models of naming.

In contrast to the dual-route and parallel distributed processing models, Norris' interactive activation type model quantitatively accounts for regularity/consistency and frequency effects on what seems to be the main dependent variable in naming studies: latency. Norris' model fares well in simulating data from studies by Andrews (1989), Brown (1987), Jared, McRae, and Seidenberg (1990), Kay and Bishop (1987), or Seidenberg, Waters, Barnes, and Tanenhaus (1984). It explains these effects on the basis of a few simple assumptions about competition between different levels of spelling-to-sound correspondences. A simple version of the model including a learning mechansim is also presented. If it is augmented with assumptions about the time-course of processing (necessary to account for the serial position effects observed by Coltheart and Rastle, for example), it seems like a good candidate for a general model of reading aloud, although there might be a tough competition with the model of Coltheart and Rastle, if these authors succeed in building and testing a large-scale variant of their dual-route interactive activation model (possibly including semantic units) that could explain accuracy and latency data from the three major experimental paradigms used in the field (perceptual identification, lexical decision, and naming task) plus the semantic categorization task.

Although the present special section does not provide an answer to the question of which functional architecture is the right one for a general model of reading aloud, considerable progress has been made with regard to Carr and Pollatsek's (1985) conclusion on the issue. These authors stated that parallel coding systems models are clear on the existence of a rule-governed encoding mechanism but very vague about what the rule system's representation and operation looks like, and that such models must either become more specific or give way to other approaches. The three models just discussed provide a pluralistic answer to this problem, one clearly trying to do the former, while the others try different versions of the latter.

Semantic Code Formation And Morphemic Decomposition

Two contributions to the present section deal with the relationship between visual and semantic code formation (Kawamoto et al.), and the relationship between visual, phonological, and semantic "subsymbols" (Van Orden & Goldinger). Forster's and Kawamoto et al.'s papers are the only ones to consider processing of morphologically complex words (see Taft, 1991, for a verbal interactive activation type model that is supposed to account for the processing of morphologically complex words).

Kawamoto et al. present a distributed connectionist/adaptive resonance solution to the problem of modeling the ambiguity advantage, sometimes found in lexical decision experiments (Kellas, Ferraro, & Simpson, 1988; Millis & Button, 1989; Simpson & Kang, 1994). Their fully recurrent network can simulate an ambiguity advantage on response times, if a special error-correction learning algorithm (least-mean squared) is used, and the orthographic/spelling units are used as output units. However, it fails to capture the finding that ambiguous words with many meanings yield fewer errors than ambiguous words with few meanings

An important general question raised by Kawamoto et al. is how we are to represent meaning in models of word recognition ? The "standard model" (Carr & Pollatsek, 1985) is silent on that issue (but see Dagenbach & Carr, 1994). However, Kawamoto et al. also make no commitment whatsoever with regard to the nature of their semantic representations; they only commit themselves to a certain format, that is, distributed representations. In fact, had their "semantic units" been labeled lexical, phonological, morphemic, or pragmatic units, nothing in the model's behavior or its simulation of the "ambiguity advantage" would have changed, as long as the feature values representing "semantics" remain the same. This raises the issue in what sense and in virtue of which properties internal states in connectionist networks can be said to represent or to have meaning ? One possibility, discussed in Goschke and Koppelberg (1990), is to integrate connectionist models with a correlational account of semantic content (Dretske, 1988). Another interesting possibility is given by the topographical maps approach (Ritter, 1990). Despite these possibilities, subsymbolic/distributed connectionist models of the type proposed by Kawamoto et al. face a considerable challenge in explaining how the brain gets semantics from syntax (Dennett, 1987). There may indeed be severe limitations of any subsymbolic approach in dealing with this central issue of cognitive neuroscience (Rosenberg, 1990).
 

The papers by Van Orden and Goldinger, and Stone and Van Orden take an extreme stance on this issue of how to represent visual, phonological, or semantic codes. Similar to Kawamoto et al., their metaphorical resonance network uses subsymbols to "represent" these codes (Van Orden et al., 1990) and therefore faces the same issues mentioned above. The relative inaquaintance with resonance models of researchers in our field justifies a more detailed discussion of this approach, which was not discussed at all in Carr and Pollatsek (1985).

The adaptive resonance account proposed by these authors is couched in notions of complexity theory (Nicolis & Prigogine, 1989). It can also be thought of as the application of an experiential (in contrast to ecological), mediated (in contrast to unmediated) variant of Gibsonian perception-action theory to reading. Armed with a few constructs derived from complexity theory and Neogibsonian theory (Kugler, Shaw, Vincente, & Kinsella-Shaw, 1990), they provide a simple and general account of representative data observed in the naming and lexical decision tasks in terms of a system that is tuned to cohere on learned fixed point attractors emerging in a hypothetical visuo-phono-semantic state space.

The challenging questions that the two resonance papers may put into the minds of students of word recognition are: What do we gain by replacing the traditional information processing language by the language of complexity theory ? What are the costs and benefits of describing rules in terms of fine-grain pockets of self-consistency, representations in terms of subsymbols that have solely narrative function but no psychological reality, or processing times in terms of trajectory lengths ? Should we give up the old Fechnerian dream of finding a psychological, interval-scaled ordinate for the physical abscissa. That is, should we focus on establishing ordinal-scaled performance topologies, as urged by Van Orden and Goldinger, rather than on constructing models that predict RT means and distributions, for example ? Is there really no privileged grain-size, are there no atoms and quarks of the psychological reality of reading and speaking (cf. Van Orden et al., 1990) ?

An obvious advantage of the meta-account these authors propose is its generality. Complexity theory can be applied to the most diverse natural or artificial systems: a good general dynamic systems model can be applied to climatic changes, animal and human group behavior, cognition, perhaps anything (Nicolis & Prigogine, 1989). That gives this account the advantage of being applicable to multiple tasks, or multiple languages, thus avoiding the standard joke of psycholinguistics "Well that's true in English, but ..." (Van Orden & Goldinger, this volume). Another advantage is its consistency with recent attempts to describe the brain as a machine that detects correlated events (Eggermont, 1990). However, the metaphorical framework they propose for the understanding of basic processes involved in reading may rise or fall with our ability to implement models that embody its metaphors and that allow us to make quantitative predictions testable in our major paradigms. For without the capacity for quantitative prediction, we could never tell whether a resonance model A or a resonance model B (e. g. Grossberg & Stone, 1986 vs. McClelland & Rumelhart, 1981), a metaphor M or a metaphor N (e. g. computer vs. resonance), a principle P or a principle Q (e. g. supervised vs. unsupervised learning) is right. Without a quantitative understanding of phenomena we will never be able to explain one principle or theory by another, a major goal of scientific research. Moreover, the capacity to make clear, testable predictions with regard to the primary effects observed in our standard paradigms may save the present resonance approach from the fate of Grossberg's (1982; Grossberg & Stone, 1986). As Estes (1988) stated "it carried on unaccompanied by any substantial interactive relationship with ongoing research, and received little acceptance among psychologists...".

A major, very general issue raised by the "resonance" papers is that their authors not only urge us to abandon the method of componential analysis. They also seem to be willing to give up the universal scientific hope of discovering causal relations between phenomena, at least as far as reading is concerned (this is reminiscent of the classical, periodically recurring debates on i) teleological, holistic understanding vs. causal, analytic explanation of behavior (von Wright, 1971), and ii) the behavioral indeterminacy claim (Anderson, 1978; Pylyshyn, 1979). However, it is useful to note that Prigogine, as other nobel laureats of our time (e.g. Weinberg), does not seem to draw the same conclusions with respect to the initial conditions dilemma as Van Orden and Goldinger, who assure us that to recover the initial conditions that preceded a given response to a given stimulus is impossible, and that we therefore cannot deduce a set of elementary causal structures that underlie the performance of cognitive systems (Uttal, 1990).

Take a standard example from complexity theory, climate. This is considered to share the intrinsic unpredictability of systems undergoing chaotic dynamics, which Van Orden and Goldinger attribute to the language processing system. However, the fact that we cannot (yet) predict next month's weather does not mean we do not understand why the weather works the way it does (Weinberg, 1992). As Weinberg notes with respect to the emergence of chaos "the exciting progress that has been made in this area in recent years has not taken the form solely of the observation of chaotic systems and the formulation of empirical laws that describe them; even more important has been the mathematical deduction of the laws governing chaos from the microscopic physical laws governing the systems that become chaotic". Nicolis and Prigogine (1989) urge the reader of their introduction to complexity theory to use simple mathematical models, such as the stochastic resonance model of climatic variability, or the choice model of human behavior, to specify more sharply the nature of the unpredictability of the human system, and to diminish the gap with respect to a possible, detailed analytical model. Among others they present analytical methods for identifying suitable sets of variables spanning a state space, give examples of applications to EEG data (which surely are no easier to analyse than RTs), and propose algorithms for reconstructing the dynamics of complex systems from time series data. In light of the above arguments of bringing the resonance approach in closer contact with the rest of the word recognition community (and with the data), these methods surely merit some more attention from the "Arizona State University perspective".

Consider the complex system sketched in Figure 2 of the paper of Van Orden and Goldinger, which can be considered representative for current, complex models of the reading process, including some of the present models. A challenging question the authors of the resonance papers must answer is what will we gain or loose by assuming that each subsystem (visual, phonological, semantic) reaches an equilibrium nearly independently of the others, i. e. by treating the matrix of the coefficients reflecting the dynamics as a nearly decomposable matrix, and assuming - much like in the theory of the thermodynamics of irreversible processes - microscopic equilibrium and macroscopic disequilibrium (Nicolis & Prigogine, 1989). We know that comparatively little information is lost by representing complex dynamic, and nearly decomposable systems as hierarchies, and by reducing high-dimensional systems to low-dimensional ones (one- and two-variables). However, much is gained on the side of economy of description and comprehensibility (Simon, 1969). If the task of science is to make use of nature's redundancy to describe nature simply, and if we consider ourselves as scientists, this is a nontrivial issue for the future of research on word recognition and reading.

Can Reductionism And Holism Be Reconciled ?

Given the pluralistic modeling perspective these "resonance" authors share with other extant modelers of cognition (Estes, 1991), perhaps some promising possibilities exist for integrating reductionist information processing models with their subsymbolic / holistic account. Examples for such attempts exist in related domains, such as the neogestaltist account of visual perception (van Leeuwen, 1990) or the thermodynamics account of human thinking and problem solving (Krause, 1989). Our hope is that, in this age of transition for science (Gleick, 1987; Nicolis & Progogine, 1989), the present special section stimulates such integrative, synthetic attempts for word recognition and reading.

In the final invited paper of this special section, Forster, whose 1976 bin/lexical search model has stimulated a great deal of research that forms our current knowledge base on word recognition, presents a thoughtful discussion of the role of algorithmic/simulation models for advancing theory in our field. He provides insights about the status of rules and representations in modeling word recognition that challenge what he calls the "radical or revisionist" connectionist school of thought (i. e. connectionist modelers using distributed representations and learning algorithms; e. g. Seidenberg & McClelland, 1989; Seidenberg et al., this issue), in contrast to the "conservative" connectionist school of thought (i. e. modelers using localist representations; McClelland & Rumelhart, 1981; Grainger & Jacobs, 1993a; 1994; Jacobs & Grainger, 1992; Murre et al., 1992; Norris, this issue). Forster draws two conclusions supported by a variety of logical, computational, and empirical arguments. The first is that localist representations (or distributed representations using a connectivity effectively equivalent to localist representations) are the appropriate structural component to model lexical processing. His analysis of the similarities between distributed connectionist models using gating functions to minimise the problem of catastrophic interference (a problem totally ignored by any of the present accounts based on learning algorithms or covariant learning principles; Sloman & Rumelhart, 1992) and his serial search model suggests the same conclusion, although from a different direction, as Seidenberg et al.'s analysis of the similarities between their "much less distributed" model and the revised model of Coltheart et al. (1993): Models of word recognition seem to enter a "synthesis stage", in which symbolic, localist, modular models ("the thesis") are reconciled with subsymbolic, distributed, interactive models ("the antithesis") in some hybrid form(s), the prototype of which probably is a quasi-symbolic, quasi-localist, quasi-modular model. Posner and McCandliss (1993) in their final commentary on a special section on reading research come to a conclusion that supports this analysis, when stating that converging evidence from behavioral and brain imaging studies supports a word recognition model closer in spirit to the interactive activation model (McClelland & Rumelhart, 1981) than to more recent (distributed) models (Seidenberg & McClelland, 1989).

Forster's second conclusion is that simulations are no explanations. In this respect it is useful to note that what is absent in almost all non-simulationist accounts is a detailed algorithmic-level description of the structures and processes underlying word recognition. The usefulness of such algorithmic/simulation models, desgined to answer the question of how a system computes an output from a given input, depends on whether they are integral parts of some more abstract level of theorising which includes statements about the Why and the What of a computation. In Marr's (1982) framework there is no doubt about the hierarchy: The Why and What questions must be answered first (but see Pylyshyn, 1989, for an alternative view). The last section deals with this important general issue of the right strategy for model construction.

STRATEGIES FOR MODEL CONSTRUCTION

Gardeners and Architects

Two distinct approaches to model construction emerge from the literature on word recognition and the present special section. The first, which may be coined the gardener's approach (or, "the model is not the theory") can be caricaturized as consisting in "growing" a model or network that mimics in some respect a human cognitive function, without necessarily having an explicit theory of that function (cf. McCloskey, 1991; Pinker & Prince, 1988). A pragmatic example of this modeling strategy that, as we feel, did a good job for advancing theory in our field, is the activation-verification model. Its authors argued in favor of the following modeling strategy:"find a set of simple algorithms (with plausible psychological underpinnings) that do a good job of predicting performance. Once a formal model has been developed that predicts performance with a fair amount of success this makes it easier to test and select particular psychological explanations than to start with very specific ideas about the underlying psychological processes and then derive algorithms that suit these particular assumptions". (Paap et al., 1982)

The second strategy could be coined the architect's approach (or "the model is the theory"). In line with the central dogma of cognitive science (cf. Chomsky, 1965; Pinker & Prince, 1988) some continue to argue that it is the right approach to start with a fully-specified theory (based on general principles) and then (if one wishes) to implement it as an algorithmic model, for example (McCloskey, 1991). However, in reality, the architect's approach is more like a deus ex machina approach (Pinker & Prince, 1988), in which the model is continuously reshaped to fit the facts (McNamara, 1992).

Obviously, the "the model is not the theory" strategy has the advantage of being applicable even in fields of research that lack general principles, or, to put it differently, where people still wonder what these principles are. The testing and selection of appropriate explanations, so the hope associated with this strategy, is an emergent property of the success of prediction (Paap et al., 1982; Paap, 1992). In this approach, models can be used much as experiments, providing data against which to assess more general theoretical claims (Seidenberg, 1993a).

Principles

It might be argued that since we simply do not know enough yet to be able to fully motivate all aspects of our models by independent general principles, the potential for the gardener's strategy is larger in our area than that for the architect's strategy (see Rueckl & Kosslyn, 1992, for a critical view). There are, however, some bright lights at the horizon of the architect's strategy. One star might shine from the firmament of neuropsychology and the age of cognitive neuroscience (Caramazza, 1992; Carr, 1992). In combination with behavioral results, data obtained with neuropsychological, brain mapping, and electrophysiological techniques could provide much stronger "internal" constraints for our models (Jacobs, 1994; Kosslyn & Intrilligator, 1992; Posner & Carr, 1992). Another bright light is the external constraints provided by recent attempts to lay down a principle-based foundation for modeling cognition. Stone and Van Orden (1993; this issue) discuss some broad design and system principles that are particularly interesting because they apply to all types of models considered here (verbal, algorithmic, and mathematical). For localist and distributed connectionist models, McClelland (1993) and Seidenberg (1993a), respectively, have also made proposals that would render the "the model is not the theory" strategy more principled. Since the principles listed by McClelland (1993) are only provisional and lack a principle of representation, and for reasons of space we will focus here only on the first three connectionist principles listed by Seidenberg (1993a,b). We believe a close inspection of these three principles provides some general insights.

The first principle is "knowledge representations are distributed". Before we can accept this as an independently established, fundamental principle for use in theory construction in our field we must examine its possible shortcomings. One is that it is ambiguous because both terms "distributed" and "representation" are not clearly defined. With regard to definitions of the term representation, symbolist, neogibsonian, and neoconnectionist approaches still have to be unified (Hatfield, 1990). More importantly, in the description of complex systems a multiplicity of representations is required (Marr, 1982). It is not possible to state that one of them is "true", or that it is the final one (Dalenoort, 1990; Rosen, 1978). It is also necessary to clearly define the relationships between localist and distributed representations, or symbolic and subsymbolic ones. Until it has been demonstrated that a subsymbolic performance model (Seidenberg et al., this issue; Smolensky, 1988; Van Orden & Goldinger, this issue) converges at its conceptual level to the competences of a successful symbolic model over the same initial task domain, one may conclude that it is a mistake to think of the subsymbolic approach as an alternative to the symbolic approach to cognitive modeling (Rosenberg, 1990). In this regard a question that arises from the work presented here is whether the terms localist and distributed represent opposite ends on a continuum. If componential attractor networks (Seidenberg et al., this issue; see Note 7) use representations that are less distributed than those used in the Seidenberg and McClelland (1989) network, the term "distributed" requires fuzzy-logic to be understood correctly, and the above first principle has to be redefined in more precise terms to be meaningful.

Although Seidenberg (1993a, Footnote 4) acknowledges that it is by no means clear as yet whether connectionism provides an adequate set of principles, it should be emphasized that the above "first principle" neglects the fact that there is no such thing as a general agreement among cognitive (neuro)scientists that representations are distributed. There is indeed a large body of neurobiological, neuropsychological, and psychophysical evidence arguing in favor of modular or quasi-modular systems (Caramazza, 1990; Changeux, 1983; Damasio & Damasio, 1992; Gazzaniga, 1985; Marr, 1982; see Churchland & Seijnowski, 1992, for an alternative view), and there are many reasons why introducing modular constraints and localist representations in a layered network architecture are not only advantageous but necessary for the network to learn and perform correctly (Forster, this issue; Murre et al., 1992; Rueckl, Cave, & Kosslyn, 1989; Sloman & Rumelhart, 1992). The fact that Plaut and McClelland (1993) introduce a priori knowledge in the form of a position-specific architecture (thereby introducing highly-structured, quasi-localist representations) to guide nonword pronounciation learning into the intended direction is another case in point.

Of course, there may be a way to reconcile the first principle with the above arguments. In analogy with classical logic being considered a special case of fuzzy logic (Klir & Folger, 1988), or the inclusion of symbols as a subclass of nonsymbolic representations (Hatfield, 1990), distributed representations could be considered the general case, from which all other special cases can be formally derived. Thus, once it has been demonstrated how and why results obtained with localist and quasi-modular connectionist networks transfer to systems using componential and noncomponential attractors, we could feel more at ease with accepting Seidenberg's statement as a (first) principle. Meanwhile, the labels axiom or dogma seem to be better names for it at present (cf. Stone & Van Orden, this issue).

The second and third connectionist principles can be combined as follows: "Processing involves learning through weight adjustment by error-correction". There are laws of learning (discrimination, generalization), and laws of performance (McClelland, 1993). The acceptance of these principles, however, must depend on the outcome of tests of the validity of the learning algorithms of network models with respect to the appropriate learning laws in learning experiments (Feldman-Stewart & Mewhort, 1994; Gluck & Bower, 1988; Mewhort, 1990; Ratcliff, 1990). If such tests establish the validity of the postulated learning algorithm(s), we can be more confident in the conjecture that the performance effects simulated by parallel distributed processing models are, in part, caused by the learning history (they are, of course, also a result of the network's structural and dynamical properties). We know that some learning algorithms work for some applications (least-mean square error-correction in Kawamoto et al., this issue), whereas others work for other applications (Hebbian learning in the network of Farrar & Van Orden, mentioned in Van Orden & Goldinger, this issue). While it is important to examine the Why behind this (Kawamoto et al.), there is no such thing yet as a computational theory of learning, in Marr's (1982) sense, that guarantees that distributed connectionist models do more than mimic learning.

We will spare the reader with a detailed repetition of the many criticisms that can be leveled against the neurobiological and/or psychological plausibility, as well as the computational rationality, of supervised learning by backpropagation or other error-correction algorithms (Murre et al., 1992). However, instead of silencing them away, builders of distributed learning models should verify whether the problems inherent in this learning procedure (slow convergence in multi-layer networks; catastrophic interference; modest discrimination and/or generalization ability, local minima; existence of more powerful algorithms, based on line-search rather than gradient descent; the need for feedback of a type that may not be available in the environment, etc. etc.; (Feldman-Stewart & Mewhort, 1994; Ratcliff, 1990; Rueckl & Kosslyn, 1992; Sloman & Rumelhart, 1992), are also characteristics of animal and human learning. Bechtel (1990) makes some interesting proposals in this direction, suggesting that connectionist models of language learning should first be tested against and constrained by data from animal, not human language (e. g. Savage-Rumbaugh, 1986). However, error-correction learning, which is the most widely used technique for distributed models, seems no good candidate for an implementation of a "learning principle", for at least two reasons. First, it cannot account for the fact that much learning, like the incidental storage of everyday experiences, proceeds without any form of supervision (or auto-supervision; Murre et al., 1992). Second, principles illustrated in small networks using error-correction learning cannot be applied easily to brain-size networks by scaling up as required (Feldman-Stewart & Mewhort, 1994), thereby undermining the assumption that the study of simplified distributed networks can tell us something about the functioning of larger brain-size networks (Hinton & Shallice, 1991). Given the insights provided by the learning or "revisionist" branch of connectionist modelers (Forster, this issue) it would be silly to argue that no learning mechanisms for the aquisition of cognitive skills should be hypothesized and studied until one has a good theory of the steady-state skill itself. However, such hypotheses should be submitted to the same standards of scientific testing as those about other mechanisms.

Pragmatic approaches

At present it seems premature to say that there exists a generally accepted principle-based approach to modeling visual word recognition, much less to modeling cognition in general. However, in the literature represented by this special section there exist some interesting pragmatic attempts in this direction. One is the canonical modeling approach by Stone and Van Orden (1993; this issue). Starting with the simplest model within a given framework that fairly characterizes the qualitative behavior of other models that share its design and system principles, with respect to the data at hand (e. g. the interactive activation model as the prototype of a canonical resonance model), the cornerstone of this approach to theory development is a refinement process. However, in contrast to the deux-ex-machina approach mentioned above, the aim of this modification and refinement process is not to guarantee the survival of models in future attempts at falsification. Rather, in this approach models are important means for identifying principles and determining the explanatory credit and blame with respect to these principles.

A second approach aims at identifying the core processes underlying word recognition by building multilevel, multitask models (Grainger & Jacobs, 1993b; 1994b; Jacobs, 1994). A cornerstone of this approach is the concept of functional overlap, illustrated in Figure 2. This concept relates the assumptions that i) there is a reasonable overlap between functional mental structures and processes involved in making perceptual identification, lexical decision, or naming responses, and those involved in identifying isolated words, and ii) there is no theory-free way of determining this functional overlap. A straightforward way of modeling functional overlap is to construct, on the basis of a computational theory in the sense of Marr (1982), or a set of sufficiently constraining background assumptions (Posner & Carr, 1992), a readily falsifiable performance or algorithmic-level model of a task A, and, after testing the model in this task, to eliminate only the postulated task-specific process(es) and test it in a related task B, which, by hypothesis, is otherwise identical to A. If the model does well in predicting performance for both tasks, it is reasonable to assume that the difference between the two tasks is well captured by the postulated task-specific process(es), and that the model captures the basic, task-independent representations and processes (Jacobs, 1994).

Figure 2. Venn diagram illustrating the concept of functional overlap. This concept relates the two notions that i) there exists a reasonable overlap between the mental structures and processes involved in "normal" word recognition and those underlying performance in the major tasks used in the field, and ii) that there is no model-free way to determine this overlap. Accordingly, visual word recognition can only be understood by constructing and verifying multitask models that specify both the intersections of the sets and the differences between the sets illustrated in this diagram.

A recent example for this approach is the successful quantitative prediction of a large number of dependent variables in both the lexical decision and perceptual identification tasks by the three-process model of lexical decision and word recognition, which assumes that in the perceptual identification task two of the three processes underlying performance in the Yes/No lexical decision task are not operative (Grainger & Jacobs, 1993b; 1994b; Jacobs & Grainger, 1992). The multilevel, multitask modeling approach adopted in the latter papers is part of a research strategy, expected to lead to an internally and externally constrained (Kosslyn & Intrilligator, 1992; Posner & Carr, 1992), integrated model of visual word recognition (Jacobs, 1994).

Nested Modeling
 

It is also useful to note that before the crucial assumptions distinguishing the interactive activation model (interactivity and connectivity) from its extant precursors (logogen model) or competitors (Massaro & Cohen, 1991; Paap et al., 1982) could be submitted to thorough empirical tests, the new connectionist model generation arrived. Quantitative tests of these assumptions are now published and some authors are pursuing the test-work (Grainger & Jacobs, 1993a; 1994; Jacobs & Grainger, 1992; Richman & Simon, 1989; Massaro & Cohen, 1991; Mewhort & Johns, 1988). However, the authors representing the "revisionist" connectionist school (Forster, this issue) have not specified how exactly localist and distributed models are related to each other, i. e. how the developmental model can be derived from the interactive activation model, or the componential attractor model can be derived from the original developmental model (cf. McClelland, 1993). This makes it very difficult to pursue the test work, and it lends support to the astrophysicists joke mentioned by Roberts and Sternberg (1993) with respect to complex models ("never propose a theory that can be tested during your life time"). Clearly, a good balance is needed between theoretical creativity and a model development that is guided by the principle of nested modeling, i. e. a new model should be related to (or include), at least, its own, direct precursors and be tested against the old data sets that motivated the construction of the old model before testing it against new ones. This implies specifying why the new model is more general, simpler, and/or descriptively/explanatorily more adequate than the old one. The approach chosen by Roberts and Sternberg (1993) provides an encouraging example in this regard. As far as models of word recognition are concerned, a good approximation to the approach of nested modeling is given by the different signal detection models of Broadbent (1967), Morton (1969), and Treisman (1978).

The policy being urged then is to grow models not wildly but in accord with a few general principles (Marr, 1982; McClelland, 1993; Stone & Van Orden, this issue; Rueckl & Kosslyn, 1992; Seidenberg, 1993a,b), and with a few pragmatic stratagems, such as: i) canonical modeling, ii) modeling functional overlap, and

iii) nested modeling, that is, a new model should either include the old one as a special case by providing formal demonstrations of the inclusion, or dismiss with it, after falsification of the core assumptions of the old model.

REFERENCES

Anderson, N. H. (1978). Arguments concerning representations for mental imagery. Psychological Review, 85, 249-277.

Andrews, S. (1989). Frequency and neighborhood size effects on lexical access: Activation or search? Journal of Experimental Psychology: Learning, Memory, and Cognition, 15, 802-814.

Andrews, S. (1992). Frequency and neighborhood effects on lexical access: Lexical similarity or orthographic redundancy? Journal of Experimental Psychology: Learning, Memory, and Cognition, 18, 234-254.

Bechtel, W. (1990). Multiple levels of inquiry in cognitive science. Psychological Research, 52, 271-281.

Bechtel, W., & Richardson, R. C. (1993). Discovering complexity: Decomposition and localization as strategies in scientific research. Princeton, NJ: Princeton University Press.

Besner, D., & McCann, R. S. (1987). Word frequency and pattern distortion in visual word identification and production: An examination of four classes of models. In M. Coltheart (Ed.), Attention & Performance XII: The psychology of reading (pp. 201-219). Hillsdale, NJ: Erlbaum.

Besson, M., Courrieu, P., Frenck-Mestre, C., Jacobs, A. M., & Pynte, J. (Eds.) (1992): Language Perception and Comprehension: Multi-disciplinary approaches, Marseille, France: CNRS-LNC.

Broadbent, D. E. (1987). Simple models for experimentable situations. In P. Morris (Ed.), Modelling Cognition (pp. 169 - 185). New York: John Wiley & Sons.

Brown, G. D. A. (1987). Resolving inconsistency: A computational model of word naming. Journal of Memory and Language, 26, 1-23.

Brown, R. (1987). Can brains make psychological sense of neurological data ? Behavioral and Brain Sciences, 10, 175-176.

Caramazza, A. (1990). Cognitive neuropsychology and neurolinguistics: Advances in models of cognitive function and impairment. Hillsdale, NJ: Erlbaum.

Caramazza, A. (1992). Is cognitive neuropsychology possible? Journal of Cognitive Neuropsychology, 4, 80-95.

Caramazza, A., & Miceli, G. (1989). Orthographic structure, the graphemic buffer and the spelling process. In C. von Euler, I. Lundberg & G. Lennerstrand (Eds.), Brain and reading (pp. 257-268). London: Macmillan.

Carr, T. H. (1986). Perceiving visual language. In K. R. Boff, L. Kaufman & J. P. Thomas (Eds.), Handbook of perception and human performance (pp. 29.1-29.82), N.Y.: Wiley.

Carr, T. H. (1992). Automaticity and cognitive anatomy: is word recognition "automatic" ?. American Journal of Psychology, 105, 201-237.

Carr, T. H., Davidson, B. J., & Hawkins, H. L. (1978). Perceptual flexibility in word recognition: Strategies affect orthographic computation but not lexical access. Journal of Experimental Psychology: Human Perception and Performance, 4, 674-690.

Carr, T. H., & Pollatsek, A. (1985). Recognizing printed words: A look at current models. In D. Besner, T. G. Waller, & G. E. MacKinnon (Eds.), Reading research: Advances in theory and practice 5 (pp. 1-82). Orlando, FL: Academic Press.

Changeux, J. P. (1983). L'homme neuronal. Paris: Fayard.

Chiang, C. (1978). Gross vision of a word enhances the perceptibility of its component letters: a model. Vision Research, 18, 1599-1600.

Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge, MA: MIT Press.

Churchland, P. S., & Seijnowski, T. J. (1992). The computational brain. Cambridge, MA: MIT Press.

Collyer, C. E. (1985). Comparing strong and weak models by fitting them to computer-generated data. Perception & Psychophysics, 38, 476-481.

Coltheart, M. (1978). Lexical access in simple reading tasks. In G. Underwood (ed.), Strategies of information processing (pp. 151-216). London: Academic Press.

Coltheart, M., Curtis, B., Atkins, P., & Haller, M. (1993). Models of reading aloud: dual-rout approaches and parallel-distributed-processing approaches. Psychological Review, 100, 589-608.

Coltheart, M., Davelaar, E., Jonasson, J. T., & Besner, D. (1977). Access to the internal lexicon. In S. Dornic (Ed.), Attention and Performance VI. London: Academic Press.

Coltheart, M., & Rastle, K. (1994). Serial processing in reading aloud: Evidence for dual-route models of reading. Journal of Experimental Psychology: Human Perception and Performance, 20, 1197-1211.

Crick, F. H. C., & Asanuma, C. (1986). Certain aspects of the anatomy and physiology of the cerebral cortex. In D. E. Rumelhart, J. L. McClelland & the PDP Research Group (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition (Vol. 1, pp. 219-232). Cambridge, MA: MIT Press.

Cutting, J. E., Bruno, N., Brady, N. P., & Moore, C. (1992). Selectivity, scope, and simplicity of models: A lesson from fitting judgements of perceived depth. Journal of Experimental Psychology: General, 121, 364-381.

Dagenbach, D., & Carr, T. (1994). Inhibitory processes in perceptual recognition: Evidence for a center-surround attentional mechanism. In D. Dagenbach & T. Carr (Eds.), Inhibitory processes in attention, memory, and language (pp. 327-357). New York: Academic Press.

Dalenoort, G. J. (1990). Towards a general theory of representation. Psychological Research, 52, 229-237.

Dell, G. (1988). The retrieval of phonological forms in production: Tests of predictions from a connectionist model. Journal of Memory and Language, 27, 124-142.

Dennett, D. C. (1987). Three kinds of intentional psychology; In D. C. Dennet (Ed.), The intentional stance. Cambridge, MA: MIT Press.

Dretske, F. I. (1988). Explaining behavior. Reasons in a world of causes. Cambridge, MA: MIT Press.

Eggermont, J. J. (1990). The correlative brain. Berlin: Springer.

Ellis, A. W., & Young, A. W. (1988). Human cognitive neuropsychology. Hillsdale, NJ: Erlbaum.

Estes, W. K. (1975). Some targets for mathematical psychology. Journal of Mathematical Psychology, 12, 263-282.

Estes, W. K. (1988). Toward a framework for combining connectionist and symbol-processing models. Journal of Memory and Language, 27, 196-212.

Estes, W. K. (1991). Cognitive architectures from the standpoint of an experimental psychologist. Annual Review of Psychology, 42, 1-28.

Feldman-Stewart, D., & Mewhort, D. J. K. (1994). Learning in small connectionist networks does not generalize to large networks. Psychological Research, 56, 99-103.

Forster, K. I. (1976). Accessing the mental lexicon. In R.J. Wales & E.W. Walker (Eds.), New approaches to language mechanisms (pp. 257-287). Amsterdam: North-Holland.

Forster, K. I. (1994). Computational modeling and elementary process analysis in visual word recognition. Journal of Experimental Psychology: Human Perception and Performance, 20, 1292-1310.

Forster, K. I., & Davis, C. (1984). Repetition priming and frequency attenuation in lexical access. Journal of Experimental Psychology: Learning, Memory and Cognition, 10, 680-690.

Gardner, M. (1994). Great essays in science. New York: Prometheus Books.

Gazzaniga, M. (1985). The social brain. New York: Basic Books.

Gigerenzer, G., & Murray, D. J. (1987). Cognition as intuitive statistics. Hillsdale, NJ: Erlbaum.

Gleick, J. (1987). Chaos. New York: Penguin Books.

Gluck, M. A., & Bower, G. H. (1988). Evaluating an adaptive network model of human learning. Journal of Memory and Language, 27, 166-195.

Glushko, R. J. (1979). The organization and activation of orthographic knowledge in reading aloud. Journal of Experimental Psychology: Human Perception and Performance, 5, 674-691.

Goschke, T., & Koppelberg, D. (1990). Connectionist representation, semantic compositionality, and the instability of concept structure. Psychological Research, 52, 253-270.

Grainger, J. (1990). Word frequency and neighborhood frequency effects in lexical decision and naming. Journal of Memory & Language, 29, 228-244.

Grainger, J., & Jacobs, A. M. (1993a). Masked partial-word priming in visual word recognition: effects of positional letter frequency. Journal of Experimental Psychology: Human Perception and Performance, 19, 951-964.

Grainger, J., & Jacobs, A. M. (1993b). Modeling neighborhood effects in lexical decision. Paper presented at the 34th Annual Meeting of The Psychonomic Society, Washington, DC.

Grainger, J., & Jacobs, A. M. (1994). A dual-read out model of word context effects in letter perception: Further investigations of the word superiority effect. Journal of Experimental Psychology: Human Perception and Performance, 20, 1311-1334.

Grainger, J., & Jacobs, A. M. (1996). Orthographic processing in visual word recognition: A multiple read-out model. Psychological Review, 103, 518-565.

Grainger, J., O'Regan, J. K., Jacobs, A. M., & Segui, J. (1992). Neighborhood frequency effects and letter visibility in visual word recognition. Perception & Psychophysics, 51, 49-56.

Grainger, J., O'Regan, J. K., Jacobs, A. M., & Segui, J. (1989). On the role of competing word units in visual word recognition: the neighborhood frequency effect. Perception & Psychophysics, 45, 189-195.

Grossberg, S. (1982). Studies in Mind and Brain. Dordrecht, NL: D. Reidel.

Grossberg, S. (1984). Unitization, automaticity, temporal order, and word recognition. Cognition and Brain Theory, 7, 263-283.

Grossberg, S., & Stone, G. O. (1986). Neural dynamics of word recognition and recall: attentional priming, learning, and resonance. Psychological Review, 93, 46-74.

Hawkins, H. L., Reicher, G. M., Rogers, M., & Peterson, L. (1976). Flexible coding in word recognition. Journal of Experimental Psychology: Human Perception and Performance, 2, 380-385.

Hatfield, G. (1990). Gibsonian representations and connectionist symbol processing: Projects for unification. Psychological Research, 52, 243-252.

Heller, D., & Jacobs, A. M. (1993). Zur Rolle von Buchstaben bei der Worterkennung: Bestimmung des Einflusses visueller, lexikalischer und okulomotorischer Faktoren. Forschungsbericht bei der DFG, Az.: He 1192/2-1.

Hempel, C. G. (1965). Aspects of scientific explanation. New York: Free Press.

Henderson, L. (1987). Word recognition: A tutorial review. In M. Coltheart (Ed.), Attention & Performance XII: The psychology of reading (pp. 171-200). Hillsdale, NJ: Erlbaum.

Hertz, J., Krogh, A., & Palmer, R. G. (1991). Introduction to the theory of neural computation. Redwood City, CA: Addison Wesley.

Hinton, G. E., & Shallice, T. (1991). Lesioning an attractor network: Investigations of acquired dyslexia. Psychological Review, 98, 74-95.

Humphreys, M. S., Wiles, J., & Dennis, S. (1994). Toward a theory of human memory: Data structures and access processes. Behavioral and Brain Sciences, in press.

Jacobs, A. M. (1994). On computational theories and multilevel, multitask models of cognition: The case of word recognition. Behavioral and Brain Sciences, 17, 670-672.

Jacobs, A. M., & Grainger, J. (1991). Automatic letter priming in an alphabetic decision task. Perception & Psychophysics, 49, 43-52.

Jacobs, A. M., & Grainger, J. (1992). Testing a semi-stochastic variant of the interactive activation model in different word recognition experiments. Journal of Experimental Psychology: Human Perception and Performance, 18, 1174-1188.

Jacobs, A. M., Grainger, J., & Nazir, T. (1994). How much does context matter in letter perception ?: A parametric analysis of accuracy and latency data. Manuscript submitted for publication.

Jacobs, A. M., Heller, D., & Nazir, T. A. (1992). Möglichkeiten einer experimentellen Dyslexieforschung auf der Basis der aktuellen Lesepsychologie. Schweizerische Zeitschrift für Psychologie, 51, 26-42.

Jared, D., McRae, K., & Seidenberg, M. S. (1990). The basis of consistency effects in word naming. Journal of Memory and Language, 29, 687-715.

Johnston, J. C. (1978). A test of the sophisticated guessing theory of word perception. Cognitive Psychology, 10, 123-153.

Johnston, J. C., & McClelland, J. (1980). Experimental tests of a hierarchical model of word recognition. Journal of Verbal Learning and Verbal Behavior, 19, 503-524.

Kawamoto, A. H. (1993). Nonlinear dynamics in the resolution of lexical ambiguity: A parallel distributed processing account. Journal of Memory and Language, 32, 474-516.

Kawamoto, A. H., Farrar, W. T., & Kello, C. (1994). When two meanings are better than one: Modeling the ambiguity advantage using a recurrent distributed network. Journal of Experimental Psychology: Human Perception and Performance, ????.

Kay, J., & Bishop, D. (1987). Anatomical differences between nose, palm, and foot, or the body in question. In M. Coltheart (Ed.), Attention & Performance XII: The psychology of reading (pp. 449-469). London: Erlbaum.

Kellas, G., Ferraro, F. R., & Simpson, G. B. (1988). Lexical ambiguity and the timecourse of attentional allocation in word recognition. Journal of Experimental Psychology: Human Perception and Performance, 14, 601-609.

Klir, G. J., & Folger, T. A. (1988). Fuzzy sets, uncertainty, and information. New York: Prentice-Hall.

Kosslyn, S. M., & Intriligator, J. M. (1992). Is cognitive neuropsychology plausible ? The perils of sitting on a one-legged stool. Journal of Cognitive Neuroscience, 4, 96-106.

Krause, W. (1989). Über menschliches Denken - Denken als Ordnungsbildung. Zeitschrift für Psychologie, 197, 1-30.

Kugler, P. N., Shaw, R. E., Vincente, K. J, & Kinsella-Shaw, J. (1990). Inquiry into intentional systems I: Issues in ecological physics. Psychological Research, 52, 98-121.

Kutas, M., & Van Petten, C. (1988). Event-related brain potential studies of language. Advances in psychophysiology, 3, 139-187.

Lakoff, G. (1987). Women, fire, and dangerous things: What categories reveal about the mind. Chicago, IL.: University of Chicage Press.

Luce, R. D. (1959). Individual choice behavior. N.Y.: Wiley.

Lukatela, G., & Turvey, M. T. (1994). Visual lexical access is initially phonological: Evidence from associative priming by words, homophones, and pseudohomophones. Journal of Experimental Psychology: General, in press.

MacKay, D. G. (1988). Under what conditions can theoretical psychology survive and prosper ? Integrating the rational and empirical epistemologies. Psychological Review, 93, 559-565.

Marr, D. (1982). Vision. San Francisco: Freeman.

Massaro, D. W. (1987). Speech perception by ear and eye: A paradigm for psychological inquiry. Hillsdale, NJ: Erlbaum.

Massaro, D. W. (1992). Understanding mental processes through modeling: possibilities and limitations. In M. Besson, P. Courrieu, C. Frenck-Mestre, A. M. Jacobs, & J. Pynte (Eds.): Language Perception and Comprehension: Multi-disciplinary approaches, (pp. 17-18), Marseille: CNRS-LNC.

Massaro, D. W., & Cohen, M. M. (1991). Integration versus interactive activation: the joint influence of stimulus and context in perception. Cognitive Psychology, 23, 558-614.

Massaro, D. W., & Cowan, N. (1993). Information processing models: Microscopes of the mind. Annual Review of Psychology, 44, 383-425.

Massaro, D. W., & Friedman, D. (1990). Models of integration given multiple sources of information. Psychological Review, 97, 225-252.

Massaro, D. W., Weldon, M. S., & Kitzis, S. N. (1991). Integration of orthographic and semantic information in memory retrieval. Journal of Experimental Psychology: Learning, Memory, and Cognition, 17, 277-287.

McClelland, J. L. (1979). On the time relations of mental processes: an examination of systems of processes in cascade. Psychological Review, 86, 287-330.

McClelland, J. L. (1993). Toward a theory f information processing in graded, random, and interactive networks. In: D. E. Meyer & S. Kornblum (Eds.), Attention & Performance XIV: Synergies in Experimental Psychology, Artificial Intelligence, and Cognitive Neuroscience (pp. 655-688). Cambridge, MA: MIT Press.

McClelland, J. L. & Rumelhart, D. E. (1981). An interactive activation model of context effects in letter perception: Part I. An account of basic findings. Psychological Review, 88, 375-407.

McCloskey, M. (1991). Networks and theories: The place of connectionism in cognitive science. Psychological Science, 2, 387-395.

McNamara, T. P. (1992). Priming and constraints it places on theories of memory and retrieval. Psychological Review, 99, 650-662.

Mewhort, D. J. K. (1990). Alice in wonderland, or psychology among the information sciences. Psychological Research, 52, 158-162.

Mewhort, D. J. K., & Johns, E. E. (1988). Some tests of the interactive activation model for word recognition. Psychological Research, 50, 135-147.

Mewhort, D. J. K., Braun, J. G., & Heathcote, A. (1992). Response time distributions and the stroop task: A test of the Cohen, Dunbar, and McClelland (1990) model. Journal of Experimental Psychology: Human Perception and Performance, 18, 872-882.

Millis, M. L., & Button, S. B. (1989). The effect of polysemy on lexical decision time: Now you see it, now you don't. Memory & Cognition, 17, 141-147.

Morton, J. (1969). Interaction of information in word recognition. Psychological Review, 76, 165-178.

Morton, J., & Patterson, K. E. (1980). A new attempt at an interpretation, or, an attempt at a new interpretation (pp. 91-118). In M. Coltheart, K. E. Patterson, & J. C. Marshall (Eds.), Deep dyslexia. London: Routledge and Kegan Paul.

Murre, J. M., Phaf, H. R., & Wolters, G. (1992). CALM: Categorizing and learning module. Neural Networks, 5, 55-82.

Neumann, O. (1990). Lexical access: Some comments on models and metaphors. In D. A. Balota, G. B. Flores d'Arcais, & K. Rayner (Eds.), Comprehension processes in reading. Hillsdale, NJ: Erlbaum.

Newell, A. (1990). Unified theories of cognition. Cambridge, MA: Harvard University Press.

Nicolis, G., & Prigogine, I. (1989). Exploring complexity. New York: Freeman & Company.

Norris, D. (1994). A quantitative model of reading aloud. Journal of Experimental Psychology: Human Perception and Performance, ?????.

Olson, A., & Caramazza, A. (1991). The role of cognitive theory in neuropsychological research. In F. Boller & G. Gratman (Eds.), Handbook of Neuropsychology (pp. 287-309). Amsterdam: Elsevier.

Paap, K. (1992). Chickens and eggs: data-driven modeling or model-driven experimentation. In M. Besson, P. Courrieu, C. Frenck-Mestre, A. M. Jacobs, & J. Pynte (Eds.), Language Perception and Comprehension: multi-disciplinary approaches, (pp. 21-22). Marseille: CNRS-LNC.

Paap, K., Newsome, S. L., McDonald, J. E., & Schvaneveldt, R. W. (1982). An activation-verification model for letter and word recognition: the word superiority effect. Psychological Review, 89, 573-594.

Paap, K., & Johansen, L. (1994). The case of the vanising frequency effect: A retest of the verification model. Journal of Experimental Psychology: Human Perception and Performance, ????.

Patterson, K. E., & Morton, J. (1985). From orthography to phonology: An attempt at an old interpretation. In K. E. Patterson, J. C. Marshall & M. Coltheart (Eds.), Surface dyslexia: Neuropsychological and cognitive studies of phonological reading (pp. 15-34). London: Erlbaum.

Penrose, R. (1989). The emperor's new mind: On computers, minds, and the laws of physics. New York: Oxford University Press.

Pinker, S., & Prince, A. (1988). On language and connectionism: Analysis of a parallel distributed processing model of language acquisition. Cognition, 28, 73-193.

Plaut, D. & McClelland (1993). Generalization with componential attractors: Word and nonword reading in an attractor network. Proceedings of the 15th Annual Conference of the Cognitive Science Society (pp. 824-829). Hillsdale, NJ: Erlbaum.

Popper, K. R. (1935). Logik der Forschung. Vienna: Springer.

Posner, M. I., & Carr, T. H. (1992). Lexical access and the brain: anatomical constraints on cognitive models of word recognition. American Journal of Psychology, 105, 1-26.

Posner, M. I., & McCandliss, B. D. (1993). Converging methods for investigating lexical access. Psychological Science, 4, 305-309.

Pylyshyn, Z. W. (1979). Validating computational models: A critique of Anderson's indeterminacy of representation claim. Psychological Review, 86, 383-394.

Pylyshyn, Z. W. (1989). Computing in cognitive science. In M. I. Posner (Ed.), Foundations of cognitive science (pp. 49-91). Cambridge, Mass.: MIT Press.

Ratcliff, R. (1978). A theory of memory retrieval. Psychological Review, 85, 59-108.

Ratcliff, R. (1990). Connectionist models of recognition memory: Constraints imposed by learning and forgetting functions. Psychological Review, 97, 285-308.

Reicher, G. M. (1969). Perceptual recognition as a function of meaningfulness of stimulus material. Journal of Experimental Psychology, 81, 274-280.

Richman, H. B., & Simon, H. A. (1989). Context effects in letter perception: comparison of two theories. Psychological Review, 96, 417-432.

Ritter, H. (1990). Self-organizing maps for internal representations. Psychological Research, 52, 128-136.

Roberts, S., & Sternberg, S. (1993). The meaning of additive reaction-time effects: tests of three alternatives. In D. E. Meyer & S. Kornblum (Eds.), Attention & Performance XIV: Synergies in Experimental Psychology, Artificial Intelligence, and Cognitive Neuroscience (pp. 611-653). Cambridge, MA: MIT Press.

Rosen, R. (1978). Fundamentals of measurement and representation of natural systems. Amsterdam: North-Holland.

Rosenberg, J. F. (1990). Treating connectionism properly: Reflections on Smolensky. Psychological Research, 52, 163-174.

Rozin, P., & Gleitman, L. (1977). The structure and aquisition of reading II: The reading process and the aquisition of the alphabetic principle. In A. Reber & D. Scarborough (Eds.), Toward a psychology of reading (pp. 55-141). Hillsdale, NJ: Erlbaum.

Rueckl, J. G., Cave, K. R., & Kosslyn, S. M. (1989). Why are "What" and "Where" processed by two cortical visual systems ? A computational investigation. Journal of Cognitive Neuroscience, 1, 171-186.

Rueckl, J. G., & Kosslyn, S. M. (1992). What good is connectionist modeling ? A dialogue. In A. F. Healy, S. M. Kosslyn & R. M. Shiffrin (Eds.), From learning theory to connectionist theory (pp. 249-265). Hillsdale, NJ: Erlbaum.

Rumelhart, D. E. (1989). The architecture of mind: A connectionist approach. In M. I. Posner (Ed.), Foundations of cognitive science (pp. 133-159). Cambridge, Mass.: MIT Press.

Rumelhart, D. E., & Siple, P. (1974). The process of recognizing tachistoscopically presented words. Psychological Review, 81, 99-118.

Rumelhart, D. E., & McClelland, J. L. (1982). An interactive activation model of context effects in letter perception: Part II. The contextual enhancement effect and some tests and extensions of the model. Psychological Review, 89, 60-94.

Savage-Rumbaugh, E. S. (1986). Ape language. From conditioned response to symbol. New York: Columbia University Press.

Sears, C. R., Hino, Y, & Lupker, S. J. (1994). Neighborhood frequency and neighborhood size effects in word recognition. Journal of Experimental Psychology: Human Perception and Performance, in press.

Seidenberg, M. S. (1988). Cognitive neuropsychology and langauge: The state of the art. Cognitive Neuropsychology, 5, 403-426.

Seidenberg, M. S. (1993a). Connectionist models and cognitive theory. Psychological Science, 4, 228-235.

Seidenberg, M. S. (1993b). A connectionist modeling approach to word recognition and dyslexia. Psychological Science, 4, 299-304.

Seidenberg, M. S. & McClelland, J. L. (1989). A distributed developmental model of word recognition and naming. Psychological Review, 96, 523-568.

Seidenberg, M. S., Plaut, D. C., Petersen, A. S., McClelland, J. L., & McRae, K. (1994). Nonword pronounciation and models of word recognition. Journal of Experimental Psychology: Human Perception & Performance, ?????.

Seidenberg, M. S., Waters, G. S., Barnes, M. A., & Tanenhaus, M. K. (1984). When does irregular spelling or pronunciation influence word recognition ? Journal of Verbal Learning and Verbal Behavior, 23, 383-404.

Segui, J., & Grainger, J. (1990). Priming word recognition with orthographic neighbors: effects of relative prime-target frequency. Journal of Experimental Psychology: Human Perception and Performance, 16, 65-76.

Simon (1969). The Sciences of the artificial. Cambridge, MA: MIT Press.

Simpson, G. B., & Kang, H. (1994). Inhibitory processes in the recognition of homograph meanings. In D. Dagenbach & T. Carr (Eds.), Inhibitory processes in attention, memory, and language (pp. 359-381). New York: Academic Press.

Sloman, S. A., & Rumelhart, D. E. (1992). Reducing interference in a distributed memory model. In A. F. Healy, S. M. Kosslyn & R. M. Shiffrin (Eds.), From learning theory to connectionist theory (pp. 227-248). Hillsdale, NJ: Erlbaum.

Smolensky, P. (1988). On the proper treatment of connectionism. Behavioral and Brain Sciences, 11, 1-74.

Snodgrass, J.G. & Mintzer, M. (1993). Neighborhood effects in visual word recognition: Facilitatory or inhibitory? Memory and Cognition, 21, 247-266.

Spielberg, N., & Anderson, B. D. (1985). Seven ideas that shook the universe. New York: Wiley.

Sternberg, S. (1975). Memory scanning: new findings and current controversies. Quarterly Journal of Experimental Psychology, 27, 1-32.

Stone, G. O., & Van Orden, G. C. (1993). Strategic control of processing in word recognition. Journal of Experimental Psychology: Human Perception and Performance, 19, 744-774.

Stone, G. O., & Van Orden, G. C. (1994). Building a resonance framework for word recognition using design principles and system principles. Journal of Experimental Psychology: Human Perception and Performance, #, ???.

Swinney, D. (1981). The process of langauge comprehension: An approach to examining issues in cognition and language. Cognition, 10, 307-312.

Taft, M. (1991). Reading and the mental lexicon. Hillsdale, N. J.: Erlbaum.

Thagard (1988). Computational Philosophy of Science. Cambridge, MA: MIT Press.

Treisman, M. (1978). A theory of the identification of complex stimuli with an application to word recognition. Psychological Review, 85, 525-570.

Uttal, W. R. (1990). On some two-way barriers between models and mechanisms. Perception & Psychophysics, 48, 188-203.

Van Orden, G. C., & Goldinger, S. D. (1994). The interdependence of form and function in cognitive systems explains perception of printed words. Journal of Experimental Psychology: Human Perception and Performance, ?????.

Van Orden, G. C., Johnston, J. C., & Hale, B. L. (1988). Word identification in reading proceeds from spelling to sound to meaning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14, 371-385.

Van Orden, G. C., Pennington, B. F., & Stone, G. O. (1990). Word identification in reading and the promise of subsymbolic psycholinguistics. Psychological Review, 97, 488-522.

Van Leeuwen, C. (1990). Perceptual-learning systems as conservative structures: Is economy an attractor ?, Psychological Research, 52, 145-152.

Von Wright, G. H. (1971). Explanation and Understanding. New York: Cornell University Press.

Weinberg, S. (1992). Dreams of a final theory. New York: Vintage Books.

FOOTNOTES

NOTE 1: We adopt the terminology proposed by Marr (1982) and call simulation models algorithmic rather than computational, reserving the latter term for the highest-level description in Marr's framework (see Jacobs, 1994).

NOTE 2: Broadbent (1987) gives a striking example for "the inadequacy of verbal models" by demonstrating how a single, very simple algorithmic model can unify four patterns of results from visual and memory search, believed to require different mechanisms to explain them.

NOTE 3: The merit of the Copernican model of planetary motion was not truth but simplicity. Thanks to the heliocentric model motions that heretofore seemed very complicated became very simple and although quantitatively the model fared less well than the geocentric model, its simplicity eventually led to Kepler's well known three laws (Spielberg & Anderson, 1985).

NOTE 4: Falsificationism and strong inference is one research strategy among many, and its drawbacks have been discussed elsewhere (e. g. Newell, 1990). However, as Massaro and Cowan (1993) point out, it may still be the best game in town as long as we are concerned with specific assumptions that can be tested at the level of fine-grained analysis (e. g. item analysis). The strong inference strategy is also useful to the extent that we arrive at constructing models that are different by one, or a few identifiable properties that play a transparent role in the model's performance, such as the presence/absence of lexical feedback (Grainger & Jacobs, 1993a; this issue; Jacobs & Grainger, 1992).

NOTE 5: The terms auxiliary and ad hoc assumption are used here as synonyms for a hypothesis or algorithm that serves to explain no more effects than the one it was introduced to explain. It should be useful to note that the fact that a model includes an ad hoc assumption does not imply that the model is useless. Famous examples from physics teach us otherwise. One is the inverse-square law in Newton's theory of gravitation, which was de facto an arbitrary assumption until it became a genuine law in Einstein's more rigidly formulated theory (Weinberg, 1992). Another nice example is Einstein's heuristic explanation of the photoelectric effect. Einstein called it heuristic, because he could not justify it from accepted fundamental principles, but it worked (it was experimentally verified nine years later).

NOTE 6: One reason for this are standard arguments advanced by "connectionists" themselves (e. g. Crick and Asanuma, 1986). The latter authors list five devices used in connectionist models that, if interpreted literally, are not justified by the available neurobiological evidence. Device one, which is part of the models belonging to the resonance and interactive activation families, violates Dale's law (Murre et al., 1992). Device three, which is part of most of the distributed and resonance models, conflicts with the fact that about 104 connections are available to each neuron in the human brain, whereas about 1011 would be necessary to justify full connectivity (Murre et al., 1992). A second reason is that with regard to the connectionist models we use in developing a theory of basic processes in reading, we adopt a weak position of "simulationism", as opposed to one of "computational realism" (Lakoff, 1987; Pylyshyn, 1989). That is, although we find that algorithmic/simulation models, whether connectionist or not, can be useful in theory development (Jacobs, 1994), we make no commitment about the mind as being algorithmic in nature (cf. Penrose, 1989).

APPENDIX

The model developed by Plaut and McClelland (1993) in response to the multiple problems of the original model (see Coltheart et al., 1993, for a review of these problems), is a compromise between a quasi-localist and a fully distributed connectionist model. It exhibits several new features with regard to the 1st generation developmental model of Seidenberg and McClelland (1989). Since these new features are essential for understanding the discussion of the similarities and differences between the dual-route cascaded and the parallel distributed processing model, we describe them in some detail.

Interactivity and componential attractors. Interactivity involves bidirectional connections between all 57 phoneme, 100 hidden, and 108 orthographic units. The old developmental model (Seidenberg & McClelland, 1989) was only weakly interactive in that there was no feedback between the orthographic and phonological units. Componential attractors are attractors that have substructure reflecting common grapheme-phoneme correspondences, and thus can be considered improved or simplified orthographic and phonological representations.

Position-specific grapheme and phoneme units. This is where the new model becomes quasi-localist, and quasi-rule based. Plaut and McClelland (1993) made the relevant relationships between input and output explicit in the network's structure. Thus the network is given a structure that is sensitive to grapheme-phoneme correspondence rules. This evidently helps the network to learn the task more easily and generalize better (e. g. from words to nonwords). That is, it avoids one of the problems with strong parallel distributed processing models, which may be called the dispersion problem (Plaut & McClelland, 1993; the term "distribution problem" is avoided ?). The fact that information relevant to a particular phoneme in a particular position was distributed over different units, led to some of the problems encountered with the original simple feedforward, strong parallel distributed processing network. In the new interactive/partly componential network, each unit represents a particular grapheme or phoneme within one of three positions (onset, nucleus, coda). Thus, we have a compromise here between the fully localist representational scheme, as used in interactive activation type models, and the fully distributed scheme, as used in distributed models.

An augmented supervised learning algorithm. In this variant of back-propagation, a different learning rate is used for each connection. This solves problems due to the dependency of the efficiency of the backpropagation algorithm on phases of the learning period: the best values of the learning rate parameter during the initial phase may not be so good during the terminal phase.

Additional "exception" units. Three exception units were included in the network to handle the exceptional cases where affricates (/ps/, /ks/, /ts/) do not follow the phonotactic rules/constraints for monosyllabic english words. During the training procedure, involving 2897 + 101 monosyllabic words, whenever these affricate units were activated, their component parts were also activated (by the experimenters; Plaut & McClelland, 1993).

Grapheme-phoneme correspondence training. Because "children typically are taught them in the course of learning to read", the training corpus also included patterns consisting of each grapheme in isolation and the corresponding phoneme(s) (Seidenberg et al., this volume). It is useful to note that this stands in sharp contrast with widely accepted theoretical analyses suggesting that children do not and cannot learn grapheme-phoneme correspondences this way (Rozin & Gleitman, 1977).