School of Psychology, Birkbeck College|
LEARNING IN NEURAL NETWORKS/ CONNECTIONISM.
For other Notes see the Easter Handout for Summer Term Lectures
Thorndike called himself a connectionist — is this just a co-incidence, or can comparisons be
made between modern accounts of neural networks and previous theories of animal learning?
Thorndike called himself a connectionist — is this just a co-incidence, or can comparisons be made between modern accounts of neural networks and previous theories of animal learning?
No 9 on the March 15th list
(NB “Parallel Distributed Processing”, “PDP”, “Connectionism”, “Neo-connectionism”, “New
connectionism, “Neural Networks” and “Neural Network Simulations” can be used almost
synonymously. The terms refer to theories about, and demonstrations of, the effects of training systems in
which large numbers of simple processing units interact only via positive or negative connections
[page 1 of wk 12 handout]Very basic points
[page 1 of wk 12 handout]Further notes
The theme of the earlier draft notes is similarities and differences between recent connectionist theories and associative theories of animal learning. Before getting on to this we should consider the main thrust of new connectionist theories, which is to give accounts of specifically human cognitive processing (using Rumelhart and McClelland, 1986a, on past-tense learning, as an example).
“Connectionism is ‘in’. Not since the Dark Ages of the pre- Chomskyan era have we seen so much interest in associationist models of human thinking. Streaming forth from their banishment in the Skinnerian dungeons are dozens of detailed computational models based on the new language of networks, nodes, and connections.” (from MacWhinney, B. and Leinbach, J. ,1991) This is from a paper on simulations of past-tense learning, which was one of the topics in the 1986 two volume work which attracted the most intense criticism (Rumelhart and McClelland, 1986a). Without going into any detail, it is possible to see from the claims and stated goals of this 1986 chapter why it attracted, and still attracts, so much attention.
[bottom of page 1 of wk 12 handout]1. The fact that the acquisition of English as a first language includes a stage at which children make errors by supplying regular past tense endings for irregular verbs they had initially used correctly (e.g. “goed”, “comed”, or “camed”), and can generate a regular past tense for an invented word, had been used to support than children make use of explicit inaccessible rules, which they discover through the use of a special purpose innately given language acquisition device.
2. Rumelhart and McClelland (1986a) ended up by directly challenging this for the past tense in particular and all other language processing more generally. —
“We have shown that a reasonable account of the acquisition of the past tense can be provided without recourse to the notion of a ‘rule’ as anything more than a description of the language..... The child need not figure out what the rules are, or even that there are rules.”
“We view this work .... as a step toward a revised understanding of language knowledge, language acquisition, and linguistic information processing in general!
[page 2 of handout]
3. Many of the details used by Rumelhart and McClelland (1986a) are not relevant to this overall conclusion since they have been changed in subsequent simulations (e.g. McWhinney and Leinbach, 1991; Plunkett and Marchman, 1991; Plunkett and Juola, 1999; Joanisse & Seidenberg, 1999, 2005.
Basic points in Rumelhart and McClelland (1986a) are:
[top of page 3 of WEEK 12 handout]Criticisms of Connectionist claims
There have been many detailed and lengthy attacks on the claims made by Rumelhart et al (1986) and others subsequently. (e.g. Fodor and Pylyshyn, 1988; Pinker & Ullman, 2002, 2003). For present purposes they can be condensed to the two points examined by Kaplan et al (1992):
1. “Connectionism is merely a naive, computerized revival of behaviourism.”
2. “Connectionist models are fundamentally associationist in nature, and this severely limits their cognitive potential.” (pp 91-2)
Quinlan (1991) has a short section on New connectionism and human reasoning (pp 262-5) in which he reviews the criticism that connectionist networks cannot exhibit the systematicity which is characteristic of the human understanding of sentences, and some forms of animal cognition (Fodor and Pylyshyn, 1988). The term “systematicity” is related to the concept of rule-learning, and Quinlan uses the example of the difference in rule-learning ability that apparently exists between corvids and pigeons, as discussed by Mackintosh (1988), who concluded that “associations alone do not generate rules.”
Thus, stimulus-response theories of animal learning (Thorndike, 1898; Hull, 1943; Spence, 1937) and direct input-output neural network models, have been subjected to the same kind of criticism, that they do not capture cognitive processes such as abstraction, rule-following and the use of cognitive maps.
The criticism is particular acute for the case of human language —
(Quinlan, 1991; p. 193: my italics)
[bottom of page 3 of handout]
[top of page 4 of handout]
Kinds of Learning in Connectionist Models
[top of page 4 of handout]Unsupervised learning
Typically examples use Hebb rules of association by contiguity, and are able to capture regularities in repeated inputs. At the behavioural level an example is habituation, where a certain stimulus is repeated and comes to be recognized: there is no external feedback for a “right” or a “wrong”. The connectionist equivalent is an “auto-associative network”. In these the same pattern is presented at both the input and the output stages of a pattern associator (Quinlan, p.52; see overhead), and eventually the network can complete the pattern if only a partial input is given.
Simple kinds of Pavlovian conditioning can also be regarded as unsupervised learning:
the connectionist equivalent is when a pattern associator is given pairs of different patterns at the
“input” and “output” stages, and can subsequently reproduce the output pattern when given just
the input. This is unsupervised in the sense of lacking external feedback for right or wrong
responses (it is not sensitive to goals).
[middle of page 4 of handout]Supervised learning
A. The “Delta Rule” (Lieberman, 2000; pp. 522-523: Quinlan, 1991; pp. 55-
Supervised learning involves methods of changing the strength (or
weight) of connections between input and output units that are more complicated that
the Hebbian rule of contiguity (or “co-activation” — when two units are active at the same time
the weight of the connection between them is increased.)
[bottom of page 4 of handout]
B. Back-propagation (Quinlan, 1991; pp. 56-
For present purposes the back-propagation method can be regarded as an elaboration of the delta rule for the purpose of supervising learning in “multi-layer” nets, where there is at least one layer of “hidden unit” which intervene between the input and output units. The important points for comparison with ideas derived from studies of animal (or human) learning are:
1. Back-propagation is very widely used in connectionist modelling.
1. For connectionist modellers, back-propagation in multi-layer
units is good since it can do things not possible for the delta-rule with direct input-output.
[bottom of page 5 of wk 12 handout]Reinforcement learning
In these procedures external feedback is only given globally, to distinguish “right” from “wrong” outputs.
Hinton (1989) notes that there is a large literature on this topic “beyond the scope of this paper”. I.e. not much use was being made of reinforcement procedures in connectionist simulations of learning. There is a technical problem called “credit assignment”: if a reasonably large network produces a correct output which local connections are responsible? This is potentially solvable, and it is not clear that reinforcement methods could not in principle be made more use of.
The fact that more is made of the effects of reward and punishment in analyses of biologically “real” learning may be related to the involvement of motivational factors, which are not mimicked (so far) in neural network research. However, there is currently some interest in reinforcement learning in areas such as robotics (Dean, 1998; Colman et al., 2005 - abstract) and there was a special issue of the journal Machine Learning devoted to "Reinforcement Learning" (Kaelbling, 1996). Reinforcement learning may be used for practical purposes (Bingham, 2001; Franklin, 2007), or for simulating biologically realistic reward-related behaviours (Berthier et al, 2005; Hazy et al., 2006; Hampton & O'Doherty, 2007).
Main Sources — Animal Learning and Learning in Connectionist (Neural Network) Simulations (Week 12)
Lieberman, D. (1990/1993/2000) Learning: Behavior and Cognition. Belmont: Wadsworth. ("The Neural Network Solution": pp. 439-455 /1993 edition pp. 511-525; /2000, pp. 517-532)
Quinlan, P. (1991) Connectionism and Psychology. Harvester Wheatsheaf, Hemel Hempstead. Chapter 2 “Memory and Learning in Neural Networks” esp pp.51-56, pp.69-71, and pp. 262-266. [152 QUI & AKCHN(Qui).]
Walker, S.F. (1990/1992) A brief history of connectionism and it psychological implications. AI & Society 4, 17-38. (TIED XEROX/SLC) Or Walker, S.F. (1992) A brief history of connectionism and its psychological implications. In Clark, A. and Lutz, R. (eds) Connectionism in Context. Berlin: Springer-Verlag. 123-144. (BK library AKCHN [Cla] )
References (Not normally required for further reading)
Albright, A., & Hayes, B. (2003). Rules vs. analogy in English past tenses: a computational/experimental study. Cognition, 90(2), 119-161.
Baxt, W. G., Shofer, F. S., Sites, F. D., & Hollander, J. E. (2002). A neural network aid for the early diagnosis of cardiac ischemia in patients presenting to the emergency department with chest pain. Annals of Emergency Medicine, 40(6), 575-583.
Becktel, W. and Abrahamsen, A. (1991) Connectionism and the Mind: An Introduction to Parallel Processing in Networks. Oxford, Basil Blackwell. (AKCHN).
Berthier, N. E., Rosenstein, M. T., & Barto, A. G. (2005). Approximate optimal control as a model for motor learning. Psychological Review, 112(2), 329-346.
Bingham, E. (2001). Reinforcement learning in neurofuzzy traffic signal control. European Journal of Operational Research, 131(2), 232-241.
Christiansen, M.H. and Chater, N. (1999) Connectionist natural language processing: The state of the art. Cognitive Science, 23, 417-437.
Christiansen, M. H., Chater, N., & Seidenberg, M. S. (1999). Special issue - Connectionist models of human language processing: Progress and prospects. Cognitive Science, 23(4), 415-415.
Coleman, S. L., Brown, V. R., Levine, D. S., & Mellgren, R. L. (2005). A neural network model of foraging decisions made under predation risk. Cognitive Affective & Behavioral Neuroscience, 5(4), 434-451.
Colunga, E., & Smith, L. B. (2005). From the lexicon to expectations about kinds: A role for associative learning. Psychological Review, 112(2), 347-382.
Desai, R., Conant, L. L., Waldron, E., & Binder, J. R. (2006). FMRI of past tense processing: The effects of phonological complexity and task difficulty. Journal of Cognitive Neuroscience, 18(2), 278-297.
Elman, J. L. (2005). Connectionist models of cognitive development: where next? Trends in Cognitive Sciences, 9(3), 111-117.
Elman, JL, Bates, EA, Johnson, MH, Karmiloff-Smith A, Parisi, D. & Plunkett K. (1996) Rethinking Innateness: A connectionism perspective on development. London: MIT Press. (155.7 ELM in Bk Libary).
Fodor, J. & Pylyshyn, Z.W. (1988) Connectionism and cognitive architecture: a critical analysis. Cognition, 28, 3-71.
Franklin, J. A. (2006). Jazz melody generation using recurrent networks and reinforcement learning. International Journal on Artificial Intelligence Tools, 15(4), 623-650.
Garlick, D. (2002). Understanding the nature of the general factor of intelligence: The role of individual differences in neural plasticity as an explanatory mechanism. Psychological Review, 109(1), 116-136.
Gurney, K. (2007). Neural networks for perceptual processing: from simulation tools to theories. Philosophical Transactions of the Royal Society B-Biological Sciences, 362(1479), 339-353.
Hampton, A. N., & O'Doherty, J. P. (2007). Decoding the neural substrates of reward-related decision making with functional MRI. Proceedings of the National Academy of Sciences of the United States of America, 104(4), 1377-1382
Harm, M. W., & Seidenberg, M. S. (1999). Phonology, reading acquisition, and dyslexia: Insights from connectionist models. Psychological Review, 106(3), 491-528.
Hartshorne, J. K., & Ullman, M. T. (2006). Why girls say 'holded' more than boys. Developmental Science, 9(1), 21-32.
Hazy, T. E., Frank, M. J., & O'Reilly, R. C. (2006). Banishing the homunculus: Making working memory work. Neuroscience, 139(1), 105-118.
Herd, S. A., Banich, M. T., & O'Reilly, R. C. (2006). Neural mechanisms of cognitive control: an integrative model of stroop task performance and fMRI data. Journal of Cognitive Neuroscience, 18(1), 22-32.
Hinton, G.E. (1989) Connectionist learning procedures. Artificial Intelligence, 40, 185-234.
Hutzler, F., Ziegler, J. C., Perry, C., Wimmer, H., & Zorzi, M. (2004). Do current connectionist learning models account for reading development in different languages? Cognition, 91(3), 273-296.
Joanisse, M. F. (2004). Specific language impairments in children - Phonology, semantics, and the English past tense. Current Directions in Psychological Science, 13(4), 156-160.
Joanisse, M. F., & Seidenberg, M. S. (1999). Impairments in verb morphology after brain injury: A connectionist model. Proceedings of the National Academy of Sciences of the United States of America, 96(13), 7592-7597.
Joanisse, M. F., & Seidenberg, M. S. (2005). Imaging the past: Neural activation in frontal and temporal regions during regular and irregular past-tense processing. Cognitive Affective & Behavioral Neuroscience, 5(3), 282-296.
Kaelbling, LP (1996) Special issue on reinforcement learning - introduction. Machine Learning, Vol.22, No.1-3, Pp.7-9
Kandel, E. R. (2001). Neuroscience - The molecular biology of memory storage: A dialogue between genes and synapses. Science, 294(5544), 1030-1038.
Kaplan, S. Weaver, M. and French, R.M. (1992) Active symbols and internal models: Towards a cognitive connectionism. In Clark, A. and Lutz, R. (eds) Connectionism in Context. Berlin: Springer-Verlag. 91-110. (TIED XEROX)
Kemp, N., & Bryant, P. (2003). Do beez buzz? Rule-based and frequency-based knowledge in learning to spell plural -s. Child Development, 74(1), 63-74.
Mackintosh, N.J. (1988) Approaches to the study of animal intelligence. British Journal of Psychology, 79, 509-25.
Mareschal, D., & Johnson, S. P. (2002). Learning to perceive object unity: a connectionist account. Developmental Science, 5(2), 151-172.
Marshall, C. R., & van der Lely, H. K. J. (2006). A challenge to current models of past tense inflection: The impact of phonotactics. Cognition, 100(2), 302-320.
Marslen-Wilson, W and Tyler, LK (1997) Dissociating types of mental computation. Nature, Vol.387, No.6633, Pp.592-594.
Marslen-Wilson, W and Tyler, LK (1998) Rules, representations, and the English past tense. Trends in Cognitive Sciences, Vol.2, No.11, Pp.428-435 Is: 1364-6613.
Marslen-Wilson, W. D., & Tyler, L. K. (2003). Capturing underlying differentiation in the human language system. Trends in Cognitive Sciences, 7(2), 62-63.
Marslen-Wilson, W., & Tyler, L. (2007). Morphology, language and the brain: the decompositional substrate for language comprehension. Philosophical Transactions of the Royal Society B: Biological Sciences, 362(1481), 823-836.
McClelland, J. L., & Patterson, K. (2002). Rules or connections in past-tense inflections: what does the evidence rule out? Trends in Cognitive Sciences, 6(11), 465-472.
McWhinney, B. and Leinbach, J. (1991) Implementations are not conceptualizations: Revising the verb learning model. Cognition, 40, 121-157.
Monaghan, P., & Shillcock, R. (2004). Hemispheric asymmetries in cognitive modeling: Connectionist modeling of unilateral visual neglect. Psychological Review, 111(2), 283-308.
Monaghan, P., & Shillcock, R. (2007). Levels of description in consonant/vowel processing: Reply to Knobel and Caramazza. Brain and Language, 100(1), 101-108.
Newman, A. J., Ullman, M. T., Pancheva, R., Waligura, D. L., & Neville, H. J. (2007). An ERP study of regular and irregular English past tense inflection. Neuroimage, 34(1), 435-445.
Nicoladis, E., Palmer, A., & Marentette, P. (2007). The role of type and token frequency in using past tense morphemes correctly. Developmental Science, 10(2), 237-254.
Penke, M., & Westermann, G. (2006). Broca's area and inflectional morphology: Evidence from Broca's aphasia and computer modeling. Cortex, 42(4), 563-576.
Pinker, S. and Bloom, P. (1990) Natural language and natural selection. Behavioural and Brain Sciences, 13, 707-784.
Pinker, S., & Ullman, M. T. (2002). The past and future of the past tense. Trends in Cognitive Sciences, 6(11), 456-463.
Plunkett, K., & Bandelow, S. (2006). Stochastic approaches to understanding dissociations in inflectional morphology. Brain and Language, 98(2), 194-209.
Plunkett, K., & Juola, P. (1999). A connectionist model of English past tense and plural morphology. Cognitive Science, 23(4), 463-490.
Queller, S., & Smith, E. R. (2002). Subtyping versus bookkeeping in stereotype learning and change: Connectionist simulations and empirical findings. Journal of Personality and Social Psychology, 82(3), 300-313.
Ralph, M. A. L., Braber, N., McClelland, J. L., & Patterson, K. (2005). What underlies the neuropsychological pattern of irregular > regular past-tense verb production? Brain and Language, 93(1), 106-119.
Read, S. J., & Urada, D. I. (2003). A neural network simulation of the outgroup homogeneity effect. Personality and Social Psychology Review, 7(2), 146-169.
Rumelhart, D.E. and McClelland, J.L. (1986a) On learning the past tenses of English verbs. In McClelland, J.L. and Rumelhart, D.E (eds) Parallel Distributed Processing. Volume 2. Psychological and Biological Models. London: MIT Press, 216-271.
Rumelhart, D.E. and McClelland, J.L. (1986b) PDP Models and General Issues in Cognitive Science. In Rumelhart, D.E. and McClelland, J.L. (eds) Parallel Distributed Processing. Volume 1. Foundations. London: MIT Press, 110-46
Sutton, R.S. and Barto, A.G. (1981) Toward a modern theory of adaptive networks: expectation and prediction. Psychological Review, 88, 135-171.
Thomas, M. S. C., & Karmiloff-Smith, A. (2003). Modeling language acquisition in atypical phenotypes. Psychological Review, 110(4), 647-682.
Thomas, M., & Karmiloff-Smith, A. (2002). Are developmental disorders like cases of adult brain damage? Implications from connectionist modelling. Behavioral and Brain Sciences, 25(6), 727-+.
Thorndike, E.L. (1905/1919) The Elements of Psychology. New York, A.G. Seiler. [in 'Early Texts' at Senate House]
Tikkala, A. (2000). A connectionist word production tool for Finnish nouns with a model for vowel harmony restrictions. Computer Speech and Language, 14(1), 1-13.
Ullman, M. T., Pancheva, R., Love, T., Yee, E., Swinney, D., & Hickok, G. (2005). Neural correlates of lexicon and grammar: Evidence from the production, reading, and judgment of inflection in aphasia. Brain and Language, 93(2), 185-238.
Van Overwalle, F., & Jordens, K. (2002). An adaptive connectionist model of cognitive dissonance. Personality and Social Psychology Review, 6(3), 204-231.
Walker, S.F. (1992) A brief history of connectionism and its psychological implications. In Clark, A. and Lutz, R. (eds) Connectionism in Context. Berlin: Springer-Verlag. 91-110.
Westermann, G., Mareschal, D., Johnson, M. H., Sirois, S., Spratling, M. W., & Thomas, M. S. C. (2007). Neuroconstructivism. Developmental Science, 10(1), 75-83.
Westermann, G., Sirois, S., Shultz, T. R., & Mareschal, D. (2006). Modeling developmental cognitive neuroscience. Trends in Cognitive Sciences, 10(5), 227-232.
White, R. L., & Snyder, L. H. (2007). Spatial constancy and the brain: insights from neural networks. Philosophical Transactions of the Royal Society B-Biological Sciences, 362(1479), 375-382.