[ page 9 of handout — for a very brief account, see Lieberman, 2000, p. 528]

Rumelhart, D.E. and McClelland, J.L. (1986b). On learning the past tenses of English verbs. In McClelland, J.L and Rumelhart, D.E. (eds). Parallel Distributed Processing. Volume 2. Psychological and Biological Models . MIT Press, London, 216-271.

Do we have explicit inaccessible rules ? Is there a Language Acquisition Device (LAD) to discover such rules?

Convention is Yes:
- the mechanism derives such rules
- hypotheses are rejected and replaced to account for evidence
- the LAD has innate knowledge of the possible range of languages and thus only considers hypotheses imposed by linguistic universals .

children’s acquisition of regular and irregular English past-tense verbs.

regular = wiped and pulled
irr = came went (goed) gave got

there is a stage of goed and comed in Stage 3 both regular and irregular OK

– therefore “U-shaped learning”

acquistion is actually quite gradual

Goal was to simulate the stages with a simple connectionist model

[end of page 9 of handout]

MODEL (See overhead.)
Basic part is the pattern associator. Decoding network converts a featural code to a phonological representation. All learning is in the associator, decoding converts near misses into legitimate phonological representation.

Test trials
Given phoneme string corresponding to root of work. system computes the net input. It has linear threshold operation, probablisticially, and therefore learns slowly. at high Temperatures the response is highly variable

On learning trials model is given both the root and the target output. Model compares its own answer with the target, connection strengths are then adjust using a discrete variant of the Delta rule. e.g. if output is 0 and it should be 1, we increase the weights from all the active input units by a small amount N. At the same time, the threshold is reduced by N. Opposite if output is 1 and should be 0. (In their simulation, N was always 1.

All the values chosen so that at one stage the system has more regular than irregular examples, and temporarily ‘overregularizes exceptions that it may have learned previously (This gives U-shaped learning, but by artificially structuring the input).

Used a coarse and cut down “Wickelphone” system of triplets of features. And blurred it by turning on some randomly (to promote generalization) - triples of features, one from central, one from predecessor and one from successor. Each triple was a “wickelfeature” and they use 460 of a possible 1210 (10xllx11). “All words, no matter how many phonemes in the word, will be represented by a subset of the 460 Wickelfeatures.”

Results of simulations
- the model captures the basic 3 stage pattern of acquisition - captures most aspects of the differences in performance on different types of regular and irregular verbs. - is capable of responding appropriately to verbs it has never seen before, as well as to regular and irregular verbs actually experience during training.

They used 506 verbs altogether ordered according to frequency.- first trained with 10 hi- frequency verbs, then 410 medium frequency, then 86 lower frequency were added

For this case, there is no induction problem . The child need not figure out what the rules are, or even if there are rules at all.

“We view this work on past-tense morphology as a step toward a revised understanding of language knowledge, language acquisition, and linguistic information processing in general.”

(NB there is no temporal information as such in the coding.)

This model has been heavily criticised (e.g. Pinker and Prince, 1988). BUT

Plunkett, K. & Marchman, V. (1991) U-shaped learning and frequency effects in a multi-layered perceptron: Implications for child language acquisition. Cognition, 38, 43-102.

This paper reports on a wide range of similar simulations of past-tense learning using several different kinds of network. The conclusion is that it is indeed unlikely that a single-layered network of the kind used above will work with input configurations that are more analogous to English in real life. However they discuss several types of system which could result in over- generalization and U-shaped learning. These authors obtained successful simulations with networks using 20 input units, 20 output units, and 20 “hidden units” between these layers.


p234: How to represent the words


CAT is represented by #ka kat and at#

But there are two many of these: with 35 different phonemes there would be 353 or42,875. — Therefore they used


used 460 of the posible 1,210 (11*10*11)

eg. kam (came):

the first is interupted back stop unvoiced: the middle is a long low front vowel and the 3rd is mixed to give 16 features turned on.

a word will have at most 16 features turned on for every phoneme. Thus “came” will have 48 of the 460 possible units turned on. (Temporal order is entirely implicit).

In order to promote generalization they turned on some similar features at random.

In summary the model had two sets of 460 (Wickelfeature Units), one which repsented the base form of the 506 most frequent verbs in English and the other which represented the past-tense form. (p239).

460*460=211,600 connections of varying weights


10 highest were come, get give, look, take, go have live and feel (only 9!) (eight irregualr and two regular) The 410 medium frequency, of which 334 regular and 76 irregular. Finally 86 lowere freuqency, of wihc 72 were regular and 14 were regular.

There were 10 cylces of training through the first ser of 10 (phase 1)
Then 190 more trials with the 410 med verbs added. Then tested with the further 86, without any more learning.

The Simulations

The model

Captures the three-stage pattern of acquistion

Captures most aspect of differences in performance on different types of regular and irregular verbs.

Is capable of responding appropriately to verbs it has never seen before, as well as to both irregular and regular verbs actually experienced in training.