| [ page 9 of handout — for a very brief account, see Lieberman, 2000, p. 528]
Rumelhart, D.E. and McClelland, J.L. (1986b). On learning the past tenses of English verbs. In McClelland, J.L and Rumelhart, D.E. (eds). Parallel Distributed Processing. Volume 2. Psychological and Biological Models . MIT Press, London, 216-271.
Do we have explicit inaccessible rules ? Is there a Language Acquisition Device (LAD) to discover such rules?
Convention is Yes:
ALTERNATIVE: NO explicit rules EXAMPLE:
regular = wiped and pulled
there is a stage of goed and comed in Stage 3 both regular and irregular OK
– therefore “U-shaped learning”
acquistion is actually quite gradual
Goal was to simulate the stages with a simple connectionist model
[end of page 9 of handout]
MODEL (See overhead.)
All the values chosen so that at one stage the system has more regular than irregular examples, and temporarily ‘overregularizes exceptions that it may have learned previously (This gives U-shaped learning, but by artificially structuring the input).
Results of simulations
They used 506 verbs altogether ordered according to frequency.- first trained with 10 hi- frequency verbs, then 410 medium frequency, then 86 lower frequency were added
“We view this work on past-tense morphology as a step toward a revised understanding of language knowledge, language acquisition, and linguistic information processing in general.”
(NB there is no temporal information as such in the coding.)
This model has been heavily criticised (e.g. Pinker and Prince, 1988). BUT —
Plunkett, K. & Marchman, V. (1991) U-shaped learning and frequency effects in a multi-layered perceptron: Implications for child language acquisition. Cognition, 38, 43-102.
This paper reports on a wide range of similar simulations of past-tense learning using several different kinds of network. The conclusion is that it is indeed unlikely that a single-layered network of the kind used above will work with input configurations that are more analogous to English in real life. However they discuss several types of system which could result in over- generalization and U-shaped learning. These authors obtained successful simulations with networks using 20 input units, 20 output units, and 20 “hidden units” between these layers.
But there are two many of these: with 35 different phonemes there would be 353 or42,875. — Therefore they used
used 460 of the posible 1,210 (11*10*11)
eg. kam (came):
the first is interupted back stop unvoiced: the middle is a long low front vowel and the 3rd is mixed to give 16 features turned on.
a word will have at most 16 features turned on for every phoneme. Thus “came” will
have 48 of the 460 possible units turned on. (Temporal order is entirely implicit).
In order to promote generalization they turned on some similar features at random.
In summary the model had two sets of 460 (Wickelfeature Units), one which repsented the base form of the 506 most frequent verbs in English and the other which represented the past-tense form. (p239).
460*460=211,600 connections of varying weights
There were 10 cylces of training through the first ser of 10 (phase 1)
Captures the three-stage pattern of acquistion
Captures most aspect of differences in performance on different types of regular and irregular verbs.
Is capable of responding appropriately to verbs it has never seen before, as well as to both irregular and regular verbs actually experienced in training.