At some point, anyone reading this must have acquired a tendency to convert letters into sounds, and to associate strings of letters with the meanings of words. We cannot be certain exactly how this happened, but there is surely no doubt at all that it involved learning from experience. There may be some aspects of human language that remain uninfluenced by individual experience, but the spoken words I am using here are a result of my having learned English. There are many other languages which could be written with the same script, such as French or Spanish, but anyone who reads Russian, Greek or Hebrew will have had to learn a different set of visual signs as the alphabet, while the Arabic script is even more unfamiliar, and reading (or writing) in ancient Egyptian or Chinese is a different kettle of fish altogether.
Thus, the use of language requires a strong element of learning, and so does the knowledge of social custom and ritual, and all beliefs, loyalties and values that can be shown to differ from one culture to another, or from one generation to the next. And in many cases, it is quite obvious that even in the same culture or
sub-culture one family may differ from another, or one person from another person, because of previous tragedies or traumas that may produce either weakness or strength of personality. More positively, they may differ because of family traditions or personal histories of practice and determination that may transmit or develop skills and expertise — whether musical, muscular or social.
Whether in academic education, specialized training, or in the more informal and unconscious adaptation to social and personal circumstances, there has always been a case to make that human life is largely dependent on individual learning — it is the flux, not the fixity, of human technologies and social institutions which most distinguishes our species from any other. Therefore, those who believe that they have found general laws, or universal principles, which apply to learning have often gone on to infer that the same general laws must underlie wider areas of human psychology, and can be used as general principles of human behaviour. The idea that the same theory can apply to all learning has often been attacked, both by those who think that innate factors determine human nature and by those who reject any ‘reductionist’ approach which tries to find underlying explanations for the complexities of life.
In spite of such criticisms, there are two quite separate reasons for continuing to study learning theories. The first is that this is still a vigorous research area, with technical advances in the design of experiments producing new answers to some of the old puzzles (Rescorla, 1980; Dickinson, 1980). The second is that the old theories themselves, despite numerous logical and metaphysical difficulties, have spawned a collection of practical measures, known as behaviour modification, or behaviour therapy, which have made a significant contribution to such areas as the treatment of severe neurotic phobias and the education of retarded or handicapped children.
The development of learning theory
As I am writing in the centennial of the year of Darwin’s death —1882 — it is appropriate to acknowledge the Darwinian roots of some of the features of learning theories. In the conclusion to The
Origin of Species, Darwin says: ‘In the distant future I see open fields for far more important researches. Psychology will be based on a new foundation, that of the necessary acquirement of each mental power and capacity by gradation’ (Darwin, 1859/1968, p. 458). A new foundation for psychology exactly like this seems distant still, and an emphasis on the gradual evolution of mental powers and capacities is not really a characteristic of the theories I shall shortly review. But it is obvious that the Darwinian theory of evolution emphasizes the continuity of human and animal psychology, and the use of evidence from laboratory experiments on animal learning to test principles and specific hypotheses of supposedly wider application could almost be used as a definition of a learning theory.
In The Descent of Man (1871/1901) Darwin has two chapters on ‘Comparison of the mental powers of man and the lower animals’ whose aim is ‘to show that there is no fundamental difference between man and the higher mammals in their mental faculties’ (p. 99). Much of these is taken up with anecdotal evidence for the existence of wonder, curiosity, complacency and pride, as well as reason, abstraction and imagination, in mammals such as the baboon and the domestic dog. The conclusion is that ‘The lower animals differ from man solely in his almost infinitely larger power of associating together the most diversified sounds and ideas’ (Darwin, 1871/1901, p. 131), and ‘We must admit that there is a much wider interval between one of the lowest fishes, as a lamprey or lancelet, and one of the higher apes, than between an ape and man’ (p. 99). These assertions are used by Darwin to support his theory that human abilities could have evolved gradually from those of related species, rather than to forward the ‘new foundation’ for psychology which he had referred to earlier. However, by identifying the power of associations as a critical factor in human intelligence and firmly relating human mental capacities to those of animals, Darwin prepared the way for later ‘associationist’ theories about psychology which derive supporting evidence from animal experiments.
Pavlov and the conditioned reflex
When, in 1882, Darwin was buried in Westminster Abbey, a leader in The Times expressed full appreciation of his work. But, when
The Descent of Man first appeared in 1871, The Times had thundered that ‘morality would lose all elements of stable authority’ if the public were to believe it. Few today suggest that the theory of natural selection (or in the case of The Descent of Man, sexual selection) is a threat to public order, although neither the religious Right nor the political Left have much enthusiasm for modern varieties of Darwinism such as sociobiology (Wilson, 1975).
Another branch in the roots of learning theory also ran into ideological resistance early on. In 1866 the St Petersburg Censorial Committee banned a popular book, and prosecuted its author for undermining public morals. The author was neither a pornographer nor a political theorist, but a physiologist called Sechenov, and the book was Reflexes of the Brain, which introduced the controversial suggestion that ‘all acts of conscious or unconscious life are reflexes’. Perhaps the authorities were especially sensitive, 1866 being the year of the first assassination attempt on Alexander II, but the case against Sechenov soon collapsed, and Reflexes of the Brain later made a deep impression on the young Ivan Pavlov (Gray, 1979).
Pavlov (1849—1936) was awarded a Nobel prize in 1904 for his work on digestion. In the lecture he gave in Stockholm when he received it, he described some of his findings as ‘conditioned reflexes’, although many of the more detailed experiments were to come later, as Pavlov became less concerned with digestion, and more concerned with ‘an investigation of the physiological activity of the cerebral cortex’ (the subtitle to Pavlov’s Conditioned Reflexes, 1927). Sechenov’s idea was that even the most complex manifestations of human psychology were made up of reflexes, that is of ways of reacting to stimulation of the sensory nerves by specific muscular or glandular activities. This gave Pavlov encouragement to build up from his rigorous experiments on digestive reflexes a theory which he applied to all cerebral functions.
Although Pavlov’s work has been influential in a number of ways, the clearest contrast with Darwin is in his application of rigorous experimental method. Darwin had relied on anecdotal reports of casual and informal observations. For instance, he attributed abstract thought to dogs on the basis of a game he played with his pet terrier: Darwin would say ‘Hi, hi, where is it?’ in an eager tone, and the terrier would rush around, apparently hunting for something. ‘Now do not these actions clearly show
that she had in her mind a general idea or concept that some animal is to be discovered or hunted?’ asked Darwin (1871/1901, p. 127). This is unsatisfactory because there are several other possibilities. The eagerness of Darwin’s tone of voice may simply have excited the dog, since there were no experimental controls to show that saying Where am I?’ or ‘How are you?’ to the terrier might not elicit an equal amount of rushing about.
Pavlov’s observations were the opposite of casual. As a professional research scientist under three Tzars, and then both Lenin and Stalin, he was known for his emphasis on rigorous experimental method, and the physiological tradition in which he worked required reliable and repeatable experimental demonstrations. In much of the later work, scrupulous care was taken to avoid extraneous influences on psychological experiments by keeping the human experimenter separate from the experimental animal in a different room, and by going to such lengths as building the laboratories with double walls filled with sand to achieve sound insulation. What Pavlov discovered, in the course of his systematic study of digestion, is that glandular secretions, of gastric juices or of saliva, are controlled by ‘psychic’ or psychological factors, and not simply by chemical or mechanical stimulation. The earliest way in which this was demonstrated was by ‘sham feeding’.
It was primarily his skill as an experimental surgeon which enabled Pavlov to make his Nobel-prize-winning discoveries. Others, in the 1840s, had developed the technique of permanently implanting a metal tube in a dog’s stomach, through which gastric juices could be collected. The modification introduced by Pavlov (and his co- worker, a Mrs Shumov-Simanovsky) was the surgical separation of the mouth from the stomach: the oesophagus was cut and the two cut ends were independently brought out at the throat. This meant that food eaten and swallowed by the dog dropped out of its throat before reaching the stomach, or alternatively, food could be dropped directly into the stomach without the dog’s having seen or tasted it. The original purpose of this was to solve the problem of obtaining pure gastric juices, uncontaminated with food.
But, by the time of his Nobel lecture in 1904, Pavlov’s main interest was in the psychological control of the gastric secretion. Bread dropped into a dog’s stomach without the animal noticing it
was not digested at all, but if the dog ate bread, gastric activity occurred even if the bread never reached the stomach. This all depended on the ‘appetite’ of the dog, since the mere sight of food produced stomach activity, but only if the dog took an interest in the food — the effectiveness of food in the mouth depends on how far the food ‘suits the dog’s taste’ (Pavlov, 1955, p. 142).
In order to study these psychological factors in more detail it was not necessary to continue to work with secretions of the stomach, since secretions of the salivary glands can serve just as well. As every student knows, the standard Pavlovian experiment requires the measurement of salivation in response to an external and distant stimulus, such as the sounding of a buzzer. Dogs do not normally salivate when they hear buzzers, even if they are hungry, but, if the buzzer is always sounded a few seconds before food is to be presented, a ‘conditioned reflex’, of salivating to the sound, is formed. To establish a reliable response, a dog might be given half-a-dozen pieces of meat, each preceded by the sounding of the buzzer, at five-minute intervals every day for a week or more. After this, a demonstration of the ‘conditioned reflex’ could be given by simply sounding the buzzer, without giving any meat. Now the buzzer would produce the same effects, more or less, as showing the dog real food — there would be plenty of salivation, and the dog would lick its lips, look at the food dispenser and perhaps wag its tail.
More details of Pavlov’s experimental findings will be found in chapter 3. The laboratory findings with the reflex of salivation were used as the basis for a theory about the ‘higher nervous activity’ of the mammalian cerebral hemispheres, and then for wide- ranging speculations about psychology and psychiatry. The dog salivates to the buzzer only because of its previous experience of the association in time of the buzzer with its food. The conditioned reflex could thus be seen as an atomic unit of learning from experience, capable of being ‘synthesized’ into more complex combinations by the activities of the cerebral cortex. Thus Pavlov was led to claim that ‘the different kinds of habits based on training, education and discipline of any sort are nothing but a long chain of conditioned reflexes’ (Pavlov, 1927, p. 295, my italics).
In some ways this set the pattern for subsequent learning theorists. In the last two pages of his Conditioned Reflexes, the most systematic exposition of his work, Pavlov reiterates his objective
of providing a ‘purely physiological interpretation’ of brain activity immediately after asserting that his experiments would eventually ‘throw some light upon one of the darkest points of our subjective self — namely, the relations between the conscious and the unconscious’ (Pavlov, 1927, p. 410). Obviously, this harks back to Sechenov’s slogan that ‘all acts of conscious or unconscious life are reflexes’, but, sadly, relations between conscious and unconscious processes are usually neglected in developments based on Pavlov’s work.
Thorndike (1874—1949): connectionism and the law of effect
Pavlov often described stimuli such as the buzzer, which came to elicit salivation, as ‘signals’ for food, which might direct the animal to acquire food, and assist in its adaptation to the external world. But partly because in his experiments the dogs were firmly strapped in stands, he saw the formation of conditioned reflexes as a rather passive and mechanical process. Thorndike is important in learning theory for proposing an equally mechanical process of learning, but also for emphasizing the effects of consequences of the active response of an experimental animal. We may note, however, that Pavlov was not unaware of the influences of the consequences of an animal’s actions, and made a special point in 1895 of mentioning an anecdote to illustrate this. Pavlov’s most famous operation was the construction of ‘Pavlov’s pouch’ (Gray, 1979) — a piece of the duodenum containing the outlet of the pancreas is cut away and then stitched back facing outwards so that it discharges through an opening in the abdomen, and its secretions can be subsequently collected. A difficulty with this was that the escaping digestive juices, leaking out during the night, caused erosion and bleeding of skin of the abdomen. One of the dogs subjected to his operation and left tied up in the laboratory overnight was found, two nights in succession, to have torn a heap of plaster from the wall. On the second occasion Pavlov noticed that the dog had been sleeping on the plaster, with the result that the skin of its abdomen was in exceptionally good condition. From then on all the animals that had had a similar operation were provided with a pile of sand or old mortar to lie on, which greatly reduced the incidence of skin irritations. Pavlov remarks: ‘We gratefully acknowledge that by its manifestation of common sense
the dog had helped us as well as itself. It would be a pity if this fact were lost for the psychology of the animal world’ (Pavlov, 1955, p. 90).
Thorndike (1898) ensured that the tendency of animals to learn to help themselves was not lost to learning theory, but was reluctant to acknowledge anything approaching common sense on the part of the dogs, cats and chicks which were the subjects of his behavioural experiments. Whereas Pavlov brought to animal psychology a fully equipped physiological laboratory, Thorndike was influenced by the philosophical views of William James and by the fact that as a postgraduate student at Harvard in the 189 OS he could conduct animal experiments only by keeping chickens, young cats and dogs in his own lodgings, and building his own apparatus. Not surprisingly, Thorndike was unpopular with landladies, and at one point, when Thorndike had been turned out for hatching chickens in his bedroom, William James’s household had to take in both Thorndike and chickens. James’s Principles of Psychology (1891) pours considerable scorn over Darwin’s and Romanes’ anecdotal evidence of reasoning in animals, and proposes that all their associations of ideas take place by simple contiguity. When an animal reacts intelligently to some stimulus, it is because ‘the beast feels like acting so when these stimuli are present, though conscious of no definite reason why’ (James, 1891, p. 350). In particular, when any animal opens a door or gate by biting or manipulating a latch or handle, James suggests that this is likely to be ‘a random trick, learned by habit’ (James, 1891, p. 353). Thorndike’s bedroom experiments were designed to support these views of William James, which in many ways represented a reaction against Darwinian anthropomorphism and a return to the sceptical view of animal reason put forward in the seventeenth century by the English philosopher John Locke.
The main technique which Thorndike used to provide experimental evidence in support of James’s view involved the use of problem or puzzle boxes, with cats (the barking of dogs having caused excessive trouble with landladies). The boxes were small crates, hammered together from wooden slats, about 50cm square and 30 cm high. Anyone who has ever put a cat in a carrying box will know that they do not always take kindly to it, and Thorndike’s animals, although less than a year old, struggled violently when they were first confined in the crates. In case they were in need of
any further motivation to escape from the problem boxes, they were tested in a state described as ‘utter hunger’, with a piece of fish visible to them outside. In order for them to make their escape, it was necessary for them to find a releasing device which, when manipulated, would automatically allow a door to spring open. Several boxes were used, and for each one Thorndike had designed an ingenious arrangement of strings, pulleys and catches, which even the human observer would find hard to follow at first sight. Thus, when a cat was first put in one of the problem boxes, it did not sit back and deduce from the arrangement of pulleys which loop of string or catch had to be pulled, but rather scratched and cried, and thrust its paws or nose into any available opening, until by chance it made some movement which operated the release mechanism. On average, with one of the simpler boxes, it would take a cat five minutes of random scratching before it accidentally succeeded on the first test, but after ten or twenty trials in the same box it would consistently escape within five seconds or so (see figure 3, p. 48).
Thorndike attributed this change in behaviour to random, or ‘trial-and-error’, learning. The form which the learning took (Thorndike’s answer to the question ‘What is learned?) he supposed to be a ‘connection between the situation and a certain impulse to act’ which is ‘stamped in when pleasure results from the act, and stamped out when it doesn’t’ (Thorndike, 1898, p. 103). Thorndike called himself a connectionist, and it was connections between stimulus input and response output which were learned, whether one thinks in terms of connections between perceptions and impulses to act or in terms of connections between neurons in the brain.
There are two things to notice about Thorndike’s explanation. First, it is a pleasure/pain, or reward and punishment, theory. It is because the cat is glad, or satisfied, when it gets out of the box, that it learns the trick. Secondly, and this is an odd thing about Thorndike’s theory, the cat is not supposed to think ahead about getting out — it has an impulse to perform the releasing action, but no anticipation that the action will lead to its release. This sounds rather unlikely, as far as the cat experiment goes, but it followed on from William James’s views, and as a simple generalization the ‘law of effect’ proved to be a very powerful assumption. The law of effect is the statement that the effects of an action (whether it
produces reward or punishment) act backwards to stamp in the connection between the action and the circumstances in which it was made. Although more recent authorities would suggest that Thorndike’s cats operated the catches and pulled strings in their boxes because they expected to get out by these means, there are other cases where the backwards-stamping-in aspect of rewards is significant, and the idea that responses can be changed by their consequences has an important place in learning theory.
Watson (1878—1958) and behaviourism
The idea that an animal’s pleasure or discomfort could encourage it to repeat, or abstain from, actions which brought about such states did not at first form part of Watson’s behaviourism. It was listed under ‘certain misconceptions’ in Watson’s book of 1914, where he said that ‘It is our aim to combat the idea that pleasure or pain has anything to do with habit formation’ (p. 257). Having devoted, by 1913, twelve years to studying animal behaviour, in the laboratory and in the field, Watson had become impatient with the difficulties of integrating his findings with the contemporary psychology based on human introspection, and attempted to throw out all talk of feelings, conscious sensations and images from psychological discussion, hoping to produce ‘a purely objective experimental branch of natural science’. This approach certainly made it easier to be systematic about such things as colour vision in animals. If a bird is exposed to red and green lights, it will never be possible to decide whether the animal subjectively sees the two colours in the same way as I do, or sees them as two shades of grey, or as anything else. But it is relatively easy to do an experiment to train the bird to respond to the red light and not to the green one, and then to make variations in the lights to find out the relative importance of the brightness and wavelength of the light as influences on the animal’s behaviour.
However, Watson’s behaviourism also led him to say things he later regretted, such as ‘thought processes are really motor habits in the larynx’ (1913, p. 174). By 1931, this extreme view had been altered slightly: ‘I have tried everywhere to emphasize the enormous complexity of the musculature in the throat and chest.... the muscular habits learned in overt speech are responsible for implicit or internal speech (thought)’ (Watson, 1931, pp. 238—9).
The tide of the book from which these quotations are taken is Behaviorism, and the extract makes it quite clear that Watson wished to interpret all mental activity in terms of peripheral movements and habits of movement — this is a defining feature of behaviourism, but not of all learning theories, as we shall see.
Like most learning theorists, Watson discussed human behaviour in terms of motives and rewards. A dress designer creates a new gown, he says, not by having ‘pictures in his mind’, but by calling a model in, throwing a piece of material around her, and manipulating it until his own emotional reactions are aroused in a satisfactory way and the assistants say ‘Magnifique!’ (1931, p. 248). This of course is trial and error, or ‘trial and success’, as Thorndike often called it.
Watson did his Ph.D. thesis on animal learning a couple of years after Thorndike, and published it as Animal Education (1903). The experiments involved problem boxes and mazes which hungry animals learned to get out of for food rewards, and differed from Thorndike’s mainly in that the subjects were white rats instead of chickens and cats. But Watson resisted Thorndike’s idea that rewards ‘stamped in’ responses, and preferred to talk simply of the formation or fixation of habits. Animals learned how to get out of problem boxes, according to Watson, partly because the response they happened to make just before leaving the box had to be the last one in a series of responses to the same stimulus—the recency effect. In time, the correct solution to a problem becomes the most frequent response, and Watson left it at that. The frequency principle, especially when rendered as the importance of practice and repetition, is common to many other theorists, notably Guthrie (1886—1959).
As a propagandist, Watson was fond of going to extremes, and one can be most sympathetic to him in this when he argues against racial and familial inferiority. Since Watson was brought up as a poor white in South Carolina, he knew well enough the strength of the opposition to his view that neither Negroes in general nor distinguished white families in particular had inherited tendencies and psychological factors to thank for their social position. It is ‘millions of conditionings’ during early childhood experience in upbringing and education that are responsible, in Watson’s theories, for both the personalities and the intellectual capacities of adults. His most famous assertion was:
This is certainly a strong claim for the importance of learning in human psychology.
Skinner (1904— ) and operant conditioning
B.F. Skinner belongs to a much later academic generation than Thorndike and Watson, but he can be regarded as having amalgamated these two earlier theorists into a new blend which has outlasted the original components. Thorndike’s experiments on the law of effect have priority as the first investigations of operant conditioning, as Skinner acknowledges, but the Watsonian emphasis on habits and reflexes, and the popularity of behaviourism, left Thorndike isolated. While believing in associations and connections, Thorndike was always closer to William James than to Watson. Thorndike’s Elements of Psychology (1905) is almost entirely concerned with mental states and feelings — these two terms being used in most of the chapter headings, as in ‘Feelings of things as absent: images and memories’ and ‘Mental states concerned in the directions of conduct: feelings of willing’. In the year that Watson began his crusade for behaviourism, Thorndike was still introducing the law of effect in a chapter on ‘Consciousness, learning and remembering’. It was left to Skinner to bring the trial-and-success principle into the Watsonian world of reflexes.
This happened fairly gradually. Skinner’s first work (1931) was a behaviourist analysis of the concept of the reflex, in Pavlov’s experiments and those of other physiologists, and the results of experiments on rats in the famous ‘Skinner box’ were reported as experiments on reflexes. But in a series of papers Skinner drove a wedge between reflexes of the Pavlovian type (eventually terming these ‘respondents’) and habits of the Thorndikean kind (eventually calling these ‘operants’). In these early papers, Skinner was
fond of drawing diagrams to show the sequences of stimulus and response in conditioning. Thus S0 — R0; S1 — R1 described what happened in the first Skinner boxes. These contained an automatic dispenser to drop small pellets of food, one at a time, into a food tray, and just above the food tray a horizontal wire lever, which, when pushed down by the rat, could operate the automatic dispenser. In the sequence of Ss and Rs, S0 — R0 would be the stimulus of the sight or the touch of the lever leading to the response of the rats of pressing it down, and S1 — R1 would be the consequent stimulus of a food pellet dropping into the tray, and the response of the rat of seizing and eating the food pellet. Clearly the description could be broken down further (including even the swallowing reflexes of the animal), but Skinner’s point was that there was a ‘chain’ of responses, and the ‘getting-the-food part’ of the chain strengthened the ‘pressing-the-lever part’ which preceded it, the ‘strengthening’ being a more neutral and purely descriptive version of Thorndike’s ‘stamping in’ (see Skinner, 1938, pp. 47—54, 65—6).
By 1938, Skinner emphasized that in ‘operant’ behaviour —moving about in the environment and manipulating things — there is no static connection of a response with a previous eliciting stimulus, but rather a response is ‘emitted’ in a more or less spontaneous and voluntary way. This may be contrasted with a knee-jerk or finger-from-flame withdrawal, which is always related to the eliciting stimulus. Sometimes these distinctions become very technical. But a ‘reflex of seeking food’ (Skinner, 1935, p. 176) as active, goal-determined responding is clearly rather different from the secretion of saliva and gastric juices of a dog strapped in a stand in Pavlov’s experiments, and more like intentional trial and error.
Although operant goal seeking sounds slightly mentalistic, Skinner exceeded even Watson’s rigour in sustained scepticism about inner mental images and desires. Operant responses were not supposed to be chained together because they made up purposeful acts, but only because, in echoes of Thorndike, ‘The connections between parts are purely mechanical’ (1938, p. 55). Inner mental events, Skinner usually supposes, are no more necessary as explanations of operant behaviour than they are for the sequences of reflexes used in swallowing or the maintenance of postures (1938, p. 54, 1977). But he has been able to say
provocative things about private stimuli and functional units in thinking and speaking, in very much the same way as Watson was able to talk about ‘language habits’. The most theoretical part of Skinner’s work is his claim to have no theory at all (1950) and to be a radical behaviourist who simply describes the facts and nothing but the facts.
Hull (1884—1952) and mathematical equations in learning theory
In the 1930s, while Skinner was working out that Thorndike’s results were different from Pavlov’s, C.L. Hull, at Yale, was saying that they boiled down to the same thing. Hull’s theories were extremely influential during his lifetime, but after his death, Skinner, who had returned to Harvard from Indiana in 1948, became the most notable figure concerned with animal learning.
Hull’s theory was extremely systematic, and could often be stated in mathematical equations; these made it easier to show that the theory was, in most significant respects, wrong. It was most famously and instructively wrong over the question of needs, drives and incentives. Hull’s theory has been called ‘hypotheticodeductive’ because he believed in starting from first principles, and setting down postulates and corollaries in mathematical or logical forms. But Darwinian first principles led him astray almost immediately. ‘Animals may almost be regarded as aggregations of needs’ (1943, p. 64) is true enough, but Hull took the idea too far. In Thorndike’s cat-in-the-box experiment it seems reasonable to say that the cat, if hungry, has a need for the food outside, and that, if it struggles, it appears to have a drive to escape, as well as a ‘hunger drive’. Hull went on from this to formulate various elaborations of the law of effect in terms of need reduction. Postulate III(1952) runs:
This is very close to Thorndike’s idea of the stamping-in of connections by pleasurable consequences, but here it is a reduction of a drive which increases the likelihood of a future response
to the stimulus. It was the essence of Hull’s system that he did not add qualifications such as ‘other things being equal’ or ‘depending on whether the animal is paying attention to what it is doing’, as Thorndike did — animals were supposed to always learn response tendencies under appropriate conditions of drive reduction, and never learn anything if there was not any drive reduction. Tolman and others (see pp. 16ff.) provided evidence to show that rats apparently learned a good deal without any obvious drive reduction, and that learning does not only take the form of tendencies for stimuli to elicit responses, and Hull’s theory became more and more complicated and cumbersome. After various articles, the first in 1929, the 1936 presidential address to the American Psychological Association (1937) and the book Principles of Behavior (1943), Hull’s final system (1952) needed thirty-two separate postulates. The main one of general application was this:
This implies that the intensity or likelihood of any learned behaviour (SER) can be calculated if four other factors are known — the drive or motivation associated with it (D); the intensity of the signal for the behaviour (V); the degree of incentive (K); and the level of habit (SHR). Under laboratory conditions all the factors can be measured, and the equation checked. SER is measured by the probability or strength of a response, D by hours of deprivation or some other indicator of physical need, K by the size of the reward or some other index of its desirability, and SHR is calculated as the amount of practice given — usually as a number of reinforcements, each episode of drive reduction being one reinforcement.
The gradual increase in response tendency or habit with repeated experience is the core of the Hullian system, and the mathematical treatment of gradually changing associations is a bit of flotsam remaining from its wreckage. When a response tendency increases, in Hull’s system, the increase equals a fraction of (the maximum of the habit minus its current level). This always gives a nice gentle approach to the final level of habit (or asymptote). If the final level is taken as 100 units, and the fraction as a tenth (Hull, 1943, p. 115), then on the first learning trial the increase will be 10 units; but by the time the habit is half formed at 50 units,
which takes 7 trials, the increase is down to 5 units (1/10 (100—50)), and by the time the response tendency is 90 per cent complete (after 22 trials) each increment is of course less than 1. The closer to the maximum, the smaller the increments get.
This started as a matter of algebraic convenience, but by a quirk of fate ‘a modification of Hull’s account of the growth of SHR’, proposed long after his death by two of his successors at Yale (Rescorla and Wagner, 1972, p. 75), has proved to be surprisingly popular (Dickinson, 1980; Hilgard and Bower, 1981). Rescorla and Wagner were concerned with experiments of a Pavlovian kind, where two stimuli are given at once. As an example we may consider a Pavlovian dog whose food is signalled by both a buzzer and a flashing light — how much would it salivate to the light or buzzer presented alone? Hull’s algebra can be modified to treat separately the tendency of each stimulus to elicit salivation, by saying that whenever there is an increase for an individual stimulus the increase equals a fraction of (the maximum response minus the current tendency of both stimuli). This deals quite well with cases such as that where the dog has already been conditioned to salivate to the buzzer before a flashing light is made to accompany the buzzer. If the buzzer was already firmly conditioned, and close to the maximum by itself, the equation says there can be very little increase in the tendency to salivate to the light, and indeed, in this sort of experiment, the dog would probably ignore it.
Tolman ‘s ideas and expectancies
When Rescorla and Wagner (1972) presented their rather Hullian model for findings in classical conditioning, they made a prefatory comment that an alternative version of the model would be expressed by saying ‘organisms only learn when events violate their expectations’ (p. 75). If a dog has already learned to expect food when it hears a buzzer, adding a flashing lamp along with the buzzer is redundant, and the dog will not pay it much attention. It is now fairly common for theories about animal learning to be presented in terms of expectations, or ‘expectancies’ (Mackintosh, 1974; Walker, 1983), and this is something which goes back to Tolman. The terminology is indicative of a larger theoretical debt.
Starting with ‘A new formula for behaviorism’ (1922), Tolman
always called himself a behaviourist, but disowned Watsonian ‘muscle twitchism’, and recommended that objective measurements of behaviour should be used to support ‘molar’ concepts or ‘intervening variables’. During his career Tolman used rather a wide variety of terms to describe these concepts, including ‘beliefs’, ‘hypotheses’ and ‘representations’, as well as the more obscure ‘sign-Gestalt’ and ‘means-ends readiness’, and the now familiar ‘cognitive map’ (Tolman, 1948). But his theoretical position is most clearly set out in the early papers in which he proposed that animal learning is determined by ‘purpose and cognition’ (1925), and that because of this it is possible to have ‘a behavioristic theory of ideas’ (1926). In animal experiments, we can observe ‘in-behaviour’ ideas, because behaviour may be caused by purposes — ‘purposes which exhibit themselves as persistences through trial and error to get to or from’, and may express cognitions — ‘cognitions as to the nature of the environment for mediating such gettings to or from’ (1951, p. 51). The behavioural evidence from the kittens getting out of Thorndike’s puzzle box suggested to Tolman that the animals had rather simple ideas —‘getting out of the box’ was related to ‘eating the food’, and also ‘clawing at the loop’ was related to ‘getting out of the box’ — without mechanical insight into the relation between the loop of string. and the door. He thought that the cat had only ‘a representation of the very immediate consequences of the act, a prevision, perhaps of the opening of the door’ (1951, pp.58, 60). Nowadays, it is often inferred that animals may have inner representations of future events, as in a ‘unitary representation’ of two stimuli presented together, or in ‘multiple representations’ of the same stimulus or in ‘reinforcer-specific expectancies’, the inferences being made from an examination of detailed experimental evidence. Tolman’s own preferred form of experiment was the study of maze-learning in rats — in his view the best way to examine ideas was not to introspect on one’s own but to look at ‘a really good rat in a really good maze’ (see Tolman’s Collected Papers, 1951, p. 62). Mazes often demonstrate something which we ought anyway to expect on the basis of an animal’s natural life — that most species can acquire a knowledge of local geography which is not coded as sequences of muscle movements. There is still a good deal of uncertainty about exactly how homing pigeons get home, but they definitely do not do it only because of a collection of wing- flapping tendencies:
visual landmarks are important, at least at close range. In the case of rats in mazes, experiments in Tolman’s laboratory showed that a rat who has learned the maze by running can successfully swim round the maze when it is flooded (and vice versa); this shows it is not the individual movements of walking or swimming that constitute the learning of the maze. Various experiments on ‘place-learning’ demonstrate that organized information about the spatial layout of mazes is available to rats, enabling them to make shortcuts or go in the correct direction towards the usual location of food from several different starting points, and this is evidence for ‘cognitive maps’ rather than habit sequences.
So far, this means that Tolman’s answer to the question ‘What is learned?’ is much more complicated than stimulus- response connections or response tendencies. But it does not bear on the questions ‘How is it learned? or ‘When is it learned?’, which were both answered by the law of effect or the drive-reduction postulate in the theories of Thorndike and Hull.
Latent learning Maze experiments by Tolman and Honzik (1930) showed that learning could take place without any drive reduction, or stamping-in of connections by pleasure or pain. Rats were run on ‘multiple T’ mazes (see figure 1) in which the correct way through involved a sequence of fourteen choices (to turn left or right) in correct order.
Figure 1 A fourteen-turn maze used by Tolman. This was the kind of maze used in the first studies of 'latent learning'. To run through it without any errors, rats have to make fourteen turns in the sequence r, 1, 1, r, r, 1, r, r, 1, r, r, 1, 1, r. The alleys are four or five inches wide, with high walls and a system of curtains and doors which prevents the animals from seeing the next clear turn, but does not prevent them using more remote visual cues such as light sources. (After Tolman, 1948)
Hungry rats given some food at the end of the maze once a day seemed gradually to learn the correct sequence, taking up to two weeks to reduce their mistakes (turning into blind alleys) to a minimum. This could be taken to mean that correct turning habits were being stamped in by the rewards, since animals not given any food at the end hardly reduced their mistakes at all. But, if the rats who had been through the mazes without any food for ten days (being fed in their home cages) suddenly found some food at the end of it, then, on the next day, they ran through almost perfectly. The obvious interpretation is that they had learned a cognitive map of how to get through the maze during the first ten days, but had not bothered to take the quickest route — hence the learning was ‘latent’, or ‘behaviourally silent’. But, once they discovered food at the end — even after an incorrect run — then, on the next day, the expectation that food might be available again was sufficient motivation for them to manifest their previous learning by running the maze correctly.
Figure 2 Latent learning results.
The obvious way of getting a rat to learn a maze is to provide it with a food incentive at the end. Rats always rewarded in this way show a gradual decrease in errors (turning the wrong way in a maze like that shown in figure 1). Rats who are never rewarded continue to make many errors. But animals run initially without any food incentive (in this case for the first ten days), who have apparently learned nothing, show an immediate improvement in performance after just one reward, proving that some learning had taken place on the non-rewarded trials, even though this was not obvious in the rats' behaviour until rewards were given. (After Tolman and Honzik, 1930) (figure 2).
Clearly, the motivation, and the food, are important influences on the rats’ behaviour, or on their performance, but the acquisition of spatial knowledge can go on quite well in the absence of any obvious drive reduction, or external reward.
A discussion of how learning from experience has been incorporated into psychological theories could be widened to include a great deal more material than I have covered in this brief and selective review. Darwinian evolution suggested that there must
be a gradual development of psychological capacities, from species to species, and this gave a boost to all studies of animal behaviour. But the learning theories developed from the results of Pavlov’s experiments on conditioned reflexes and Thorndike’s experiments on trial-and-error learning are notable because of the search for general laws of learning. First Watson, and then Skinner, emphasized overt and directly measurable behaviour, learned as habits and reflexes. Skinner has followed Thorndike in pointing to external reinforcements consisting of reward and punishment as the primary causes of trial-and-error learning, or operant conditioning, while Hull maintained for some time that internal drive reduction was the reinforcement for trial-and-error learning and Pavlov’s classical conditioning alike.
Much current work follows Tolman in deducing that learning can occur without any reward and indeed without any immediate effects on behaviour. These are all differences between learning theories, but they have much in common, and some of the common factors will be examined in the next chapter.