gif

 

4

 

gif
gif
gif

Goals, rewards and
reinforcements

Learning theorists have always had difficulty in agreeing among themselves whether there is just one principle of learning—perhaps the principle of association, exemplified in classical conditioning—or whether there might be at least two principles, the other usually being the law of effect, as propounded by Thorndike. That there might be an infinite number of principles, varying according to species, experimental apparatus and season of the year, is a nightmare that is occasionally encountered, and put aside with a shudder. Some success is still being achieved by those who stick by a single principle of association, as they are willing to apply the same principle to many different kinds of event. Thus it is possible to claim that Pavlov’s dogs learned an association between the buzzer and getting food, while Thorndike’s cats, by the same principle of contiguity, learned an association between the response of pawing a latch and the welcome experience of getting out of their box (Mackintosh, 1974, p. 139; Dickinson, 1980). Others believe that there are fundamental differences between Pavlov’s dogs and Thorndike’s cats; for

45

instance that salivation is involuntary but pawing voluntary (Skinner, 1938), and yet others suggest that each species is a law unto itself (Seligman, 1970; Hinde and Stevenson-Hinde, 1973).

The theoretical ramifications here are difficult, complex and obscure, but in purely practical terms it is possible to make a number of simple distinctions. First of all there is a difference between involuntarily elicited emotional states, or immediate physical reactions, and goal-oriented actions. There is a difference between having an emotion, motive or drive and doing something about it. It is relatively difficult to decide to salivate, or to be angry, or to be anxious, and even more difficult to decide not to salivate or to be angry or anxious, once these reactions have begun. We are usually freer to do what we will than we are to decide what it is that we will. This distinction does not always hold up, especially in animal experiments, but there is a more physiological analogy — the difference between autonomic reactions such as salivation and skin resistance, and skeletal motor reactions such as walking about and making object- directed movements of the front paws. Descriptively, there is a world of difference between making the same old reflexive responses, such as salivation or leg flexion, to new stimuli, and learning new response skills, such as carrying a marble about the cage if you are a rat, balancing a ball on your nose if you are a performing seal, and tying your own shoes if you are a backward child.

Thus there is a case for treating separately the development of new response skills under the influence of reward and punishment, or in the course of striving to achieve more intangible goals, and the shifting about of more reflexive responses and emotional reactions due to Pavlovian associations between stimuli. In learning theory the former are referred to as instrumental learning, or operant conditioning, as opposed to classical conditioning. There is also a case for treating separately many other kinds of learning from experience, such as perceptual learning, the formation of cognitive maps of the environment and the acquisition of knowledge by imitation and by all the resources of human culture — but this takes us even further afield, and I will return to these questions in later chapters.

46

Stamping-in responses versus forming expectancies

Having assumed that Thorndike’s cats were learning in a different way from Pavlov’s dogs, can we now say exactly what it was that the cats were learning? Thorndike’s law of effect says that the effects or results of the cats’ responses, in pulling strings or pressing latches which opened the door to their problem box, gradually stamped in a connection between the stimuli of being in the box and the movements required to pull the strings and press the latches. Descriptively, this is a law of response consequences, which tells us that the probability of an action is determined by its previous outcomes. And this is a pretty good rule of thumb, especially if ‘consequences’ are interpreted imaginatively. In Skinner’s hands, rain dances are performed because of accidental outcomes of rain, work is done because of consequent payments or social credit, and left undone because of the counterattractions of more rewarding alternatives. But the explanation, in terms of responses being stamped in so that they are automatically elicited by stimuli, has only limited support. Thorndike claimed that the stamping-in was obvious because the cats only learned gradually to make the correct response. This is a false claim for two reasons. First, gradualness does not necessarily imply stamping-in — it could mean, for instance, that there was a gradual change in the cats’ ideas — and, second, the cats did not learn gradually anyway (most of them learned very quickly). Thorndike made great play with one animal that learned relatively slowly, in a fairly easy problem box, which could be solved by pulling a loop of string in front of the door. The complete set of latencies of getting out of the box for this animal, in seconds, was 160,30, 90, 60, 15, 28, 20,30, 22, II, 15, 20, 12, 10, 14, 8, 8, 6, 6,7. This certainly shows some gradual improvements, but note the big difference between the first and second trials. And the average for all the twelve cats tested in this problem box shows a fairly dramatic change between the first two trials and the third.

gif

Figure3 Rapid learning in Thorndike's experiments.
The average time taken by twelve cats to get out of Thorndike's problem box A. Individual cats showed even more striking improvements in speed of making the releasing response from one trial to the next, but some animals made their biggest improvement after the first trial, some after the second and some after the third. (After Thorndike, 1898)

As Thorndike himself said, it often happened that if a cat was paying attention to what it was doing when the release device worked, ‘a single experience stamps the association in so completely that ever after the act is done at once’ (Thorndike, 1898, p. 27). If there was an association formed, it was formed very quickly, and modern evidence would suggest that the important thing

47

learned by the cats was the association between the escape response and their escape. So when they performed the same response again soon afterwards it was because they expected to get out (Mackintosh, 1974; Dickinson, 1980).

Thus, for cats, we should not assume that rewards work only by stamping in responses; they also work by establishing certain expectancies of reward in the animals, linked with the response it is necessary to perform to get the reward. But, and this is a big but, response consequences do not always work by building up an animal’s (or a person’s) expectancies. There are no experiments yet which show sea slugs pulling strings to get out of marine problem boxes, but there are experiments in which octopuses pull levers to get bits of fish. Operant conditioning is sometimes claimed for a wide variety of invertebrates and, with sufficient dedication on the part of experimenters, decorticated rabbits can be laboriously trained to press levers in Skinner boxes (Oakley, 1979).

48

Human subjects may by the use of various subterfuges be seen to change the frequency of simple verbalizations like ‘Mmm’, ‘Yes’ and ‘Go ahead’ because these responses have changed social consequences (Rosenfeld and Baer, 1969), while professing no knowledge of either change. One could always appeal to unconscious expectancies and strategies in the learning of human skills and habits, but this still leaves decorticated rabbits, to say nothing of decapitated cockroaches (Horridge, 1962). Oakley (1979) suggests that some forms of response-learning are sub-cortical, which is certainly true, but Dickinson (1980) has put forward a more comprehensive scheme to differentiate between various results of reward training which may be observed in laboratory animals (see also Adams and Dickinson, 1981a). This is based on the distinction between actions and habits, or between ‘declarative representations’, such as ‘pulling the string opens the door’, and ‘procedural representations’, such as ‘when in the box, pull the string’. Many lengthy experiments with rats have suggested that trained rats operate according to rules which correspond closely to Thorndike’s and others’ concepts of stamped-in stimulus-response habits. However, under the right conditions, there are indications that laboratory rats are capable of working according to rules such as ‘if you want a sugar pellet, press the lever’. The technique is to train the animals fairly quickly to press a lever to obtain sugar pellets, and then afterwards to give the animals a pile of sugar pellets in a separate cage, with an injection of lithium chloride, which makes them ill. This is well-known as a taste- aversion procedure, which is a reliable way of putting animals off foods which they have formerly liked (Garcia et al., 1977, chapter 5). The rats had learned to press a lever to get sugar pellets, but the assumption now was that they would no longer want sugar pellets. In some cases, especially after long previous training, rats react stupidly and automatically, since they go on pressing the lever even though they would never consume the rewards. In other experiments, however, it is quite clear that the ‘devaluation of the reinforcer’ by poisoning suppresses the tendency to press the lever (with appropriate controls for any side effects). Thus the conclusion is that the animals were using a declarative rule that ‘pressing the lever brings sugar pellets’, and when they no longer wanted sugar pellets, they no longer pressed the lever (Adams and Dickinson, 1981b). Responses in rats rewarded by food may either

49

stay goal-oriented, in that changes in the value of the goal very rapidly change the responses, or they may become automatic habits and insensitive to changes in the goals themselves.

The general point is that responses changed by rewards may change for a variety of reasons. It is not enough to know that rewards are effective — we also need to know why they are effective.

Autoshaping and reflexive responses to reward

A very special case of not knowing why rewards work occurred with pigeons rewarded for pecking at keys in Skinner boxes. This set-up was used by Ferster and Skinner (1957) to establish the effects of schedules of reinforcement (see p. 53), on the assumption that presenting a pigeon with food after it had pecked at a key could be the model for all other forms of positive reinforcement, that is attractive rewards given for any response. Although some of the effects of schedules of reinforcement are very general across species, not all are, and it has turned out that the relation between a pigeon’s key-pecking and subsequent presentations of grain is very unusual (Terrace, 1981). Pigeons will peck keys in spite of the rewarding properties of food, rather than because of them.

The first finding was that all that is needed to get pigeons to peck the key is to light up the key for a few seconds before each food presentation, following the same sort of rule as is used in classical conditioning (Brown and Jenkins, 1968). Nothing else is needed, for this appears by itself to make the bird spontaneously start pecking at the lighted key, and the effect is called autoshaping. Thereafter, of course, various schedules can be set up in which the pigeon must peck the key in order to get further rewards. But one can also set up the opposite rule, so that if the pigeon does peck at the lighted key food will not be presented on that trial, but if the pigeon is content to wait without pecking the key then food is presented to it for a few seconds at the end of every lighted-key period (which occurs about once a minute on average). This is called an ‘omission procedure’, since food is omitted when the animal responds. In terms of achieving goals, the best strategy for the birds would be to refrain from pecking at all, but almost all pigeons carry on pecking the key on well over 50 per cent of the trials, thus losing more than half the food which

50

they might obtain by waiting patiently (Williams and Williams, 1969). The usual suggestion is that the pecking is an involuntary and emotional reflexive response, produced by the classical conditioning which results from pairing the key light with the food. This idea is supported by the fact that when, as usually, food is being used, pigeons peck the key with an open beak as if they were trying to eat it, but if water is used instead of food pigeons press the key with a closed beak, as if they were trying to drink from it. It is a fairly safe generalization that in any given experiment on animal learning, evidence can be found for both goal-directed instrumental effects and more reflexive Pavlovian conditioned associations (Mackintosh, 1974, p. 139). Pigeons are certainly not indifferent to the rewarding effects of food and peck far more, according to the various schedules (see pp. 53—5), if pecking produces the goal of food, rather than preventing it.

Also, the lesson to be learned from autoshaping is that pecking for pigeons is exceptional, not that all effects of reward are due to classical conditioning. Since pigeons spend most of their lives pecking things, it is not surprising that this response tends to occur whenever food is in the offing. Other birds, such as crows, do not autoshape in the same way, and indeed most other species, including crows and rats, tend to respond to signals for food by searching for the food, rather than investigating the signal (Lea, 1984). Two experiments in which social behaviour appears demonstrate that the effects of stimulus pairings in reward procedures may be complicated, and certainly reveal large species differences. In Jenkins et al.’s procedure (1978) dogs were free to move about in a room in which food was delivered about a metre from a loudspeaker and lamp signal source, which was activated for ten seconds before each food delivery. The dogs differed in their responses, but after some early sniffing at the audio-visual signal a typical pattern was to approach the signal, point at it and playfully bark at it with much wagging of the tail. If the reader thinks it silly of the dogs to have directed such natural social behaviour at an arbitrary signal, consider the swearing, physical abuse or verbal coaxing which people direct at similarly inanimate mechanical objects.

In Timberlake and Grant’s experiment (1975), the stimulus presented to rats as a signal that food was about to be delivered was another rat, dropped into the cage. By strict analogy with the

51

pigeon-eating-the-key theory of autoshaping, this should have led to cannibalism, but the result of the ‘other rat signals food’ association was to increase the amount of friendly social reaction to the new animal, in the accepted rat form of sniffing, pawing and social grooming. The overall message is thus one of species differences, and the integration of reward procedures with the natural behaviour of any particular species. Innate patterns of behaviour have a tendency to assert themselves, and the most successful training methods make use of a species’ natural abilities.

Schedules of reinforcement

Given that there are enormous species differences in learning ability, and given that for any species we may need to select its most natural response, there are certain regularities in the relation between the response and experimental schedules of rewards, or reinforcers, which apply across most higher vertebrate species tested (mammals and birds), give or take a certain amount of quantitative variation. The most profound and reliable regularity is also to some extent counter-intuitive and is therefore all the more worthy of note. This is that more is worse, if one is using rewards to generate the maximum amount of behaviour. That needs some qualification: it is more often that is worse, rather than big rewards being worse than small ones, and more often is worse at the end of training than at the beginning. When a learned response is consistently rewarded, absolutely every time it occurs, then if rewards are suddenly stopped it will be almost as if they had never been given; but when rewards are doled out inconsistently and unreliably, then the behaviour so meagrely rewarded is likely to persist through many further trials and tribulations.

There may be natural forces at work in this, since most species have to be accustomed to undertake periods of foraging, chasing or waiting sustained by only the occasional achievement of natural goals, but the study of the ‘partial reinforcement effect’ in the laboratory allows for discussion of some of the results in terms of experimental variables. The strengthening effects of intermittent reinforcement are app4rent in many circumstances, and are said to be readily visible in human gambling. They have been studied extensively in rats running down alleys or through mazes

52

to find food at the goal only if they are in luck on a random experimental sequence. However, they are perhaps most obvious of all in the typical Skinner box experiment. In this the animal in the box makes a single response repeatedly and rewards are occasionally made available by means of a mechanical dispenser of some sort. It is thus possible to institute several simple limiting conditions relating the response to the reward, and many less obvious ones (Ferster and Skinner, 1957). A convenient visual display of the rate at which responses are repeated is obtained by plotting responses cumulatively, as steps up a vertical axis, against the time when they are made, along the horizontal axis. Skinner termed this a ‘cumulative record’ and it allows quick inspection, although more accurate quantitative methods are used nowadays for experimental work. Typical schedule performance with subsequent patterns of responses in extinction, for five basic schedules of reinforcement, are shown in figure 4, and these schedules are described below.

gif

Figure 4 Cumulative records of reinforced responding and extinction with basic schedules of reinforcement.

1 Continuous reinforcement (CRF): it is a straight line of shallow angle, with every response rewarded, which quickly flattens out completely when rewards are discontinued in extinction.

2 Fixed-interval schedules: training on interval schedules is relatively easy for both trainer and trainee, since only one response is necessary for reward. The interval specified is a minimum, since the rule is that a certain period must elapse after a reward before the next one is obtained, but the next reward is not given until the animal makes its first response after this period is over. On fixed intervals the period is the same every time, say one minute, and the expected result (for a rat or a pigeon after several hours of experience) is that an animal somehow uses a biological clock in order to vary its response rate, responding faster and faster as time passes since the last reinforcement and the next reward becomes due. Even in the absence of rewards, in extinction, this patterning of behaviour is sustained, and gradual accelerations of response rate can be observed. It has been shown that the passage of time, rather than the chaining together of responses, is the important variable, and the relation of response output to internal timing processes continues to be a topic for investigation (Roberts and Church, 1978).

3 Variable-interval schedules: if the specified minimum interval between any two rewards is varied at random, the time at which a

53

reward can be obtained must be unpredictable, and a steady and fast rate of response is eventually obtained, indicating either that the animal expects reward at any time or that consistent and sustained patterns of responding have become habitual. There is still a slight tendency for response rate to increase as time passes since the last reinforcement. When all reinforcements are discontinued, response rate declines very slowly and gradually.

4 Fixed-ratio schedules: care is necessary for training on fixed-ratio schedules. The rule is that the animal must make a certain fixed number of responses for each reward. An animal first reinforced for every response, and then put immediately on the task of making a hundred responses per reward (FR100) would almost certainly give up. However, suppose that the task requirement is increased gradually, or that, as in Skinner’s original experiment (1938), the animals are first trained on an interval schedule (which allows for backsliding because rewards are always available at some point for just one response). In this case ratios of tens or hundreds may be undertaken by rats and pigeons, and a ratio of

54

120,000 responses per (large) reward has been reported for a chimpanzee (Findley and Brady, 1965). The organization of responses is very obvious when an animal has learned a fixed-ratio schedule, since it pauses completely after each reinforcement, and then reels off the required fixed number of responses very rapidly. It is clear from the persistence of such step-wise patterns in extinction, and from more detailed experiments, that the grouping of responses approximates the number required, and in some cases there are indications that an animal comes to expect reward when this fixed number has been completed (Adams and Walker, 1972).

5 Variable-ratio schedules: these are the closest to gambling schedules, since responses must be made in order to obtain reward, but each response has a low probability of a successful outcome. The faster the animal responds, the more often it will be rewarded, and thus very high response rates may be obtained. However, if, at random, rewards become very scarce indeed, animals may show signs of ‘ratio strain’ and respond only in bursts (Ferster and Skinner, 1957).

Biofeedback and response skills

It is often possible in everyday life to distinguish between the motives for attempting a task and the factors which allow for mastery of it. In many competitive sports, the rewards for engaging in them may have to do with the excitement of competition, the joys of winning, love of the game itself or its social fringe benefits, whereas the development of the skills necessary for taking part may involve many hours of tedious and lonely practice. Rewards obtained by exercising a skill, although useful in inducing further efforts, may not make any difference to proficiency: no amount of celebration surrounding the achievement by a golfer of a hole in one is likely to improve his swing, although it may encourage him to spend more time playing. On the other hand the swing may be improved by certain prompts and advice, such as suggestions about changes of grip, keeping the head down and so on, which are not much fun in themselves.

However, a law of response consequences may often be applied to skills. Feedback, both internal and external, which supplies information about what is actually being done is usually necessary

55

to provide structure and organization in response output. Moreover, knowledge of results in some form or other is logically essential in the later stages — it would be impossible for a golfer to get very far if he never ever knew what happened to the balls he hit. Even in the golfer, it is possible to distinguish between two aspects of response consequences, the motivational and the informational, however much these are intertwined in practice. Information that you are doing something the wrong way may be just as useful as the information that you are getting it right — in both cases being right or wrong can have motivational effects. However, small-scale detailed information such as that gained internally by proprioception, or that which improves split-second timing when you succeed in keeping your eye on the ball, have beneficial effects on performance, and these are out of all proportion to their influence on motivation, which may be negligible.

A special case of response consequences occurs in the therapeutic technique called ‘biofeedback’. Nature has not seen fit to provide us with a great deal of detailed feedback about the state of our internal organs or the volume of our glandular secretions. We are, thankfully, not usually aware of how much we are salivating, how fast our heart is beating, what our blood pressure is, or what is happening to our skin-conductance response. These are all controlled by the autonomic part of the nervous system, and this does not have as much by way of internal and unconscious neural control circuits as does the skeletal system for balance and movement. It is partly because of this lack of internal feedback that it is extremely difficult to learn to voluntarily control autonomic responses such as heart rate. After years of dedication and self-denial yogis in India can do it; and it is easy enough to raise one’s heart rate by running up and down the stairs, but this is not the same thing as direct voluntary control of heart rate, in the same way that we have direct voluntary control of whether to run upstairs or not.

A short cut to a measure of increased self-control over bodily states of the yogi type is to provide external feedback by electronic means. Subjects, or patients, can be provided with an egg-like object to hold in their hand, which emits a tone of increasing pitch when skin resistance is lowered by anxiously sweating palms. Lights can be made to flash if blood pressure gets too high, or for experimental purposes a speedometer needle can be put in front

56

of patients which gives an exact read-out of their moment- to-moment heart rate. With these aids, people have a much better chance of learning to keep their skin resistance, blood pressure, or heart rate within prescribed limits, although biofeedback of this kind is by no means a universal panacea for disordered autonomic activity (Yates, 1980). How biofeedback helps is a difficult and complicated question. Both mental and physical strategies are involved, since autonomic responsiveness can vary according to voluntary breathing patterns, posture, and degree of muscle relaxation; and according to calming or exciting trains of thought and spoken or unspoken verbal rituals.

It is almost certainly not true that biofeedback works by directly reinforcing autonomic responses, according to the principles of reward and punishment and instrumental learning. Of course motivation is important, and so is practice. People have to make efforts to ‘keep the needle on the right’ or ‘make the tone stay low’, but to do this they can learn cognitive and muscular intervening states. And so can animals. It is still sometimes claimed that direct reward and punishment of autonomic responses was demonstrated by certain experiments reported some time ago by Neal Miller and Leo DiCara on rats whose entire skeletal musculature had been paralysed by the drug d- tubocurarine (e.g. DiCara and Miller, 1968, Miller and DiCara, 1967). All such effects are either fraudulent or artifactual. Undrugged rats respond very well to biofeedback procedures, and previous experience prior to paralysis may reveal itself in the drugged state. But although Miller himself is a scientist of undoubted integrity and repute, the early experiments on curarized rats can all be dismissed. There are many technical difficulties associated with continuous adjustments to the artificial respiration of curarized animals. No one else has been able to repeat the original results despite enormous efforts in other laboratories, DiCara for one reason or another committed suicide, and Miller now says ‘it is prudent not to rely on any of the experiments on curarized animals for evidence on the instrumental learning of visceral responses’ (1978, p. 376).

Cognitive processes in animal learning

There have been two main developments in the study of animal learning over the past decade. One is the realization that some

57

responses are learned more easily than others. Not only are whole body movements and object manipulations learned more easily under the influence of external rewards and punishments than the complex internal balances of the internal organs, but for each species certain kinds of natural action are more readily associated with biologically appropriate goals and deterrents (Lea, 1984; Walker, 1975). The second trend is that far greater emphasis than formerly has been placed on inner cognitive processes as important determinants of those learned actions (Mackintosh, 1974; Hulse et al., 1978; Dickinson, 1980; Roitblat et al., 1983; Walker, 1983). Much of the evidence which has supported this trend has come from experiments in which reward procedures are subtly modified in ways which reveal the importance of internal descriptions of outer events, expectancies for the immediate future and memories of the immediate past to a greater extent than was apparent in the initial experiments of Thorndike and Skinner. Detailed treatment of this topic can be found elsewhere, but the experimental techniques can be summarized in terms of spatial learning, perception and memory.

Spatial learning— mazes and maps

Tolman’s interpretation of his maze experiments — that they indicate that rats form cognitive maps of their environment independently of imposed rewards and punishment — has been broadly supported (O’Keefe and Nadel, 1978). A new technique which produces easily replicated results is the radial maze (Olton, 1979). In this, eight arms are laid out like the spokes of a cartwheel, but with no outer perimeter. A single food pellet is placed at the end of each arm, and in the basic experiment a rat is simply put in the middle of the maze and allowed to go forth and back along each spoke, retrieving the food pellets. The experimental result is the fact that it almost never retraces its steps. Thorndike’s stamping- in principle would suggest that the animal would have to repeat each rewarded response or respond at random. Searching by smell is not the answer: crude controls initially included drenching the whole maze in aftershave, which the rats found unpleasant but not confusing. More sophisticated controls (including confining the rat at the central axis while the arms were rotated, or already tried arms switched with untried ones) all

58

indicate that rats make use of visual landmarks to distinguish where they have already been from where food pellets remain to be found. Animals must have both some sort of mental map of their world and also something analogous to a working memory, which records what they have recently been doing in that world.

Perception and pattern recognition

Countless experiments confirm the intuition that higher animals as well as people must have some way of picking out transiently relevant details from the booming and buzzing confusion which would surround them if all sensory channels remained permanently open. This applies within sensory modalities, as well as between them. If tones predict rewards, then pigeons will pay more attention to tones than they would otherwise. But equally, if shapes predict rewards one minute, and colours the next, then both pigeons and monkeys learn to select shapes irrespective of colour, or colour irrespective of shape, as appropriate.

Slightly more surprising is the finding that pigeons can somehow learn to distinguish As from 25, irrespective of which of dozens of different type faces they might suddenly be presented with (Morgan et al., 1976). Pigeons also learn to distinguish photographs of one person (in any of 100 poses) from photographs of other individuals in the same range of surroundings and activities; holiday slides which contain people from those which do not; slides with trees from slides without trees; silhouettes of oak leaves from silhouettes of leaves from any other species (Herrnstein et al., 1976; Cerella, 1979). All of this goes to show that stimulus learning is separate from response learning, and that recognizing familiar objects and individuals, which is an imperative fact of life for most species, requires the building-up, as a result of learning from experience, of a vast array of elaborate internal descriptions (Walker, 1983). Whenever rewards are given, this will not merely motivate and inform rewarded responses, it will also affect the attention and perception of the rewarded subject.

Memory and expectancies

Clearly, spatial and perceptual learning, as well as the learning of precise and orderly response skills, requires the retention of

59

information in some complicated forms. The way in which retained information controls current actions is a very tricky subject, and in most cases it can be assumed that animal behaviour is impelled by needs of the moment, rather than constructed from the recall of past history and anticipatory speculation. However, Mackintosh (1974) and Adams and Dickinson (1981b) suggest that there are instances where past experience appears to result in the formation of sensible expectancies and guesses by laboratory rats. Also, specialized experimental techniques have revealed that detailed information about previous episodes of experience sometimes determines choices made by laboratory animals (Medin et al., 1976; Roitblat et al., 1983).

Weismann et al. (1980) conclude from a careful study in which different orders of coloured lights were presented to pigeons that ‘animals have more than a succession of feelings — they have a feeling of succession’. This is one way in which memorial processes can be expected to be utilized by many animal species: the continuity and ordering of perceptions over time, and the ordering and sequencing of behavioural output, are extremely general requirements. However, this would not imply much by way of remembered knowledge of events. Experiments on monkeys and apes suggest that our closest relatives have the closest approximations to our own memories, even if lacking in the ability to put memories in a verbal code. Without such verbal coding there is no doubt even more of a memorial vacuum than the one we would experience if we were prevented from ever consulting our books, notes, photocopies, file cards and minutes.

The visual memory of pig-tailed macaque monkeys for pictures of coffee mugs, shoes, spectacles, screw-drivers and other human artefacts is about 80 per cent reliable over a period of two days (Overman and Doty, 1980). This is not as good as human subjects given the equivalent test, since they were 100 per cent correct, but the pig-tailed macaques neither know names for these objects nor do they wear shoes or spectacles or drink from coffee mugs, and they understandably pay far less attention to the pictures. The monkeys only paid attention at all because they were allowed to drink orange juice from a tube when they touched the correct pictures.

It is no easy matter to interpret the results obtained with tests of animal memory, but the results seem to imply recognition processes

60

which decay with time and which occur in some way or other in all laboratory birds and mammals. Such memory processes may have many natural functions — although squirrels may often forget where they store their nuts, they presumably remember at least some of the time. Birds which store nuts or seeds often seem to make few mistakes in finding them, and the marsh tit has performed well on this task in a rigorous experimental test (Shettleworth and Krebs, 1982). However, memory is clearly most humanlike in monkeys and apes. On such tasks as sorting pictures into different object categories, or sorting nuts, bolts and washers into different piles according to size and type, young chimpanzees may perform as well as human 3- or 4-year-olds (Hayes and Nissen, 1971). Baboons (Beritoff, 1971) and chimpanzees (Menzel, 1978) remember where they have seen food being hidden for minutes and hours, and they remember for weeks and months exactly where in a particular room they themselves once found food, never having been back to that room in the intervening period.

The many attempts to train chimpanzees to communicate by gestural sign language, or by manipulating visual symbols for objects and actions, have not demonstrated any ability on their part to acquire the syntax and grammar which is characteristic of human speech. It is, however, possible to interpret the results of these attempts as further support for the proposition that simian intelligence includes a large measure of cognitive understanding by monkeys and apes of their experience of the world about them, and that therefore a theory of how behaviour is affected by experience cannot afford to leave out the effect of rewards on memory of past events and expectations of future ones (Shettleworth and Krebs, 1982; Walker, 1983).

Conclusions

It should go without saying that the application of learning theory to human behaviour will have to be far more circumspect today than it was for Thorndike and Skinner. For the theory of learning by reward, in order to account for experimental evidence obtained from both rats and chimpanzees, has had to develop from Thorndike’s concept of stamped-in stimulus-response connections to the consideration of perceptual categorization, the organization of memory, and expectancies which are derived from if-then

61

declarative representations of the relation between responses and reinforcements. But we should be careful not to forget that, in practical terms, experimental results stand apart from theory. There is little in subsequent work to deny the empirical generalization which impressed itself on the earlier learning theorists that, however it may come about, the effect of reward procedures on future behaviour is profound, dramatic and pervasive.

62

End of Chaper 4 | Contents | Start of Chaper 5