reinforcement and
extinction: scarcity and
absence of reinforcers



`If at first you don't succeed, try, try and try again.' It is often necessary to make several attempts before achieving a goal. However generously we define a response, it is always possible to find responses which have to be repeated without result. Several rabbits might have to be chased for each one that is caught, several shots might be needed to get a golf ball in the hole, several shops may need to be searched before we find the pair of shoes we like. Searching and looking are the most obvious kind of behaviour where persistence is intermittently rewarded, but the usual example of gambling, where rewards may be not only rare, but apparently inadequate, suggests that very persistent behaviour may be maintained by rewards that are extremely infrequent. Subsidiary or compensatory rewards may help fill the gap. Dogs obviously enjoy chasing rabbits even if they don't catch them (see Ch. 7), individual golf shots are rewarded by getting closer to the hole and so on. Many goals occur so infrequently that goal-directed behaviour has to be reinforced by the achievement of sub-goals, and ancillary benefits. If, for instance, you decide to sail around the world single-handed, then the preparations have to be either enjoyable in themselves, or 'pursued because they bring the final goal nearer. Many of the sub-goals and subsidiary rewards canbe discussed as secondary reinforcers or parts of response chains that are systematically related to major or primary reinforcers involved (Ch. 7).

However, the most dramatic discovery of research in operant


conditioning is that the behavioural effects of individual reinforcers can grow with extended training so that the same reinforcer which originally generated one response comes to command literally thousands of responses of the same kind. A chimpanzee which starts by hesitantly pushing a button once for each food reward may after long training get through 4,000 pushes for each reward. But what is the limit to this process? Does it mean that four hundred thousand, or four million responses can be learned just as easily, or that eventually the rewards won't matter at all? That can't be true, because lack of reinforcement must eventually bring about the extinction or the disappearance of the reinforced response, according to all the textbooks. There is obviously a paradox here, if lack of reinforcers can sometimes produce thousands of responses, and at other times produce the dwindling away of the response called extinction. In fact in the laboratory we can tell fairly well when cutting out reinforcers will extend the range of responses and when it will cause extinction. However this still leaves us with another unexpected result, because extinction is slower after sparse and infrequent reinforcement than after rich and continuous reinforcement. This is called the partial reinforcement effect, which is one of the easiest phenomena to reproduce, but one of the most difficult to explain.

Apart from the partial reinforcement effect of greater response persistence found with less reinforcement, the interest in intermittent reinforcement schedules lies in different patterns of behaviour produced by different schedules of intermittent reinforcement. These schedules were briefly described in Chapter 3.

Intermittent schedules of reinforcement

A convenient visual display representing performance on simple schedules of reinforcement can be obtained with a cumulative recorder. More elaborate recordings of exact times when various responses are made can be achieved by storing data on magnetic tape for later computer analysis, but cumulative records give a useful general impression of the patterns of responding. When the records are presented as in Figure 6.1 it is important to remember that a horizontal line means that nothing is happening, whatever the level of the line. Horizontal


distance measures time, and vertical distance gives the total number of responses made. The slope of the record corresponds to the rate of response, which is a valuable measure on basic schedules of reinforcement. In Figure 6.1 records for each of the four basic reinforcement schedules have been put together for comparison. Typical schedule performance, and extinction curves for each schedule, are shown. The performance curves represent samples of behaviour after considerable experience of the schedule. It usually takes many hours for animals to learn a particular schedule under conventional laboratory conditions, but after that behaviour should not vary much from day to day if health and weight of the animal, and environmental factors such as temperature, remain constant.

gif Fig. 6.1 Cumulative records of reinforced responding and extinction with basic schedules of reinforcement (After Reynolds, 1968)

Many species of animal, including man, have been tested on the schedules of reinforcement mentioned in Figure 6.1. When an appropriate combination of response and reinforcer is selected (see Ch. 7) the basic patterns of response, and the differences between the schedules, can be observed in all species tested. The standard training procedure is to allow for experience of continuous reinforcement of a response before intermittent reinforcement is introduced. Training on interval schedules of reinforcement is much easier than training for ratio schedules


because low response rates in the early stages of training do not prevent the subject from receiving reinforcement.

Fixed interval schedules. Eventual performance when fixed time intervals separate reinforcements (Fig. 6.1) shows the highest rates of response just before reinforcement becomes due. On average, rate of response increases as time passes since the last reinforcement, and this often can be seen on individual cumulative records. It has been shown that the passage of time, rather than chaining of responses, is the important stimulus.

Fixed ratio schedules. Care is necessary for training on fixed ratio schedules. A rat previously reinforced for every lever press would never learn to respond successfully if put immediately on to FR 100; it would probably give up responding before 100 responses had been made. A form of gradual shaping can be used by gradually increasing the size of the fixed ratio in steps within the capacity of the subject. Or a fixed ratio may be used after training on an interval schedule. The final performance on fixed ratio schedules has pronounced pauses after each reinforcement. Following the pause the animal reels off the required fixed number of responses very quickly. As with fixed interval schedules there is evidence of anticipation of reward at the appropriate time when the schedule has been learned : detailed measurement has shown that the speed and force of responses increase towards the end of the run.

Variable interval schedules. If reinforcements are obtainable at unpredictable times, rate of response is much steadier than for fixed schedules (Fig. 6.1). However there is still an underlying tendency for response rate to speed up as time passes since the last reinforcement. This is related to the increasing probability of the animal's getting a reinforcement the longer it has been without one. The average rate of response depends on the average interval between reinforcements. If the average intervals are long so that reinforcements are given infrequently, rate of response is lower than if the average interval is short.

Variable ratio schedules. In these schedules an unpredictable number of responses is needed for each reinforcement. As with all ratio schedules, the sooner the subject makes the responses the sooner he gets the reinforcement. Variable ratio schedules


can therefore produce high rates of response without pausing after reinforcement, if very long ratios are not introduced too suddenly (Fig. 6.1).

Multiple schedules. Two or more schedules can be learned by the same subject when a distinctive cue is used to signal which schedule is in effect. The chimpanzees shot into space in the early stages of the NASA space programme had to perform several tasks on a multiple schedule. While orbiting the earth they had, among other things, to work on a fixed ratio for food while a yellow light was on, make a separate response for water on a low rate schedule if a green light was on, and make fast shock avoidance responses on the water lever if a red light was on. Multiple schedules reflect real-life situations in. that different schedules operate in different situations.

Chain schedules. Responses can be chained together by a method like a multiple schedule, in which each response is performed in a particular stimulus situation. The only difference is that a complete chain of response has to be completed for each reinforcement. A very simple chain would be a schedule for a pigeon to peck ten times on a left-hand button to turn on the light behind the right hand button, and then peck the right-hand button on a variable ratio schedule to produce food.

Concurrent schedules. If more than one schedule is operating at the same time, the subject has to make a continuous series of choices about which response to make next. For instance, a rat could be given a fixed interval schedule on the right-hand lever for food, and a fixed ratio schedule on the left-hand lever for water. In this kind of situation the two responses interfere with each other and neither response would be performed in the normal manner.

Comparison of operant with classical conditioning

The ease of establishing intermittent schedules of reinforcement in operant conditioning is in marked contrast to the weakening effects of intermittent reinforcement in Pavlovian procedures. If a stimulus sometimes signals food and sometimes doesn't, this simply means that it is an unreliable signal as far as classical conditioning goes. But if a response sometimes produces food but sometimes doesn't it means that making more responses is


a necessary strategy for obtaining food. With operant schedules of reinforcement, the activity of making numerous responses is rewarded. Although there are several opinions about the interpretation which should be given to disparities between operant and classical conditioning, it is agreed that intermittent reinforcement is an area of dispute. The distinction is between an almost passive absorbing of information in pure classical conditioning and the active organization of behaviour necessary in operant conditioning. This difference between learning about reinforcers and learning what to do to get reinforcers is amplified when reinforcers are relatively rare events.

Extinction after continuous and intermittent reinforcement

Why does a response die away when reinforcement ceases? The answer seems almost unnecessary - the response was there only because of the reinforcement and when the reinforcement goes, so must the response. It is the supplementary questions which are difficult. Why do responses appear to continue indefinitely after some forms of negative reinforcement (p. 66) or after reinforcement by administration of narcotics to addicted animals? Why does the pattern of responding in extinction. reflect the previous schedule of reinforcement? Why are meagrely-rewarded difficult tasks more persistently performed in extinction than easy tasks which have been richly rewarded?

The number of conflicting interpretations given for curious results in extinction procedures is greater than for other areas of learning but there are two distinct themes of explanation. On the one hand extinction performance may reflect emotional upheavals caused by the absence of rewards, and on the other it may reflect cognitive problems to do with 'finding-out' that rewards are no longer available.

Cognitive effects in extinction. The importance of 'finding-out' in extinction of the standard tasks of maze-running and barpressing has been demonstrated by `latent' extinction which occurs when animals are given experience of empty goal boxes or empty food magazines. This experience immediately reduces the vigour of responses previously rewarded from the now empty goal boxes or food magazines. Circumstances which help the subject `notice the difference' in extinction, such as novel cues or a change in the apparatus, will speed up extinction, while difficulties in finding out about absence of reinforcement


prolong extinction. The problems of extinguishing avoidance responses which take the subject out of a situation and thus prevent contact with new information have been mentioned already (p. 66). The perplexities of distinguishing between scarcity and absence of reinforcers must contribute to the continuance of responding after variable schedules of reinforcement (Fig. 6.1). Subjects experienced on a variable ratio schedule of reinforcement shifted to extinction are in the position of someone putting money into a fruit machine which has previously paid off about once every hundred goes, but has surreptitiously been fixed so that it doesn't pay off at all. Even allowing for perfect memory and ideal processing of new information, it would take some time to conclude that the machine had in fact been altered. It is not surprising that if long sequences of non-rewarded responses have eventually been followed by reward, it is more likely that long sequences of non-rewarded responses will be performed in extinction.

Emotional effects in extinction. Perhaps the persistence of responding after negative reinforcement or reinforcement by narcotics is due partly to the emotional intensity of the original learning. Not much work has been done on that problem, but there has been plenty of speculation about the influence of frustration and disappointment when reinforcements are discontinued. Monkeys, as well as children, may have tantrums if an accustomed reward is tampered with - there is little doubt that discontinuing rewards can produce emotional disturbance. Amsel (1972) has investigated frustration in rats running through simple mazes with two goal boxes in which rats run down an alley to get food in the first box, then run from the first to the second box to get more food. If food is left out of the first box, they reveal their frustration by running faster to the second box. Amsel's theory is that the same frustration normally suppresses responding in extinction. Animals who have experienced intermittent reinforcement will have learned to tolerate such frustration, and will therefore go on responding for long periods during extinction. This effect can be seen after intermittent or partial reinforcement has been given for running mazes as well as with variable schedules of reinforcement. If Amsel is right, intermittent reinforcement teaches not only particular response strategies but also more general emotional reactions, so that an animal may learn to `keep trying' and this


will increase the persistence of responses in extinction (see p. 59). The existence of such general response classes might help to explain otherwise puzzling results where animals show more persistence after being trained on arduous tasks such as running up steep inclines.

It may be that experience of conditions where rewards are few and far between leads to persistence in extinction for both cognitive and emotional reasons. The absence of reward is less noticeable after scarcity, response sequences and strategies formed by schedules of reinforcement are not easily disturbed, and emotional resistance to the absence of reward may grow up as a by-product of intermittency of reinforcement.

Intermittent reinforcement and extinction in human behaviour

The idea of intermittent reinforcement is used more as an explanatory device than as a practical tool in applications of reinforcement theory to human behaviour. It is said that persistent behaviours which have very few apparent rewards may be sustained by occasional reinforcement. Examples are participation in games of chance and in superstitious rituals, where profit may be infrequent or accidental (Skinner, 1953). It may prove feasible to utilize similar schedules of reinforcement in therapy to increase the resistance to extinction of adaptive behaviours.

Extinction itself has sometimes been found effective in the management of behaviour problems in children. If parents consistently ignore behaviour such as bedtime tantrums, it may gradually disappear. In some cases, even severe or selfdestructive deviant behaviour will stop when it no longer attracts the attention of onlookers (Lovaas and Simmons, 1969). However Ferster (1961) has pointed out the theoretical dangers of insufficient reaction by parents to the activities of their children. If parents remain impassive in the face of the ordinary range of annoying behaviours it may come about that only bizarre behaviours by the child gain parental attention. It is therefore hazardous for parents to ignore mild misbehaviours if at the same time more extreme acts provoke rewarding social exchanges. Nevertheless the extinction of deviant behaviours has sometimes been accomplished when parents carefully ignore all such responses (Wahler, 1969).


Summary and conclusions

The effects of operant reinforcement can be stretched out so that long periods of activity are devoted to each reinforcer. Standard routines for doing this, schedules of reinforcement, space out rewards in time or according to amounts of response. Although a little reinforcement can be made to go a long way by these means, no reinforcement at all usually causes a gradual withering away of the response pattern previously built up under its influence. For one reason or another, this extinction of behaviours when reinforcement is removed is less immediate after experience of sparse or variable rewards.