7
Primary and secondary
reinforcers: where do
they come from?

 

 

`Feels great' - Like it OK' -`Happy button' -`Feel sick all over' : these are one patient's subjective descriptions of what it is like to have reinforcement centres aroused through silver wires stuck deep into the brain. Some patients have had as many as fifty leads permanently inplanted in their head, so that very small electrical currents could be delivered to different parts of their brain. The reinforcing effects of the various electrodes are assessed by allowing the individual to press buttons which turn on brief jolts of stimulation (Heath, 1963). Investigations like this are rare, but the findings complement thousands of experiments with rats and monkeys designed to discover exactly what parts of the brain are important for motivation and reinforcement. For the human patient, pressing the `Feels great' button was accompanied by some sexual images and by the elimination of anxiety and `bad' thoughts. Animals receiving rewarding brain stimulation indulge in almost every kind of activity including exploration, eating and sexual responses, depending on the precise area or site stimulated. Although many of the most important details have yet to be settled, study of the brain mechanisms involved in reinforcement is one way of trying to answer the basic questions of `what is reinforcement?' and `why are reinforcers reinforcing?' A physiological answer may eventually assist in the matching together of other kinds of information based on behavioural experiments or subjective impressions.

A second kind of answer to the question `why are reinforcers

89

reinforcing?' can be made in terms of evolution. Unless Darwin was completely mistaken, all species have had to evolve methods of ensuring that they eat enough food, keep away from dangerous or damaging situations and have proper social relationships with their fellows particularly in the context of reproduction. Hull and Skinner were content to leave it at that, and get on with behavioural analyses of reinforcement, but ethologists like Tinbergen and Lorenz have investigated naturally-occurring patterns of behaviour which indicate that animals have evolved a number of methods of adapting their behaviour to the environment, including the development of 'species-specific' reinforcers. Morris (1968) among others has speculated about the results of human evolution, which may include unique social and intellectual forms of reward.

A third kind of answer is given by emphasizing the importance of individual experience, especially in human cultures. Many rewarding activities - playing the electric guitar, pretend-ing to be a spaceman - are of such recent origin that brainphysiology or the sequence of human evolution have very limited relevance to understanding why that in particular functions as a reinforcer. Traditionally behaviourists have laid a great deal of weight on recently acquired secondary reinforcers because of the supposed predominance of these in modern societies.

Physiology of reinforcement

The most frequently selected location of electrodes for positive brain reinforcement is the medial forebrain bundle which is a bundle of nerve fibres that runs through an area in the middle of the brain called the lateral hypothalamus. This gives very strong and reliable reinforcing effects, with animals pressing levers to obtain stimulation almost to exhaustion. It is not at all clear that this is a single `centre' for positive reinforcement rather than a concentrated pathway for nerve impulses connected with rewarding processes. It is,possible to claim, though, that there are separate anatomical locations for positive and negative reinforcement, with the medial forebrain bundle (MFB) the main pathway for reward and the periventrical system (PVS) the area for negative reinforcement (Stein, 1969). Stimulation of the PVS has most of the effects that are produced by electrical shocks, and stimulation of the MFB is roughly equivalent to external rewards like food and water. In order to use MFB stimulation

90

in schedules of reinforcement it is necessary to allow rats several self-produced shots for each `reward'. The technique is to have the rat press twenty times (for instance) at the right-hand lever. As the reward for this a left-hand lever is activated so that the rat can give itself 100 brief pulses of current through wires going down into its brain. Used like this MFB stimulation produces results similar to that of food given to hungry animals (Pliskoff et al., 1965). There is little difficulty in showing that responding can be suppressed by punishment with PVS stimulation.

There are thus reasonable grounds for the hypothesis that the behavioural difference between positive and negative reinforcement is partly produced by separate physiological processes. A more tentative suggestion is that there may be chemically different systems for the incentive and response-shaping functions of positive reinforcement (Crow, 1973). It is known that the transmission of impulses in many neural pathways requires the release of the chemical noradrenalin, while certain other pathways make use of a different agent, dopamine. Both kinds have been identified in the MFB but it is possible to find places where only one kind operates. Rats pressing a lever for stimulation at 'dopaminergic' sites move forward bright-eyed and eager, sniffing at the lever and actively exploring the surrounding region. In marked contrast, rats self-stimulating at 'noradrenergic' sites press dully and mechanically with little apparent enthusiasm. Whatever the final verdict is, it looks as though neurophysiologists will eventually be able to give us a fairly detailed picture of the separate biochemical and neural processes which underly different kinds of reinforcement effect.

At present there is still no agreement about the relative importance of the several possible aspects of reinforcing events : direct effect of stimulus in-put; subjective pleasure; performance of rewarding actions or release from tensions or drives. The dissociation between subjective report' and unconscious motivational effects was of course a major aspect of Freud's theories and there can be little doubt that reinforcement can take place without intense pleasure or even knowledge on the part of the subject. A hint of this kind of effect is apparent in the data that goes with the subjective descriptions at the beginning of this chapter. The button producing stimulation that was only described as 'Like it OK' was actually pressed a third as much again as the `Happy button' and almost as much as the `Feels

91

great' button. Even more responses were given on a button that produced subjective irritability rather than pleasure! The behavioural reinforcing power of a stimulus cannot accurately be judged from someone's own sensations, although on the whole things which people say they like should work as reinforcers.

Release from tensions or reduction of drives is one way of describing reinforcement by escape from pain or anxiety. However, it has proved to be a very minor part of positive reinforcement with food or brain stimulation (A2, D2). Escape from hunger is less important than positive incentive for food, as far as this can be shown from laboratory experiment. A good deal of knowledge about the way the hypothalamus controls motivation for eating has now accumulated (A2; Nisbett, 1972) but it is all consistent with the simple fact that nutritional rerequirements are rather remote from the short-term reinforcement for eating. It is the stimulus properties of taste and smell, if not the pleasure of eating, which reinforce, as is evidenced by rats' liking for sacharrin and most people's tendency to overeat, given the opportunity.

It is extremely hard to disentangle the various pushes and pulls given by sensation alone or by actions connected with sensation. Responses such as eating and drinking, often referred to as consummatory acts, may serve as reinforcers. Indeed one view is that activities are the essential thing about reinforcement, in the sense that `eating' rather than `food' is what matters most. A special kind of impulse to respond, which some authors believe is the prototype for all reinforcement, is produced when animals receive electric stimulation of the MFB and hypothalamus (Valenstein et al., 1970). The same kind of stimulation which can be used as a reward actually provokes a range of species-specific or instinctive responses. When several electrodes are implanted in the same animal, an experimenter may `turn on' different behaviours by directing current to particular points in the brain. Dramatic demonstrations have been given of 'radio-controlled' changes in the charging of fighting bulls. Rats can be made to shift from eating to copulating and back again at the flip of a switch (Caggiula, 1970).

The interesting thing as far as theories of motivation go is that such electrical control is not mechanical firing of reflexes, but induction of a mood, or incentive to respond. This is easiest to imagine in the case of sexual responses, where the brain stimulation causes sexual excitement which is a reward in itself,

92

but also creates incentive for further sexual activity. The same kind of thing happens, with activation of different parts of the hypothalamus, for other responses like fighting, eating or drinking. It also seems that sometimes a particular electrode produces an incentive to `do something' without strict boundaries on the behaviour, so that rats get used to doing whatever is available when they are first stimulated. An example is rats learning to run back and forth inside a large box to turn current to their hypothalamus on and off. They would do this anyway, but if given small objects at the place where the current turned on, they always carry something to the other side. The conclusion to be drawn from this set of results is not obvious, but it seems as though one of the jobs of the reinforcement mechanism in the hypothalamus is to make animals do things in their repertoire of instinctive responses. Intimately connected with this function is the facility for making animals do things which are only indirectly part of their natural repertoire, like pressing levers, and allowing them to learn to behave in these new ways.

Evolution and reinforcement

The discovery that natural patterns of behaviour like attack movements can be elicited by electrical stimulation of the brain, even in animals who have never had the chance to exhibit the behaviour before; has helped to rekindle interest in the 'wiredin' or instinctive influences on learning, by which each species of animal is `prepared' or constrained so that some responses are more easily learned than others, or that some reinforcers will only work for naturally appropriate responses (Seligman, 1970, Hinde and Stevenson-Hinde, 1973). Each species may have some idiosyncratic forms of learning or types of response. However, the general importance of defensive reactions and 'running-away' with negative reinforcement has been mentioned in Chapter 5, and approach reactions to food, and orienting and exploration in response to unfamiliar stimuli, also show some degree of uniformity between species. Evolutionary factors are of critical importance in determining what activities and stimuli serve as reinforcers for particular species, and the range of possible behaviours that are amenable to each reinforcer.

Food is used so often in experiments because it is an extremely reliable and powerful reinforcer which will motivate a wide range of responses in most animals. The type of food which is

93

reinforcing may of course be species-specific and certain types of 'food-getting' responses, such as pecking in birds and hunting in carnivores, are pre-learned or `prepared' to a greater or lesser extent. For other reinforcers appropriate behaviours are also constrained. Migration and reproduction must involve strong reinforcers, but the range of behaviours which occur in the natural habitat of any species is usually very narrow. This does not mean that unnatural types of response cannot be learned; although it is certainly not part of either sex's natural mating pattern, both male and female rats will learn to press levers to gain access to a sexually attractive partner, in much the same way that they press levers to obtain food. Rodents apparently press levers for almost anything that has been tried as a reinforcer, including the opportunity to dig sand or shred up paper for nests. But it is not the case that all reinforcers can reinforce all responses - grooming responses are not reinforced very well by food for instance.

Specialized reinforcement processes may be involved in the development of social attachments during infancy. A remarkably rapid fixation of social attachment to whatever object is around at the time occurs during the first twenty-four hours of the life of birds like ducks or geese, where it normally ensures that the young birds follow their mother (this is termed imprinting). A much slower process of socialization takes place in mammals, but it is often found that the proper operation of social and sexual reinforcers in adult life depends on social experience as an infant. It is a moot point whether either the rapid imprinting process or the slower development of social attachment are `highly prepared' or instinctive responses, but Hoffman and Ratner (1973) persist with the notion that certain innate forms of reinforcement provide the basis for both phenomena.

As in most instances of the nature/nurture controversy, there is no way to separate built-in and acquired influences on motivation because normal development requires that these influences act jointly. However species-specific limitations have been suggested, even'for the human species. Some facial signals, such as enlarged pupils, are said to have in-built sexual attractiveness. Desmond Morris suggested that the importance attached to female breasts derived from the primate fascination with buttocks. Rather more credibility can be given to the careful study of facial expression in primates (monkeys and apes) which has

94

implied that smiling and laughing in man evolved as separate emotional responses. The consensus of ethological opinion is that there is a considerable measure of innate influence on human social reinforcers, starting with contact comforts in maternal attachment (Harlow, 1958) and continuing with smiling and being smiled at, expressions of greeting and so on (Hinde, 1972). It is always worth bearing in mind, however, the constant modification and attenuation of any human `prepared' responses by social and cultural experience. Watching colour television and riding in cars are two powerful modern reinforcers. To an extent we could say that they depend on innate preferences - one has to have the physiological equipment for colour vision, and possibly some preference for bright or colourful objects. Riding in vehicles seems to be reinforcing for chimpanzees, and could thus be described as some innate primate value. But the major determinant of the power of televisions and cars as reinforcers is surely a set of experiences and learned attitudes within a particular culture.

Secondary reinforcers

Because individual experience is bound to attentuate or enhance in-built reinforcers - even the `innate' behaviours coming from brain stimulation can be changed by training - there can be no hard and fast distinction between those reinforcers determined by evolution and those relatively independent of it. But it is convenient to distinguish roughly between a category where the necessary biological function of reinforcers is fairly obvious - traditionally food, water, pain and sex - and a category where biological function is remote. Primary rein f orcers are the more directly biological ones, secondary rein f orcers more arbitrary or artificial.

Another scheme for classifying reinforcers, put forward by Premack (1965), removes the need for a two-way classification. He proposed that all behaviours can be considered in the light of a single scale of reinforcement value based on the preferences of an individual subject or of a particular animal species. Premack's hypothesis is that activities at the top of the scale will reinforce behaviours further down, but not vice-versa. Most people would agree that the incentive for working overtime- is related to more preferred activities which demand the spending of money. It is less obvious that watching television would reward mowing the lawn, but contingencies like this are often

95

set up for children as in `you must clean up your room before you go out to play'. However, position on a scale of preference should vary according to deprivation conditions and the factor of habituation (Ch. 1), as well as temporary shifts of circumstance.

Secondary reinforcement through pairing

One way in which stimuli which have no intrinsic biological value can become reinforcers is through pairing with a strong reinforcer. Sights and sounds which are a prelude to reinforcing tastes acquire the capacity to act as reinforcers in standard animal learning tasks. It can be arranged that arbitrary objects or signs become desirable simply because they have been presented in close contiguity with another incentive. The presence of attractive young women in close physical proximity to a motor-car, cigar or aeroplane may enhance the rewarding properties of those. items, because of involuntary classical conditioning or something of the sort (see Ch. 2). Long-lasting secondary reinforcement has been demonstrated with children by pairing originally boring stimuli such as geometric shapes or `nonsense syllables' (like MYV or KEB) with sweets or money. For instance, five- to seven-year-olds were given a game of fishing envelopes out of a 'lucky-dip' apparatus. Some envelopes contained sweets, and others stones, depending on the nonsense syllables written on the outside. Even three weeks after the game, the children showed a large difference between their evaluations of the `good' and `bad' nonsense syllables. The context of the verbal evaluation was broadened by writing 'KEB', 'MYV' etc. as identifying labels across the chests of drawings of other children. After the different nonsense syllables had been paired with winning or losing money in a gambling game the subjects were asked `Would you like to play with KEB?', `Is MYV a nice boy?' and similar questions. Favourable answers were given when the imaginary playmate had been paired with winning money, and unfavourable answers if the association had been with losing (Parker and Rugel, 1973). Two theoretical issues are linked with the formation of secondary reinforcers (sometimes called conditioned reinforcers). First, what information is supplied by the pairing? Second, what is the importance of responses given to the secondary reinforcing stimulus? As with secondary negative reinforcers produced by giving signals for shock (p. 68), a neutral cue is a better signal, and a better secondary reinforcer

96

if it carries plenty of information about the primary reinforcer. If it is unreliable, or unnecessary, as a predictor of the primary reinforcer, it has less secondary reinforcing effect. On the other hand, it seems to enhance secondary reinforcing effects if the secondary stimulus calls for a definite response (that is, if it is a discriminative stimulus).

Both these theoretical factors may help to make tokens very effective as conditioned reinforcers. A token is usually a tangible object like a coin or marble, which is exchangeable for physical rewards. As they are exchangeable, tokens are a reliable index for the primary reinforcer and the responses to do with collecting or handing them in guarantee their discriminative function. Another advantage of token reinforcers is that they can be swopped for a great variety of other rewards, or back-up reinforcers. Money can be viewed as the token reward par excellence but in real life is complicated by economic and social variables like savings and investment. Enclosed arrangements called token economies have been in vogue in recent years as a method of large scale `behavioural engineering' in mental hospitals (F3 and B4). Patients are helped by shaping or instructions to perform tasks within their capabilities for the tokens. Self-care and ward-cleaning work, as well as more demanding jobs such as secretarial or laundry work, are rewarded with tokens. Written credits, points or money can be used as tokens, but in the initial attempts metal disc `coins' were used which could only be obtained on the ward (Ayllon and Azrin, 1968). Virtues ascribed to tokens include: (a) they can be given immediately after a target response, as a direct reinforcement; (b) they are an unambiguous indication of approval; (c) if sufficient back-up reinforcers are available, there are fewer problems of satiation than with food or social reinforcers.

Organizational problems and difficulties of assessment make it hard to evaluate the general usefulness of the all-inclusive token economy method (Kazdin and Bootzin, 1972). But small scale `token systems' - where, for instance, children may be given tokens which can be exchanged for a variety of toys or sweets - are a valuable addition to more direct methods of reinforcement. The evils of all-inclusive token economies seem to reflect those traditionally ascribed to money : some persons may be tempted to steal other person's tokens, or lend out tokens at exorbitant rates of interest. Very careful supervision is required.

97

The complexities of the therapeutic application of token reinforcement are less apparent in the experimental study of tokens as reinforcers for animals. Chimpanzees can be trained to perform on various schedules of reinforcement to obtain poker chips if these may later be used to obtain food from a slot-machine (e.g. Kelleher, 1958), though they work harder to obtain poker chips as the time approaches when the exchange can take place. Similarly, rats have been trained to press a lever to produce marbles which can later be dropped in a hole to release food.

Secondary reinforcement through remote associations.

Objects which can be carried around and exchanged for food or sights and sounds which are necessary.precursors to primary reinforcement can hardly not be associated with their back-up reinforcers. Some transfer of powers of reinforcement can also be observed when the association between the secondary and primary rewards is less obvious. It has always been assumed by Freudians that many of our civilized activities are motivated not by the apparent reinforcers, but by deeper primary drives which the notional rewards represent (D2). Nail-biting is said to occur as some kind of substitute for a more drastic form of self-mutilation, hoarding of money is enjoyed as an expression of anal retentiveness, and so on. There is no need to go to these lengths, but it is probably necessary to allow for some quite complicated mix-ups of human motives. For example, an original reinforcement by parental approval might enhance the satisfactions of stamp collecting, which might later bear fruit in a dedication to foreign affairs.

There is little of substance to be gained by speculation about individual biographies, for reinforcement theory. On the other hand, an interesting sample of relatively remote secondary reinforcement has come to light as a feature of second-order reinforcement schedules. These work in the same way as ordinary schedules (p. 83) except that an alternative stimulus is substituted for the proper reinforcer most of the time. Kelleher (1966) trained pigeons to work on a fixed interval schedule with the most likely reward just a flash of light every two minutes. Every thirtieth flash of light was accompanied by a large food reward. Behaviour was maintained in the usual pattern (see Fig. 6.1) while only light flashes were given as incentives to respond, and the obvious explanation was that light flashes had become

98

conditioned reinforcers through pairing with food. But several other experimenters found that behaviour on similar secondorder schedules was sustained even if the food reinforcement took place in the absence of the secondary stimulus. Stubbs (1971) concluded that any stimulus change could become a secondary reinforcer so long as it was systematically related to food delivery, even if it was never actually paired with food delivery. This liberates secondary reinforcers from the requirement of happening at the same time as their back-up rewards, and extends the scope of the secondary reinforcement phenomenon.

Reinforcement and motivation

It will not have escaped the reader's notice that I have been using the terminology of reinforcement as a substitute for the goals, purposes, drives or intentions which are more common in everyday speech and in other areas of psychology. The relations between reinforcement and other concepts in motivation is discussed in D2 of this series, and examples of reinforcement in social motives are covered in B1. The advantages claimed for analyses in terms of reinforcement are: (a) they are clearly founded on a bedrock of reference experiments, and (b) it follows that they can easily be translated back to experimental tests or to practical therapeutic measures. The limitations lie in the narrowness of the field to which reinforcement concepts apply with certainty. It remains to be seen exactly how narrow or wide this field will become.

Reinforcement in relation to drives and incentives

Motivational states such as hunger, thirst and sexual desire are sometimes termed drives. Drives are accommodated in reinforcement theory in so far as they determine the effectiveness of relevant reinforcers. Deprivation of food makes the animal eat, or perform responses that have previously gained food. Deprivation and other factors which change the strength of reinforced behaviour are said to `change the effectiveness of the reinforcer'. There are many ways of working up a thirst, but it would be an odd sort of thirst which did not make drinking a more potent reinforcer. Incentives can be closely related to reinforcers if they describe the vigour or enthusiasm, as opposed

99

to skill or accuracy, of reinforced behaviour. Other source (D2, A2; Cofer and Appley, 1964) deal with relevant theoretical and physiological issues in more depth.

A new research area concerning the effectiveness of reinforcer is the study of adjunctive behaviours (Falk, 1972). Animal given small amounts of food at intervals develop startling procli vities for other behaviours such as drinking, or gnawing inap propriate objects. Excessive drinking, termed schedule-induced polydipsia, is a powerful enough reinforcer to support fixed ratio schedules of lever-pressing for the `unnecessary' water. It has also been found that periodic brief shocks sometimes evoke, eating, or copulation. The possibility that schedules of one reinforcer may alter the effectiveness of another therefore has to be included as a setting operation which changes the energizing or directing influence of the second reinforcer. Such behavioural interactions have to be added to other major factors which can be said either to changes drives, or to activate and alter the effectiveness of positive and negative reinforcers. Illness, brain injury, drugs and medicines can all produce drastic changes in responsiveness which may conveniently be described as change in motivation, but more accurately assessed by measurement of reinforced behaviours.

Reinforcement in relation to knowledge and purpose

Are all our purposes reflections of reinforcement contingencies: Skinner maintains that purpose and knowledge could be accounted for by a complete enough list of `contingencies of reinforce. ment' (p. 18) but has not produced the list (Skinner, 1974) Most of us are prepared to acknowledge the importance oi schedules of reinforcement in the Skinner box, and possibly it children or other people, but wish to draw a line somewhere between conditioning and reasoning where our thoughts and inner purposes take over. In fact it is possible to bring a surprising proportion of psychological facts under the aegis of reinforcement theory, as Skinner has done, but this does not in itself appear to solve many of the traditional puzzles aboui human knowledge and purpose. Differences between impulse and foresight, conscious and unconscious motives, sensual and intellectual satisfaction and so on are still matters of philosophical as much as scientific argument. It may yet turn out to be helpful to consider all such differences as differences between types of reinforcement. Mischel (1973), however, has pointed

100

out in the context of theories of human personality, that psychological conceptualization needs ultimately to encompass the three perspectives of operant conditioning, personal variables such as beliefs and values, and subjective experience.

Surely knowledge can be acquired without assistance from reinforcement? We can look out of the window or read a newspaper without either purpose or reinforcement, can we not? This problem has a long history in learning theory, with the consensus being that knowledge may be acquired without reward, but actions need motivating (Ch. 1). Although this distinction is valid if rewards are only external goals, it is still possible to include information as a reinforcer itself, or even to define reinforcement as the modulation of information flow (Atkinson and Wickens, 1971). Even looking out of the window and reading newspapers may depend, if not on tangible profits, on temporary interest, sensory satisfaction or previously gained advantages. The reinforcement theorist may therefore continue to pry his way into areas of psychology where the main inhabitants feel he has no business.

Summary and conclusions

Reinforcement is essentially a behavioural concept, applicable to reliably observed evidence that responses vary according to their effects. The concept is buttressed by physiological research which is uncovering the brain mechanisms responsible for the effects of reinforcement and the subjective pleasures which may or may not accompany it. The evolutionary background to reinforcement is the necessity for arousing and directing instinctive and learned behaviours at appropriate times. Primary reinforcers are those whose biological function is clear, secondary reinforcers those whose powers have been acquired through an association with more powerful rewards or punishments. It is possible to put the question of biological function to one side and rank reinforcers according to the observed preferences of an individual or species.

101