6 Biological bases of classical and instrumental conditioning
‘Trial-and-error learning, of which instrumental conditioning is the core, is a different matter from the classical conditioned reflex in a number of important respects.’
Thorpe (1963, p. 85)
General process learning theory
The comparison of classical and instrumental conditioning, with a listing of similarities and differences, is a fairly congenial activity for learning theorists, since for those desirous of parsimony the worst outcome is only two separate processes, each, in the traditional view, subject to a limited set of principles of operation. I have already suggested in the previous chapters that each behaviourally defined version of conditioning will have to be given its own separate explanation, according to whether it is working on spinal reflexes or on the cognitive processes of the whole animal. This violates a strict version of parsimony, but means that no further losses of simplicity are likely to come from comparisons across the procedural boundaries of stimulus-pairing and response-rewarding techniques. However, the discussion of the biological bases of either kind of learning threatens general process learning theories, because it raises the spectre of species differences. As Seligman (1970) pointed out, general process theorists such as Pavlov and Skinner not only assumed that they could apply to other species results obtained from a small number of domesticated dogs, or rats
or pigeons, but chose very narrow samples of the behaviour of the animals so favoured. Thus, ‘It is obvious that the reflex activity of any effector organ can be chosen for the purpose of this investigation, since signalling stimuli can get linked up with any of the inborn reflexes’ (Pavlov, 1927, p. 17), and, ‘The general topography of operant behaviour is not important, because most if not all specific operants are conditioned. I suggest that the dynamic properties of operant behaviour may be studied with a single reflex’ (Skinner, 1938; p. 45).
It would be foolish to deny that the research strategies adopted by Pavlov and Skinner led to striking achievements: nevertheless it is difficult to sustain the view that all remaining questions about learning will eventually be answered by further experiments on dogs in conditioning stands and pigeons and rats in Skinner boxes. The points made under the heading of ‘constraints on learning’ or ‘biological boundaries of conditioning’ (see for instance Hinde and Stevenson-Hinde, 1973; Seligman and Hager, 1972) have by now been acknowledged in almost all quarters, and there is no reason to resist the argument that species may differ in motives, in sensory, motor and cognitive capacities, and certainly in the degree to which their natural style of life depends on individually acquired information. This last difference is perhaps the most fundamental, and theories about the evolution of the capacity to learn, such as that put forward by Plotkin and Odling-Smee (1981), are covered in the final chapter of this book. For the present it is enough to acknowledge first that we should not expect the psychology of a tick, which can probably succeed admirably in its chosen psychological niche without learning anything at all, to overlap a great deal with the psychology of a chimpanzee, which must learn about the territory it lives in and the food to be found there, and, probably most crucially, about the exact ways it should interact with the other individuals with whom it shares its long life. As Darwin would have said, the difference in the importance of learning in chimpanzee and human societies, though immense, should be compared with the difference in the importance of learning for the chimpanzee and the tick. But second, we must also take heed of the warnings of Hodos
and Campbell (1969) that we cannot simply string out all animal life between the tick and the chimpanzee on any kind of single scale. In fact ticks are related to spiders, and there are blood-sucking degenerate flies with similar habits, and neither spiders nor flies should be close to the bottom of the scale. More important is the point that for any class or family of animal species, small changes in ecological niche may lead to large changes in psychology it is very difficult to predict, for instance, which song-bird species should learn their songs from scratch and which should inherit each note (Marler, 1970; Green and Marler, 1979).
Thus, we must agree to be cautious about any generalization from one species to another, and generalizations which apply to all species, though usually vague enough to be less dangerous, must always allow for exceptions. Is it therefore possible to make any meaningful comparisons at all, between categories of learning as global as classical conditioning by stimulus-pairing, and instrumental conditioning by response-reward-pairing? Is not each category made up of several different kinds of association, or levels of neural representation, of the relevant stimuli and responses? And would we not be better off instead if we looked at naturalistic categories of learning, such as social imprinting in ducks and geese, vocal learning in canaries and parrots, co-operative hunting in wolves and lions, foraging strategies in insectivores versus herbivores, language-learning in human infants, or second language- learning in immigrant adults — all of which, we might suppose, are governed by their own peculiar and specialized principles and laws? Maybe so. All those areas repay specific investigation for their own sake. But we cannot tell until we’ve tried it whether or not there are principles of learning theory, or at least types of question in learning theory, which apply to or can be asked about all the more lifelike categories. Some phenomena which look specialized at first sight turn out to be a new form of old general principles, and thus an accommodation can be reached between naturalistic explanations and laboratory testing. This appears to be so for the two phenomena, taste aversion learning and autoshaping, which led Seligman (1970) and others to call for the abandoning of general discussions of learning. I shall
come to these two cases later in the chapter, in the context of more general questions about classical and instrumental conditioning.
The comparison of classical and instrumental conditioning
The most interesting point of general comparison between Pavlovian conditioning and goal-directed learning is that one is voluntary and the other is not. This of course is extremely difficult to pin down in anything but a purely subjective sense, but one of the reasons for examining closely the classical-instrumental distinction is that it might supply some clues about where our subjective sense of the voluntary comes from. Clearly part; perhaps a major part, of a human act of will arises from the possibility of saying (out loud or to oneself) ‘I am voluntarily going to get up and open the door’, before doing it, and the analysis of this is a philosophical or even a legal exercise, rather than a problem for the experimental laboratory. But we do not always have to verbally justify voluntary actions — we cannot very well do this for voluntary verbalizations themselves in any case, and the separate volition of the right hemisphere, which lacks (relatively) the necessary verbal skills, in split- brained human patients, has been a major source of recent philosophical puzzles. There-fore, the question is not just a matter of verbal determination, and biologically more general issues of purpose and goal direction have to be addressed. Slightly simpler questions concern types of response and degrees of response change. Why is it hard to voluntarily control perspiration or (usually) ear wiggling, but easy to successfully raise an arm? Almost everyone can voluntarily raise an arm, but no one can play the violin, or copy John McEnroe’s serve, just by wanting to.
Different types of response in conditioning
Aristotle (Movement of Animals, 703b) supposed that all geographical movements of all animal species are voluntary, but that certain bodily organs, most noticeably the heart and the penis, undergo involuntary movements, since they are not
at our bidding, but appear to have their own reasons for moving, when certain stimuli present themselves. This corresponds roughly to the physiological distinction between the autonomic and the skeletal nervous system, the former being responsible generally for the internal organs of the body (digestion, regulation of blood supply, glandular secretions and so on), but the latter controlling the muscles which move the body about. This distinction is anatomically and biochemically fairly clear-cut, at least in vertebrates, though for medical reasons most is known about mammals. One can take pills which affect one system more than the other (or affect mainly one part of the autonomic nervous system). The most obvious exception is breathing, which is normally autonomic, that is, literally, self-governing, but can also be changed at will, since the relevant muscles are fed from both parts of the nervous system.
It is thus possible to follow Aristotle in this instance, since he put breathing in a separate category, along with sleeping and waking, of activities which are not normally the product of purpose and rationality, but which are not as resistant to voluntary control as movements of the heart or penis. Since Pavlovian conditioning was initially studied with salivation and other digestive secretions, and is frequently measured in terms of other autonomic responses such as heart rate or electrical resistance of the skin, it is understandable that several theorists have wondered whether the difference between classical and instrumental conditioning, and/or the difference between voluntary and involuntary responding, can be explained entirely on neuropsychological grounds: classical conditioning applying exclusively to the autonomic nervous system, and instrumental or operant learning being possible only in the brain systems that control the skeletal musculature (Konorski and Miller, 1937; Skinner, 1938; Mowrer, 1947). It turns out that very much more needs to be said than just this (e.g. Mackintosh, 1974, 1983), but it is nevertheless important in the context of biological constraints on learning to acknowledge that some response systems are more conditionable than others (with a given set of procedures) and that the autonomic nervous system is extremely sensitive to stimulus- pairing procedures, as Pavlov’s work on digestion
first showed (see chapter 3, pp. 58—63), and barely (if at all) influenced by the Law of Effect. The justification for saying this has to be given by the rest of this chapter, but to begin with it is essential to stress that only the autonomic nervous system is restricted, according to this hypothesis. The skeletal muscles are certainly used in the standard instrumental learning test of maze-learning or Skinner-box-responding, but they are also involved in obviously classically conditioned responses, such as the human knee jerk, or eyeblink, or the conditioned flexing of the leg of a dog in a conditioning stand. The motor system as such can perform both classically and instrumentally conditioned responses. But there may be other constraints or restrictions, in so far as certain instinctive motor patterns in animals, such as face-washing or scratching in rats or cats, are relatively resistant to food rewards. It is now generally accepted that a given species will be biologically ‘prepared’ to do certain kinds of things to achieve certain kinds of goals (Seligman, 1970; Breland and Breland, 1961). Thus cats are naturally prepared to stalk and crouch and spring at bird- like or mouse-like prey, and to snarl and arch their back when threatened or afraid it would be easy to train cats to stalk artificial moving objects for food reward, or to arch their back to make an artificially threatening object disappear — but not vice versa. Similarly it is very easy to train pigeons to peck at buttons for food rewards (see ‘autoshaping’, below) but very difficult indeed to train them to peck a similar key to escape from electric shocks (see chapter 7). This now seems so obvious that one wonders why any fuss needed to be made about it. The answer, regrettably, is that the fuss had to be made because influential theorists resisted the implication that different species might more easily learn different kinds of behaviours for biologically appropriate rewards. Skinner (1977), for instance, while freely admitting the importance of evolutionary factors in determining instinctive behaviour, wishes to still keep instinctive behaviour in a different category from learned behaviour, and is thus led to the absurd and unsupported claim that, with the right contingencies of reinforcement, a cow could be trained to stalk and pounce on ‘an animated bundle of corn’ (Skinner, 1977, p. 1011). Skinner also refuses to accept Herrnstein’s
(1977) suggestion that one of the differences between cows and cats is that cats innately enjoy stalking and chasing suitably sized objects. It is true that one cannot observe the enjoyment directly,. but as many animals can be trained to press levers to gain the opportunity of engaging in instinctive behaviours (e.g. mice pressing levers to obtain lengths of paper which they then make into nests) , there is behavioural data to support the conclusion that instinctive behaviours are ‘self-reinforcing’ (Herrnstein , 1977).
There are various theories of why some motor responses are more easily learned than others, under the influence of reward and punishment, but not all these are directly relevant to the question of differences between classical and instrumental conditioning as psychological processes. The more serious claim in this context is that the autonomic nervous system is readily amenable to the classical conditioning process, but not directly susceptible to instrumental conditioning. In practice, of course, many autonomic responses may be determined by prior acts of will — in a sense one can voluntarily raise one’s heart rate by running on the spot, lower it by certain techniques of relaxation, or by taking a suitable pill, and deliberately change many other internal secretions by muscular or mental exertions. There is an enormous amount of medical and applied work done on techniques of ‘biofeedback’ because of the potential benefits of having indirect voluntary control of one’s viscera, without recourse to drugs, although the benefits have not yet been fully realized (Yates, 1980). The first point is that special techniques of biofeedback were devised precisely because autonomic responses are not normally under voluntary control, and do not respond readily to reward contingencies. It is easy enough to condition someone’s skin resistance by pairing a visual stimulus with an electrical shock (Ohman et al., l975a, see pp. 79—80), in a matter of minutes, but training the same response by using rewarding consequences is difficult, and, by comparison, involves much more by way of special mental and physical strategies. Thus the second point is that, to the degree that practical biofeedback techniques work at all, there is still no reason to suppose that any visceral responses can be directly conditioned in the same way that
some motor responses can. A considerable amount of confusion still surrounds this point because reports by Miller and his associates (Miller and Dicara, 1967; Miller, 1969) once suggested that instrumental conditioning of autonomic response such as increases and decreases in heart rate could be obtained by using rewarding brain stimulation or electric shock punishment on rats whose entire skeletal musculature was paralysed by a curare-like drug. The theoretical reason for performing these experiments was to demonstrate that the autonomic nervous system could respond to reward and punishment directly, even if the motor system was inactive. This idea is suspect, because paralysis of the peripheral muscles would not necessarily preclude the conditioning of central motor instructions, which might have effects on the heart, but in any case it now appears that the original results were unreliable since they have not proved to be capable of replication. Miller himself therefore came to the conclusion that ‘it is not prudent to rely on any of the experiments on curarized animals for evidence on the instrumental learning of visceral responses’ (Miller, 1978, p. 376; Dworkin and Miller, 1986).
In the absence of the evidence from these experiments, we may retain the conclusion that one of the differences between classical and instrumental conditioning is that classical conditioning effects may indubitably be obtained both from motor responses and from the autonomic systems involved in the various emotional and metabolic phenomena previously listed (chapter 3), whereas the procedures of instrumental conditioning have strong and obvious direct effects on many (though not all) motor actions, but are relatively ineffective for producing learned changes in the glandular and other visceral activities controlled by the autonomic division of the central nervous system.
Degrees of response change in conditioning
Since one kind of conditioning, called instrumental or operant, appears to have differential effects on the motor and autonomic control mechanisms of the brain, but the other kind, called classical, does not, it would seem there is adequate justification for saying that the two kinds of
procedure engage two different kinds of psychological process. But there are many complications. As we shall see, there are sound reasons for believing that the classical kind of process is engaged by many if not all forms of instrumental training, and therefore some have thought that there is really only one process — a classical process depending upon stimulus pairings— which in some way applies in a more elaborate form when rewards and punishments are applied to motor behaviour (e.g. Pavlov, 1927; Spence, 1956; Moore, 1973; Dickinson, 1980). For very much less sound reasons, other theorists have ignored the implication of the last section, and proposed that there is a single process, which is an instrumental mechanism related to drive reduction, that is responsible for the phenomena observed with both stimulus-pairing and response- reward procedures (Hull, 1943, 1952; Perkins, 1968; Miller and Balaz, 1981).
The conclusion I shall attempt to support here is that there are two distinct kinds of psychological process, even though both kinds of process — one of stimulus-associations and the other the linking of responses to pay-offs — may both be engaged by the same training procedures. This is a ‘two-factor’ or ‘two-process’ theory, as originally put forward by Kornorski and Miller (1937) and Skinner (1938) and revived with some modifications by Gray (1975) and Mackintosh (1974, 1983). Before getting on to the theoretical complexities of two-factor theories, it may be helpful to draw attention to a relatively straightforward category of evidence which on the face of it provides direct support for the two- factor view. This is simply the magnitude of behavioural effect, in terms of changes in response repertoire that can be brought about by the two processes. Even the modified stimulus- substitution theory of the classical conditioning process provides no obvious mechanism for the learning of skills such as riding a bicycle or for the other performances of circus animals. If human manual skills are interpreted as examples of instrumental learning (e.g. Mackintosh, 1983, p. 43), it becomes even more odd that anyone should try to explain all instrumental learning as due to the process responsible for salivary conditioning. The essential feature of both the procedure and the theoretical process of Pavlovian conditioning is that some
portion of the already present response given to a significant stimulus becomes transferred to an associated but hitherto insignificant signal. The most that can happen directly in terms of originality and novelty of response is that a known behaviour occurs more often, or at different times.
By contrast, according to both the theory and practice of trial-and-error learning and the Law of Effect, random variations in behaviour can be permanently acquired, and thus, by degrees, if not occasionally faster (see pp. 138—9 on porpoise learning) , radically new skilled forms of responding can be acquired. There may certainly be limits to this, set by any species’ natural propensities and capacities, but there is no shortage of examples, from cows opening gates (Romanes, 1883) to monkeys and chimpanzees learning to use wrenches or turn keys in locks (Savage- Rumbaugh et al., 1978b) which demonstrate the acquisition by individuals of qualitatively new kinds of behaviour. Thus, at any rate at the procedural level, we can say that instrumental training techniques produce a wide range of learned behaviours, while the signalling of one stimulus by another, which is the defining feature of the Pavlovian conditioning procedure, should only produce minor alterations in already present responses. This strongly suggests that there is a difference in processes which accounts for the different effects of the two procedures.
Pay-offs versus classically conditioned effects in instrumental procedures
The reason why it has been possible to argue against this strong suggestion, on theoretical grounds, can be gathered from close inspection of Figure 5.8 in the chapter on instrumental conditioning (p. 160). This sketch assumes that food rewards are obtained in the presence of a discriminative stimulus. At (3) it is assumed that the stimulus thus elicits a representation of the reward. This would appear to be little different from the assumption that the buzzer in a Pavlovian procedure elicits a representation of the signalled reward, and indeed in this diagram (3) can be regarded as a classical conditioning process. Similarly, the learned connection at (6) between the stimulus and emotional arousal can be taken as
an example of classical conditioning of a central motivational state (Rescorla and Solomon, 1967, see pp. 77—80). For some activities, in particular that of a rat running down an alley towards food, the argument can be made that only these two classical conditioning processes are needed to explain instrumentally rewarded performance. Spence (1951, 1956), used the Hullian theoretical elaboration of an ‘anticipatory goal response& (or rg-sg) to serve as what in Figure 5.8 is listed at (3) as a representation of the wanted reward. He also used the concept of conditioned incentive motivation, which would correspond to the conditioned emotional effect of rewards (6) except that for Spence the amount of incentive motivation depended upon the anticipatory goal response mechanism. Spence thus provides an example of the view that ‘classical conditioning is really an inherent part of instrumental conditioning’ (1956, p. 49). Additional explanation is required for why the rat actually runs down the alley. Spence (1951, 1956) appealed mainly to contiguity — that is because the rat runs at all in the presence of the external stimuli of the alley, it is more likely to do so again, and the degree of enthusiasm with which it does so is a function of classically conditioned emotional arousal, or incentive. A further and plausible suggestion is that animals have a built-in tendency to approach wanted rewards. Less plausible is the deduction that the classical conditioning of approach responses is all that is required to explain the behaviour of rats in mazes, but this has been attempted (Deutsch, 1960; Mackintosh, 1983, p. 39) . Generally speaking, the appeal to classical conditioning depends on the similarity between what the animal is trained to do, and what its instinctive response to the reward is. If ‘run towards the food’ is regarded as already associated with any representation of food in Figure 5.8, then only the classically conditioned connection at (3) would be needed to cause the rats’ behaviour of running down a plank towards food they have previously experienced at the end of it. But, and this is a large but, this is hardly enough to explain the skills of circus animals, or even the various different escape responses performed by Thorndike’s cats.
As a simpler and more tractable example, consider the experiment already discussed in which rats learn to press one
lever for food rewards when a clicker is on, and another for sugar rewards when a tone sounds. How could this be explained using only the classical conditioning process? Well, this experiment was first performed (Trapold, 1970; see pp. 155—6) with Spence’s ‘anticipatory goal responses’ in mind as providing two different representations of reward, conditioned to the clicker and the tone respectively. Thus, in the course of successful learning, we can assume that the clicker elicits an expectancy of dry food, and the tone an expectancy of sugar solution, by a process of stimulus- associations. But how does the rat achieve successful learning —that is, how does it know which lever to press when? There are several possibilities but most require that reinforcement selectively strengthens an arbitrary response which happens to precede it, which is a different process from the stimulus- substitution possibilities of purely Pavlovian conditioning. Trapold (1970) was content with a Hullian stamping-in of responses which happened to precede reward, which is more effective when there are differential response outcomes because the clicker and tone are thus made more distinctive stimuli. This is one of a number of possibilities arbitrarily excluded from Figure 5.8. The alternative is that at (1), it is assumed that the representation of a particular wanted event will elicit an impulse to perform the response associated with getting that event, and thus left lever response (if that is correct) is directly associated with the reward it obtains.
There is a fair measure of consensus that either rewards stamp in arbitrary responses, by the Law of Effect, which is a different process from stimulus- substitution in classical conditioning, or that particular responses become themselves associated with rewards, so that there are processes of both response- reinforcer associations and stimulus-reinforcer associations (Dickinson, 1980; Mackintosh, 1983). Dissidents may continue to insist that both response-reinforcer and stimulus- reinforcer association are forms of classical conditioning, but these forms eventually become stretched rather far apart. Initially, it looks possible to say for the Trapold (1970) experiment that a stimulus of the sight of the left lever, or the kinesthetic sensation oppressing down on the left lever, becomes a signal for reward, and that these Pavlovian
processes account for the selective effect of rewards on this arbitrarily correct response. There are, however, numerous indications that kinesthetic feedback is not an essential part of instrumental learning, although it must have its uses. The early investigation of maze-learning, for instance, suggests no kinesthetic specificity is necessary (see p. 141), and the experiment of Taub et al. (1965) is often quoted as an example of lever-pressing learned without kinesthetic or proprioceptive feedback, which was surgically eliminated. (Polit and Bizzi, 1978, confirmed this finding.) Mackintosh (1983, p. 40) points out that although dogs learn to flex their leg away from a shock to the paw, which may be a classical conditioned reflex. the fact that they readily learn to flex a leg to avoid a puff of air to the ear is less easily attributed to stimulus-substitution. Even decerebrate duck embryos can learn something similar, since they will learn to flex their foot in order to terminate an electric shock applied to the wing (Heaton et al., 1981). For the dogs, it might seem worthwhile to speculate along the lines that a straight leg becomes associated with unpleasant air puffs, but a flexed leg becomes a Pavlovian signal for no puffs. Then an account of the selection of the response with the desired associations is still required. For the duck embryos it makes more sense, surely, to include some other lower-level mechanism for the selection of muscular activities which terminate intense stimulation.
This brings us to the essence of the theoretical difference between the processes of stimulus- substitution (classical conditioning) and response selection (instrumental conditioning), which is that evaluated pay-offs are the critical element of response-selection but not of the stimulus-substitution process. For response-selection to work, there has to be a system which performs the function of measuring immediate advantage and disadvantage, and selecting responses accordingly. It is usually part of the background to classical conditioning theories that the process should be useful in the evolutionary sense, but the stimulus substitution mechanism can in principle be built in without any hedonic (pleasure/ pain) system, or anything corresponding to it. This is not to imply that there is no motivational angle to stimulus-substitution — on the contrary, the classical conditioning
process in two-factor theory is often held to be responsible for changing motivation, via conditioned emotional effects. But stimulus-substitution is the agent for changing emotions rather than their product. In salivary conditioning, either food (good) or acid (bad) can be put in the dog’s mouth after a buzzer sounds, to induce salivation to the buzzer. The point of the stimulus-substitution theory is that both have conditioned effects, irrespective of their goodness or badness. But if the dog receives food for turning its head to the left but acid for turning its head to the right (or vice versa), then it will very quickly learn to move its head to the left rather than to the right (or vice versa) because of the goodness or badness of the stimulus experiences. The food is a wanted event, and this will determine the quality of the emotion conditioned to a signal for it, but the argument is that the motor-control system has evolved in ways such that efforts are made to make wanted stimuli happen (and unwanted stimuli not happen).
It is hardly necessary to enquire very deeply into the question of why the motor system should have evolved this property rather than the autonomic nervous system, or indeed the sensory parts of the nervous system. It is the job of the sensory system to detect whether something wanted has happened or not, or to detect other useful signals, and it is the job of the autonomic nervous system to adjust the internal environment in accordance with many in- built checks and balances, for which anticipatory reflexes, via stimulus substitution, may often be helpful (e.g. release of digestive and other juices in good time). The motor system must also accomplish a great many reflexive adjustments (blinking, flinching, fight and flight) which can usefully be shifted to anticipatory signals, but, in addition, there are obvious long-term evolutionary advantages in some niches if there is flexibility in behavioural efforts modulated by on- the-spot cost-benefit analyses conducted by some or other version of the pleasure/pain principle in an individual animal. Thus it was assumed in Figure 5.8 that an essential feature of instrumental learning is that representations of wanted events should be intrinsically linked to the performance of motor responses associated with making the wanted event happen. Many of these responses may them-
selves already be built in, but in species which display trial-and-error learning, the assumption is that relatively arbitrary actions, such as pulling loops of string or pressing levers, can, through experience, be included in the ‘making it happen’ system, either because they become associated with the wanted event or because some simpler mechanism stamps in arbitrary responses to arbitrary stimuli.
That, I hope, makes some sense as an abstract theory, but the previous sections of this chapter will have shown that there are many competing theories. There are two sources of behavioural evidence in favour of a special pay-off principle acting on response selection, that is, in addition to the circumstantial evidence based on the difference between motor and visceral control in the nervous system, and the plain fact of response flexibility in trial- and-error learning. Several standard laboratory responses, such as maze-running and lever-pressing in rats and key- pecking in pigeons, are sufficiently close to reflexive motor patterns in the relevant species to require close investigation. Two experimental procedures have theoretical importance: omission schedules and intermittent reinforcement as in Skinnerian schedules of reinforcement.
This term often appears to cause confusion, possibly because of the uncertainty of its relationship to ‘schedules of reinforcement’. Although the experimental set-up requires some attention to detail, the theoretical meaning of an omission schedule is simply that of ‘no pay-offs for responding’. The original purpose of these arrangements was to demonstrate that classically conditioned responses such as salivation in dogs were not due to a pay-off principle (Sheffield, 1965), but they may in other circumstances provide evidence that instrumental responses are influenced by their consequences. Hull (1943) had assumed that all conditioning involved some form of pay-off, ultimately via drive reduction. Thus it could be argued that the reason why Pavlov’s dogs salivated is that by accident salivation preceded drive reduction, or that prior salivation had the advantage of making food taste better, and acid taste not so bad (e.g. Perkins, 1968). Now, suppose that we conduct
the standard Pavlovian experiment, in which a buzzer signals the imminence of food to a dog, but with an ‘omission contingency’ which makes it greatly to the advantage of the dog that it should not salivate, because if it does, we do not give it any food, but if it can manage to sit through the buzzer signal without salivating, we do. If Perkins’s hypothesis was correct, that all classically conditioned responses occur because they make life more attractive for the animal concerned (1968, pp. 163—5), then it is quite clear that the dog should never salivate to the buzzer in this experiment, since if ever it does, it gets no food, while there is every reason in terms of the pay-off of food to delay salivation until food is actually presented. But Sheffield’s experiments (1965) with dogs along these lines suggested that the reward factor has very little influence on salivation. Some dogs do indeed wait to salivate until food is actually presented, as Pavlov (1927) suggested, but this is based on temporal discrimination. There is good reason to believe that salivation is an involuntary response, because in some cases dogs continue to salivate frequently to the sound of the buzzer, even though this prevents them getting food, because of the omission procedure (Sheffield, 1965). A rough analogy is asking someone who is very hungry to choose something from an attractive menu, with the catch that if they salivate while making the choice, they get nothing. Not everyone salivates the same amount while reading a menu, but those who salivate a lot would not find it easy to inhibit salivation as a voluntary decision.
Motor movements can also be involuntary, and the omission schedule provided an important test of the hypothesis that the pigeon’s key peck in a standard Skinner box, far from being a pure example of operant or instrumental learning, is influenced by stimulus-pairings with no pay-offs almost as much as salivation. The experimental procedure here is to present hungry pigeons with food (from an illuminated grain hopper) for brief periods — say for 3 seconds once a minute on average. The pigeon key (a round disc a few inches above the grain hopper) is lit for 10 seconds before food presentations. If the bird never pecks the key, food will continue to be presented after the signal, at irregular intervals. If the bird ever pecks the key when the light is on, the light goes out,
and it misses the food it would otherwise have had. There is thus no conceivable advantage to the bird, in terms of the frequency of food presentations, or overall amount of food obtained, in key-pecking behaviours. Notwithstanding this lack of favourable outcomes for responding, pigeons very reliably develop and maintain the habit of pecking at the illuminated key, to the extent that they miss about half the food -available (Williams and Williams, 1969; review by Williams, 1981). This is hardly optimal foraging, or optimal anything else, in the microsm of the autoshaping procedure. It is much more like a reflexive or knee-jerk-like involuntary response, which pigeons find as difficult to inhibit as dogs do salivation. The key pecks show other reflexive properties, since when food is the reinforcer pecks look like pecks at grain (beak open), but when the reinforcer is water, pecks are slower, and look as if the bird is attempting to drink (beak closed) and thus it has been proposed that this behaviour is just a classically conditioned motor response and nothing else (Moore, 1973; Jenkins and Moore, 1973) and that possibly pigeons can only learn by classical conditioning (Gray, 1975) or that all instrumental learning is mainly classical conditioning (Bindra, 1968; Spence, 1956).
All these proposals are misguided and mistaken. The omission procedure itself can be used to show that other responses, unlike the key peck or salivation, are extremely sensitive to their consequences. For instance, the other Skinner-box standard, lever-pressing by rats, is effectively demolished if rats lose rather than gain food by its performance (Zeiler, 1971; Locurto et al., 1976). This is such a strong result, and one potentially obtainable with many responses in many species, that discussion of it tends to go by default. But if the omission schedule can be used as evidence that some behaviours are insensitive to their consequences, and therefore should be regarded as a product of classical rather than instrumental learning (Sheffield, 1965; Mackintosh, 1974, 1983), then equal weight ought to be given to all the behaviours which would immediately cease if they produced outcomes as unfavourable to the individual animal as that provided by an omission schedule. Moreover, it is not simply that there are other cases, or other species. Many other response patterns in the pigeon,
which are less intrinsically related to food-getting than a directed peck of the beak, can be shaped by food rewards and are sensitive to the omission of reward (e.g. the raising of the head, as studied by Skinner as superstition, and closely examined in an omission schedule by Jenkins, 1977). And the pigeon’s key peck itself, which has been the focus of a good deal of scepticism about the Law of Effect (Seligman, 1970; Herrnstein, 1977) is, surprising as this may seem, influenced both by the food-signalling stimulus-pairing operation which produces reflexive autoshaping and by a host of other procedures which indicate choice and response selection according to pay-offs. Using an ingenious inversion of the usual ‘autoshaping’ procedure, Jenkins (1977) measured the key-pecking of pigeons when their key was lit for 8 seconds as a signal twice a minute on average, but the meaning of the signal was that food would only be delivered if the pigeons did peck the key, with the crucial elaboration that food was presented anyway, with an equal probability in the periods without the signal, whatever the pigeons did. Thus, colloquially, the only message in the signal was ‘Now you must earn the food you otherwise get freely’ (Jenkins, 1977, p.54). It was not a signal for food, only a signal that the response was necessary for food. The birds learned to peck when the signal was on, and not to peck when it wasn’t, thus demonstrating a grasp of the instrumental necessities of the procedure which goes beyond merely a conditioned anticipation of food (suggesting an association at (1) in Figure 5.8 rather that at (3)).
Choice and effort on intermittent schedules of reinforcement
The causes of behaviour observed under Skinnerian schedules of reinforcement (see pp. 128—37) are not always easy to establish, partly because of the complexity of the procedures, but partly also because the evidence suggests that several alternative psychological processes, notably automatic habits and more goal-sensitive responding, may have roughly equivalent effects on observed behaviour (see pp. 156—63). Several standard results, such as reactions to a discriminative
stimulus which signals that otherwise unavailable rewards may be procured by responding, can be interpreted in terms of the effects of stimulus-pairing, that is, in terms of the psychological process typical of classical conditioning. But there are other phenomena which cannot be so straightforwardly attributed to the classical process, and yet others which directly implicate the pay-off process of instrumental learning. In the former category is the success of ‘multiple schedules’ (Ferster and Skinner, 1957), in which a signal indicates which of several schedules is in operation, so that a green light may produce responding characteristics of a variable-interval schedule, but a red light typical fixed- ratio performance, in the same animal. This would appear to require at least a Thorndikean stimulus-response association, by which different patterns of response were stamped in to different discriminative stimuli.
A further very general finding with schedules of reinforcement has from time to time been put up as a useful practical touchstone by which to distinguish the operation of classical (stimulus-substitutioon) or instrumental (response-selection) psychological processes. This is the finding that any schedule in which there is only inconsistent and intermittent reinforcement of a given response almost invariably results in an increase in the amount of behaviour per reward during training, and also may spectacularly increase the persistence of behaviour when rewards are withheld, by comparison with the procedure of giving the identical reward consistently every time the response is performed. The effect most of interest is the persistence of unrewarded behaviour (resistance to extinction) since this is less logically necessary, but the degree to which animals will comply to the logical necessities of reward schedules is also a form of experimental evidence. When animals receive rewards only occasionally for successfully running down a straight alley or through a maze, the fact that they carry on running longer than animals rewarded continuously and consistently, when all rewards are stopped, is known as ‘the partial reinforcement effect’ (Amsel, 1962; Mackintosh, 1983), but this result clearly shares all the essential features of what is called intermittent reinforcement in Skinner boxes.
Pavlov (1927 pp.384ff.) found that if a dog was fed after only every other presentation of a signal or even after every third presentation, then the conditioned reflex of salivating to every signal developed quickly, but that if reinforcement was less frequent than this (food after every fourth buzzer) no conditioned response could be obtained, and the dog appeared simply to ignore the signal (not even demonstrating inhibition). The exact numbers here are probably not significant, but it has usually been claimed (e.g. Kimble, 1961) that partial reinforcement in the stimulus-pairing procedure seriously reduces the effects of this on autonomic conditioned reflexes. It would be helpful if we could say simply that intermittent reinforcement increases the effectiveness of response pay-off processes but decreases responses supported only by stimulus-substitution or other classical processes. There are two reasons why we cannot make things quite that simple. First there is the probability that classical conditioning involving motor behaviour differs in some ways from that involving autonomic responses. Intermittent pairings sometimes, it is claimed, produce more persistent behaviour in the pigeon autoshaped key-peck paradigm (Boakes, 1977; Mackintosh, 1983). A difficulty here is that the key peck is probably substantially influenced by instrumental effects (Williams, 1981; Jenkins, 1977; see pp. 180—2 above). The more serious but related complication is that there is insufficient agreement about whether intermittency in pairings should generally be regarded as weakening any classical conditioning process. In fact Pearce and Hall (1980) have proposed a theory that classical conditioning in general is more effective when there is some uncertainty attached to the reliability of the signal, and this aspect of the theory has already received considerable experimental support (e.g. Kaye and Pearce, 1984).
My own view, nevertheless, is that there are strong theoretical reasons for assuming that intermittency of associations will have rather different implications for stimulus-substitution and response-selection processes, and that a great deal of the inconsistency of experimental evidence can be circumvented by taking degree of intermittency into account, specifically, in accordance with Pavlov’s original observation, by
distinguishing uncertainty, as in the 50:50 examples, from probable absence, when intermittency is of the order of 10:1 or 100:1. There are few experimental reports I know of in which a response shown to be independent of pay-offs, and therefore an indicator of classical associations, has been found to be either learned or maintained when the signalled event follows the signal with a probability of less that 10 per cent:
on the other hand I could go on quoting for the rest of the book experiments in which responses demonstrably under the control of wanted consequences are maintained, if not originally learned, when hundreds of responses are necessary before one is associated with an event of equivalent motivational significance. This may depend partly on the details of foraging strategies, but it is also likely that more general rules about stimulus-signalling and goal-seeking biological functions can be applied. There would be very little point, either for the immediate advantage of an individual dog, or for the longer-term value of the instincts to be built in to its surviving relatives, in stimulus- signalling mechanisms which ensured that dogs carried on salivating at the sound of a buzzer (and expecting food then), if the buzzer was followed by food once every 100 times. It would be a waste of the animal’s mental and physical effort to pay attention to the stimulus (and in this case a waste of saliva). On the other hand, if it is the dog’s own behaviour which is correlated with the receipt of food, even if only occasionally, then, if it is hungry enough, it is worth the repeated efforts of the individual animal if it persists in responding until it gets the food; and animals which have built-in characteristics of stamina and patience will enjoy certain advantages over their fellows in the circumstances of an ungenerous but eventually rewarding environment.
At fairly extreme levels of intermittency, then, the limited evidence available supports the hypothesis that the psycho-logical processes by which an animal correlates its own motor behaviour with wanted outcomes are to some extent different and separate from the processes by which emotional, visceral and reflexive reactions are automatically shifted from one external stimulus to another. This is not to say there are no parallels at all to be seen: Dickinson (1980) was able to
sustain a very orderly account of animal learning by speaking only of event-event associations (‘El—E2 associations’ was the exact notation), assuming that for most purposes it is irrelevant whether the first signalling event is the receipt of an external stimulus for the performance of a motor act. What the two kinds of association have in common will be left until chapter 8; here I wish to stress my own view that whatever the kinds of association may have in common, it is arguable that the variables of volition, effort and personal involvement are related to a ‘making-it-happen’ system which is involved in the response-consequence association in some forms of instrumental learning.
Lea (1979, l984a) among others, has drawn attention to the possibility that all the principles of learning examined in the abstract in the laboratory evolved under the constraint that they should produce optimal behaviour, in particular optimal foraging for food (Krebs and Davies, 1983; Kamil and Sargent, 1981; Maynard-Smith, 1984), in a given species’ natural environment. This, Lea suggests, may apply both to specializations (waiting and pouncing in cats and other ambush hunters; long continued unexciting searches in grazers and browsers) and to more general principles such as the distinction between pay-off-sensitive motor behaviour and the more automatic and reflexive anticipatory response shifts referred to above. Intermediate principles, such as that of paying most attention to stimuli that are fairly closely associated with significant events, but in a way not yet totally predictable (Pearce and Hall, 1980) are also presumably only confirmable in laboratory experiments because they serve a theoretically important biological function. The problem is that the discovery of what should be a mathematically important biological or psychological function is partly a matter of the ingenuity of the theorist (Pyke et al., 1977). It is rare that what is claimed to be an optimal behaviour is actually shown to be so by naturalistic experiment instead of mathematical simulation, However there is some agreement on what kinds of foraging are generally optimal, and I mention this here because of the strong possibility that the assessment of pay- offs held to be the defining feature of instrumental conditioning is in some cases theoretically necessary
for individuals to adjust their foraging behaviour in natural environments.
One principle of foraging that has been found to roughly match observed behaviour applies to the simplified case of the choice between two alternative food items — whether animal or vegetable. If one is better than the other (more nutritious), then the animal ought to pick it. This sounds completely obvious, but it means that a given species must be able to detect the difference, possibly making use of post-ingestional effects, and then use sight or taste cues to make the choices. It may be that the two kinds of items are in different places, require different search strategies and so on, and in this case the choice has become more complicated and the learning-what-is-where task — forming Tolman’s cognitive maps — may be needed. However, choosing the better of two alternatives can be studied from another theoretical angle, since there is the problem of when to give up on the better alternative if it becomes too scarce or difficult to get. The easiest rule to specify is that, provided there is no cost to the business of taking a food item when it is available, then the best item should always be taken when it is there, and the worse of two items should not be taken if there are plenty of the better kind, but should be taken with more and more enthusiasm as the better kind becomes scarce. The only surprising thing about this rule is that how much there is around of the less-preferred item is irrelevant — it is only the availability of the more-preferred item that really matters.
Both shorebirds (redshank) choosing to eat shrimp rather than worms (Goss-Custard, 1977) and pigeons choosing immediate rather than delayed rewards in a Skinner box (Lea, 1979) and rats with a similar task (Collier and RoveeCollier, 1981) follow this principle roughly, but not exactly. The exactness is not at issue — the point here is that some system of choosing is used, and motor behaviours of varying degrees of complexity and novelty are required in the making of the choices (e.g. Krebs et al., 1978). This provides a biological rationale for the existence of psychological processes by which behaviours are selected according to a fairly sophisticated assessment of differing reward outcomes. That laboratory animals are capable of modulating their
learned behavioural choices in this general kind of way is made obvious by the research on concurrent interval schedules discussed above (pp. 133—7) among other results, what-ever the precise equation used to fit the data. It seems extremely likely that both classical and instrumental processes of association would need to be appealed to in any explanation for all of these phenomena of choice between reward outcomes, and very unlikely that either one or the other alone would be sufficient. The assessment of pay-offs as an instrumental reward process would appear to one of the most obvious requirements for learned optimal foraging.
Molar and molecular correlations ‘between behaviour and reward
There should be a good measure of agreement that learned behaviours are somehow correlated with rewarding outcomes — this after all is what Thorndike (1898) started off with. There is as yet likely to be little agreement on the exact mechanisms involved, but it is worth mentioning one area of argument. This concerns the scale of the behavioural units which are susceptible to instrumental reinforcement. The theoretical choice is between small-scale or molecular units —almost of individual muscle twitches — and larger-scale extended actions or response strategies, referred to, originally by Tolman (1932) and, Hull (1943) as ‘molar’ behaviours. The first question is whether the larger chunks of useful activity can be regarded as aggregates of muscle contractions, or whether, alternatively, the cognitive representations of actions are relatively independent of the small-scale co-ordination of individual muscles which are needed to put them into effect. The conclusion previously in the last chapter (e.g. pp. 140—5) has been that Tolman was right about muscle-twitches, and that, especially in spatial learning, what is actually learned consists partly of the macro-activities of ‘getting to and from’ rather than specific bodily movements.
In the next chapter, under the heading of ‘Learned helplessness’, we shall see that Seligman and others have proposed that the correlation between what an animal does and the things which happen to it as a consequence brings about not
only particular goal-directed actions, but also a more general belief that doing things is useful. This comes close to an attributional theory of learning which includes various kinds of beliefs about the utility of trying to make things happen, and moods of optimism and pessimism about life in general (Miller and Norman, 1979; Seligman and Weiss, 1980). Both Alloy and Tabachnik (1984) and Dickinson (1985) have proposed that typical learning-by-pay-off tasks in the animal laboratory, as well as most of human psychology, involves initial beliefs and preconceptions as to what causes what, modified according to fairly global assessments of the validity of these beliefs in practice. The particular theoretical tack taken in these proposals is that modification of beliefs or expectancies occurs as a product of the ‘assessment of covariation’ between one category of events and another, such assessments having fallibility as one of their defining characteristics, in both people and animals. Dickinson (1985) has however been able to make fairly direct experimental tests of hypotheses derived by applying this speculation to the performance of rats on schedules of reinforcement, by arguing that it is experienced global correlations between variations in responding and variation in rewards received which encourages goal-sensitive behaviour.
When an animal is well-trained on any particular schedule, both its behaviour and the consequent delivery of reward occurs with great consistency, and since there is thus no longer any opportunity for the animal to assess whether changes in its behaviour lead to changes in the receipt of rewards, responding becomes a matter of habit, and can be shown to be insensitive to the de-valuing of rewards (see pp. 156—63).
Although there is strong evidence that under some circumstances animals may develop molar cognitions about what general sort of behaviour is worth making for what expected source of reward, Dickinson’s own theory includes an alternative mechanism for the maintenance of consistently rewarded habits. It is therefore no threat to his theory that experimental evidence supports the view that relatively molecular and Thorndikean mechanisms of stamping in small- scale response characteristics appear to be responsible for a good deal of the
behaviour exhibited when stereotyped behaviour is established using schedules of reinforcement, at least in the case where pigeons peck the keys in Skinner boxes for food rewards. The difference between behaviour on variable ratio and variable interval schedules has already been referred to ( pp. 132—3), and is relevant here because when rewards are given for a certain number of responses, even when the number varies randomly, it is clear that there is more correlation between rate of response and rate of getting rewards than there is on an equivalent interval schedule, since when-ever rate of response varies this will directly change reward rate. By comparison, interval schedules specifically arrange
that there is little correlation between reward rate and responding, since the delivery of rewards is determined primarily by the value of the interval, and remains roughly the same over wide variations in actual responding. Despite this overt characteristic of interval schedules, many explanations of the ‘matching law’ observed for concurrent interval schedules have emphasized that there could be some mechanising by which response efforts are adjusted according to associated rewards received, and that assessment of correlations between response rates and reinforcements is likely to be a part of such a mechanism (Baum, 1981; Rachlin, 1978; Herrnstein, 1970).
The idea that variations in response-rate are assessed by the animal according to variations in reinforcement rate that are thus brought about is very definitely a ‘molar’ factor, implying sophisticated sensitivities to average values without necessarily specifying detailed cognitive mechanisms of calculation. An alternative, much more molecular explanation for the effects of reinforcement schedules is that an important element of what is learned is the time that should elapse between successive responses (inter-response time or IRT; Ferster and Skinner, 1957; Anger, 1956; Platt, 1979; Shimp, 1969). In this case sophisticated calculations must often be performed by the human theorist to discover which reinforcement schedule should do what, but it is straightforward in principle that under variable-interval schedules, the longer the time between two responses, the more likely it is that the second response will be reinforced (since more of the interval
will have elapsed). This does not apply to variable- ratio schedules. Thus a molecular explanation for the fact that variable-interval schedules produce slower response rates than variable ratios with equivalent frequencies of obtained reward is that longer inter-response times are selectively stamped in under the interval schedule. Theories such as this soon become elaborate and mathematical. However, a number of extremely elegant experimental comparisons, plus a computer simulation, recently conducted by Peele et al., (1984) , suggests very strongly that the selective effect of reinforcement on particular bands of inter-response times is indeed observed after long Skinner-box training, and is the main factor responsible for the different response rates observed under variable-interval and variable-ratio schedules.
There is no reason to object to the experimental reliability of this finding. But there is every reason to resist the implication, drawn by Peele et al. (1984) , that similar molecular explanations of instrumental conditioning will provide satisfactory accounts of all goal-directed learning, or even of all phenomena observable for the food-reinforced key-pecking of the pigeon, which is a narrow and unrepresentative paradigm, even though very widely used. The molecular explanation, that rewards stamp in the details of a preceding behaviour, is the mechanical version of the Law of Effect originated by Thorndike. Much of this chapter has been devoted to the evidence against it, for example in spatial learning and in goal-sensitive actions. The tension between the molar and molecular explanations for pay-off-con trolled activities should therefore be resolved, not by choosing one or the other, but by a form of attribution which is apparently deeply uncongenial to the human intellect — by saying that we can attribute instrumental learning sometimes to one process, and at other times to another.
Levels and types of association and brain mechanisms in conditioning
Theories of learning have not always implied very much about the real neural mechanisms which many of us assume make the theories possible. One of Aristotle’s most glaring bloomers
was his hypothesis that the brain had no psychological functions (since he thought it was a sort of radiator for cooling the blood); he rashly did not hedge his bets on this one and declared that the brain had no more to do with mental processes than a piece of excrement. In modern times few theorists have done quite so badly on brain function. but several learning theorists have managed to ignore the brain altogether — Skinner as a matter of principle and Tolman and Hull by default. The early attempts of Pavlov and Thorndike to tie down theories of learning to putative neural mechanisms did not gain them many admirers. The most successful and influential theory of brain processes in learning is probably that of Hebb (1949), followed closely by the not dissimilar model of Konorski (1967), and these have influenced Bindra (1976) and Wagner (1976, 1978), while remaining rather remote from the theoretical comparisons of conditioning processes which have been the concern of this chapter. I shall follow the mainstream tradition of saying little or nothing about the details of the brain processes involved in learning, while expressing the hope that more will be known at some point in the distant future: however, I wish to refer briefly to brain anatomy for the purpose of buttressing my speculative conclusion to this chapter, which is predictably similar to that reached in previous chapters, since it takes the form of saying that learning involves quite different levels of representation of whatever it is that is learned. In addition, in this chapter, I have tended to stress that there are several different types of possible learned association, and that those types by which motor behaviours are connected to motivational payoffs are what differentiate instrumental processes from classical processes, while accepting, of course, that both stimulus-pairing and response- rewarding laboratory procedures typically arouse impure mixtures of these internal processes.
Brain structures and psychological processes
The vertebrate brain consists of several identifiable parts, each containing thousands or millions of neurons. Readers unfamiliar with this fact should consult a suitable textbook
of physiological psychology or, for more directly relevant discussions, Walker (l983a), Macphail (1982) or Yeo (1979). Not much more detail is needed for present purposes than was regarded as well established by Herrick or Bayliss in 1918. Primarily sensory and primarily motor tracts go up the outside and down the inside of the spinal cord. Stimulus-response reflexes of some degree of complexity, especially in lower vertebrates, are accomplished within the spinal cord itself, but its main function is clearly to transmit information to and from the higher centres of the brain. Just at the top of the spinal cord several cranial nerves go in and out of the medulla, first of all to various cranial nuclei: these control both input and output for the internal organs and some parts of the sensory and motor systems and therefore they can be involved in learned associations, as well as relaying to other parts of the brain. At the back of the brain is a large crumpled lump with grey matter on the outside and large white fibre tracts on the inside: the cerebellum. Although there is no agreement as to exactly what happens inside it, no one doubts that it plays a special part in balance and in fast co-ordinated muscle movements, including both learned and unlearned motor skills. Cerebellar learning is likely to be both anatomically and functionally separate from learning in other parts of the brain. Within the cerebellum, there are specialized regions dealing with different parts of the body — hindleg, foreleg, face and head, etc. (The left and right sides of the body are done on the left and right of the cerebellum, respectively.)
There are a great many known pathways in which sensory and motor information is transmitted up and down from the spinal cord and cerebellum through the mid-brain, thalamus and basal ganglia, to the top and front of the brain, where lie what are often referred to as the ‘highest centres’, located in mammals in the cerebral cortex. And there are other bits, connected by known neural tracts, which are identified with motivation and emotion, and happen to be called the limbic system. Best known of these are the hippocampus, which responds to novelty and therefore is expected to be needed for memory and for learned cognitive maps; and the hypothalamus, which controls the pituitary master-gland, and thus
the autonomic nervous system, and has parts which seem specialized for hunger, sexual excitement and thirst, and thus may be regarded as necessary for drive and incentive.
There is not necessarily a one-to-one correspondence between brain anatomy and brain function, and it is true that detailed theories of brain function are not very advanced. However, what is already known and certain is sufficient to support the following crudest possible deduction: there is more than one part of the brain, and therefore there might be more than one kind of learning, depending on what or which part is most closely involved in it. A somewhat less crude theory of brain function was put forward by the father of British neurology, Hughlings Jackson (Jackson, 1888/ 1931; see Walker, 1986), which has yet to be contradicted, and which is one of the reasons for assuming that associated events may occur at different levels of representation. Jackson’s theory was that there was a hierarchy of representation in the brain, particularly representations of movements, since localized epileptic movements were one of his medical specializations. The hierarchy was specifically related to a vague notion of evolution, and specifically related to the question of the voluntary control of actions. An illustrative example is the case of a patient who had lost voluntary control of tongue movement and thus could not comply with the standard doctor’s instruction of ‘put out your tongue’, but whose tongue was capable of efficient, but involuntary, licking of the lips. Thus Jackson believed that there were at least three and possibly more, separate brain representations of the tongue, the lowest being involved with reflexive movements, such as those necessary in swallowing and lip-licking, and physically based mainly on the cranial nuclei, the middle being required for more complicated movements, including those which need to be learned for speech, and the top being needed for delicate voluntary control.
Without making any unnecessarily firm commitments to Jackson’s or any other detailed account of brain function, it is surely very safe to assume that bodily movements may be initiated or influenced either by the spinal cord, or by the cerebellum, or by the basal ganglia, or by various higher motor centres of the cerebral hemispheres, which are in the
frontal, cerebral cortex of mammals, which have rough equivalents in the cerebral hemispheres of birds and reptiles, and possibly though not definitely very rough equivalents indeed in the cerebral hemispheres of the brains of frogs and fish (Macphail, 1982; Walker, 1983a). If this assumption is correct, it is almost equally safe to take it for granted that there are psychological consequences of the anatomical differentiation, and that behaviours depending on one level of representation will be demonstrably different from behaviours characteristic of a different level. Then spinal reflexes would be different from voluntary actions, and conditioned digestive reflexes would be different from learned and novel methods of goal-seeking, which gives one kind of biological basis to the same sort of conclusion already reached about classical and instrumental conditioning.
The types of learned association — conclusions
It is now possible to summarize the conclusions of this chapter in terms of the biological functions and possible anatomical implications of learned associations, categorized according to the traditional types of 5-5, S-R, and R-S, but with reservations about the adequacy of the typology.
S-S associations and learning about stimuli
This is useful to begin with, since it calls into question not only the typology but also the concept of an association that only comes in pairs. Stimulus-stimulus associations can be made in principle without any response, and therefore are obvious candidates for behaviourally silent or latent learning (p. 150). In these cases learning may be presumed to take place simply by exposure to stimuli. This covers also perceptual learning (p. 41) and of course the learning to recognize stimuli which is the theoretical basis of many kinds of habituation (p. 39). In both these cases it would be better to explicitly rule out the implication that pairing two separate stimuli engages utterly different brain processes from exposure to one or three — that is clearly most unlikely, and SS is better understood as covering learned relations between,
within and about sensed events. For novelty, spatial learning, and certain kinds of memory for stimuli, it has often been suggested that the hippocampus of the mammalian brain plays a special part (see chapter 10). There are limits to the usefulness of this kind of assertion, but no one makes alternative suggestions along the lines of memory being in the spinal cord or cerebellum, and the relation of the hippocampus to the rest of the limbic system prompts a number of interesting speculations about how memory interacts with motivation (e.g. Gray, 1982). Even in the most rudimentary neural systems currently studied, associations between individual sensory nerves at synapses suggest S-S associations at a drastically simpler level of representation (Hawkins et al. , I 983, see p. 99), although these possibly ought to be included in the next sub-category.
Stimulus-reinforcer reflexive shifts (S-S*) (The asterisk indicates that the stimulus or response which it accompanies has motivational significance)
It is a moot question whether any stimulus is without any motivational effects at all. But there is clearly a difference of degree at least between familiar geographical and social surroundings and the commonly used motivationally significant stimuli or ‘reinforcers’ of food for a hungry animal and electric shocks for a fearful one. The pairing of a relatively neutral signal with another event of some moment, which elicits external or internal un-conditioned responses, is common to the various kinds of Pavlovian conditioning (chapter 3) and very similar, perhaps identical effects which occur when similar events are used as goals for instrumental responding. The theory and description of what results from these stimulus-pairings is ‘stimulus-substitution’ (p. 100). The essential feature of this is that it is relatively automatic and involuntary, and, as argued in this chapter, unaffected by the pay-offs, that is, the evaluated costs or benefits, of the reinforcing event. By contrast, the hedonic properties of the reinforcing event may be one of the attributes which are transmitted to the signal — the hedonic shift (Garcia et al.,
l977a, see chapter 7). In this case reactions of the limbic motivational system compose part of the set of attributes shifted to a new stimulus. These can hardly be put as a necessary part of stimulus-pairing effects if the same terms are to be used for animals such as the sea-slug which do not have a limbic system. Hawkins et al. (1983) suggest that there is in this animal a set of diffusely projecting ‘facilitator neurons’ which for this purpose is equivalent to the vertebrate limbic system. This may be so, but again it is unduly limiting to suppose that the phenomena of stimulus- substitution can only be observed with very strong defensive reflexes, or with strong motivation in any species, since it is more likely that stimulus-reinforcer pairings blend into pairings of much less powerful stimuli (e.g. second-order conditioning; see p. 86).
Although S-R associations have been under a cloud for several years, due to disillusion with theories in which they were the main or indeed the only component, there can be no question of leaving them out of learning theory altogether. Everyday habits, good and bad, are often describable as associations between stimulus and response, and learned skills such as swimming, driving, or riding a bicycle — and even social skills — patently require some sort of automatic pilot, which ensures that the right responses are made at the right times. The main point of doubt is whether the formation of habits is governed by strong motivational stimuli via a backwards stamping-in principle, as suggested by Thorndike, or whether some other more complex mechanism ensures the performance of responses in the first place, habits being then acquired merely by repetition (e.g. Spence, 1956). It seems wisest to keep both possibilities open, since Dickinson (1980, 1985), among others, has argued strongly that there is often a shift from initial voluntary control to repeated habits, and the mechanism of repetition without massive motivational feed-back seems important for human skills: but the less-complex stamping-in device would appear to be available to the restricted neural systems, such as those possessed by decorticate rabbits, decerebrate duck embryos, or decapitated cock-
roaches, which selectively make responses with favourable outcomes (p. 158). These preparations should not be regarded as identical; indeed it is arguable that decorticate rabbits or rodents, since they retain the basal ganglia and limbic system and other subcortical structures (which certainly are involved with motivation and the control of movement, and are even said by some to contain the seat of self-awareness: Penfield and Roberts, 1959; Creutzfeldt, 1979), should possess the essential equipment for voluntary goal-directed responses, even if lacking in much sensory and motor information processing capacity. There may in fact be more than one form of stamping-in, or immediate confirmation mechanism. Anatomical evidence suggests that the cerebellum has detailed wiring for its own stamping-in of input and output connections, which may be independent of the motivational drive/satisfaction devices of the limbic system. The limbic system itself is likely to have its most direct effects on forebrain structures (e.g. the sensory thalamus and the motor basal ganglia), and the vital centres lower down in the mid-brain and brainstem have their own homeostatic imperatives which may influence reflexive associations. Eventually, for example in the mammalian spinal cord, one gets down to a level which has no obvious mechanism for stamping in by pay-offs, but which may be capable of anticipatory response shifts between closely related sensory inputs. The point is that there is a clear difference between the biological functions of the bottom and top anatomical levels of a mammal’s brain, and an equally clear neurological disparity between the brain of a monkey and the ganglia of a cockroach or slug. It is merely the beginnings of any comprehensive biological account of learning to distinguish between reflexive habits, which are independent of knowledge of the goals which they may nevertheless achieve, and the possibilities of more considered, flexible and delicate adjustments of means to ends which have been offered in the course of brain evolution.
R-S* associations: responses made as means to ends
Although it is conventional to emphasize that instrumental learning is partly a matter of association between responses
and their practical consequences, it is useful theoretically to reverse the order here, and say that goal- directed behaviours will ensue if, when a given end is desired, the means by which it can be obtained are activated. Thus the order of the association can be that the wanted goal comes first, and the necessary response second, and this could be regarded as the essential minimum of a voluntary behaviour — that a representation of the consequences of responding directs the performance of the response. In this sense means- to-ends associations differ from stimulus-response impulses mainly in that an internal representation of the goal is part of the stimulus. It is possible, however, that some species may be capable of acquiring if-then knowledge in a rather more elaborate form, in which what is learned is that if a certain response is made, such and such consequences follow. In principle, this could be a ‘declarative representation’ formed and utilized without the response ever having to be made. But for this kind of representation to be practically useful, it would need to be attached in some way to response instructions, to ensure that when the consequences of responding assumes a high priority, so also does response performance. The conclusion from experiments on latent learning (p. 150) is that this form of representation is normally available to laboratory rats, for the purposes of maze learning.
The role of motivational factors in means- ends learning is bound to be strong in the context of response performance, almost by definition, and thus the psychology of drives or the mechanisms of wanting interacts with the learning of this kind of association. It is arguable that willed actions of any kind are based on the biological mechanisms which evolved to ensure that needed goals are achieved, by a built-in set of instructions to perform activities associated with gaining those goals. There is little doubt that the limbic system of higher vertebrate brains can function as a device of this kind. Mammals rapidly learn to perform an artificial response, such as pressing a lever, if this action switches on electrical current to an electrode strategically placed in the limbic system (e.g. in the medial forebrain bundle between the septal area and the hypothalamus), but the receipt of electrical stimulation at the same location elicits goal-oriented behaviours (eating
of food, copulation) and, most crucially, also arouses motivational states which are sufficient to promote the learning of new arbitrary responses to gain access to food, or to the opportunity for copulation (Olds, 1961; Caggiula, 1970; Mendelson, 1966; Roberts et al., 1967). Thus this type of instrumental learning is not so much a stamping-in of habits as a form of wish fulfilment.
R-R* associations: working rewarded by doing
At this point it needs to be acknowledged that a goal can be and often is the opportunity to do something, rather than the passive receipt of external stimulation. Glickman and Schiff (1967) and Herrnstein (1977) have stressed the importance of this for instinctive behaviours in animals, and Premack’s laboratory demonstrations that one activity can reward another (Premack, 1965) are often acknowledged by reference to the ‘Premack principle’. A mouse will press a lever so that it can run in a wheel, but the popularity of the Premack principle is no doubt due in part to the ubiquity, and perhaps the effectiveness, of instructions to children in the form ‘first do your homework, then you can ride your bicycle’.
R-S associations: feedback and latent expectancies
It would be a strong restriction to imply that means-ends relations could only be learned when the ends are being actively desired. One of the demonstrations of latent learning had rats pressing a lever for water which happened to be salty, then going back to this when they became salt-deprived (Kriekhaus and Wolf, 1968, p. 152). As a form of latent expectancy learning, or anticipation of feedback for responses where the feedback is informative without being strongly motivating, especially in motor skills, associations between responses and subsequent stimulation should be common. There are of course a number of built-in physiological mechanisms for supplying this kind of information, in proprioception, mechanisms of balance in the inner ear, and in the visual control of movement and posture (eye-hand co-ordination,
also evident in the difficulty of standing on one leg with closed eyes).
R-R associations: response chains and skills
Very often, chains of responses involve a great deal of internal and external sensory feedback, and therefore can be said to include S-R and R-S associations. The function of the feed-back is however to produce integrated sequences of movements — what is referred to as ‘a response’ may in some experiments be the twitch of a single muscle, but typically will require an unseen and unremarked feat of muscular coordination. Much of this is innately programmed, but most human skills demonstrate the possibility of complex learning of motor sequences. The control of some sequences without re-afference or proprioceptive feedback from response completion is evident in the playing of musical arpeggios, when this is done too fast for there to be any time for neural information to be sent back and forth. With prolonged training, the minimal form of response habits may dispense with as much feedback as possible — the early experiments by Carr and Watson (1908) on maze-running in rats were continued until the animals tried to run in the same pattern when the maze was lengthened or shortened. On the other hand, finely tuned perceptual and motor skills, such as playing arpeggios, or fast bowling, may require daily practice involving constant attention to feedback.
Conclusion: relation of classical and instrumental conditioning to other forms of learning
Under the heading of ‘varieties of memory’, Oakley (1983) has placed the kinds of learning covered in this chapter about half-way up on a continuum with genetically determined instincts and reflexes and learning in the course of growth below, and culturally acquired information, such as may be obtained through higher education, above. This is a wide-ranging and comprehensive account of learning, which gives classical and instrumental conditioning a context as a bridge between biology on the one hand and culture on the other.
However, even in this context, and perhaps especially in this context, it is important to recognize that what is simply described as conditioning covers a considerable variety of possible psychological processes. The general rule of thumb which applies to habituation to a single stimulus, anticipatory shifts in given responses due to stimulus pairings, and the development of new behaviours by response selection, is that these procedures correspond to biological problems for which there are vastly different psychological solutions. I have compared these solutions on a scale of levels of representation of the procedural events, which vary from responses in single neurons to cognitive representation and descriptions of perceived events and purposive actions which defy reductions to simpler levels. Cutting across these levels of representation there are however types of process which can be categorized in terms of what kinds of association are learned. Most simply, in stimulus-pairings, many of the behavioural effects of classical conditioning can be interpreted as the automatic and involuntary transfer of the properties of one stimulus to another, whereas in response- reward procedures, although there is often opportunity for this kind of stimulus-substitution effect, there is also the chance of effortful and goal-directed response-selection process, as a rudimentary form of voluntary behaviour. Not surprisingly, the system of motivated effort is more evident in motor movements than in visceral responses but the motor system also takes part in more automatic habits and skills, and the sensory systems have their own specializations of information-gathering, including the formation of associations between stimuli. Conditioning experiments may thus engage biologically useful processes of perceptual learning, motivated goal-seeking, and automatic habit formation. In later chapters I shall examine to what extent these processes overlap with mechanisms of perception and memory, and whether these more complex psychological processes can be considered as further elaborations of principles already inherent in the more basic biological solutions to the functional problems of anticipating important events, automatically ignoring irrelevant events, and discovering new ways of bringing about needed goals.