Discrimination learning:
reinforcement for
paying attention



`Telling the difference' between stimuli is a prerequisite for most other kinds of behaviour. Habituation (see p. 20) allows us in a sense to `tell the difference' between familiar and unfamiliar stimuli, and it is necessary to take notice of positive or negative reinforcers before any kind of conditioning can begin. In these cases however, we tell the difference without any special effort. Habituation, by definition, does not demand our attention, and an immediate reaction to attractive or aversive stimuli is something that we take for granted. Discrimination learning applies to distinctions that we do not make to start with, and might never make but for an adequate programme of training or experience. This description would cover most academic and manual skills, but as the reader may by now expect, the theoretical issues I shall deal with revolve around animal experiments. Discrimination has long been a focus for theoretical analysis, and it is worth looking at some of the behavioural techniques and results divorced from their possible implications in the first instance.

Methods and terms

In both classical and operant conditioning, the occurrence of reinforcement with one stimulus but not with another is likely to have the effect of intensifying responses to the reinforced stimulus and eroding responses to the other one. In either case the reinforced signal may be referred to as the positive stimulus or S-I-, and the non-reinforced cue as the negative stimulus or


S-. A positive stimulus of this kind, or a stimulus which has come to act as a sign or cue for a particular response, is said to be a discriminative stimulus. In its most obvious form, the discriminative stimulus may call for action in a GO:NO-GO or successive discrimination. Waiting at traffic lights is a typical GO:NO-GO, with the green light a discriminative stimulus for pulling away, and the red light the S-. This example is a good illustration for a frequent characteristic of GO:NO-GO discriminations, which is that waiting around in the NO-GO part is annoying if not actually aversive. To continue with the motoring metaphor, another form of discrimination requires choice between alternatives, as in deciding which direction to take at a junction or choice-point. For this choice, or simultaneous discrimination, there is no waiting period required, although if you don't know the way you might take some time to make up your mind. In laboratory tests of simultaneous discrimination, subjects would be presented with two or more stimuli, and asked to choose the `correct' one either by naming it or pointing to it, or for animals by touching the correct stimulus or making some similar indicative response. The simultaneous presentation of `right' and `wrong' stimuli allows for detailed comparisons of the stimuli to be made and for relational effects to be more prominent. Relational discrimination means responding to `the bigger' or `the brighter' rather than to a particular size or brightness. In the GO:NO-GO or successive discrimination method, only one stimulus is given at a time, and so comparisons are more difficult. It is much easier to detect a forged fivepound note if you have a real one to compare it with.

Another method involving comparisons is the matching-tosample procedure, which is usually given as a choice between two alternatives on either side of a centre sample, but can be much more complicated if for instance children are asked to `point to a letter like this one'. Pigeons might be given three buttons to look at, and rewarded if they peck whichever button is `the same as' the middle one. Depending on the number of samples and choices given, there is much more to learn in a matching-to-sample task than in a single `right or wrong?' discrimination.

Stimulus generalization and stimulus control. Measuring theresults of discrimination learning requires finding out if certain stimuli make any difference to behaviour. This is not always as


easy as it seems, since a subject who has apparently learned one discrimination may in fact have achieved his success by an alternative method. `Clever Hans' is a nineteenth-century horse now famous because he was supposed to be able to solve arithmetic problems by tapping out the correct answer, but was found to depend on subtle cues from his trainer. In such cases we say that the question to be answered is `which stimulus is controlling the behaviour?' Is it the ostensible signal or some other clue to the correct answer? An alternative way of putting the question is to ask `what stimulus is the subject attending to?' The answer has to be given by carefully isolating the critical feature or features of the environment which are being picked up. In most cases the technique is to alter some aspect of the stimulus and note any changes in behaviour.

Sometimes it is valuable to make a series of systematic alterations in a discriminative stimulus to judge which feature (or dimension) is controlling responses. If a pigeon has learned fo peck at a green square and we wish to know whether `green' or `position of square' is important, we could present the bird with red, orange, green, yellow and blue squares, in five different positions. We should probably find that the position of

gif Fig. 8.1 Stimulus generalization gradients. These examples could be obtained by reinforcing a pigeon's responses to a particular colour and then showing a number of colours at random, without reinforcement (After Guttman and Kalish, 1956)


the square made no difference, but the curve of number of responses against colour was shaped like figure 8.1, with a dropping off of responses as stimuli become less similar on the dimension of wavelength. Figure 8.1 shows steep gradients of stimulus generalization. Provided we are sure the stimuli did not differ in brightness or any other indirect clues, we can say that the steep gradient shows stimulus control for wavelength or, more loosely, `the pigeon was attending to colour'.

Learning with and without inhibition

Inhibition is a theoretical term, and its exact interpretation is not needed here. It is something which is held responsible for suppressing responses to non-reinforced stimuli. In practice it may consist of strengthening opposite or `antagonistic' responses, which may be seen as a deliberate `holding back' of the response, or a turning away from the negative stimulus. An extremely visible kind of inhibition is reported by Konorski (1948). A dog was given food after the signal of a metronome, but if the experimenter bent the dog's leg while the metronome ticked, food was withheld. As the dog learned that having a flexed leg meant there would be no food, he began to actively stiffen the leg, to `inhibit' the bending. After further learning `. .. we are almost able to raise the animal into the air by its extended limb' (Konorski, p. 227).

A less obvious kind of inhibition of responding can be attributed to the NO-GO stimulus in a successive discrimination. It may be related to frustration, or aversive emotional state, since the NO-GO stimulus serves as a negative reinforcer; furthermore, tranquillizing drugs often `release' the inhibition, so that the formerly prohibited responses are made once more. Pavlov's procedure for producing stimuli which were `conditioned inhibitors' (p. 38) continues to be used to demonstrate that a negative stimulus may continue to inhibit responding when combined with a positive stimulus (Hearst et al., 1970).

A phenomenon which was initially predicted on the basis of a theory of inhibition is the change in preference for positive stimuli produced by discrimination training. Peak shift (Purtle, 1973) occurs when the peak of a generalization gradient (cf. Fig. 8.1) is shifted away from a negative stimulus. For instance if a line pointing north-east is used as the `correct' stimulus, but a line pointing north-west is the `wrong' cue, a generalization test with lines pointing to sixteen points of


the compass might show that east by north-east, is the 'mostpreferred' inclination. The idea is that the inhibition of the`wrong' westerly cue has pushed the point of preference evenfurther in the `correct' easterly direction. An experiment likethis with seven- to eleven-year-old children suggested that peakshift was a sign of emotional immaturity (Nicholson and Gray,1972) but it has been found in adults for some tasks and is regularly obtained with animals.

A phenomenon often found under the same conditions as peak shift is behavioural contrast, which is an exaggeration of responding to the positive stimulus when it is intermingled with less favourable or negative situations. Vast amounts of research have so far failed to make very much sense of this, perhaps because there are several different reasons why response rates to a discriminative stimulus should be elevated (Mackintosh, 1974). But on the whole it is a fairly safe guess that mixing-up `good' conditions with 'less-good' or `bad' periods for the same individual enhances value or effectiveness of the `good' circumstances. The trials of adversity may make one more appreciative of an otherwise unrelieved sufficiency. The adversities imposed in animal experiments have included absent or less frequent food rewards, more difficult tasks to obtain food rewards, reduced intensities of rewarding brain stimulation or the addition of electric shocks. Exposure to these `hardships' has often produced more enthusiastic responding in subsequent standard conditions.

It is not clear that all the different ways of reducing the attractiveness of the situation should be regarded as producing a single kind of `inhibition', but they point up the fact that `not responding' to the negative stimulus in a simple discrimination may be accompanied by an unpleasant emotional state. Is this strictly necessary? Both Skinner (1938) and more recently Terrace (1966) have thought that the typical discrimination learning experiment makes the task more difficult than it need be by first training the subject to respond to the negative stimulus (S-) and then requiring him to go into reverse. Surely one could train the subject not to respond to the `wrong' stimulus in the first place? This, it turns out, produces faster learning of the discrimination task, with very few mistakes, and fewer unpleasant emotional consequences, as far as can be assessed by the absence of peak-shift and behavioural contrast. The techniques needed for this `errorless discrimination learning' are:


(a) never reinforce responses to S- from the start of training; (b) introduce S- early in training, before there is much tendency for the subject to respond to it by generalization; (c) fade in the S- gradually, starting with short exposures and low intensities; (d) for a difficult discrimination, start off with an easy task and then fade-in the difficult discrimination by superimposing it on the easier task.

These results indicate that `inhibitory' emotional states are not a necessary concomitant of discrimination learning, or of a sustained or very well learned discrimination. If, however, discriminations are established by associating one stimulus with `good' outcomes and another with `bad', it, is not surprising that early stages of learning are accompanied by a variety of emotional effects.

Attending and not-attending

The inhibition concept applies to the control of response output - Konorski's dog stiffened its leg to inhibit the movement of bending, and the definition of inhibition requires withholding of response. Equally important, but more difficult to measure, is the modulation of stimulus input, which is roughly what is meant in this context by attention. The subject's external methods of control of stimulus input are easy enough to observe - opening and closing of eyes, direction of gaze, eye movements etc - but it is assumed that selection of different aspects of the environment can go on in the absence, of such outward signs. `Paying attention' to colour rather than shape, or listening for the sound of high-pitched but not low-pitched tones may have little effect on the movements of looking or listening. Whether or not `attention' is actually happening has therefore to be deduced rather than directly observed.

The complete learning of a discrimination generally implies both that the attention is being paid to the correct stimuli and that the responses are being directed accordingly. Complications arise because there are many degrees of correct analysis of stimuli and degrees of response restriction. You may walk on the grass either because you have misunderstood the sign which says `Keep off' or you are walking in spite of having correctly read the sign. Conceivably you have carefully avoided looking at the sign in case it says `Keep off'. Possibilities similar to this make interpretation of discrimination learning in terms of attentional processes rather speculative, but some progress has


been made in designing experiments which distinguish changes in attention from changes in response output.

Some stimuli get more attention than others

This is one way in which attentional processes are fairly obvious. Strong reinforcing stimuli such as shock or food are not likely to be ignored, but some aspects of discriminative stimuli command immediate attention, irrespective of reward and punishment. For many animals smells are `more important' than sights or sounds, and discriminations based on smells may more easily be learned than those based on the visual sense (p. 112. Bright colours on the other hand may more readily be attended to than shapes or sounds in some species; this seems to be true of the pigeon. Pigeons given food for pecking a yellow key `notice' the yellowness, to the extent that they tend to stop pecking if the colour is changed (Fig. 8.1). If a similar experiment is done using sound, with a note of top C sounded instead of the yellow colour, the top C is ignored in that the birds carry on pecking in exactly the same way if the pitch of the tone is changed considerably in either direction (Jenkins and Harrison, 1960). By and large, vivid or intense stimuli will attract more attention from any species, but it has been proposed that each species could have its own stimulus preferences. Innate factors necessarily at least place limits on stimulus input, in the sense that animals with no colour receptors will be insensitive to colour. Even more detailed aspects of shape perception are built in to the nervous system (Hubel and Weisel, 1963) and therefore it is possible that certain shapes, or types of movement, are particularly vivid for a particular species.

However, sensitivity to types of stimuli can certainly be modified by experience. Conditioning factors can produce short-term changes in attention value of particular stimuli or stimulus categories (see below). Another kind of environmental influence, which may produce irreversible changes in the way the brain deals with sensory information, is experience early in life. Kittens allowed to see only vertical stripes during the first months after birth are unable to play with sticks held horizontally because brain neurones sensitive to horizontal lines have not developed (Blakemore, 1973). Rats reared with lots of experience of circles and triangles are much better at discrimination problems using these shapes than others, so it may be that early experience has a great deal to do with perceptual abilities in


later life.

Whether or not stimulus analysers are wired-in at birth or established by experience, it is possible that they will be sensitive to the motivational demands of the moment. It is likely that animals pay more attention to food when they are hungry than at other times and are alerted to any strange external stimulus by pain or danger (Bindra, 1969). Illness, as well as hunger, appears to predispose rats to notice tastes (p. 73). Effects such as these may have various causes, but it is convenient to summarize them by stating that the attention given to stimuli varies with their importance. The importance may change directly as a function of internal drives or more indirectly by the learning processes discussed in following sections. Diverse motivational effects are found in studies of human attention, including blocking of attention to obscene words, and enhanced sensitivity to the stimulus of one's own name (A4).

The effects of habituation and classical conditioning on attention

According to Sokolov (1963) it is possible for the conditioned response to become automatic after a time so that it is made with less `attention' in this sense. On the whole, though, it is true to say that our attention to a stimulus declines as habituation progresses and increases during classical conditioning. But this meaning of `attention', as similar to the orienting response, is not quite the same as selective attention which is the basis of a recent theory of discrimination learning (Sutherland and Mackintosh, 1971). The orienting reaction has more to do with alertness and arousal, or even awareness, produced by external stimuli, whereas selective attention deals with which aspect of a particular stimulus (for instance colour, shape or brightness) is perceived or processed. When you consider that there is yet another separate meaning of `attention' - that of mental effort or concentration - it is less surprising that some psychologists try to avoid using the word altogether, by referring to `stimulus control' instead.

Operant reinforcement for paying attention

The theory of Sutherland and Mackintosh (1971) is too complicated to be properly dealt with here but one of their basic assumptions is shared by a number of other theorists. This is the assumption that attention is paid to aspects of stimuli which


are important while other aspects are ignored. Tests of this assumption can be made when what is important is the reinforcement in discrimination learning. The implication is that in all forms of discrimination learning, stimulus information of certain types is only processed for as long as the processing `pays off' in terms of associations with reinforcement. In more specialized terms `an analyser is strengthened when its outputs consistently make predictions about further events of importance to the animal' (Sutherland and Mackintosh, 1971). The selective utilization of certain aspects of stimulus displays is most obvious when animals are specifically trained to pay attention to different kinds of stimulus at different times, or to ignore some types of visual information. Reynolds (1961), for instance, trained pigeons to pay attention to the colour of a disc and ignore the shape of a superimposed shape some of the time, but the rest of the time to ignore colour and respond according to shape. The signal for which feature of the disc to attend to was given by the intensity of a side lamp. The method of training was to give reinforcements according to colour in one phase, but according to shape in the other. In a similar experiment, Ray (1969) first trained monkeys with separate stimuli. They had to press the left-hand lever if a vertical line was presented but the right-hand for a horizontal line, in the first problem. Then one of two colours were shown and they had to press left for red, but right for green. Now the monkeys were put in a situation of conflict, by presenting them with a vertical line on a green background. One stimulus told them to press right, and another to press left. The quandary could be resolved, because the rewards were obtained by pressing left for the green vertical display, but right for the red horizontal display. Tests showed that the monkeys accomplished this correct solution by ignoring the colours altogether and working on the basis of the vertical and horizontal lines, instead of learning that the colours now meant the opposite of their original training.

It seems as though dimensions or features such as colour, shape or angle can be `switched in' or `switched out' at will, given appropriate training. This is consonant with our expectations that we can instruct someone else to limit or expand their attention if we ask them to `ignore the treble for a moment and listen to the resonance of the bass' or `never mind the quality, feel the width'.


`Dimensions' and analysers. According to Piaget, it takes some time for children to `pay attention' to physical dimensions such as volume, width and number in the right kind of way (C2) but we often assume that the environment can always be classified in terms of dimensions such as these, and in terms of other aspects of stimuli which we can verbally label, such as colour, angle, shape and so on. This is not justified, either with children or with animals. The behavioural justification of our state-. ments that an animal is paying attention to colour, or a child is `noticing how many there are', should depend on systematic tests, which reveal how the behaviour varies as function of variations in the physical stimuli (p. 104).

Even when we are sure that there is a close correspondence between a physical stimulus dimension, and a behavioural index of response, it may be difficult to tell exactly how the environmental information is being dealt with. A well-known example is the experiment by Lashley (1938) which showed that rats were discriminating between a square and a diamond shape. If rats are doing this it is tempting to assume that they are somehow `seeing' the diamond and square in the same way that we do. Lashley found that his rats certainly weren't, in so far as they were not looking at the tops of the shapes at all, but only at the bottom, as became apparent when he presented the tops and bottoms separately. The physiological or perceptual theories of how a rat or a machine could discriminate between patterns such as the diamond and square deal with the features of the patterns that are processed - in Lashley's experiment, only the flat or pointed bottoms of the figures. Analysers are hypothetical mechanisms capable of picking out particular features (Sutherland and Mackintosh, 1971, and A4). It is worth remembering that focusing on one dimension of the same stimulus source - e.g. noticing a shape but ignoring its colour - is relatively difficult. On the other hand it is quite easy to switch attention between stimulus sources : closing your eyes to listen instead of look, or even listening with one ear rather than the other (Triesman, 1969).

Improved attention to a specific dimension. There are a number of reasons for believing that experience at learning some discrimination based on a particular dimension helps when it comes to learning another discrimination on the same dimension (Mackintosh, 1974). A simple way of showing this is to test


subjects with a series of problems using the same kind of stimuli to see if they show progressive improvements in learning ability. For instance, if monkeys first learn to choose a red object instead of a green one (to find food), then a black form not a blue, then an orange rather than a brown, they improve with each problem (Shepp and Schrier, 1969). They are becoming connoisseurs of colour, not merely adapting to the situation, since changing the relevant dimension of the objects every time, from colour to shape and back again, results in no improvement over the first few problems. It is possible that improved attention to a specific dimension, such as colour or brightness, is responsible when improvement in learning with a particular type of stimuli results from prolonged training on one pair of `correct' and `incorrect' cues (overtraining) or when the `correct' and `incorrect' cues are repeatedly reversed after the subject has learned them (serial reversal learning).

Non-specific improvements in attention. The best-known form of general improvement in learning ability with experience is the learning set phenomenon (Harlow, 1949). This was discovered in a series of experiments with monkeys like the Shepp and Schrier one just mentioned. If rhesus monkeys are given several hundred problems, of choosing one from two objects presented to them, they learn increasingly quickly until, when presented with a new pair of objects, it only takes them one or two trials to find out which is the right one.,It is as if they were able to say after the first trial `this is the right one' if they chose correctly, or `the other one is the right one' if they had picked the wrong one. When subjects show this ability to make any new choice, they are sometimes said to have learned a winstay lose-shift strategy (Mackintosh, 1974). Some results apparently confirm estimates that rats, cats and different kinds of monkey can be placed on an ascending scale of `learning set ability'. Although a battery of learning tasks would probably confirm this ranking, it probably depends over much on visual skills, since rats have demonstrated rapid formation of learning ' sets when choices between two smells have to be learned (Slotnick and Katz, 1974). Perhaps learning sets are not quite so non-specific as we think and the monkeys are not so much `learning to think' as `learning what to look for'. Although it is less generally useful, other animals with poor vision may be I able to `learn what to smell for'. '


While `knowing what to look for' can make all the difference in some tasks it is really more like a specialized perceptual skill than a general increase in `attentiveness'.

Narrowing of attention span. In so far as attention corresponds to what we mean by being aware of a stimulus, or concentrating on a particular cue, it is usually accepted that we can only `attend to one thing at a time'. This is not really to say that nothing else is learned or remembered apart from the thing attended to, but it is usually easier to learn `one thing at a time'. This can be seen in conditioning experiments in the overshadowing and blocking of one stimulus by another. If two stimuli are given at the same time in a classical conditioning context, but one of them is `better' than the other, animals often concentrate on the better one and ignore the other, as far as we can tell by separate tests. For example, a rather dim light might normally work perfectly well as a conditioned stimulus in a Pavlovian experiment (p. 36). But if a very loud buzzer is always sounded at the same time as the dim light is turned on, the buzzer may overshadow the light so that a dog does not salivate if the light is turned on by itself. Alternatively, if we used buzzers and lights of equivalent intensities, but trained the dog first with the buzzer, shining the light along with the buzzer might make very little impression on the animal, because listening carefully for the buzzer may block out attention to other stimuli (as happens when someone closes their eyes in order to listen more attentively).

The main point about narrowing of the attention span is the idea that there is a limit to our capacity to process information and attend to stimuli, which makes it necessary to cut down on attention to some sources in order to devote the maximum effort to the most important information. In-built salience, high intensity, novelty and past associations with reinforcement are all factors which seem to add to the `importance' of particular stimulus categories. The activity of looking for or expecting a certain kind of feature, or of `paying attention' more generally, is a very significant reflection of the importance of stimuli, apart from the avidity with which the information is processed once it is available. In this sense, the `importance' of a stimulus can be assessed as the degree to which the stimulus reinforces the behaviour of attending.


Application of discrimination procedures - programmed learning.

Although discrimination learning is an area of considerable theoretical interest, the procedures and routines studied in this context are not without practical implications. Complex discriminations shade imperceptibly into concepts and cognitions, as we shall see in the next chapter; and rules of thumb which represent optimum conditions for successful discrimination learning have been found to provide useful guidelines in the design of the educational tools programmed texts and computer assisted instruction (Atkinson, 1968). Programmed texts are books in which the reader is invited to answer questions by writing in missing words as he goes along. This question-andanswer method is especially useful if the reader needs to memorize a good deal of the information in the book. Computer-assisted instruction is the latest form of the teaching machines advocated by Skinner. Simple teaching machines merely facilitate the presentation of the same type of material used in programmed texts, but a high technology version controlled by a computer may include batteries of coloured slides, light sensitive screens so that the student may register answers, by `pointing' to them with a special pen, and stereophonic headphones, not only for spoken instructions but also for the delivery of bursts of music as reinforcers.

Writing a programmed text or designing material for computer-assisted instruction requires at least as much skill and experience as traditional teaching, but there are several principles, dear to the hearts of behaviourists, which are generally adhered to.

(1) Each individual proceeds at his own pace. Few doubt that personal tuition has considerable advantages over group methods, and these are emphasized by an approach based on active learning. One-to-one training with a skilled teacher has yet to be superseded, but it is very entensive. Teaching machines and programmed texts are usually claimed to be cheaper, as well as better than their traditional equivalents. But it is essential that they are designed for individuals rather than groups.

(2) Each individual makes active responses. It is stressed that one should write in answers at every step of a programmed text. This insures that nothing is missed and leads to greater


involvement, like underlining or writing in the margin. It also allows for reinforcement of correct items. However even in computer-assisted instruction the responses are very restricted, compared with writing essays or `learning by discovery' with miniature scientific or artistic projects. The argument is not that programmed learning should replace such alternatives, but rather that by taking care of the routine teaching jobs it releases more resources for additional goals.

(3) Immediate reinforcement for correct learning. Immediate and frequent reinforcement by `getting it right' should lead to more accurate learning, since students cannot `get hold of the wrong end of the stick' and carry on regardless. It should also add incentive to the task. Further incentives include an interesting and colourful presentation of material, including the use of puppets and cartoon characters in some cases. `Playing with the machine' may be an additional reinforcer with the computer-controlled consoles.

Reinforcement should be positive, not negative or punishing, as far as possible: `getting it right' can be encouraged but `getting it wrong' should not be penalized. Children disparaged for getting their sums wrong are less likely to try to get them right than they are to dislike both sums and the teacher (or machine) which does the disparaging.

(4) Minimizing errors. Discriminations are less painful if they can be accomplished without errors. This contradicts the truism that we `learn by our mistakes' and some programmes called branching or intrinsic are designed not so much to minimize mistakes as to provide remedial sub-routines for different types of mistake. In programmed texts this is rarely possible, and a linear type of programme is used, which breaks down the material into extremely small and easy steps. If very small steps are used, and easy concepts are established before being elaborated into more difficult ones, few errors need be made, and the student progresses in an atmosphere of success. The ideal programme allows more gifted students to go through fast without getting bored, but provides ample time and material to ensure that slower learners do not get discouraged. Many linear programmes are very much spoonfeeding operations, but it is valuable to have spoonfeeding that works, since more challenging undertakings can always be added.


On the whole, programmed texts and teaching machines have had a limited impact on educational practices. However, the basic idea of trying to make learning easier and more enjoyable is still gaining ground (C5) and advancing technology may mean that the computer-assisted teaching machine eventually becomes commonplace.

Summary and conclusions

Learning a discrimination implies that we learn to perceive the difference between two similar things, and that we also learn to make separate responses to the two cases. Recent theories assume that perceptual learning and response learning are distinct from one another. The perceptual kind may involve learning to be alert and learning to pay attention to particular aspects of events such as colour or shape. Response learning frequently requires the inhibition of responses because the task demands that responses are not given to the incorrect stimulus. The practical techniques which bring about discrimination learning, and which may profitably be adapted for use in educational aids, are dominated by the need to associate reinforcement with a limited set of stimuli.