8   Discrimination, attention and perception

‘Thus, from the point of view of the conditioned reflexes, the cerebral hemispheres appear as a complex of analysers, whose purpose is to decompose the complexity of the internal and external worlds into separate elements and moments, and then to connect all these with the manifold activity of the organism.’

Pavlov (1955, p. 300)

The conditioning/extinction theory of discrimination

It is arguable that most, if not all, forms of learning involve differential reaction to certain classes of stimulus input. It is advantageous if certain response skills, such as those involved in driving a golf ball, can be called into action at will, under different external circumstances (as at different golf courses), but usually the deployment of such skills will require modification according to special kinds of informational input (as in the assessment of the desired distance and direction of the stroke). Although the learning of wild animals is often demonstrated by skilled motor accomplishments, such as the aerial acrobatics of the crow family, or the shell-cracking techniques of the oyster-catcher, laboratory analysis of animal learning can almost always be defined in terms of responses given to certain classes of environmental stimuli. Habituation can be described as the differentiation of familiar from novel events: often fine limits of discriminatory ability (for instance in


human infants) can be conveniently established by habituating the first response to a certain stimulus-pattern, and discovering the smallest change in this pattern that leads to recovery of the response. Any sort of Pavlovian conditioning demonstrates at least a crude distinction between the presence or absence of the conditioned stimulus. Goal-directed or instrumental learning is less obviously tied to a given external cue, but a rat or cat inside an experimental box only goes through the motions of pressing a lever to escape when the lever is present; and it is easy to demonstrate that this or any other rewarded action can be limited to a specific external signal such as a buzzer of light, and thus made a measure of the detection by the animal of these events.

The issues in learning theory which have been covered in previous chapters cannot therefore be divorced from the sensory capacities of the species doing the learning; it is, rather, a matter of having previously taken for granted certain kinds of perceptual sensitivity which will here be examined in more detail. However, it is also true that simple conditioning experiments, such as may involve responses to buzzers or lights, rarely tax the perceptual abilities of their subjects —indeed, the buzzers and lights may be chosen precisely because they present no discriminatory problems — and that new theoretical issues may be raised by experiments which are designed to in some way stretch the limits of an animal’s perceptual apparatus. Sometimes, if not always, ‘discriminative experiments, by introducing the element of choice and decision, must involve some new processes’ (Mackintosh, 1983, p. 259). It is thus possible to introduce the issues of discrimination learning by reference to the now thoroughly discredited theory of Spence (1936, 1937, 1940) and Hull (1952), which assumed that all of the phenomena of choice and discrimination could be explained by appeal to the strengthening and weakening of different sets of stimulus-response associations by reinforcement and non-reinforcement.

Suppose that a rat learns to press a lever for food rewards in a Skinner box while a light is on, but learns also to refrain from pressing when the light is off, in darkness, since no rewards are then delivered (Skinner, 1938). The simplest


conditioning/extinction explanation of this is that a connection between light shining in the eyes and the responses of lever-pressing is strengthened when this conj unction of events is followed by reward, while the absence of rewards for presses in the dark leaves this second association weak. In practice, Spence (1937) and Hull (1952) had to talk of ‘the relative strengthening of the excitatory tendency of a certain component of the stimulus complex’ (Spence, 1936, p. 430), since the rat in question would not make pressing movements if lights of appropriate brightness were to be shone in its eyes in its home cage. It is conceivable that the explanation in terms of direct connections between certain stimulus inputs and response outputs, and/or the modification in terms of response connections to elements of a stimulus complex, may yet be needed to account for motor reflexes differentially conditioned to tactile or olfactory stimuli in the slug Aplysia californica or in other invertebrates (Carew et al, 1983; Sahley et al., 1981). But there several different ways in which the conditioning/extinction explanation of discriminative learning is inadequate as an account of results typically obtained with vertebrates in the laboratory. Spence (1936, 1937) originally proposed the conditioning/extinction theory as an explicitly defensive measure against two claims of more cognitively inclined psychologists: first, that discrimination learning was discontinuous, the animal being only able to start learning once it had selected an appropriate ‘hypothesis’, or begun attending to the appropriate signals (Krechevsky, 1932, 1938; Lashley, 1942); second, that one kind of appropriate hypothesis concerned comparisons between two or more stimuli, so that animals might select stimuli according to relative values (by choosing the brighter, the bigger or the smaller and so on) instead of reacting only according to the absolute physical magnitudes of the external events (e.g. Kohler, 1925, 1929). By 1950, Spence himself had become even more defensive, and it is now rarely disputed that some or other analogue of an active attentional process is typically engaged by a discrimination task, and that in this and other ways the animal imposes internal perceptual organization on the physical events which may impinge on its sense organs (Sutherland and Mackintosh, 1971; Walker, 1983a, chapter 7).


Generalization along innate stimulus dimensions.

Even the conditioning/extinction theories of Hull and Spence made use of the internal organizing factors implied in the notion of stimulus generalization. Pavlov (1927, p. 113) found that if a tone of a particular pitch was learned as a conditioned stimulus ‘many other tones spontaneously acquire similar properties’, these spontaneously acquired properties being predictable as a systematic function of the degree of similarity of a stimulus to the already learned signal. This degree of similarity is an internal and to a large extent innately determined aspect of stimulus organization. For pitch, this is obvious in the case of species differences in upper and lower limits of sensitivity. Tactile generalization over the body surface, also observed by Pavlov (1927), requires some kind of sensory homunculus or body image. This may seem a relatively straightforward consequence of cutaneous sensation, but the brain organization required for many obvious-seeming stimulus dimensions is actually complex. Strong generalization along a dimension such as colour, much studied in pigeons, requires not only specialized receptors in the retina, but also methods of ordering and comparing the outputs from these further on in the visual system (e.g. Karten, 1979; Emmerton, 1983; Jassik-Gerschenfeld et al., 1977). Position in the visual field, the location of a sound source to the left or right of the receiver, assessment of the distance of an auditory or visual signal from the receiver — all stimulus dimensions of this kind are only possible with the benefit of complicated internal brain circuitry (see Masterton and Glendenning, 1979). One way of acknowledging this is to talk of hypothetical ‘analysers’ for given stimulus dimensions, assuming these analysers to be largely innate (Sutherland and Mackintosh, 1971; Pavlov, 1927). This is a convenient, though not terribly revealing, strategy.

Generalization gradients and peak-shift

The spontaneous but systematic change in reactions over a whole stimulus dimension can be graphed as a ‘generalization gradient,’ which under the right conditions can be a roughly


symmetrical and bell-shaped distribution about the stimulus value previously experienced. Smooth curves were obtained by Hanson (1959) and easily replicated subsequently, which show the generalization of the pigeons’ key-pecking response to other hues after being trained with a green of a particular wavelength, and serve to demonstrate the smoothness and cohesiveness of that species’ internal scale of wavelength, which begins peripherally with five different colours of oil-droplets in the retina, functioning as cut-off filters, plus the three different types of visual pigment (absorbing maximally ‘red’, ‘green’ or ‘blue’ light) in the retinal cone cells, which are more reminiscent of the primate system (see Figure 8.1).

gif Figure 8.1 Generalization gradients and peak shift.

Number of responses to a range of coloured stimuli, by pigeons only ever rewarded for pecking at a stimulus of wavelength 550. Various groups had previous experience of a specific non-rewarded stimulus (8-) of the wavelength value indicated. See text. After Hanson (1959).


The heavy line in Figure 8.1 shows data from birds which were trained on a variable interval schedule to peck a key illuminated with light at a wavelength of 550 nm, which


appears to the human observer as a rather yellowy leaf- green. After this they were shown the complete range of 13 hues, in random order, no further rewards being given, and responded to these at rates indicated on the figure. In this case the roughly bell-shaped curve was asymmetrical, with more responses being given to the green side than the yellow side of the yellowy-green maximum, but the usual result is a neatly symmetrical curve around this hue (Guttman and Kalish, 1956). Other birds were given up to 25 days’ extra discrimination training, after the initial experience with 550 nm, in which half the time they continued to be shown this particular yellow-green, getting rewards on the same schedule, but for the other half an even more yellow colour was presented (560, 570 or 590 nm for separate groups) with no rewards at all obtainable in its presence. This procedure produces behavioural evidence of discrimination, since the birds learn to respond when rewards are available, but not when they aren’t. However, the dotted lines on Figure 8.1 show data obtained following the discrimination training (with two stimuli), when all 13 hues were shown randomly, and rewards never given at all. In this test, pigeons with previous experience that a greeny-yellow was bad news, even though yellowy green had remained a signal for reward, showed relatively little responding to the exact stimulus they had been rewarded for (550nm) but had a high peak level of responding for 540 nm, a much greener green. A reduced effect of this kind was observed after training with a more orangey yellow (590 nm) as the previously negative stimulus. The fact that the peak level of responding in a generalization test is not given to the originally rewarded stimulus is referred to in the name given to this phenomenon, which is ‘peak shift’. Curves are not always so clear and consistent as those obtained for pigeons after colour discriminations, but a roughly similar change in generalization gradients following reward/no reward or ‘GO/NO GO’ successive discriminations has been found for line tilt with pigeons (Bloomfield, 1966) and children (Nicholson and Gray, 1972), for visual intensities with pigeons (Ernst et al., 1971) for auditory intensities with rats (Pierrell and Sherman, 1960), and for gravitational forces (produced in a centrifuge) with squirrel monkeys (McCoy and Lange, 1969).


The explanation of the peak shift effect may vary, but it is always powerful and incontrovertible evidence that the physical dimensions being experimentally varied are indeed sensed by the species concerned and are moreover internally organized on some kind of interval scale, so that for one reason or another suppression of responding to one side of a standard stimulus can be converted to an increase on the opposite side. Terrace (1966) showed that the result obtained by Hanson (1959) was temporary — if the two-value discrimination is continued for long enough (up to 60 sessions in the experiment by Terrace, 1966) then the subsequent generalization gradient is symmetrical exactly about the rewarded value. This piece of evidence implicates a temporary emotional and inhibitory effect of non-reward, and thus suggests that a version of the inhibition/excitation formula originally proposed by Spence (1937) is responsible for the peak shift effect. The general idea is that there are two separate generalization gradients, an excitatory, pro-response gradient produced about the exact stimulus value present for rewards; and an inhibitory, anti-response gradient centred on the negative stimulus which signals reward absence. When the inhibitory gradient overlaps with the rewarded stimulus, but only just, then it is possible to perform a subtraction of inhibitory influence from excitatory influence which results in a shift of peak response values (see Figure 8.2). Thus Spence’s theory of simple gradients about absolute stimulus values has merits in the context of the successive, GO/NO GO discrimination procedure.


Figure 8.2 The conditioning/extinction theory applied to peak shift.

Uniform generalization of both positive and negative response tendencies could produce asymmetrical generalization gradients. See text. After Spence (1937).


Generalization and transposition

But there is general agreement that Spence was wrong to assume that all perception can be explained in terms of learning restricted to exact stimulus values, and that his inhibition/excitation theory does not work for the task he designed it for, that of explaining transposition, and the discrimination of relative stimulus values (Riley, 1968; Mackintosh, 1983). Early work on stimulus discrimination (e.g. Coburn, 1914; Johnson, 1914) had suggested that animals often discriminate relative values, especially if given a simultaneous


choice between two displays. Thus a rat or chicken rewarded for choosing the darker of two shades of grey card is likely to transpose this relationship immediately to vastly different pairs — if it has learned to choose the darker of two fairly light greys it is more than likely to choose also the darker of two much darker ones, even though in an absolute sense this is further away from the brightness originally rewarded. Kohler (1929) emphasized this as an aspect of the Gestalt theory of perception. In a sense this should be quite uncontroversial, since brightness contrast, and other forms of context effect, are accepted as basic perceptual phenomena, as indeed are other Gestalt principles of grouping by proximity and similarity (Dember and Warm, 1979). Although animal discrimination learning often reveals good abilities for exact sensory discriminations, many species have demonstrated


capacities for also discriminating relational cues such as ‘darker than’, ‘greener than’ or ‘larger than’. In primates certainly, and possibly in other species, this extends to such abstract aspects of perceived displays as oddity and similarity (Bernstein, 1961; Wright et al, 1968; see pp. 278-9 below).

Selective attention to simple stimulus dimensions and problem reversals

The theories of Spence and Hull could be regarded as attempts to minimize the role of the animal in discrimination learning. Physical events impinge on the animals’ sense organs, and may or may not find their way through a network of stimulus-response connections to emerge as behaviour potentials — as little as possible is said about active processes of search, inference and judgment as organizing factors in animal learning: ‘what has been termed intelligent or insightful learning differs only in degree from blind or slow learning’ (Spence, 1940, pp. 287-8). There was a long-drawn-out argument between supporters of this sort of stimulus-response theory and their opponents, initially Lashley (1929, 1942), Krechevsky (1932, 1938) and Tolman (1932, 1948), which at the time was referred to as the ‘continuity- noncontinuity controversy’ (see Osgood, 1953, pp. 446ff. and Sutherland and Mackintosh, 1971, chapter 4). Lashley (1929) had noticed that rats in discrimination experiments often appeared to change suddenly from random behaviour to their maximally efficient performance. Thus he proposed that learning was not a continuous process of strengthening correct responses; rather, in the case of discrimination learning, it was a case of selecting or attending to the appropriate sensory features. In Krechesvky’s experimental studies (1932, 1938) rats were given a visual discrimination task such as choosing a black rather than a white card (placed to the right or left at random) for successive choices. It appeared that there were rapid shifts between ‘hypotheses’ which first were wrong —for instance position habits such as always going left, or always going right. Lashley (1942) took the extreme view that hypotheses were switched on and off as an ‘all-or-nothing’ process, only one being possible at a time. Spence (1940) was


from the start willing to admit that for choices between visual stimuli animals must point their eyes and head in the appropriate direction, as a preliminary to learning the task, but wished to include such ‘receptor exposure adjustments’ as merely another kind of gradually learned stimulus-response process. As Hull (1952, p. 93) put it in his chapter on discrimination learning, ‘exposing the receptors to the relevant stimuli in such a problem situation, will be referred to as receptor adjustment acts. The detailed theory of the evolution of this type of habit will be presented later in connection with an account of compound trial-and-error learning, of which it is a small-scale example.’

The continuity-noncontinuity controversy is partly a matter of terminology, and it is now conventional to talk of selective attention in discrimination learning, or of the switching in and out of perceptual analysers (Sutherland and Mackintosh, 1971), or of changes in the associability of certain classes of stimuli (Mackintosh, 1983). In terms of experimental predictions, the continuity-noncontinuity question boils down to the question of exactly how quickly and how abruptly such attentional processes change, and the answer, perhaps surprisingly, is that the changes tend to be moderately continuous (Sutherland and Mackintosh, 1971).

If the change in attention from an irrelevant to a relevant cue is an all-or-nothing discrete process, then it should not matter to animals if a problem is changed before they have hit on the correct solution. Thus if rats are being trained to choose a black but not a white door, because food is found behind the black one, but are responding at random, if this is because they are paying no attention at all to the nature of the visual cues it will not delay learning if the discrimination is reversed, food being always put behind the white door instead. This is termed ‘presolution reversal’. As it turns out, changing the problem in this way almost always delays learning, and this implies both that learning to pay attention to the correct cue is gradual, and that animals may show some sensitivity to the correct cue before they have begun to use it efficiently (Sutherland and Mackintosh, 1971). It should be pointed out that although rapid changes in discrimination performance are observed occasionally, as a


rule the degree of correctness in discrimination learning improves gradually, even for monkeys and chimpanzees, on the initial experiences with the tasks (Harlow, 1950).

A second straightforward experimental test might be called ‘post-solution reversal’, if a U-turn in choice is required after correct performance has become established. Two phenomena are observed in this case, one rather ephemeral but the other highly reliable, both of which support the proposition that attentional processes may change very gradually if not continuously. The ‘overtraining reversal effect’ appears to be obtained only in rats, and only when this species learns a moderately difficult discrimination for an appreciably large reward (Mackintosh, 1974, pp.602-4). Little theoretical weight should therefore be attached to it, but its occurrence is consistent with assumptions amply confirmed by different kinds of tests. If, once rats have learned to choose a black door instead of a white door, irrespective of spatial position, the task is switched so that the white door is correct, rats may take an inordinate amount of time to alter their original preference for black (100 or more trials). In some circumstances, however, prolonged experience with the original problem (overtraining) results in a more speedy alteration of performance when the problem is eventually switched (70 instead of 138 trials to reversal in the original report by Reid, 1953). It is possible to argue that this is because the period of overtraining enhances attention to the relevant stimulus dimension, although what this involves precisely is unclear (Mackintosh, 1974, 1983).

Although the effects of extra experience on the first reversals of a two-choice discrimination are only occasionally obvious, gradual improvements in performance on repeated reversals of the same discrimination is virtually certain in mammals and birds (see Figure 8.3). It is not clear whether gradual improvements in performance on serial reversal learning are due to increased attention to a particular stimulus dimension, as opposed to more elaborate learned changes such as the development of win- stay/lose-shift response strategies (Mackintosh, 1974; Mackintosh et al., 1985). However, the improvements must in some way be attentional even in the later case, since this strategy requires that attention is concentrated on


the choice made and its outcome on the immediately preceding trial. It is thus difficult to disentangle the perceptual from the memorial aspects of this phenomenon, but it is none the less of considerable theoretical interest, as there are reasonably consistent differences in serial reversal performance between fish, which show very limited improvements, birds and mammals, which generally improve more dramatically, and monkeys and apes, who are likely to demonstrate exceptionally accurate performance under these conditions (Mackintosh et al., 1985; Gossette et al., 1966, 1968; see ‘Learning sets’, p. 270 below).


Figure 8.3 Gradual improvements in serial reversal learning.       


The performance of homing pigeons which received either a large reward (4 seeds) or a small reward (1 seed) for displacing the correct choice of 2 blocks which covered left and right foodwells. The birds received 20 trials per day of training to respond to their non-preferred side, until they reached a criterion of less than 3 errors per day. Then the correct side was switched until the same criterion was reached, and this was continued for 24 reversals. After Gossette and Hood (1968).


It is always difficult to be sure that minor variations in


procedural detail are not the source of species differences in accuracy or speed of learning, leading some to conclude that all species differences are illusory (Macphail, 1982), but there are strong suggestions that fish show relatively little improvement when two-choice discriminations are serially repeated, by comparison with rats, and that commonly studied primates (rhesus monkeys, chimpanzees and children) show much greater improvement, again by comparison with rats (see Mackintosh et al., 1985; Harlow, 1949, 1959; Woodard et al., 1971; Gossette et al., 1968). One of the advantages of the serial reversal procedure for species comparisons is that such factors as the level of motivation and the discriminability of the stimulus choices used are to some extent controlled for in the data from the very first discrimination. If this is learned without undue difficulty, but leaving room for significant improvements which are not forthcoming, it seems fair to conclude that the species in question is lacking not in any basic capacity for discrimination but in some higher-order change in attentional strategy or use of memory, which enables the serial reversal improvement seen in other species to take place (see ‘Learning sets’, pp. 270-4 below).

Selective attention to stimulus dimensions and transfer effects

The reversal of two-stimulus discrimination tasks may not in fact be the most suitable way to demonstrate selective attention to a particular perceptual dimension. A rat or pigeon which has just learned to respond to a black but not a white card for its food rewards has considerable difficulty when the rule is first changed, and could be forgiven for ignoring the appearance of the cards altogether. In practice there is evidence that greater attention is paid to card appearance over a series of reversals, but at the same time the animals need to develop a form of cynicism about reliability of the dimension they are attending to. There are several other lines of evidence which bear on theories of selective attention in discrimination learning, and one of these concerns transfer effects when two or more problems are learned in succession,


the different problems not being simple reversals of the same two stimuli.

It was typical of experiments in Pavlov’s laboratory that the same dog should experience many different signals. One of the main reasons for this was most dogs only stayed awake during experiments if a wide variety of stimuli was used each day (Pavlov, 1927, pp. 285-6 — this precaution is not usually necessary with hungry rats and pigeons, even if they are restrained), which points to the connection between processes of selective attention and overall arousal or alertness (Sokolov, 1963). Pavlov took it for granted that a major aspect of brain function was that ‘it selects out of the whole complexity of the environment those units which are significant’ (1927, p. 110), and did not need to be convinced of the importance of attention in dogs, but noted two results which apply also to other species, and which indicate something about the details of attentional processes. First is the fact that a contrast between positive and negative stimuli produces an overwhelmingly more rapid and precise discriminative behavioural reaction than merely the repetition of a single positive stimulus (Pavlov, 1927, p. 117). This is explicable in terms of a mechanism by which discrepancies between expected and experienced outcomes are resolved, or which reduces more general uncertainties about motivationally significant outcomes. Any such mechanism would be involved in changes in arousal and alertness, and in selective attention to those stimulus dimensions containing information which reduce uncertainty (Sutherland and Mackintosh, 1971; see pp. 105-13 chapter 4).

Transfer from easy to difficult cases

A second phenomenon noted by Pavlov specifically implies that attention to a particular dimension can somehow be usefully switched in (1927, pp. 121-3; 396). Dogs shown a circle of white paper as a signal for food continued to salivate to an off-white circle of the same size even when this had invariably signalled the absence of food on dozens of occasions. If, however, a dark grey circle was first used as the negative signal, and then a couple of lighter shades of grey,


the original off-white circle could be readily and completely accurately distinguished from the white circle — in that it elicited no salivation. This might be regarded as the use of the traditional ‘method of limits’ in establishing just noticeable differences in animal sensation — Hodos and Bonbright (1972) used a graded series of intensity filters for precisely this purpose, and established that pigeons could detect whether or not a plain glass slide was inserted between them and a light source. But a more important inference for present purposes is that, although the dogs may in the first instance have been looking at the white and off- white circles (since they salivated only when they were presented) and thus have had appropriate peripheral ‘receptor adjustments’ (Spence, 1940), training with a brief progression of easier cases seems to have encouraged the switching in of a more central analyser. Transfer from easy to difficult cases is a reliable result (Lawrence, 1952; Mackintosh, 1974; Terrace, 1966) with many species. It applies not only to brightness, of course, but to colour (Marsh, 1969) and to shape (Pavlov, 1927, p. 122 used easy and then more difficult circle/ellipse discriminations). Vision in particular (but other senses also) involves many different stimulus dimensions, which reflect different methods of internal analysis of the same input, and not necessarily selection between different sources of input (noticing the colour as opposed to the shape of a circle does not require greatly different methods of inspecting the stimulus, especially in species with limited eye-movements). Separate brain mechanisms may of course be involved in different sorts of perceptual analysis (Walker, 1983a) and therefore the switching in of analysers may be almost literal, although on the grounds of behavioural evidence alone the analyser is a hypothetical though convenient construct (Sutherland and Mackintosh, 1971).

Transfer to alternative problems using the same dimension

The effects of transfer from an easy to a hard discrimination may be particularly noticeable, since performance on the hard discrimination without prior training is often extremely poor. But statistical comparison can reveal transfer effects in other


instances. For instance, when rhesus macaque monkeys are allowed to find food under one of two objects presented together, they make significantly fewer errors on the later pairs if they are trained first to choose a red object instead of a green one, then a black form instead of blue one, then an orange rather than a brown, that is, if they are given four colour problems in succession. However, if they are given two colour problems and two shape problems (e.g. circle versus square, cross versus ‘T’), shape and colour alternating, no improvements are observed (Shepp and Schrier, 1969). This kind of experiment is described as a comparison between ‘intradimensional and extradimensional shifts’, and it is very frequently found that the intradimensional shifts are learned more easily than extradimensional shifts, implying that consistent attention to the same dimension is an advantage (Mackintosh, 1974, p. 597).

Two carefully designed and influential experiments were reported by Lawrence (1949, 1950). In these, transfer was studied between problems in which very similar pairs of stimuli had to be responded to rather differently. In the first case rats were initially trained to enter one of two side-by-side compartments in a simultaneous discrimination. For different groups choice was on the basis of the walls being black or white, or on the floor being made of fine or coarse wire mesh, or in response to the compartments being of different widths. After this, the animals were trained on successive discriminations in an enclosed T-maze in which only one stimulus value was present at a time, and the rule was of the form, ‘turn left if the walls are white, turn right if the walls are black.’ In the initial simultaneous discrimination the most obvious equivalent rule would be ‘choose the black compartment whether it is on the left or the right’, and therefore the animals could not transfer exactly what they had learned on the first problem to the second. However, all rats trained on a black- white discrimination initially showed positive transfer (made fewer errors) in learning the second kind of black white discrimination. The experiment was balanced so that each of the three stimulus dimensions used in the first problem was tested with either of the other two present but irrelevant in the second problem: the expected positive transfer was


observed in all six cases, by comparison with control groups. A second experiment confirmed that positive transfer of a similar kind also occurred when the first problem was a successive discrimination (one stimulus at a time) and the second was a two- choice simultaneous discrimination, in the same apparatus (Lawrence, 1950).

It may perhaps seem odd that such results should require systematic confirmation — being trained on a black- white discrimination in one context certainly ought to be of use in learning further black-white problems. But the experiments were performed at Yale, with the advice of Clark Hull, and were therefore a very careful and quantitative analysis of what Lawrence called ‘the acquired distinctiveness of cues’. This is a useful descriptive term, but clearly the physical cues themselves do not change, or acquire new characteristics, except in the sense that prior experience leads animals to treat them differently. Lawrence made the theoretical distinction between selection via ‘orientation behaviour’ such as moving the head or focusing the eyes, and ‘mediating processes’ which are internal and unobservable. He pointed out that although changes in orienting behaviour could not be eliminated as an explanation for his results, his choice of diffuse stimuli — wall colour, floor texture, and apparatus width — made it highly unlikely that overt peripheral bodily adjustments were important. His mediating processes might of course just as well be referred to as selective attention.

Transfer to alternative problems using different dimensions — learning sets

For theories of selective attention, the most convenient results are those demonstrating that prior experience with one particular dimension, such as visual intensity or colour, facilitates performance on subsequent tasks involving that particular dimension, but not on similar tasks requiring discriminations on alternative dimensions. Lawrence’s results (1949, 1950) and many other results comparing performance across stimulus dimensions (for instance, intra- versus extra-dimensional shifts, Mackintosh, 1974) provide unequivocal evidence for some processes of transfer which are indeed selective


However this does not exclude the possibility of more general transfer effects. Learning not to attend to a single wrong dimension, such as the left-right position of displays which are always left-right randomised, may obviously improve performance on every other relevant dimension — for instance any individual visual feature which happens to be made relevant on the left-right displays. It is also conceivable that the learning of one discrimination problem produces nonspecific changes in alertness or attentiveness, which are of benefit in any subsequent task. Thomas et al. (1970, 1971) have suggested a factor of non-specific attentiveness on the grounds that a second GO/NO GO successive discrimination is likely to be more quickly learned than the first, but in these cases just the learning that rewarded and non-rewarded periods alternate over time may be responsible for part of the improvement. Rogers and Thomas (1982) found that nonspecific transfer effects only occurred when discrimination tasks were unaltered (i.e. when one successive task was followed by another), and suggested that what may appear to be transfer of general attentiveness is better described as transfer of ‘task-appropriate response tendencies’.

Harlow (1959) would have been able to include all effects like these in what he called ‘error factor theory’. He pointed out that there are a number of general features of correct performance in discrimination learning tasks which the experimenter may take for granted, but which the animal may have to learn gradually by trial and error, and which may be responsible for errors until they are learned. Even learning to expect a reward for a correct response is necessary in the first instance; usually the next phase is learning that there is a class of stimuli to which responses should not be made. Depending on the exact experimental procedure, there may be other general features of task solution such as the location of stimuli, the frequency of stimulus changes and the necessary strategy of responding which, when learned during prior tasks, may facilitate performance on future tasks. None of this is problematical — the question is simply exactly which features are responsible for particular transfer of training effects, and if and when selective attention to a salient stimulus dimension should be put down as one of these


general features of task solution. The phenomenon that Harlow (1959) was concerned with was his previous discovery (1949) of ‘learning sets’. This is a transfer of training effect which on the face of it cannot involve selective attention, since it involves progressive improvement over a succession of problems which explicitly require that there is no single salient dimension.

Learning sets

These may be observed in a succession of standard two-choice simultaneous discriminations. Typically they are obtained when primates are presented with a tray on which there are two objects, a raisin or peanut being under one of these. The monkey (or ape) then reaches out an arm and pushes away an item, to retrieve its food incentive if it is correct. The apparatus is known as the WGTA (Wisconsin General Test Apparatus) (see Figure 8.4). A wide range of stimulus items is needed for the study of learning sets, which may vary on known dimensions — stars, circles, squares, pyramids and so on, varying in colour, height and size — or which may be more easily discriminable ‘junk objects’ — cups, bottles, toys, kitchen implements and other human artifacts. Whichever pair of objects is chosen for a monkey’s first discrimination problem, the animal will take a considerable number of trials — certainly more than four or five — before it settles to the habit of always pushing away first the one which the experimenter has chosen to associate with the food incentive on this problem. The learning set effect is observed only in animals who have received practice on literally dozens of different pairs of stimulus items. As experience is gained on different pairs, the rate of learning each new pair gradually speeds up, to the extent that, after 200 or 300 pairs, learning is instantaneous. The first time a new pair is presented the animal has only a 50 per cent chance of making a correct choice. But theoretically, the second time a new pair is presented, the animal has 100 per cent chance of being correct, provided it remembers the results of the first trial and makes its choice accordingly. Harlow’s finding (1949) was that rhesus monkeys did indeed become 100 per cent


correct (or virtually so, at over 97 per cent) on the second trial of all new problems, but only after they had had experience of learning with 250 previous pairs of objects.


gif Figure 8.4 The Wisconsin General Test Apparatus (WGTA).

A widely used method of testing visual or spatial discriminations in primates is to conceal food under only one of two or more objects presented to the animal on a given trial. The development of this method at the University of Wisconsin has led to the use of the acronym WGTA. Similar object-displacement methods can be used with some non-primate species (see Figure 8.3). After Harlow (1959).


There are two important theoretical aspects to this finding, which Harlow initially described as ‘learning to learn’. First, although avoiding simple sources of error such as fixed position habits may be useful in the early improvement of performance, the explanation for the eventual high level of performance is probably that the animals have adopted a strategy of making full use of their memory of the immediately preceding trial — since this high level of performance can be seriously disrupted merely by lengthening the interval between successive trials (Deets et al., 1970; Bessemer and Stollnitz, 1971). In a sense the only question here is the effect


of time on immediate memories, since how else could second trial performance be correct if the animal did not remember the result of the first trial? However, the second theoretical aspect of learning set is more controversial, and this concerns its use as a measure of species differences. Hayes cit at. (1953) trained three chimpanzees with a procedure similar to Harlow’s and obtained over 95 per cent performance after 150 problems. Fisher (1962), using the same apparatus, but pictures cut from magazines instead of three-dimensional objects, found less impressive performance in two young gorillas (80 and 84 per cent correct on trial 2 after 232 problems), but there is good evidence that Old World monkeys and apes can generally reach over 90 per cent correct performance after 200 or 300 problems with procedural details like those in Harlow’s original experiments. Attempts to obtain learning sets in analogous procedures with species other than primates have generally produced trial 2 figures very much less than 90 per cent correct even after 500 or more problems, and both Harlow (1959) and Warren (1965) deduced that Old World primates must possess some quantitatively superior capacity which allowed for this difference in performance. Warren (1973) and others (e.g. Macphail, 1982) have since maintained that this was unjustified, and that the better performance of the Old World primates, in so far as it stands up to detailed comparisons, is due only to better visual perception or to some similar relatively peripheral or contextual advantage.

It is certainly difficult to use learning set performance on its own to argue for some fundamental intellectual superiority in our closest relatives. However, critics of the comparative use of learning sets may have overstated their case. For example, one study which was procedurally appropriate in other respects, and appeared to show that mink and ferrets could reach primate levels of performance, failed to control for olfaction, since these carnivores demonstrated their abilities only by pushing with their nose at that door of the two available behind which a piece of meat had been placed (Doty cit at., 1967). The tuning in of a very acute sense of smell may be as biologically advantageous to carnivores of the weasel family as immediate visual memory is to primates, but it is


not the same thing. Slotnik and Katz (1974) showed that rats could rapidly be trained to make almost immediate discriminations between members of 16 pairs of smells delivered to them alternately in 5-second puffs in a special apparatus, using a GO/NO GO discrimination procedure. It is fair to regard this as evidence that rats are likely to learn olfactory discriminations much faster than visual ones, but it is naive to suppose that this rapid learning results from the same kind of cognitive capacity as that employed by primates during learning set tasks. One-trial learning can be observed in many laboratory paradigms, including avoidance of objects or locations associated with electric shock, and in the taste-aversion procedure (see pp. 232ff.). But it is not the one-trial aspect of learning at the end of a learning set experiment which accounts for Harlow’s initial interest in the phenomena, but the very gradual improvements which indicate ‘learning to learn’. The present theoretical position is that primate learning set performance should be explained by switches in attentional processes, which lead eventually to the use of immediate memory of the choice made and the resulting outcome on the previous trial.

Claims that the intellectual capacities necessary for this cognitive strategy are widely distributed in the animal kingdom should be treated with caution. For instance, Morrow and Smithson (1969) reported that they had discovered ‘learning sets in an invertebrate’ on the grounds that they had succeeded in training eight small wood louse-like crustaceans to creep round a ‘T’-maze, and observed a statistical decline in the errors made over several reversals of the direction of the correct turn. There are a number of possible accounts of how the nervous systems of these creatures might accomplish improvements in locomotors adjustments of this kind, but it is surely unlikely that any of them would have a great deal in common with the explanation of primate learning sets. In this case the behavioural test reported as ‘learning set’ hardly justified the term. (The major features of the training set phenomenon are: (a) eventual performance on a given problem is well above chance on the second trial; and (b) this performance develops gradually as a result of experience with several hundred pairs of objects.)


However, others have taken care to use a primate-style procedure when testing non-primate species. Hunter and Kamil (1971) used 700 pairs of junk objects in two- choice discrimination problems presented to blue jays, which obtained an invertebrate reward (half a mealworm) only if they displaced the object designated as correct on any given problem. This is closely analogous to the two-choice discriminations rewarded with raisins or peanuts for monkeys, and there appears to be a genuine improvement of a learning set kind in blue jays, since trial 2 performance improved from chance on the first few problems to 73 per cent correct after 700 problems. It is thus reasonable to claim that blue jays and also mynah birds (Kamil and Hunter, 1970; Kamil cit at, 1977) are capable of learning to employ win- stay, lose-shift strategies which make use of memories of immediately preceding trials. But this is hardly a challenge to the assumption of primate superiority, since the level of performance achieved — roughly 70 per cent after up to 1,000 problems —is no match for the almost perfect performance in rhesus monkeys after 250 problems that was reported by Harlow (1949). It ought always to be remembered that chance performance on these tasks is 50 per cent, and that therefore that 90 per cent can be regarded as double the improvement on chance represented by 70 per cent.

Direct switching of selective attention

Transfer experiments like those of Lawrence (1949, 1950), in which training on one problem with a certain kind of stimulus improves performance on a second problem with the same kind of stimulus, strongly suggest that an active, ‘top-down’ change in the receiving system during the first problem modifies the way in which external stimuli are detected during the second problem. They thus support two-stage theories, such as that of Sutherland and Mackintosh (1971), in which the receipt of external stimuli is a variable process subject to learning (and the output of certain responses governed by stimuli as received is a second process of learning). But transfer experiments are certainly not the only, and perhaps not even the best, form of support for the hypothesis of selective


attention in discrimination learning. Several quite different types of experimental manipulation can be used to examine variations in responsiveness to chosen categories of stimuli. Bond (1983) performed very straightforward experiments on visual search in pigeons, which he suggests support a hypothesis of attentional thresholds; he supposes that attention to a particular stimulus category can be switched in when this category is frequently encountered. His technique was to present birds with 20 grains consisting of various proportions of two types, black gram beans and red wheat, placed on a background of mixed gravel of a similar size, allowing them enough time to peck up about half the 20 grains available. The data show a clear bias in the proportion of each type of grain taken; when a 50:50 mixture was presented the birds retrieved equal amounts of the two types, and responded relatively slowly; but when, say, 80 per cent of the grain presented was of one type, then the birds had an exaggerated preference for this type, picking it on more than 95 per cent of their successful pecks, and also responded much faster. The theory is that the animals are able to switch in selective attention to a particular stimulus category when the frequency of discovery of the category exceeds some threshold. (Conversely, the speed at which a single instance of a certain stimulus type can be found in a visual display may be progressively and adversely affected as the number of alternative ‘distractor elements’ is increased: Blough, 1977, 1984).

Conditional discriminations

A similar sort of explanation, in terms of the switching-in of a stimulus analyser, or the temporary turning-on of sensitivity to a particular configurational cue, has been applied to many other kinds of discrimination learning. Pavlov’s method of contrasts itself, requiring a distinction between rewarded and non- rewarded stimuli, is the simplest possible demonstration of the conditional nature of reactivity to stimulus variation. Another well-known example was reported by Jenkins and Harrison (1960). Pigeons rewarded for pecking at a key when a 1,000 Hz tone is on are subsequently indifferent to large changes in the frequency of this tone. But if the tone is made


a signal for reward, because its absence indicates no rewards are obtainable, then this appears to enhance attention to the tone considerably, since the birds show sharp drops in responding whenever the frequency of the tone is changed, in either direction, producing very steep generalization gradients. Although it is usually concluded that pigeons are poor at utilizing auditory cues, the correlation of an auditory stimulus with food rewards appears to greatly increase auditory analysis. We may confidently expect pigeons to be normally indifferent to the music of both Bach and Stravinsky, but if the distinction between Bach and Stravinsky is a necessary preliminary to food, pigeons learn to make it, or rather, they analyse heard sounds sufficiently to generalize from Bach to Buxtehude (Porter and Neuringer, 1984; see below).

The rapid switching in and out of attention to alternative aspects of the outer environment is sometimes appealed to as an explanation for various kinds of effect under the heading of conditional discrimination. Lashley (1938) trained rats to jump towards an upright triangle on a black background, and to an inverted triangle on a striped background, but not towards an upright triangle on a striped background, or an inverted triangle on a black background. A probable explanation of this effect in rats is that each compound cue is learned separately (Sutherland and Mackintosh, 1971). However, there are other cases of conditional discrimination which look as though one kind of stimulus has become a signal for attention to be paid or not paid to another kind. For instance Yarczower (1971) studied how often pigeons pecked at stimuli made up of a white line which could be tilted at five different angles and projected on either a red or a green background. The training procedure specified that food could be obtained by pecking the green background, irrespective of the angle of the line projected thereon, but that when the background was red, reward could be obtained by pecking at it only when the background contained a vertical white line — a similar line at a 40 slant meant that pecks would go unrewarded. As one might expect, when the key was red, any deviation of the white line from the vertical suppressed the level of responding; but a consistent rate of response was given to the green key, whatever the tilt of a


white line superimposed on it. This led Yarczower to suggest that something had been learned which was roughly equivalent to ‘if red, pay closer attention to line tilt than if not red’.

Blough (1972) reported a much more elaborate set of data, in which wavelength of a visual stimulus was combined with the frequency of a tone, or a timing variable. He concluded that the interactions between any pair of dimensions were multiplicative, in that if one component of a compound stimulus was very different from the value which signalled reward, then changes in the other component had little effect — this can be taken as implying attentional changes in the context of the statistics of signal-detection theory. On the face of it, a much more direct use of attentional processes is implied by the simpler experiment performed by Reynolds (1961), who trained pigeons with four stimuli presented on the same key for 3 minutes each, in cycles which allowed for each stimulus to get 12 3-minute presentations in a daily session. The four compound stimuli consisted of white triangles and circles on blue or red backgrounds. One of these compound stimuli, the red triangle, always signalled reward; another, the blue circle, never did. The consequences of the other two compound stimuli varied, since there was an additional rule: there were two sidelights, and when one was on redness signalled reward, and when the other was on the presence of the triangle signalled reward. The main result of this training was that birds responded to the red circle and the blue triangle only when these stimuli signalled reward, as well as always responding to the red triangle and never responding to the blue circle. The simplest mechanism of accomplishing this differentiation, as it would appear to the human observer, involves selective attention to either the figure or the ground of the compound stimulus, according to the additional cues, one of which indicates that any triangle would be rewarded, and the other that any red background signals reward. It is not clear however that this is what occurred. A replication of a very similar compound discrimination by Reynolds and Limpo (1969) found inconsistent patterns of response when the coloured backgrounds or white forms were presented alone, with the birds responding more to coloured backgrounds alone than to the white forms alone.


The simpler method of demonstrating conditional discrimination with compound stimuli is that of Lashley (1938). This was used by Born et al. (1969). A circle and a triangle in white outline were projected with either a red or a green background. Only two of these four possibilities signalled reward for any one of the pigeon subjects: for instance the red circle and the green triangle might signal reward, but the red triangle and the green circle not. After this rule had been learned in Born et al.’s experiment, uniform red- and green-coloured stimuli and circle and triangle outlines on a dark background were presented alone as a test. None of the subjects responded to the colours alone, but all three responded vigorously to one of the shapes, but not to the other. This is exactly what would be expected if the colours were being used as the first stage of a two- stage strategy, to indicate which of the two shapes should be responded to, there being a bias towards one or other of the shapes in the absence of the usual colour cue.

There is an alarming variety of visual discrimination experiments on pigeons, and also on laboratory monkeys: the most reliable conclusion is probably that no single explanation will account for all results. Both the learning of elaborate compounds (see below) and isolated learning of individual elements of compound stimuli are well substantiated, in both species. However there is some support for the notion that pigeons respond most to the literal appearance of individual visual displays while by comparison laboratory monkeys are capable of more abstract representations. This is clearest in the case of the rules of responding according to similarity of oddity in ‘matching to sample’ tests, using three displays. Suppose the middle of three displays is illuminated red. This display is touched or pecked, and goes off, but the two side displays are now lit red and green. A rule of similarity or matching to the sample would require that the red display be chosen, whatever side it was on, such choices being the only ones rewarded. A rule of oddity would require that the green alternative be chosen after a red sample, but a red alternative chosen after a green sample. Primates are generally capable of abstracting such rules, in that after being trained with several colours they can immediately apply the


rules to new colours, or two new sorts of visual stimuli (Bernstein, 1961: Premack, 1983). Although there is not a complete consensus on how data obtained from pigeons should be interpreted, obtaining evidence for any degree of abstract learning in this context is very difficult. A review of previous experimentation by Carter and Werner (1978) concludes that pigeons in these circumstances almost invariably learn a set of ‘sample-specific rules’. That is, for matching to sample, they learn that if a red sample, choose a red alternative, and if a green sample, choose a green alternative, without being able to apply a general rule of similarity when other visual stimuli are tested (Mackintosh et al., 1985; Wilson cit at., 1985). This sounds very much like a form of temporary priming for recognition of particular stimulus displays, but since it works just as well, if not better, for the oddity rule, it might be regarded as a very quick and temporary switching-in of receptivity to particular stimulus patterns.

Theories of attention in conditioning

Although the more complicated procedures of discrimination learning experiments may bring into play mechanisms not normally activated in the simplest of conditioning experiments (Mackintosh, 1983), there are a number of points of contact between theories of discrimination learning and theories of basic associative processes such as those of Rescorla and Wagner, (1972), Mackintosh (1975), Wagner (1978) and Pearce and Hail (1980) which were discussed in chapter 4 (pp. 105 ff.). A very general feature of all these theories is that they are attempts to account for the waxing and waning of the ‘effectiveness’ or ‘associability’ of the conditioned and unconditioned stimuli (CS and US) in classical conditioning procedures. In terms of behavioural predictions to be made, there is rarely anything to choose between theories which refer to variations in the effectiveness or associability of stimuli and those which make similar points in terms of selective attention given to stimuli, or analysers for these same stimuli being switched in (thus making the stimuli more effective and more capable of being associated


with other things). Mackintosh (1975) notes that there is a formal equivalence between assuming that change in the associations made to specific stimuli vary according to a ‘learning rate parameter’ which depends on previous experience, and assuming that the probability of learning anything about a stimulus varies according to the amount of attention given to it, which also depends on previous experience. The main advantages in continuing to use the phraseology of attention are that, first, it allows easier comparisons with the present concerns of discrimination learning, and second, that it points to additional sources of evidence, such as variations in the observed degree of alertness of animals, or in their orienting behaviour to specific sights and sounds. Thus, the theory of Pearce and Hall (1980) was proposed primarily in terms of ‘variations in effectiveness’ of conditioned stimuli, but informally they suggest that the variations in effectiveness arise because associations depend on a limited capacity processor, and that potential conditioned stimuli will compete for access to this central processor, which is very much compatible with a selective attention approach (Broadbent, 1958, 1984). And in practice, some of the support for the Hall and Pearce model is provided by overt measures of attention, such as bodily orientation towards and physical contact with a light source (Kaye and Pearce, 1984).

Therefore there is a case that both the phenomena of discrimination learning, more usually discussed in terms of attention, and the findings of simpler conditioning experiments should be applied as tests to the same theories. The theories examined in chapter 4 in the context of classically conditioned associations may be reviewed as follows.

Rescorla and Wagner (1972)

The equation given on p. 106 above is usually interpreted in terms of the powers of the unconditioned stimulus (US) (Dickinson, 1980). The acquisition of a conditioned response reaches a limit in this account because only surprising or unpredicted motivationally significant events influence previous stimuli. In the early stages of learning, food received by a Pavlovian dog is relatively unexpected — therefore extra associations are formed to a preceding buzzer signal: in the


final stages of learning however the dog now expects food because of the buzzer — the buzzer will have reached its limit and any other signals added in conjunction with the buzzer will not acquire further predictive properties. Nothing much is said here about selective attention to the signalling stimuli, because the explanation is in terms of the predictability of the reinforcement.

Mackintosh (1975)

Mackintosh proposed (as did Sutherland and Mackintosh, 1971) that subjects increase attention to relevant stimuli and decrease attention to irrelevant ones, relevance being defined in terms of the degree to which a stimulus dimension can be used to predict the occurrence or non-occurrence of reinforcement. The factor of predictability of motivationally significant stimuli, as used by Rescorla and Wagner (1972), is included by assuming that attention to a stimulus dimension is increased only if it allows for the prediction of otherwise unexpected reinforcement (or predicts the omission of otherwise expected ones).

Pearce and Halt (1980)

These authors emphasized what is implicit in some earlier theories — that attention to a particular signal will be reduced when it has very high predictive accuracy, and maximum attention will be given to a stimulus when its outcome is uncertain.

Wagner (1976, 1981)

Wagner’s elaborate theorizing makes use of several concepts derived from studies of human information-processing, including: the shift from controlled to automatic processing (Shiffrin and Schneider, 1984) which may be related to the decline in active attention given to a well-learned condition stimulus, emphasized by Pearce and Hall (1980); the active representation of to-be- associated events (‘rehearsal’, after Atkinson and Shiffrin, 1968) in a short-term memory store; and the decrease of such activity produced as a consequence of the presentation of a particular stimulus if its representation has already been ‘primed’, which is similar to the notion


involved in the other theories above that an already expected event will not arouse much additional attentional effort.

Discrepancy and expectancy theories and discrimination learning

Of the theories very briefly encapsulated above, none except that of Mackintosh (1975) was designed to handle the basic phenomena of discrimination learning, and therefore few specific predictions from the theories about discrimination learning are made. It is however possible to list some sources of overall consensus and a few points of disagreement.

Paradoxes of knowledge and arousal

All the theories above include some kind of acknowledgment that when learning is complete the flow of information into the learner is somehow restricted. In the extreme case, in Pearce and Hall’s (1980) theory, a signal which predicts an event of great motivational significance with perfect accuracy is at some level no longer processed. This is in conflict with the initial assumption of Sutherland and Mackintosh (1971) that an analyser capable of predicting rewards perfectly should at the limit of learning be switched in to the maximum extent possible. A number of arguments arise from this conflict. The discriminability of the signal, and the question of exactly what is switched in or out, require further specification (see below). But it will always have to be acknowledged that the early stages of learning, when curiosity and uncertainty are highest, may involve more rapid change in knowledge, or more rapid formation of associations, than the later stages of learning, when the prior acquisition of knowledge may only be rewarded by routine and automatic (but correct) responses. There may thus be more learning actually going on in the early stages. But this does not mean that information has been lost in the later stages: performance which appears to be routine, with little sign of alertness or strong orientation to relevant stimuli, can change very rapidly when accustomed outcomes are changed, for instance when a discrimination is reversed. For this reason, it may be necessary for theories of


attention to distinguish between arousal and orientation and the switching in of an analyser — it may be necessary to allow for an analyser to be fully switched in when there are few signs of active attention. This is of course implicit in theories of habituation which are able to account for the ‘missing stimulus effect’ — no response at all may be given to a predictable train of stimuli, but the minimal but functionally vital degree of attention usually given is revealed by arousal and orientation when the stimulus is missed out (see p. 51).

This can be related to two phenomena of discrimination learning, the ‘errorless learning technique’ (Terrace, 1963) and the effect of unrewarded exposure to shape stimuli on subsequent shape discrimination learning (Channell and Hall, 1981). Terrace’s errorless learning result suggests that high arousal and orientation to stimuli in the early stages of learning is not strictly necessary. Very gradual changes to the stimuli in a easy discrimination task allow for the animal to learn painlessly and without apparent arousal a more difficult discrimination that might otherwise be difficult or impossible (cf. Pavlov, 1927, p. 122). The orientation and arousal part of attention is involved in searches for the correct predicting cue — once the correct predicting cue is known, it is no longer so necessary. Gibson and Walk (1956) reared young rats in cages containing cut-out triangles and circles and showed that these animals were better able than others to learn a subsequent circle/triangle discrimination. However in other cases prior exposure to the stimuli to be used in a discrimination retards learning (Hall, 1980; Bateson and Chantrey, 1972). Channell and Hall (1981) demonstrated that rats exposed to stimulus objects in their home cages learned a subsequent simultaneous discrimination (in a Lashley jumping stand, between horizontal and vertical stripes) better than control subjects; but if animals were given experience of the stimuli in the discrimination apparatus itself, but without differential reward or punishment, the learning of the discrimination was retarded. This might be interpreted in terms of the learning in the home cage of processes of perceptual analysis (‘formation of a neuronal model’ in the version of Sokolov, 1975) which turn out to be useful if applied in the discrimination apparatus. But when animals encounter


the stimuli in the experimental apparatus itself, any perceptual learning may be vitiated by the additional factor of learned irrelevance (see below) — having become accustomed to the stimuli in the absence of differential reward, there will be less reason for these animals to attend to these now relevant sources of information when differential rewards are introduced, by comparison with those introduced to the whole task at the same time (see Dickinson, 1985 for a hypothesis about the correlation between signalling experiences and outcome experiences as a necessary factor in learning).

Learned irrelevance

Mackintosh (1975) pointed out that learning to ignore irrelevant stimuli may be as much part of selective attention as learning to pay more attention to those stimuli which are useful. The term ‘latent inhibition’ is usually used for cases where, for instance, unrewarded presentation of a buzzer retards subsequent learning that the buzzer is a signal for food. If this is interpreted as due mainly to habituation of attention to the initially irrelevant signal, then clearly the effect should apply just as well to discrimination learning. Halgren (1974) reported that prior exposure to either the positive or the negative stimulus retarded learning of a subsequent discrimination. ‘Learned irrelevance’ may refer to a stronger effect, in which a stimulus and a reinforcer are both experienced prior to conditioning, but with no correlation between them (Baker and Mackintosh, 1977). Thus if a dog receives food occasionally and hears a buzzer occasionally, without any statistical relationship between one and the other, it may be even slower to respond to a signalling relationship when one is introduced than a dog that has merely become habituated to a tone without food. Mellgren and Ost (1969) showed that the same effect occurs in the context of discrimination learning, in that rats which experienced alternations of a tone and a light, with water reinforcements given at random during both, afterwards took longer to learn to press a bar for water during one stimulus but not during the other, by comparison with others that had the same prior experience of the stimuli but without random


reinforcements. Generally speaking, either as a function of habituation (in a particular context) or as a function of habituation due to lack of correlation with reward, the decline in attention to a stimulus dimension has similar consequences in straightforward conditioning (with one positive stimulus) or in discrimination learning (with one positive and one negative stimulus), and the same theory, whatever its details, could apply in both cases (Mackintosh, 1983, p. 251).

Learned relevance and attentional sharpening

With learning to pay more rather than less attention to a particular set of stimuli, something remarkably different occurs when one negative (non-rewarded) stimulus is added to one positive (rewarded) signal, as Pavlov (1927, p. 117; see p. 265 above) first pointed out. This brings up the question of ‘what is the stimulus?’, the complete answer to which, it was said long ago (Stevens, 1951, p. 31), would solve all the problems that there are in psychology. The answer is therefore unlikely to be an easy one. But it is worth emphasizing that ‘the stimulus’ which in theory becomes more effective, more associable, or has more attention paid to it, is difficult to define, both for the subject of an experiment and for the theorist. If a light signals the arrival of a food pellet to a rat, is it the position of the light bulb, the overall change in light intensity, or the shadow cast, that is important? There is more than one answer, since different animals, and different experimenters, may quite justifiably come to different conclusions. Now, if a light on the left signals food, but a similar light on the right does not, the number of possible answers to Stevens’s famous question is reduced. Even more so if, for one of Pavlov’s dogs, a metronome which has only clicked at 100 beats a minute before food deliveries now beats half as fast and there is no food. Before this contrast was introduced, a dog, and therefore an experimenter, had no way of knowing whether the speed of click was important, whether a click was needed at all or if any noise from that direction would do. Once a single alternative stimulus has been introduced, it can then become clear to the receiving animal that the rate of clicking is a relevant part of the correct


signal. Given a big enough range of perceptual capacities in the receiving subject, then in a sense a stimulus dimension has to be defined by two separate stimulus instances, just as a line must be defined by two points. And in attentional theory, it is first a dimension, or rather dimensions, of the outer environment that must be specified.

The method of contrast between two stimuli can in practice sharpen attention to a particular stimulus dimension or quality, so that the exact rate of clicking of a metronome will have a pronounced effect on the exact number of drops of saliva secreted by the Pavlovian dog (see ‘intradimensional shifts, p. 267 above). It is possible also that differential reward and punishment may set up particular values on stimulus dimensions as being especially wanted (or unwanted). The animal may not so much be paying attention to clicks in general as waiting hopefully for clicks of 100 but not 50 beats per minute. The dog listening to his master’s voice is not so much sampling various phonetic dimensions as identifying a particular pattern of sound, which is very probably associated with characteristic patterns of smells and sights as well.

Information-processing strategies

A dog given food after tape recordings of its master’s voice could be said to be undergoing a simple conditioning procedure, but might very well encode the information received in an unnecessarily complex fashion, because of its prior experience (and innate dog-like predispositions): the use of a simple conditioning procedure does not guarantee only simple responses to it (Davey, 1983, 1986). However, other things being equal, the procedures of discrimination learning are such as to allow and to encourage the development of more complex processes of perception and learning than those which might on occasion suffice for the acquisition and extinction of a simpler conditioned response (Mackintosh, 1983, chapter 9). The procedure of repeating simultaneous, two choice discriminations, with an endless succession of pairs of objects, may, it has been argued here, result in the development of a strategy of taking detailed note of the results of the immediately preceding choice, in a way that does not occur


ab initio and which may never occur at all in sufficiently lowly species (see ‘Learning set, pp. 270-4). When a two-stimulus discrimination procedure is modified into a four- stimulus task (peck at the red triangle or the blue circle; but not at the blue circle or the red triangle) then new (and not yet answered) questions arise about the possibility of learning configurations, as the conjunction of specified levels of more than one dimension, and the alternative possibility of using individual levels of one stimulus dimensions to set global or particular scans in motion (if blue respond only to small circles; if red responding according to angularity). Thus various procedures under the heading of conditional discriminations can be expected to arouse internal reactions to environmental circumstances that may be conveniently described (although by no means satisfactorily explained) by reference to strategies of information-processing. This also applies to the evidence described below, in which the behavioural procedures may be as straightforward as is possible for discrimination learning, but the classes of stimuli chosen require that theories be couched in terms of pattern recognition, and perceptual complexity.

Perceptual complexity in animal learning

The natural environment of most species tested in the animal laboratory requires perceptual capacities vastly different from those engaged merely by tones of different pitch, or lights of different colours. For instance, recognition of individual conspecifics, for species such as the pigeon or monkey which do not take much interest in smells, is roughly as demanding a task for them as it is for us, although they may not necessarily accomplish it in precisely the same way. It should not therefore occasion much surprise or alarm if these species, when presented with discrimination tasks of the same order of difficulty as those which they are confronted with in the wild, display considerable expertise. On the contrary, this provides an opportunity for framing and testing hypotheses about discriminatory capacities which are more closely related to a species’ evolutionary and ecological opportunities than those of traditional learning theory. Visual pattern recognition, of


one sort or another, takes up most of this section, since it is both theoretically challenging and practically convenient. Many animals specialize in touch (especially via whiskers) or olfaction, as opposed to vision, and these modalities are not as extensively researched. However, it is possible to quote evidence from hearing to begin with, to demonstrate that perceptual complexity is not exclusive to the modality of vision.

Music discriminations by pigeons and speech perception by monkeys

A rather charming report by Porter and Neuringer (1984) suggests that pigeons’ responses to auditory events may be more complex than is usually assumed, on the grounds that several individual birds were demonstrated to respond differentially to any of Bach’s ‘Toccatas and Fugues in D minor and F’ on the one hand, and Stravinsky’s ‘Rite of Spring’ on the other. This is not a very powerful reason for assuming similarity between pigeon and human hearing, but the experiments performed suggest that auditory pattern recognition in the pigeon may go beyond coos and clicks. With many other birds apart from Columba livia, we should expect sophisticated hearing because of their own vocal productions; although pigeons’ own vocalizations are limited, they are distantly related to parrots, and have no noticeable degeneration of the auditory apparatus.

After some preliminary relevant experience, the birds used by Porter and Neuringer (1984) were tested in the following manner. In a conventional Skinner box, tape-recorded music was played continuously, either from the 20-minute selection of Bach organ music, or from Stravinsky’s ‘Rite of Spring’ for orchestra. These two alternatives alternated at random intervals, but on average once per minute. During Bach, pecks on the left key were occasionally rewarded by access to grain (VI-30 seconds), but right- key pecks were not; conversely, while the Stravinsky was playing, right keys occasionally paid off, but left-key pecks were wasted effort. Occasionally, novel pieces of music were inserted: pre-1750 works by Buxtehude, Scarlatti and Vivaldi for organ, harpsichord


and violin and orchestra respectively; and twentieth century pieces for organ and chamber group, plus Stravinsky’s ‘Firebird Suite’ for orchestra. The results were that performance on the standard Bach v. Stravinsky was reasonable but not perfect, about 70-75 per cent of responses being made on the correct key. It is impossible to say exactly what auditory cue or pattern of cue was responsible for the results. Overall loudness was controlled, but it is likely that harmonic differences between organ and orchestra were detected, since during Vivaldi excerpts all the birds made 80 per cent of their pecks on the right (Stravinsky) key. However, this cannot have been the only source of discrimination, since the pigeons did the same thing when they heard the modern piece for organ, Walter Piston’s ‘Chromatic Study on the Name of Bach’. It is certainly not necessary to conclude that pigeons have any notion of musical style; but it is equally unnecessary to assume that their auditory system can detect only pitch and intensity, and nothing about sound patterns. Others have shown that the chinchilla (a not especially vocal rodent) can discriminate human speech sounds; or to be more specific, that the stop consonants ‘t’ and ‘d’ can be successfully used as positive and negative stimuli in a discrimination task (Kuhl and Miller, 1975). However., the training required the animals to distinguish ‘t’ and ‘d’ sounds associated with three different vowels, and as produced by four different talkers, and the discrimination generalized to new talkers and other vowels. Further evidence indicated that the rodents, like humans, detected the difference between these voiced and unvoiced consonants on the basis of the timing of the onset of voicing, and did this also, at slightly different boundaries, for the other stop-consonant pairs of 'b’ versus ‘p’ and ‘g’ versus ‘k’. Rhesus monkeys have also been trained to discriminate ‘b’ from ‘p’ and ‘g’ from ‘k’, using synthesized speech stimuli that are convincing for people (Waters and Wilson, 1976) and, without training, strongly react to changes between ‘d’, ‘b’ and ‘g’ during an habituation series (Morse and Snowdon, 1975). The primary interest of these studies is in suggesting that experience of producing speech sounds oneself, though undoubtedly helpful, is by no means necessary for the accomplishment of basic phonetic auditory discriminations,


and that the evolution of any specialized human subsystem for speech perception must have been made very much easier by the fact that mammalian auditory pathways were already capable of prerequisite forms of categorization. None the less, these results are equally useful in emphasizing that few perceptual systems evolved under pressures to detect differences between pure tones, or between black and white cards. On the contrary, perceptual systems evolved while they performed complex but necessary discriminations in natural environments, such as immediately detecting the location of the sound of a cracking twig, which requires binaural comparisons (either for time of arrival, or- frequency spectrum differences; Harrison, 1978). Thus it may be necessary to use naturalistic stimuli in order to discover some of the essential characteristics of discrimination learning. Conversely, discrimination learning techniques may be useful in analysing the bases of natural perceptual abilities. It comes as no surprise, for instance, that monkeys can learn an auditory discrimination task which requires them to respond differentially according to the functional category of the tape-recorded cries of their own species, but specialized laboratory techniques are needed in order to assess whether they might have a right ear advantage in this task, as people do for human speech (Peterson et al., 1978), and more directly, whether left-hemisphere rather than right-hemisphere lesions cause greater reductions in the accuracy of performance (Heffner and Heffner, 1984).

Visual pattern recognition

The method of operation of biological visual systems presents several puzzles, since they still outperform artificial apparatus by a wide margin (Marr, 1982; Frisby, 1979; Ballard et al, 1984; Feldman, 1985). Visual discrimination learning by animals provides necessary evidence about the limits and the plasticity of recognition performance, and its relation to specific physiological mechanisms, and has added theoretical interest because it separates visual perception from language, and from any other uniquely human cognitive specializations.


Letter stimuli

Recognition of letters is an example of this — although there can be no human specialization for letter perception, as there might be for speech perception, human letter recognition must be more than a visual detection task, as it is bound up with the skills of reading and language, and the phonetic organization of speech. Both Morgan et al. (1976) and Blough (1984) have examined letter perception in pigeons, where neither of these complications need be considered.

Morgan et al.’s .procedure was unusual in that free- living pigeons responded to stimuli projected to a screen in a window of a laboratory, but apart from this the training followed a conventional GO/NO GO discrimination schedule. Positive stimuli alternated with negative stimuli, each projected for varying intervals averaging 30 seconds. At the end of positive stimuli only, rewards were obtained by pecking the stimulus screen. During training, positive stimuli could be any of 18 different ‘A’s, and negative stimuli any of 18 different ‘2’s, differing because they were made up from different typefaces. When birds were clearly responding very much more to positive than to negative stimuli, new ‘A’s and ‘2’s from 22 further typefaces were introduced, each presented only once. All birds continued to respond vigorously to ‘A’s but not to ‘2’s with little difficulty despite considerable variation in the appearance of the new ‘A’s (see Figure 8.5).

This suggests that something about the pattern of an ‘A’ was being distinguished from something about the pattern of a ‘2’, but does not provide much information about how this discrimination might have been accomplished. Further evidence was sought by presenting partial and rotated forms of ‘A’s and ‘2’s. An inverted ‘A’, an upright ‘A’ with no cross bar and an upright triangle all elicited a high response rate, but on the other hand ‘A’s lying on their sides and an inverted triangle were hardly responded to at all. After this two birds were tested with all the letters of the alphabet in the same unelaborate typeface (Helvetica Medium). The order of preference was R,H,X,K,W,N,M,B,U,Y,T,F,D,V,O,P,Q,E,S, C,G,I,J,L,Z, with the first nine letters (R — U) eliciting appreciable responding and the last nine (Q — Z) virtually


none at all. As the authors of this report point out, there does not appear to have been a single feature of the stimuli which determined response. However, the features of ‘legs’ and ‘apex’ in positive stimuli, and ‘curvature’ and ‘flat bottom’ in negative displays, would go a long way towards accounting for the results of transfer tests, apart from the relatively high level of response to ‘B’ and ‘U’. Therefore, they argue that the basis of the pattern recognition performance is not a single critical feature, but rather a ‘polymorphous concept’ utilizing several features, which may occur in several combinations.


gif Figure 8.5 Discrimination of ‘A ‘s in 22 typefaces.

Rates of response for 3 pigeons to slides of the letter ‘A’ in 22 novel typefaces. All had previously been trained to respond to ‘A’s, but not to ‘2’s, in 18 other typefaces. The scores for the 3 birds are plotted as circles, squares and triangles, with the stimuli ordered according to the responses plotted as circles. After Morgan et al. (1976).


Blough (1982, 1985) used a choice procedure to train pigeons to in turn discriminate each letter of the alphabet from all other letters. The target letter, suppose ‘E’, could appear on any one of three keys, and another distractor letter,


say ‘0’ would be in the other two positions, the distractor letter varying from trial to trial. A limited amount of time was given with each of the 26 letters as a target, so that error data was used from a stage when easy pairs (e.g. E and 0) were usually distinguished correctly, but difficult pairs (e.g. U and V) often confused. Computer programs were then used to analyse the error data. These confirmed Morgan et al.’s (1976) suggestion that a number of different features are used by the pigeon for letter discriminations. Thus straight letters, such as I, T and L, were often confused with each other, and there were also clusters of confusions around M,N and W; A,R,P and B; C,G and 5; and D,0 and Q, which seem likely to be related to the features of oblique angularity, small enclosed regions, curved openness to the right, and large loops (Blough, 1984, 1985).

Human reaction time data also give this sort of picture of letter similarity (Podgorny and Garner, 1979). It is arguable that only extremely peculiar algorithms for letter recognition would not, but Anderson and Mozer (1981) have in fact proposed such a system for letter recognition, based on the counting of squares touched by standard letters on a standard-sized grid, and even this produces confusions between F and P, G and 5, and N and W; but it also produced clumps such as G,0 and S with V, and Q with Y, which can less obviously be derived from a feature analysis system (Neisser, 1967) for letter recognition.

Anderson and Mozer (1981) argue against the existence of specific feature analysers in the nervous system, and for a rather more diffuse and global method of categorization by inter-connected matrices of neurons. The data certainly supports the view that, if there are features analysed, then categorization such as that needed for letter recognition makes use of multiple features, in many combinations. An even more direct experimental demonstration of this has been provided by Gaffan (l977a) who trained rhesus monkeys on a form of GO/NO GO discrimination with the visual ‘wordlike’ displays of RIM, LID, RAD and LAM, as alternative positive stimuli, and RID, LIM, RAM and LAD, as alternative negative stimuli. The monkeys learned to respond to all four positive stimuli (being then rewarded by sugar pellets) and


not to respond to any of the negative displays, with a high degree of accuracy (over 90 per cent correct trials). The point is that it would be impossible for them to do this by adding up positive and negative weights for any individual letter, or pair of letters, since all letters and pairs of letters were equally often positive and negative, as may be checked by inspecting the lists above. The only way to solve the problem is by learning to recognize each individual combination of letters

— in Gaffan’s terms, by using ‘visual configurational cues arising from the interaction of the stimulus elements’ (p.594) unless, as seems most unlikely, there is some mysterious visual property which is common. to all of RIM, LID, RAD and LAM, and none of RID, LIM, RAM and LAD. It is thus fair to assume that the monkeys learned a number of different complex patterns, rather than one or two key features.

Picture stimuli

An appeal to the learning of complex patterns is hardly very satisfactory as an explanation for anything, and becomes even less useful when the patterns become large in number and variable in structure. A large amount of data from problems which answer this description has been reviewed by Herrnstein (1984, 1985) and Cerella (1982). Herrnstein and Loveland (1964) trained pigeons to distinguish coloured slides containing people from an otherwise similar set not containing people. With a conventional GO/NO GO successive discrimination procedure, 80 slides were presented to the birds each day, one at a time, for roughly a minute each, with a random order of those containing people and those not. Food rewards could be obtained by pecking a key only when slides containing people were being shown, and then only on a variable interval schedule (VI-l minute). After several weeks of this training (by no means exceptionally long for a visual discrimination), the birds had a high response rate for most of the positive slides and a much lower response rate, in some cases zero, for the negative slides. It should be noted that this is not perfect or 100 per cent correct performance, but it is certainly sufficiently accurate to justify a claim for a categorization process depending on the presence of people; the phrase


often used, but rarely clearly defined, is that ‘pigeons have a concept of’ people. This result appears to be replicable, and not artifactual (e.g. Mallott and Sidall, 1972) but has as yet no agreed theoretical interpretation (Herrnstein, 1985). It is quite clear, however, that the pigeon’s capacity for classification of large numbers of slides is not confined to detection of the human form. Herrnstein et al. (1976) used several hundred slides in each case to demonstrate classification on the basis of tree versus non-tree scenes, and similarly bodies of water versus none. They also showed learning of slides containing one particular person as against similar scenes containing other persons. There is a temptation to assume that the pigeon’s recognition of water and trees is innate, but clearly no pattern recognitions schema or template for individual persons could be innate in the pigeon. There is also the question of how far previous experience before the experiment had established the discriminatory capacities then demonstrated. Herrnstein and de Villiers (1980) conclusively eliminated both the possibility of innate conceptual categories and the influence of previous individual experience by using a set of slides taken by a scuba diver, only half of which showed the presence of a fish. Thus although the perceptual processes involved in categorization and visual recognition may be innate (and probably are; Weiskrantz, 1985), the content of individual categories cannot possibly be, and therefore must be learned or constructed on the spot.

Cerella (1979, 1982) has performed similar experiments with pigeons, using line drawings of cubes, drawings of Charlie Brown and other characters from the ‘Peanuts’ comic strip, and silhouette outlines of oak and other trees’ leaves. The birds are good at oak leaves, reasonable at whole or partial or scrambled sections of Charlie Brown, but no good at recognizing line drawings of cubes depicting the cube at a substantially different orientation from the one they had already learned. Cerella (1982) concludes that fairly local features of two-dimensional patterns are responsible for the categorical discriminations. This may be too limited a theory, but one of the results which led to it should be noted. With slightly more stringent conditions than in the other cases (requiring the birds to peck directly at the positive stimulus),


it was possible for pigeons to make general classifications of any oak leaf outline as opposed to the outline of leaves from other trees, after having had only the experience of being rewarded for pecks at a single oak leaf, with no negative instances, that is without the benefit of the method of contrasts (see p. 265 above). On the other hand, under similar conditions birds failed to discriminate between one particular oak leaf and outlines of other leaves from the same species. This implies that the categorization process is not necessarily inductive, that is, it does not have to be cumulatively developed on the basis of numerous instances, providing that there is a strongly salient perceptual feature — in this case the lobulation of the oak-leaf outline.

As Herrnstein (1984) stresses, the results obtained in his laboratory are sufficient to contradict frequently expressed views of the type, ‘the human visual system is the only effective pattern classification system known’ (Howard and Campion, 1978, p. 32). But does this imply that we need to attribute to pigeons a large dose of high-level intellectual abstraction? Not necessarily. Herrnstein (1984) appeals to a rather powerful- sounding process of categorization. But Greene (1983) and Vaughan and Greene (1984) have provided evidence for a model which is weak on abstraction and very strong on visual memory, in order to account for the classificatory abilities of pigeons. The experimental result on which this model rests is that, after very extensive training, birds can perform adequately with up to 160 pairs of slides in the same procedure as the categorization experiments, but where there is no known category of pattern which links the set of positive or the set of negative slides (Vaughan and Greene, 1984). It is thus necessary to assume that pigeons can make use of a large visual recognition memory for individual slides, or individual features. Now, this would not in itself explain the transfer of classificatory performance when one set of already learned displays is replaced by a large new set (Cerella, 1982; Herrnstein, 1985). But there is a limit to the degree of physical difference between the already learned set in these cases and the transfer set. It is therefore not implausible to propose that what might be attributed to ‘abstraction of a concept’ is accomplished by a brute force


mechanism of exact memorization ‘coupled with generalization along certain physical dimensions’ (Vaughan and Greene, 1984).

While this is a very useful theoretical model, it is less explanatory than it appears to be, since a large role is given to ‘generalization’ which in a sense is what must itself be explained. It is likely, however, that visual memory is an extremely important component of visual classificatory performance, even if, as Lea and Ryan (1983) emphasize, there is also good evidence for abstraction in the sense of feature analysis, when known visual features of letters (in quite different typefaces) can be isolated from experimental data.

For comparison with human visual pattern recognition performance it is undoubtedly more sensible to select data from primates rather than pigeons (even though birds have a superior ratio of visual ability to maintenance cost). There is strong evidence that when great apes (chimps and orang-utans) look at pictures, they recognize familiar objects in a way which allows transfer to touch and manipulation, and that this is more humanlike than rote memory of individual pictures (Hayes and Hayes, 1953; Davenport et al., 1973, 1975). However, there is relatively little data showing classificatory ability for large numbers of pictures in primates, by comparison with pigeons. Schrier et al. (1984) have now reported on the performance of stump-tailed macaque monkeys in procedures closely analogous to those used by Herrnstein and others with pigeons. Categories classified were humans present or absent in slides, monkeys present or absent, and the letter A versus the figure 2 in many typefaces. It is, as always, difficult to make exact comparisons of the performance of the different species in this case. Schrier et al. summarize their data by saying that the level of transfer of classificatory skill to new sets of slides of ‘humans’ and ‘monkeys’ was lower in their monkeys than had been reported in pigeons, but seemed to have used a higher criteria for their animals in that no response at all was allowed on negative trials. The transfer tests make it quite clear, though, that the monkeys had come to depend to a large extent on the learning of individual slides during initial training, which would


support the application of Greene’s (1983) rote-memory theory to monkeys. The monkeys appeared to have much greater transfer to the classification of ‘A’s versus ‘2’s to new typefaces than they did with the more naturalistic categories, and for ‘A’s versus ‘2’s Schrier et al. (1984) suppose that the performance of their monkeys was at least as good as that reported for pigeons. On the basis of Schrier et al.’s data, all that can be properly concluded about cross-species comparisons between pigeons and macaque monkeys is that there is as yet little evidence for substantial differences in performance in two-way classifications of large numbers of visual displays. For both species, however, theories of classificatory performance increasingly involve loose reference to visual memory. Further discussion of cognitive organization in animal memory will be found in the next chapter. However, it is appropriate here to mention the general theoretical contrast between accounts of pattern recognition based on template-matching and those which rely instead on feature analysis (e.g. Neisser, 1967), since the issues involved are very similar to those arising in the case of class-concept learning by some sort of abstraction of common features on the one hand or by rote-learning of all class members on the other. Yet another instance of a similar contrast is that between ‘viewer- centred’ and ‘object-centred’ internal descriptions in the computational theory of visual perception (e.g. Marr, 1982). In all cases it is arguable that templates, rote-learning and viewer- centred descriptions are inadequate by themselves, since we (or an animal) would need far too many of them to do any good, and in any case they could never help with novel examples of a known concept, mental rotations and other kinds of creative imagination, or simply with unusual or occluded views of a familiar object. Without denying the sense of these arguments, it is worth pointing out that the empirical evidence from discrimination learning experiments suggest that the rote-learning of large numbers of individual examples seems to be a common perceptual strategy, both in pigeons (Vaughan and Greene, 1984) and in primates (e.g. Gaffan, 1977b, p. 509; Schrier et al., 1984). Recent physiological evidence obtained from primates points to the usefulness of relatively non- abstract, concrete representations of the visual


scene. In the studies of Perrett et al. (1985) macaque monkeys were first extensively trained on a visual discrimination task. Real objects or photographs were presented behind a large aperture shutter. If either a monkey or a human face or head, at any angle or position, was seen, the experimental monkeys could lick a tube to obtain sweetened water. If any other class of object was seen (food, parts of the body, junk objects including a football, a fur coat and so on) then the monkeys were trained not to lick. Hence this is a form of the usual GO/NO GO discrimination task. When the monkeys were well-trained, recordings were made via brain electrodes from individual cells in the temporal lobe (in the anterior superior temporal sulcus) . This is a long way from the main visual reception area, but has been known for some time to be an important site for visual categorization (Weiskrantz, 1974; Rolls et al., 1977). Many cells were found which fired when faces or heads were shown, but did not fire or fired much less for other objects. There are many important aspects of the results obtained, since visual transformations that make it difficult for people to recognize faces (e.g. use of photographic negatives) reduced the response of face-sensitive cells, and it is thus likely that these cells correspond to a late stage in human face perception. However, for present purposes it is especially interesting that authors of these reports (Perrett et al., 1985) stress the large proportion of cells sensitive to particular views of the head (e.g. either full-face or profile, viewed from above or below) and argue that viewer-centred descriptions may be valuable even at high levels of visual analysis. Faces may be something of a special case, since it is not clear that any object-centred description derived from a small number of canonical views would be able to generate adequate information about other views — it would be very difficult to predict someone’s profile from knowledge of their full face, and much simpler to recognize profile and full face separately in the first instance. At any rate, this is what seems to happen in the brain of the monkey, on the evidence to hand.

It is a useful general point, that may apply to many other kinds of object recognition, that category identity may be established by pooling a number of very different descriptions.


This is what is implied by the term ‘polymorphous concept’ — it has often been observed that a lower-case and an uppercase ‘A’ have little in common, and that many easy-to-use words, such as ‘chair’, ‘dog’, or ‘game’, apply to an extremely wide variety of instances (Lea, l984b). Thus what end up as seemingly natural and homogeneous concepts may be held together by initially quite arbitrary associations — flame and heat going together only in so far as we have experienced their conjunction. Associationists such as Hume and Hull may have overstressed this possibility, but it remains true that very general learning abilities, which allow for the stringing together of initially unlike elements, may be exceedingly useful in nature, and are certainly demonstrated by pigeons and monkeys learning to identify certain letters with food in laboratory experiments. Generalization across typefaces is something different, and amounts to association by similarity as opposed to association by contiguity. But in discrimination learning at least, there may be no clear-cut point where arbitrary associations based on contiguity end and extensions and generalization based on similarity begin, and .therefore the notions of template versus feature abstraction, or rote memory versus concept learning, may turn out to be complementary in practice, even though contradictory in theory.

Discrimination, attention and perception — conclusions

The main theoretical conclusion to be derived from the study of animal discrimination learning is that processes which can be referred to in terms of selective attention, while not necessarily absent from any other kind of learning, are made much more obvious by specialized training techniques. Several experimental results all point to differential sensitivity to physically constant environmental cues caused by prior experience, which has to be interpreted as a switching in or out of internal perceptual processes, or at the very least as a stage of learning which is separate from response selection, and which results in changes to the effectiveness or associability of external stimuli. Most of these results are obtained by transfer experiments of one kind or another — training with one stimulus value shows up in generalization gradients along


many other values; there is transfer from easy to difficult cases or from one position to another on the same stimulus dimension; and attention to a stimulus dimension seems to transfer from one response requirement to another. But much more subtle and rapid variation in attentional processes can be inferred from performance on more complex discrimination tasks: both using conditional combinations of relatively straightforward stimulus dimensions such as colour or line orientation, and when the patterning of stimuli to be distinguished is sufficiently elaborate to require theories proposing that in one form or another a special set of stimulus dimensions is processed simultaneously.




End of Chaper 8 | Start of Chapter 9 | Contents