The Science of Everything Podcast - Episode 29: Operant Conditioning

Starting point is 00:00:33 You're listening to The Science of Everything podcast, episode 29, operant conditioning, and I'm your host, James Fodor. This is a follow-up to episode 28 on classical conditioning. In this episode, I'm going to now cover operant conditioning, which is a different type of learning. And I'll also at the end talk a little bit about observational learning, which is an even more recent discovery. But before we get to that, we'll start with operant conditioning, which is also known as instrumental conditioning. This is a more recent discovery. It's generally associated with the behaviorist or behavioralist school of psychology, for example, BF Skinner in particular, who did a lot of work on this in the early to mid-20th century. Operant conditioning is a form of learning, during which an individual

Starting point is 00:01:19 modifies its own behavior due to association of the behavior with some stimulus or some consequence. So what I want to do first is clearly distinguish the difference between operant conditioning and classical conditioning, because this can be a source of confusion. Remember, classical conditioning requires the existence of some kind of innate reflex. It basically causes an initially neutral stimulus to give rise, to be conditioned so that it gives rise to some innate reflexive response. Uprime conditioning has nothing to do with reflexes. It's about conscious or what we might call voluntary behavior, although the behavior of school wouldn't actually like using those words, but that's how we describe it.

Starting point is 00:01:58 It's not reflex, it's voluntary behavior. So anything that involves a reflex or involuntary action is going to be classical conditioning. Anything that involves some kind of decision is going to be operant conditioning, or for the most part. Another difference is that operant conditioning basically involves reinforces and punishments. So operant conditioning is consequential. It's about what happens after the act or the response occurs. Classical conditioning is the opposite.

Starting point is 00:02:21 It's pretty much all about what happens before. There's some stimulus which then gives rise for the response. It's about the stimulus being a predictor for another stimulus, then gives rise to a response. Operant conditioning has nothing to do with stimulus as predicting. It's all about the consequences. I want to do this, or I'm going to do this, because I will get a good consequence, or in order to avoid this bad consequence. So classical conditioning is sort of backward-looking at stimuli being predictors. Operant conditioning is consequential looking at what's going to happen after the action

Starting point is 00:02:50 and anticipating that. Okay, so I said that operant conditioning was about reinforcers and punishments. Now I need to talk about what those are. A reinforcer is defined as some event, or stimulus that makes a following behavior or a succeeding behavior more likely to occur. And the behavior that is being reinforced is called the target response. A punishment is basically the opposite of a reinforcer. It's some event or stimulus that makes a succeeding behavior less likely to occur or less likely to reoccur. So basically, a reinforcer is something that the organism likes and therefore they want it again. A punishment is something that the organism doesn't like and therefore they don't want it again.

Starting point is 00:03:28 Particularly with reinforcers, although I guess you could have to have a person. this with punishments as well, you can have primary or secondary reinforcers. A primary reinforcer is basically some stimulus or event that does not require pairing with any other reinforcer in order to act as a reinforcer. So basically they're sort of primal needs or things that have a natural appeal to the organs. And basically that they've been evolutionary selected to be attracted to. They're basically the standard things like sex, water, food, sleep, maybe shelter, stuff like that. Secondary reinforcers, most important for humans, but can apply to animals as well. There's some situational stimulus that's been paired with a primary

Starting point is 00:04:05 reinforcer or another secondary reinforcer and therefore gains its sort of function as a reinforcer as a result of being a means to that primary reinforcer or indeed as another secondary reinforcer. So money would be a good example of that. Or status or something. Well, I guess you maybe even could call status a primary reinforcer because humans are quite social. But there are some things where it's a little bit if you're whether it's a primary or secondary reinforcer. So don't worry about that distinction too much, but just the basic concept of some things are more basic than others, and the ones that are less basic, that's sort of the more instrumental to another reinforcer or refer to as secondary reinforces, money being a good example.

Starting point is 00:04:41 Okay, but more important than that primary, secondary distinction is the positive and negative distinction. Okay, so remember, we've got reinforcement, that's something good, makes the behavior more likely, and punishment makes the behavior less likely. Either way, by the way, whether you use punishment or reinforcement, the behavior that we're talking about is called the target response. Don't get confused there because the target response could be something that we want to happen or could be something that we don't want to happen, depending on whether we're using punishment or reinforcement.

Starting point is 00:05:06 So target doesn't mean we want it, it just means that we're sort of targeting on it, we're interested in it. It could be to minimize it or increase it. There are two types of reinforcement, positive and negative. There are two types of punishment, positive and negative punishment. So these words are a bit confusing, so we're going to go through them. Basically, in this context, positive means that you introduce it. So if it's positive reinforcement, it means you,

Starting point is 00:05:26 you give them something good, like it's a reward, it's money, it's food, or something like that. So negative means taking something away. So negative reinforcement would be taking away an adverse stimulus. So reinforcement, the second word, implies that it's a good thing, but negative means it's being taken away. So that doesn't mean we're taking away a good thing because it's still a reinforcement, it's still a good thing. It means the way we're getting the good thing is by taking away. So clearly, if it's taking away, but it's still a good thing, it must be taking away a bad thing. So that's what negative reinforcement is. It's taking away something the organism doesn't like. So it could be removal of pain or removal of loud noise. That would be a

Starting point is 00:06:03 reinforcer. So then there are also positive and negative punishments. Remember, punishments, just something that organism doesn't like makes the behavior less likely to occur. Positive punishment means that you, positive introduce punishment, bad thing. So you introduce a bad thing. That could be introducing the loud noise or the electric shock or whatever. Negative punishment is taking away, negative. But punishment means bad things. thing. So you're taking away something such that the organism doesn't like it. So that obviously means you're taking away something that's good. So this could be, for example, confiscating a child's toy following undesired behavior or something like that. So this is really confusing. A positive

Starting point is 00:06:38 punishment doesn't mean it's a good thing and negative reinforcement doesn't mean it's a bad thing. Positive and negative just indicates whether something's being added or taken away. Probably the best way of remembering it is that remember that the second word is sort of more fundamental. That determines whether it's good or bad, whether it's a reinforcement or whether it's a punishment. The first word, the adjective, the positive or negative, doesn't determine whether it's good or bad. The positive or negative just determines whether something's being added or taken away to achieve that purpose. So I emphasize this because it's the terms that are used in discussing operant conditioning and they're often confused, so it's important to get them clear.

Starting point is 00:07:12 Okay, so that's the idea of reinforces and punishes. Basically, operant conditioning is all about getting organisms or animals, humans, whatever, to do certain behaviors and not to, to do other behaviours by providing reinforces and punishes of various sorts in order to elicit the desired behaviour. I need to talk about the concept of a skinner box quickly, because this is sort of how the concept of operant conditioning was originally demonstrated and many of the theories were developed. A skinner box is also called an operant conditioning chamber. It's just a laboratory apparatus used in experimental analyses of animal behaviour. It's usually some sort of sealed or relatively sealed container, which allows manipulation of the environments.

Starting point is 00:07:53 Usually it'll permit provision of punishments or reinforces and some action that you want the animal to perform. It could be turning a lever or pushing a button or something like that. They could be more or less complicated. They might have lights on them to provide various signals or to act as stimuli or sounds or whatever. Rats and mice are common subjects of choice. but you could do this basic thing with any types of animals. Having discussed the basic concepts, I now want to talk about some of the more specific aspects of operant conditioning including shaping and backward chaining

Starting point is 00:08:30 and also avoidance conditioning and the different schedules of reinforcements, which is some very interesting applications. So first of all, we'll talk about shaping. Shaping is a method of successive approximations. Shaping, well actually shaping and backward chain go together because they're both mechanisms of how you actually teach something. Because remember, operant conditioning just provides a consequence, good or bad, that tries to shape behaviour. The trouble is if you're working with animals, arguably even some humans, but definitely when you're working with animals, you can't tell them what you want to do.

Starting point is 00:09:00 You can't say, we'll run around this object three times, then walk over to this side of the cage, sniff here once, and then go over and press this red button, and then you'll get your reward. That's too complicated. Nor can you really wait until the animal spontaneously does that entire thing, because it's not really going to happen. What you need to do is gradually a problem. what you want the animal to achieve. So there are two, do it bit by bit in a sense,

Starting point is 00:09:24 and there are two aspects of that. They're shaping and backward chaining. So shaping is a method of successive approximations whereby you get the animal to achieve the behavior that you want by rewarding or reinforcing successive approximations of what you want. So for example, it may be that you want the animal, say it's a mouse, you want the mouse to press a button or pull a lever with their particular pore. So first of all, you just reinforce their rat looking in the direction of the lever. Then you reinforce them walking towards the lever.

Starting point is 00:09:56 Then you stop reinforcing them walking towards the lever, and now they have to actually touch the lever to being reinforced. Once you get that to be conditioned, you require them to touch it with the right paw. And then finally, you stop reinforcing that and only reinforce if they pull down the lever. So basically, you take it in small, tiny steps such that there's a reasonably good chance

Starting point is 00:10:13 that the animal will do the next step spontaneously, you know, if you re-in-f, if there's only four sides of a box and you reinforce the, the mouse looking in the direction of one of those, and there's a pretty good chance it's going to just spontaneously look at one of those. As soon as it does that, you reinforce. And then you keep doing that until it's learned that you have to look at that side of the box to be reinforced. Then once again, once it knows a direction to look at, there's a pretty good chance that eventually it's going to walk in that direction. So once it does you reinforce that, and so on. Each little step is sufficiently close to the previous one that it, that the behavior that you want will happen by chance across that sort of small gap, and you reinforce that.

Starting point is 00:10:50 And so you gradually shape by approximating the final behavior that you want. Now, backward chaining is sort of a... It's kind of similar to shaping, but it's the next step up. It's used for training or teaching complicated steps of movements, or complex sequences of behaviors. Basically, what happens is that you start with the final behavior that you want. So say it's a mouse pulling a lever. But you want it to do some things,

Starting point is 00:11:15 before that. You want it to climb a ladder to get to the lever, but before that you want it to jump in some water, but before that you want it to find its way around a maze. What you have to do, the backward chaining method, is first of all you condition it, perhaps using shaping to pull the lever, then what you do, so that, that behavior, the target behavior, is pulling the lever. And the rat knows, I call it a mouse, but called it a rat, the rat knows that pulling the lever, that behavior, the target behavior, is associated with a reward, because it's been reinforced for that. So now what you can do is use that opportunity to pull the lever, that target behavior, as a reinforcer in itself for some other action. So, for example, if the rat

Starting point is 00:11:57 knows that pulling the lever will yield reinforcement, food is often used, then it will be willing, in a sense to climb a ladder in order to have the opportunity to pull the lever, because then that will get food. So having the opportunity to pull the lever acts as a reinforcement in itself. This would be an example of a secondary reinforcer because the pulling the lever itself is not what the rat wants. The rat wants the food that it gets from pulling a lever, but it knows it's been conditioned, instrumental conditioning, to know that pulling a lever will get the food. So it now can, you can use pulling a lever, the opportunity to do that, as a secondary reinforcer.

Starting point is 00:12:32 And so say now that you've conditioned it using shaping to climb the ladder, because maybe it only first touched the ladder and then climbed a bit up and then halfway up and you had to condition it, shape it to climb all the way up the ladder. Now you've got it climbing up the ladder and pulling the lever, but now you want it to jump in the water first. So once again, you use the opportunity to climb the ladder as a reinforcer for the target behavior, which is now jumping in the water.

Starting point is 00:12:55 And once again, you can get the jumping in the water to occur through shaping. And so you build backwards like this, using each successive behavior that the animal has learnt as a reinforcer for the one that comes before that. And so basically, you can teach the rat to do the entire process because it'll run through the maze because it knows that once it does that, it will have the opportunity to jump in the water,

Starting point is 00:13:15 which it wants to do because it will have the, once it does that, it gets the opportunity to climb the ladder, and once it climbs the ladder, it has the opportunity to pull the lever, which then it wants to do because it can get the primary reinforcement, which is the food. This is the only reliable way that's been shown of training animals to do arbitrarily complex procedures or behaviors. And whenever you see animals in films or in the circus or wherever playing with balls or really doing anything that's not a natural behavior, any complicated behavior like that, they've been trained to do it through operant conditioning, namely backward chaining and shaping.

Starting point is 00:13:45 So that's why it's important to have an understanding how these things work, because they're actually used quite a lot. Now I want to talk about avoidance conditioning, which is very important and at the same time quite disturbing aspect of learning or of operant conditioning. So before we discuss that, I need to talk about escape conditioning. Escape conditioning occurs when an animal learns to perform some action to terminate an aversive stimulus. It's basically like a get-me-eat. out of here, shut this off reaction. You know, if you have, if the animal is shocked or experiences a loud noise that it doesn't like, it is, um, it learns to avoid that by running away or whatever. That's important because you can convert escape conditioning into an alternative

Starting point is 00:14:24 type of condition called avoidance conditioning by providing a signal before the adverse stimulus begins. So for example, suppose the adverse stimulus is an electric shock and you provide a noise, not a really loud one, just a noise that the animal would notice before the electric shock. The animal now has a signal that's a reliable indicator of the electric shock. By the way, that could actually lead to classical conditioning, if those are paired enough times, but it could also lead to up-brant conditioning because what the animal might start to do is run off or move into a different spot where it won't experience the electric shock, even before the electric shock occurs merely upon hearing the signal that indicates that it will

Starting point is 00:14:59 occur. Now, the sort of the good and bad thing, or the interesting thing about avoidance conditioning is that the target response is provides its own reinforcement. Basically, the target response is the same as its reinforcement. So in the case of a, in the case of an animal avoiding an electric shock, the target response would be moving away from the electric shock or moving to a different location in the cage, say. But the reinforcer is also moving away to another spot in the cage. Or more specifically, the reinforcer is not experiencing the electric shock. But that, that will occur if the animal moves into that other part of the cage. The key thing is that the animal will experience that relief

Starting point is 00:15:39 or will be reinforced regardless of whether or not the aversive stimulus actually occurs or actually would have occurred. So for example, if you pair the noise, the signal with then the electric shock for a while and the animal learns how to escape the shock by moving away as soon as it hears a signal, then you can set off a signal

Starting point is 00:15:58 and not even bother with the shock anymore and the animal will keep running away from the signal. signal because the reinforcement is avoiding that aversive stimulus, but the reinforcement occurs regardless of whether or not the target action actually contributed to avoiding the aversive stimulus. So avoidance, avoidance conditioning or avoidance behaviors are self-reinforcing and therefore are incredibly persistent. It's very hard for them to eradicated. They've done experiments whereby basically animals will continue to respond hundreds and hundreds of times, even after the shock generator or whatever the aversive stimulates is never used again.

Starting point is 00:16:40 They just continue to respond hundreds of times in order to avoid essentially an aversive stimulus, which is no longer even present. So this is a very important application of operant conditioning or learning more generally, because this is common in humans too. Think about how many things that you or someone you know perhaps don't do, as a result of avoidance conditioning. You don't do a certain behavior because last time you did that, something bad happened, and therefore you're avoiding that situation,

Starting point is 00:17:07 whether it be a person or an activity or a place or whatever, as a result of avoidance conditioning. Now, think to yourself how likely it is that whatever it was that actually caused your initial negative experience is actually still going to occur now. For example, if it was a person you haven't seen in 10 years, how do you know that they're still going to be unpleasant to you, for example? Knowing avoidance conditioning and knowing how that works, how you can have a behaviour

Starting point is 00:17:31 that basically is self-reinforcing can, in a sense, empower you, I think, into changing your behaviour or at least understanding it better. Now I want to talk about schedules of reinforcement, because this is a very important aspect of upright conditioning, just as the timing and presentation and intensity of the condition stimulus and on-conditioned stimulus in classical conditioning determined the extent of the conditioning and the how easy it was to do. Similarly, the timing and rate of reinforcement determines the success and intensity of operant conditioning and the different basically ways or time schedules of presenting the reinforcements is known as the schedules of reinforcement. So it's sort of like

Starting point is 00:18:16 the rule or the program you're following to determine when you reinforce. Now continuous reinforcement just means you reinforce every occurrence of the desired response. Fixed ratio reinforcement means that you deliver reinforcement after every nth response. So, for example, you could deliver reinforcement every 10th response, after every second response, or after every 100th response, or whatever. Variable ratio reinforcement means that you deliver a... You reinforce after a fixed number of trials,

Starting point is 00:18:42 or after a fixed number of responses, but the actual number varies. It's random, random with some average, so there's some distribution. So that's when you have a ratio. The reinforcement is determined by the number of responses that have occurred, which could be a fixed ratio, or a variable ratio. The other main one is the interval method where you have a fixed amount of time

Starting point is 00:19:01 that occurs and you're only reinforced, for example, after the first successful response that occurs two minutes, or at least two minutes after the last response, or at least 30 seconds after the last response. Variable interval, of course, you have an interval, but the interval changes. Fixed or variable ratio and fixed or variable interval interval are very different modes of reinforcement, and they produce quite different behaviors. But before I talk about that, I'll just note that Extinction similar to in classical conditioning refers to a period during which the target response is never reinforced. Similarly, if you stop pairing the unconditioned and conditioned stimulus, the classical conditioning will become extinct or be extinguished. Extinction will occur in opera unconditioning if you continue if the target response is never reinforced.

Starting point is 00:19:43 That's why, to backtrack for a second, avoidance conditioning is so hard to extinguish because the target response is always reinforced, regardless of what the initial aversive stimulus is actually doing. Even if it's gone, avoiding will still be reinforced by not experiencing the aversive stimulus. Now, as I was saying, ratio reinforcements and interval reinforcements and also continuous reinforcements are the three sort of main categories of schedules, and they result in very different types of conditioning, very different behaviours. Ratio-based schemes both evoke much more rapid responses than interval-based

Starting point is 00:20:21 schedules, which makes sense because if you get reinforced for every, depending on how many responses you make, it's in your interest to make as many responses as possible, as quickly as possible. Whereas if it's a time-based interval, so a variable, excuse me, interval-based schedules, they evoke much fewer responses because it's not about how many responses that you make, it's about how much time is passed since the last one. Fixed interval schedules evoke the fewest responses of all because say you've got a two-minute interval between reinforcements, there's no point at all making any responses until you're right at the end of each interval, because none of them will be reinforced until the time intervals ticked her over again.

Starting point is 00:21:02 And so that's indeed what we find in animals. So this doesn't require intelligent behavior. This is just all about learning. You find that you get very few responses during an interval, and then a very rapid peak of responses, just as the interval is approximately about to end, when you have a fixed interval. When you have a variable interval schedule, you get more constant responses because the animal doesn't know exactly when the next interval will tick over, but you still have less activity because it's still less likely that a, basically because

Starting point is 00:21:33 the number of the amount of reinforcement you get doesn't depend on how many responses you make. It's still time-based, so you tend to get less response there. Ratio-based schedules, as I said, elicit many more responses, the best ones are, in terms of a number of responses they listed, are variable ratio schedules, because fixed ratio schedules provoke or produce rest periods after each reinforcement. And rest periods become longer and more likely as the number of target responses that have to be made before a reinforcement increases. So, for example, if you're only reinforced after every hundredth push-up or whatever, whatever you're doing, or every hundredth push of the button. Likely what you'll do is you'll make 100 and then you'll get the reinforcement and then

Starting point is 00:22:19 you'll sort of slack off a bit or wait a bit. All right. It's going to be a while until I get my next reinforcement so I'm just going to have a bit of rest. That seems to be what's going on in fixed ratio schedules. In variable ratio schedules though, it's basically random whenever you get reinforcement. You could get reinforcement immediately two trials in a row, three trials in a row or it could go a thousand trials with that in reinforcement. It's random. It has nothing to do with time. Yeah, but the key thing is it has nothing to do with time. So still, the more responses you make, the more like the more reinforcements you'll get, but you don't know exactly when you'll get them. So they tend to provoke the most responses, basically

Starting point is 00:22:54 because you get that motivation to produce many responses, but you don't get the rest period, because even after making 100 trials, it still could be the next response could generate reinforcement, whereas a fixed ratio, that won't happen. Continuous reinforcement also elicits very strong response, obviously, because every, every trial is reinforced. The trouble with continuous reinforcement, though, is that extinction occurs very quickly. Because if every target response is reinforced and then, you know, they stop being reinforced, after only a few trials, the animal will figure out this isn't working anymore, and they'll stop responding. The other thing is that the, whatever the reinforcement is,

Starting point is 00:23:31 can become ineffective if, in a sense, the animal or human is satiated with it, which can occur if you have continuous reinforcement, so you're providing lots of it. Whereas variable ratio schedules, you don't actually have to provide that much reinforcement. If it's just that they still provoke very high levels of response, even if the actual overall rate of reinforcement is low, as long as, first of all, the amount of reinforcement they get depends on the number of response that they make. And second of all, any given response could elicit reinforcement. If both of those conditions are met, as they always are in variable ratio schedules, then variable ratio schedules will elicit very high levels of response. And the other thing is

Starting point is 00:24:09 that variable ratio and indeed any of the other types of schedules apart from continuous reinforcement are more resistant to extinction than continuous response because you can't tell in these other conditions, especially a variable ratio condition. You can't tell if they've stopped reinforcing or if it's just not time yet or the ratio or you haven't made enough responses yet. So variable ratio is sort of the optimal in that it's hard to extinguish and produces many very rapid, consistent rate of response. And it's no accident that a variable ratio schedule is exactly the type of reinforcement that is used in casinos, in slot machines and that sort of thing, because, especially slot machines, but most types of gambling really use this method of reinforcement

Starting point is 00:24:50 because it's based on a ratio, so it's based on how many times you play, not intervals of time. So the more times you pull the lever down on the slot machine, the more absolute number of wins you'll have, not larger proportion of money that you win, but larger absolute number of reinforcement you'll have. But it's variable ratio in the sense that it's random as to win. which pull yields a reinforcements, because any pull of the lever or any individual act of gambling can produce a reinforcement. So that's why gambling is addicting for many people, or for a large number of people, at least, because it's hard to extinguish, and it is really designed to reinforce us

Starting point is 00:25:30 in such a way that we want to keep doing it. Because there's always that feeling, well, only one more response that I can get reinforced. Okay, so just one final thing I want to talk about very briefly is a third type of learning, observational learning. Don't worry, there's not too much to say about this one because there's a lot less research on it than classical or Bryant conditioning, or at least the type of research that's done on it often is categorized differently, so I'm not going to talk about it now. Observational learning, though, it's also called social learning or vicarious learning, is a type of learning that occurs as a function of observing what someone else does and then replicating that behavior. It's not the same as mimicry.

Starting point is 00:26:04 It has to actually involve some sort of degree of understanding or comprehension of what the other person is doing. The other thing is that it's been shown that you don't actually have to perform the action itself or even be directly reinforced in order to learn it. That's why observational learning is not the same as operant conditioning, because you don't have to be reinforced to do it necessarily, certainly not directly reinforced. Different species, especially humans, most especially humans, but it can occur to some extent in other species. will just spontaneously copy, observe, remember, and later on reproduce behaviors that they see.

Starting point is 00:26:38 The classical experiment, which demonstrated this, was Albert Bandura in the 1960s, who conducted his Bobo doll experiment. Basically, he presented children with a Bobo doll. That's like an inflated doll that you can play around with. That's weighted at the bottom, so it sort of rocks from side to side and so on. It sort of looks like a clown. A clown in the shape of a bowling pin, if you can sort of imagine that. Might want to look it up.

Starting point is 00:26:59 He presented one group of children with a video of an adult playing nicely, just in a fairly subdued way with the bobo doll, and a second group, he exposed them to a video of the adult playing, same adult, playing in a violent way, hitting the doll and so on, kicking it and whatever. And indeed, he found that the children who were exposed to the violent video behaved in a much more violent way towards the doll. They played with it in a more violent way, substantially more violent than the children who had observed the other video.

Starting point is 00:27:29 So this was an, and they weren't instructed to do this. This was purely they observed adults doing it, and they copied that behavior. Nor were they reinforced in doing this. This was purely their choice. Neither group got any greater benefit or any reward from behaving in one way or the other. It was purely the children observed the behavior, and then sometime later on, not immediately, but later on when they were allowed to play with the doll, they reproduced that behavior.

Starting point is 00:27:52 So this is a case of observational learning. Observational learning is also made greatly easy by the use of language where you can effectively tell people what to do. do or describe to them what you want to do and they take that in and then reproduce it later on. It does occur in non-human and indeed non-primate species in some special situations like, for example, adolescent or children of various animals learning to hunt or build a nest or something like this from their parents. Perhaps even some degree of language skills and a few other activities have been shown to be passed on by observational learning in primates. So there's a limited

Starting point is 00:28:25 amount of observational learning in animals, but by far this is most important in humans. humans. And it's greatly facilitated, as I said, by the use of language, which is a uniquely human characteristic. And observational learning is how you get the most complex and most long-lasting, prolonged behaviours, obviously, because you can demonstrate in much more precise detail how you want things done and explain why it's on. I just wanted to discuss a couple of other aspects of observational learning. One important thing to understand is the concept of vicarious reinforcement and vicarious punishment. This refers to the fact that individuals will shape their behavior in response to reinforce us that are received by others rather than themselves, or punishments received

Starting point is 00:29:03 by others as opposed to themselves. This is particularly common when people imitate successful people, companies, imitate other successful companies, or individuals imitate sports stars or movie stars and so on. It's one of the reasons why celebrities like that are so popular, because people want to imitate them because of the effect of vicarious reinforcement. When the movie star or the sports star experiences success, the people imitating them, or people who may be thinking of imitating them, or who are learning from them, reinforced vicariously.

Starting point is 00:29:31 Vicarious punishment can occur also. It's one of the principles that underlies, for example, deterrence in the legal system. You punish people in order to deter others from committing the same crime. The trouble is it doesn't seem to work nearly as well as vicarious reinforcement. One theory for that is that people will only learn from or mimic the behavior of those they identify with, those they consider to be similar to themselves. And people tend not to identify with unsuccessful people or unlucky people. people, because most people tend to think that they're pretty good, pretty successful, pretty

Starting point is 00:30:03 competent, and those who are not successful or who get caught at something or who fail are of course not competent and therefore they're not perceived as good role models. So people don't associate with those who are punished and therefore the punishment, the vicarious punishment doesn't seem to be as successful as the vicarious reward or reinforcement. One final concept I want to talk about is self-efficacy. Self-efficacy basically refers to the belief that one is capable of performing something or of attaining a certain goal. to which one believes in one's own capabilities, referred to as one's degree of self-efficacy.

Starting point is 00:30:34 Now, the concept of self-efficacy is relevant to observational learning because people will generally only attempt to imitate or learn from the behavior of others if they consider themselves capable of doing whatever it is that the other person's doing, and therefore if they have the requisite self-efficacy to achieve that task or to do whatever that is. This is why role models are so important,

Starting point is 00:30:53 as I sort of mentioned before, because even if people might be positively reinforced by what they're seeing another person doing or the results of another person's efforts or some activity that the other person is doing, if they don't perceive themselves as having the same self-efficacy as that other person or having the same ability, then they will not, or much less likely to attempt that task. So role models, particularly for young people, are very important, and it's one reason why it's considered to be a very important goal to get more, say, women and other minority groups, racial and ethnic minority groups into higher positions of, say, political power or economic power

Starting point is 00:31:28 or positions in different organizations. Because without those role models of younger aspiring people in those fields, then the self-efficacy is reduced because people don't, that people are less likely perhaps to, for example, young aspiring female physicists may be less likely to compare themselves or to feel similar to a male role model than a female role model in that role. Therefore, they feel that as they are a female, they have lower self-efficacy, because no past females have gone into that position or very few, and therefore they're less likely to actually attempt to get into that position, less like to attempt to replicate the behaviours of becoming a physicist, as one example,

Starting point is 00:32:05 because females are very underrepresented in physics and engineering and many of the other physical sciences, and this may be one reason why. There's a lack of self-efficacy because of a lack of role models. So it's sort of self-reinforcing, and that women don't feel like that they are able to, that they're as good at, say, physics or maths as men do, and therefore they don't pursue careers in those areas, and therefore there are a few women in those areas, and therefore the cycle sort of repeats itself.

Starting point is 00:32:29 So that's an example of social learning, observational learning, and self-efficacy role-modelling in action. Okay, so that's about all I wanted to say about observational learning. I hope you enjoyed this episode. Please spread the word about the podcast by posting a review on iTunes or inviting a friend to listen. Thanks for listening, and I'll talk to you next time.

The Science of Everything Podcast - Episode 29: Operant Conditioning

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.