The Science of Everything Podcast - Episode 29: Operant Conditioning
Episode Date: January 25, 2012Continuing on from episode 28 on Classical Conditioning, in this episode I discuss another form of learning called Operant Conditioning, including the concepts of reinforcers, punishers, shaping, and ...schedules of reinforcement. I then apply Operant Conditioning theory to understand phenomena such as animal training and how punishments can be effectively used. The episode concludes with a brief look at a third form of learning, Observational Learning.
Transcript
Discussion (0)
You're listening to The Science of Everything podcast, episode 29, operant conditioning, and I'm your host, James Fodor.
This is a follow-up to episode 28 on classical conditioning. In this episode, I'm going to now cover
operant conditioning, which is a different type of learning. And I'll also at the end talk a little bit
about observational learning, which is an even more recent discovery. But before we get to that,
we'll start with operant conditioning, which is also known as instrumental conditioning. This is
a more recent discovery. It's generally associated with the behaviorist or behavioralist school
of psychology, for example, BF Skinner in particular, who did a lot of work on this in the
early to mid-20th century. Operant conditioning is a form of learning, during which an individual
modifies its own behavior due to association of the behavior with some stimulus or some consequence.
So what I want to do first is clearly distinguish the difference between operant conditioning
and classical conditioning, because this can be a source of confusion.
Remember, classical conditioning requires the existence of some kind of innate reflex.
It basically causes an initially neutral stimulus to give rise, to be conditioned so that it gives rise to some innate reflexive response.
Uprime conditioning has nothing to do with reflexes.
It's about conscious or what we might call voluntary behavior, although the behavior of school wouldn't actually like using those words,
but that's how we describe it.
It's not reflex, it's voluntary behavior.
So anything that involves a reflex or involuntary action is going to be classical conditioning.
Anything that involves some kind of decision is going to be operant conditioning,
or for the most part.
Another difference is that operant conditioning basically involves reinforces and punishments.
So operant conditioning is consequential.
It's about what happens after the act or the response occurs.
Classical conditioning is the opposite.
It's pretty much all about what happens before.
There's some stimulus which then gives rise for the response.
It's about the stimulus being a predictor for another stimulus,
then gives rise to a response. Operant conditioning has nothing to do with stimulus as predicting.
It's all about the consequences. I want to do this, or I'm going to do this,
because I will get a good consequence, or in order to avoid this bad consequence.
So classical conditioning is sort of backward-looking at stimuli being predictors.
Operant conditioning is consequential looking at what's going to happen after the action
and anticipating that. Okay, so I said that operant conditioning was about reinforcers and
punishments. Now I need to talk about what those are. A reinforcer is defined as some event,
or stimulus that makes a following behavior or a succeeding behavior more likely to occur.
And the behavior that is being reinforced is called the target response.
A punishment is basically the opposite of a reinforcer.
It's some event or stimulus that makes a succeeding behavior less likely to occur or less likely to reoccur.
So basically, a reinforcer is something that the organism likes and therefore they want it again.
A punishment is something that the organism doesn't like and therefore they don't want it again.
Particularly with reinforcers, although I guess you could have to have a person.
this with punishments as well, you can have primary or secondary reinforcers. A primary
reinforcer is basically some stimulus or event that does not require pairing with any other
reinforcer in order to act as a reinforcer. So basically they're sort of primal needs or things that
have a natural appeal to the organs. And basically that they've been evolutionary
selected to be attracted to. They're basically the standard things like sex, water, food,
sleep, maybe shelter, stuff like that. Secondary reinforcers, most important for humans, but
can apply to animals as well. There's some situational stimulus that's been paired with a primary
reinforcer or another secondary reinforcer and therefore gains its sort of function as a reinforcer as a
result of being a means to that primary reinforcer or indeed as another secondary reinforcer. So
money would be a good example of that. Or status or something. Well, I guess you maybe even could
call status a primary reinforcer because humans are quite social. But there are some things where
it's a little bit if you're whether it's a primary or secondary reinforcer. So don't worry about
that distinction too much, but just the basic concept of some things are more basic than others,
and the ones that are less basic, that's sort of the more instrumental to another reinforcer
or refer to as secondary reinforces, money being a good example.
Okay, but more important than that primary, secondary distinction is the positive and negative distinction.
Okay, so remember, we've got reinforcement, that's something good, makes the behavior more likely,
and punishment makes the behavior less likely.
Either way, by the way, whether you use punishment or reinforcement,
the behavior that we're talking about is called the target response.
Don't get confused there because the target response could be something that we want to happen
or could be something that we don't want to happen,
depending on whether we're using punishment or reinforcement.
So target doesn't mean we want it, it just means that we're sort of targeting on it,
we're interested in it.
It could be to minimize it or increase it.
There are two types of reinforcement, positive and negative.
There are two types of punishment, positive and negative punishment.
So these words are a bit confusing, so we're going to go through them.
Basically, in this context, positive means that you introduce it.
So if it's positive reinforcement, it means you,
you give them something good, like it's a reward, it's money, it's food, or something like that.
So negative means taking something away. So negative reinforcement would be taking away an
adverse stimulus. So reinforcement, the second word, implies that it's a good thing, but negative
means it's being taken away. So that doesn't mean we're taking away a good thing because
it's still a reinforcement, it's still a good thing. It means the way we're getting the good thing
is by taking away. So clearly, if it's taking away, but it's still a good thing, it must be
taking away a bad thing. So that's what negative reinforcement is. It's taking away something the
organism doesn't like. So it could be removal of pain or removal of loud noise. That would be a
reinforcer. So then there are also positive and negative punishments. Remember, punishments,
just something that organism doesn't like makes the behavior less likely to occur. Positive
punishment means that you, positive introduce punishment, bad thing. So you introduce a bad thing.
That could be introducing the loud noise or the electric shock or whatever. Negative punishment
is taking away, negative. But punishment means bad things.
thing. So you're taking away something such that the organism doesn't like it. So that obviously means
you're taking away something that's good. So this could be, for example, confiscating a child's toy
following undesired behavior or something like that. So this is really confusing. A positive
punishment doesn't mean it's a good thing and negative reinforcement doesn't mean it's a bad thing.
Positive and negative just indicates whether something's being added or taken away.
Probably the best way of remembering it is that remember that the second word is sort of more
fundamental. That determines whether it's good or bad, whether it's a reinforcement or whether it's a
punishment. The first word, the adjective, the positive or negative, doesn't determine whether
it's good or bad. The positive or negative just determines whether something's being added or taken
away to achieve that purpose. So I emphasize this because it's the terms that are used in
discussing operant conditioning and they're often confused, so it's important to get them clear.
Okay, so that's the idea of reinforces and punishes. Basically, operant conditioning is all about
getting organisms or animals, humans, whatever, to do certain behaviors and not to,
to do other behaviours by providing reinforces and punishes of various sorts in order to elicit
the desired behaviour. I need to talk about the concept of a skinner box quickly, because this is
sort of how the concept of operant conditioning was originally demonstrated and many of the theories
were developed. A skinner box is also called an operant conditioning chamber. It's just a laboratory
apparatus used in experimental analyses of animal behaviour. It's usually some sort of sealed
or relatively sealed container, which allows manipulation of the environments.
Usually it'll permit provision of punishments or reinforces and some action that you want the animal to perform.
It could be turning a lever or pushing a button or something like that.
They could be more or less complicated.
They might have lights on them to provide various signals or to act as stimuli or sounds or whatever.
Rats and mice are common subjects of choice.
but you could do this basic thing with any types of animals.
Having discussed the basic concepts, I now want to talk about some of the more
specific aspects of operant conditioning including shaping and backward chaining
and also avoidance conditioning and the different schedules of reinforcements,
which is some very interesting applications. So first of all, we'll talk about shaping.
Shaping is a method of successive approximations. Shaping, well actually shaping and backward
chain go together because they're both mechanisms of how you actually teach something.
Because remember, operant conditioning just provides
a consequence, good or bad, that tries to shape behaviour.
The trouble is if you're working with animals, arguably even some humans, but definitely
when you're working with animals, you can't tell them what you want to do.
You can't say, we'll run around this object three times, then walk over to this side of the cage,
sniff here once, and then go over and press this red button, and then you'll get your
reward.
That's too complicated.
Nor can you really wait until the animal spontaneously does that entire thing, because it's
not really going to happen.
What you need to do is gradually a problem.
what you want the animal to achieve. So there are two, do it bit by bit in a sense,
and there are two aspects of that. They're shaping and backward chaining. So shaping is a method
of successive approximations whereby you get the animal to achieve the behavior that you want
by rewarding or reinforcing successive approximations of what you want. So for example, it may be that
you want the animal, say it's a mouse, you want the mouse to press a button or pull a lever
with their particular pore.
So first of all, you just reinforce their rat
looking in the direction of the lever.
Then you reinforce them walking towards the lever.
Then you stop reinforcing them walking towards the lever,
and now they have to actually touch the lever to being reinforced.
Once you get that to be conditioned,
you require them to touch it with the right paw.
And then finally, you stop reinforcing that
and only reinforce if they pull down the lever.
So basically, you take it in small, tiny steps
such that there's a reasonably good chance
that the animal will do the next step spontaneously,
you know, if you re-in-f, if there's only four sides of a box and you reinforce the, the mouse looking in the direction of one of those,
and there's a pretty good chance it's going to just spontaneously look at one of those. As soon as it does that, you reinforce.
And then you keep doing that until it's learned that you have to look at that side of the box to be reinforced.
Then once again, once it knows a direction to look at, there's a pretty good chance that eventually it's going to walk in that direction.
So once it does you reinforce that, and so on. Each little step is sufficiently close to the previous one that it,
that the behavior that you want will happen by chance across that sort of small gap,
and you reinforce that.
And so you gradually shape by approximating the final behavior that you want.
Now, backward chaining is sort of a...
It's kind of similar to shaping, but it's the next step up.
It's used for training or teaching complicated steps of movements,
or complex sequences of behaviors.
Basically, what happens is that you start with the final behavior that you want.
So say it's a mouse pulling a lever.
But you want it to do some things,
before that. You want it to climb a ladder to get to the lever, but before that you want it
to jump in some water, but before that you want it to find its way around a maze. What you
have to do, the backward chaining method, is first of all you condition it, perhaps using
shaping to pull the lever, then what you do, so that, that behavior, the target behavior, is
pulling the lever. And the rat knows, I call it a mouse, but called it a rat, the rat knows
that pulling the lever, that behavior, the target behavior, is associated with a reward, because
it's been reinforced for that. So now what you can do is use that opportunity to pull the lever,
that target behavior, as a reinforcer in itself for some other action. So, for example, if the rat
knows that pulling the lever will yield reinforcement, food is often used, then it will be willing,
in a sense to climb a ladder in order to have the opportunity to pull the lever, because then
that will get food. So having the opportunity to pull the lever acts as a reinforcement in itself. This would be
an example of a secondary reinforcer because the pulling the lever itself is not what the rat wants.
The rat wants the food that it gets from pulling a lever, but it knows it's been conditioned,
instrumental conditioning, to know that pulling a lever will get the food.
So it now can, you can use pulling a lever, the opportunity to do that, as a secondary
reinforcer.
And so say now that you've conditioned it using shaping to climb the ladder, because maybe
it only first touched the ladder and then climbed a bit up and then halfway up and you had to
condition it, shape it to climb all the way up the ladder.
Now you've got it climbing up the ladder and pulling the lever,
but now you want it to jump in the water first.
So once again, you use the opportunity to climb the ladder
as a reinforcer for the target behavior,
which is now jumping in the water.
And once again, you can get the jumping in the water
to occur through shaping.
And so you build backwards like this,
using each successive behavior that the animal has learnt
as a reinforcer for the one that comes before that.
And so basically, you can teach the rat
to do the entire process because it'll run through the maze
because it knows that once it does that, it will have the opportunity to jump in the water,
which it wants to do because it will have the, once it does that, it gets the opportunity
to climb the ladder, and once it climbs the ladder, it has the opportunity to pull the lever,
which then it wants to do because it can get the primary reinforcement, which is the food.
This is the only reliable way that's been shown of training animals to do arbitrarily complex
procedures or behaviors.
And whenever you see animals in films or in the circus or wherever playing with balls
or really doing anything that's not a natural behavior, any complicated behavior like that,
they've been trained to do it through operant conditioning, namely backward chaining and shaping.
So that's why it's important to have an understanding how these things work, because they're
actually used quite a lot. Now I want to talk about avoidance conditioning, which is very important
and at the same time quite disturbing aspect of learning or of operant conditioning. So before we
discuss that, I need to talk about escape conditioning. Escape conditioning occurs when an animal
learns to perform some action to terminate an aversive stimulus. It's basically like a get-me-eat.
out of here, shut this off reaction. You know, if you have, if the animal is shocked or
experiences a loud noise that it doesn't like, it is, um, it learns to avoid that by running away
or whatever. That's important because you can convert escape conditioning into an alternative
type of condition called avoidance conditioning by providing a signal before the adverse
stimulus begins. So for example, suppose the adverse stimulus is an electric shock and you provide
a noise, not a really loud one, just a noise that the animal would notice before the electric
shock. The animal now has a signal that's a reliable indicator of the electric shock.
By the way, that could actually lead to classical conditioning, if those are paired enough
times, but it could also lead to up-brant conditioning because what the animal might start to do
is run off or move into a different spot where it won't experience the electric shock,
even before the electric shock occurs merely upon hearing the signal that indicates that it will
occur. Now, the sort of the good and bad thing, or the interesting thing about avoidance
conditioning is that the target response is provides its own reinforcement. Basically, the target
response is the same as its reinforcement. So in the case of a, in the case of an animal avoiding an
electric shock, the target response would be moving away from the electric shock or moving to a
different location in the cage, say. But the reinforcer is also moving away to another spot in the
cage. Or more specifically, the reinforcer is not experiencing the electric shock. But that, that will occur
if the animal moves into that other part of the cage.
The key thing is that the animal will experience that relief
or will be reinforced regardless of whether or not
the aversive stimulus actually occurs
or actually would have occurred.
So for example, if you pair the noise, the signal
with then the electric shock for a while
and the animal learns how to escape the shock
by moving away as soon as it hears a signal,
then you can set off a signal
and not even bother with the shock anymore
and the animal will keep running away from the signal.
signal because the reinforcement is avoiding that aversive stimulus, but the reinforcement occurs
regardless of whether or not the target action actually contributed to avoiding the
aversive stimulus. So avoidance, avoidance conditioning or avoidance behaviors are self-reinforcing
and therefore are incredibly persistent. It's very hard for them to eradicated. They've done
experiments whereby basically animals will continue to respond hundreds and hundreds of times,
even after the shock generator or whatever the aversive stimulates is never used again.
They just continue to respond hundreds of times in order to avoid essentially an aversive stimulus,
which is no longer even present.
So this is a very important application of operant conditioning or learning more generally,
because this is common in humans too.
Think about how many things that you or someone you know perhaps don't do,
as a result of avoidance conditioning.
You don't do a certain behavior because last time you did that,
something bad happened, and therefore you're avoiding that situation,
whether it be a person or an activity or a place or whatever,
as a result of avoidance conditioning.
Now, think to yourself how likely it is that whatever it was
that actually caused your initial negative experience
is actually still going to occur now.
For example, if it was a person you haven't seen in 10 years,
how do you know that they're still going to be unpleasant to you, for example?
Knowing avoidance conditioning and knowing how that works, how you can have a behaviour
that basically is self-reinforcing can, in a sense, empower you, I think, into changing your
behaviour or at least understanding it better.
Now I want to talk about schedules of reinforcement, because this is a very important aspect
of upright conditioning, just as the timing and presentation and intensity of the
condition stimulus and on-conditioned stimulus in classical conditioning determined the extent
of the conditioning and the how easy it was to do. Similarly, the timing and rate of reinforcement
determines the success and intensity of operant conditioning and the different basically ways or time
schedules of presenting the reinforcements is known as the schedules of reinforcement. So it's sort of like
the rule or the program you're following to determine when you reinforce. Now continuous reinforcement
just means you reinforce every occurrence of the desired response. Fixed ratio reinforcement means that
you deliver reinforcement after every nth response.
So, for example, you could deliver reinforcement
every 10th response, after every second response,
or after every 100th response, or whatever.
Variable ratio reinforcement means that you deliver a...
You reinforce after a fixed number of trials,
or after a fixed number of responses,
but the actual number varies.
It's random, random with some average, so there's some distribution.
So that's when you have a ratio.
The reinforcement is determined by the number of responses
that have occurred,
which could be a fixed ratio,
or a variable ratio. The other main one is the interval method where you have a fixed amount of time
that occurs and you're only reinforced, for example, after the first successful response that occurs
two minutes, or at least two minutes after the last response, or at least 30 seconds after the last
response. Variable interval, of course, you have an interval, but the interval changes.
Fixed or variable ratio and fixed or variable interval interval are very different modes of reinforcement,
and they produce quite different behaviors. But before I talk about that, I'll just note that
Extinction similar to in classical conditioning refers to a period during which the target response is never reinforced.
Similarly, if you stop pairing the unconditioned and conditioned stimulus, the classical conditioning will become extinct or be extinguished.
Extinction will occur in opera unconditioning if you continue if the target response is never reinforced.
That's why, to backtrack for a second, avoidance conditioning is so hard to extinguish because the target response is always reinforced,
regardless of what the initial aversive stimulus is actually doing.
Even if it's gone, avoiding will still be reinforced by not experiencing the
aversive stimulus.
Now, as I was saying, ratio reinforcements and interval reinforcements and also continuous
reinforcements are the three sort of main categories of schedules, and they result in
very different types of conditioning, very different behaviours.
Ratio-based schemes both evoke much more rapid responses than interval-based
schedules, which makes sense because if you get reinforced for every, depending on how many
responses you make, it's in your interest to make as many responses as possible, as quickly
as possible. Whereas if it's a time-based interval, so a variable, excuse me, interval-based
schedules, they evoke much fewer responses because it's not about how many responses that you
make, it's about how much time is passed since the last one. Fixed interval schedules evoke the
fewest responses of all because say you've got a two-minute interval between reinforcements,
there's no point at all making any responses until you're right at the end of each interval,
because none of them will be reinforced until the time intervals ticked her over again.
And so that's indeed what we find in animals.
So this doesn't require intelligent behavior.
This is just all about learning.
You find that you get very few responses during an interval,
and then a very rapid peak of responses, just as the interval is approximately about to end,
when you have a fixed interval. When you have a variable interval schedule, you get more constant
responses because the animal doesn't know exactly when the next interval will tick over,
but you still have less activity because it's still less likely that a, basically because
the number of the amount of reinforcement you get doesn't depend on how many responses you make. It's still
time-based, so you tend to get less response there. Ratio-based schedules, as I said, elicit many more
responses, the best ones are, in terms of a number of responses they listed, are variable
ratio schedules, because fixed ratio schedules provoke or produce rest periods after each
reinforcement. And rest periods become longer and more likely as the number of target responses
that have to be made before a reinforcement increases. So, for example, if you're only reinforced
after every hundredth push-up or whatever, whatever you're doing, or every hundredth push of the
button. Likely what you'll do is you'll make 100 and then you'll get the reinforcement and then
you'll sort of slack off a bit or wait a bit. All right. It's going to be a while until I get
my next reinforcement so I'm just going to have a bit of rest. That seems to be what's going on
in fixed ratio schedules. In variable ratio schedules though, it's basically random whenever
you get reinforcement. You could get reinforcement immediately two trials in a row, three trials in a row
or it could go a thousand trials with that in reinforcement. It's random. It has nothing to do with
time. Yeah, but the key thing is it has nothing to do with time. So still, the more
responses you make, the more like the more reinforcements you'll get, but you don't know
exactly when you'll get them. So they tend to provoke the most responses, basically
because you get that motivation to produce many responses, but you don't get the rest
period, because even after making 100 trials, it still could be the next response
could generate reinforcement, whereas a fixed ratio, that won't happen. Continuous
reinforcement also elicits very strong response, obviously, because every,
every trial is reinforced. The trouble with continuous reinforcement, though, is that extinction
occurs very quickly. Because if every target response is reinforced and then, you know, they stop
being reinforced, after only a few trials, the animal will figure out this isn't working anymore,
and they'll stop responding. The other thing is that the, whatever the reinforcement is,
can become ineffective if, in a sense, the animal or human is satiated with it, which can occur
if you have continuous reinforcement, so you're providing lots of it. Whereas variable ratio
schedules, you don't actually have to provide that much reinforcement. If it's just that
they still provoke very high levels of response, even if the actual overall rate of reinforcement is
low, as long as, first of all, the amount of reinforcement they get depends on the number of
response that they make. And second of all, any given response could elicit reinforcement.
If both of those conditions are met, as they always are in variable ratio schedules,
then variable ratio schedules will elicit very high levels of response. And the other thing is
that variable ratio and indeed any of the other types of schedules apart from continuous reinforcement
are more resistant to extinction than continuous response because you can't tell in these other
conditions, especially a variable ratio condition. You can't tell if they've stopped reinforcing
or if it's just not time yet or the ratio or you haven't made enough responses yet. So variable
ratio is sort of the optimal in that it's hard to extinguish and produces many very rapid,
consistent rate of response. And it's no accident that a variable ratio schedule is
exactly the type of reinforcement that is used in casinos, in slot machines and that sort of thing,
because, especially slot machines, but most types of gambling really use this method of reinforcement
because it's based on a ratio, so it's based on how many times you play, not intervals of time.
So the more times you pull the lever down on the slot machine, the more absolute number of wins
you'll have, not larger proportion of money that you win, but larger absolute number of reinforcement
you'll have. But it's variable ratio in the sense that it's random as to win.
which pull yields a reinforcements, because any pull of the lever or any individual act of gambling
can produce a reinforcement.
So that's why gambling is addicting for many people, or for a large number of people,
at least, because it's hard to extinguish, and it is really designed to reinforce us
in such a way that we want to keep doing it.
Because there's always that feeling, well, only one more response that I can get reinforced.
Okay, so just one final thing I want to talk about very briefly is a third type of learning, observational learning.
Don't worry, there's not too much to say about this one because there's a lot less research on it than classical or Bryant conditioning,
or at least the type of research that's done on it often is categorized differently, so I'm not going to talk about it now.
Observational learning, though, it's also called social learning or vicarious learning,
is a type of learning that occurs as a function of observing what someone else does and then replicating that behavior.
It's not the same as mimicry.
It has to actually involve some sort of degree of understanding or comprehension of what the other person is doing.
The other thing is that it's been shown that you don't actually have to perform the action itself
or even be directly reinforced in order to learn it.
That's why observational learning is not the same as operant conditioning,
because you don't have to be reinforced to do it necessarily, certainly not directly reinforced.
Different species, especially humans, most especially humans,
but it can occur to some extent in other species.
will just spontaneously copy, observe, remember, and later on reproduce behaviors that they see.
The classical experiment, which demonstrated this, was Albert Bandura in the 1960s,
who conducted his Bobo doll experiment.
Basically, he presented children with a Bobo doll.
That's like an inflated doll that you can play around with.
That's weighted at the bottom, so it sort of rocks from side to side and so on.
It sort of looks like a clown.
A clown in the shape of a bowling pin, if you can sort of imagine that.
Might want to look it up.
He presented one group of children with a video of an adult playing nicely,
just in a fairly subdued way with the bobo doll,
and a second group, he exposed them to a video of the adult playing,
same adult, playing in a violent way, hitting the doll and so on, kicking it and whatever.
And indeed, he found that the children who were exposed to the violent video
behaved in a much more violent way towards the doll.
They played with it in a more violent way,
substantially more violent than the children who had observed the other video.
So this was an, and they weren't instructed to do this.
This was purely they observed adults doing it, and they copied that behavior.
Nor were they reinforced in doing this.
This was purely their choice.
Neither group got any greater benefit or any reward from behaving in one way or the other.
It was purely the children observed the behavior,
and then sometime later on, not immediately, but later on when they were allowed to play with the doll,
they reproduced that behavior.
So this is a case of observational learning.
Observational learning is also made greatly easy by the use of language
where you can effectively tell people what to do.
do or describe to them what you want to do and they take that in and then reproduce it later on.
It does occur in non-human and indeed non-primate species in some special situations like,
for example, adolescent or children of various animals learning to hunt or build a nest or something
like this from their parents. Perhaps even some degree of language skills and a few other
activities have been shown to be passed on by observational learning in primates. So there's a limited
amount of observational learning in animals, but by far this is most important in humans.
humans. And it's greatly facilitated, as I said, by the use of language, which is a uniquely
human characteristic. And observational learning is how you get the most complex and most long-lasting,
prolonged behaviours, obviously, because you can demonstrate in much more precise detail how
you want things done and explain why it's on. I just wanted to discuss a couple of other aspects
of observational learning. One important thing to understand is the concept of vicarious reinforcement
and vicarious punishment. This refers to the fact that individuals will shape their behavior in response
to reinforce us that are received by others rather than themselves, or punishments received
by others as opposed to themselves.
This is particularly common when people imitate successful people, companies, imitate
other successful companies, or individuals imitate sports stars or movie stars and so on.
It's one of the reasons why celebrities like that are so popular, because people want to
imitate them because of the effect of vicarious reinforcement.
When the movie star or the sports star experiences success, the people imitating them, or
people who may be thinking of imitating them, or who are learning from them,
reinforced vicariously.
Vicarious punishment can occur also.
It's one of the principles that underlies, for example, deterrence in the legal system.
You punish people in order to deter others from committing the same crime.
The trouble is it doesn't seem to work nearly as well as vicarious reinforcement.
One theory for that is that people will only learn from or mimic the behavior of those they identify with,
those they consider to be similar to themselves.
And people tend not to identify with unsuccessful people or unlucky people.
people, because most people tend to think that they're pretty good, pretty successful, pretty
competent, and those who are not successful or who get caught at something or who fail are
of course not competent and therefore they're not perceived as good role models. So people
don't associate with those who are punished and therefore the punishment, the vicarious
punishment doesn't seem to be as successful as the vicarious reward or reinforcement.
One final concept I want to talk about is self-efficacy. Self-efficacy basically refers to
the belief that one is capable of performing something or of attaining a certain goal.
to which one believes in one's own capabilities,
referred to as one's degree of self-efficacy.
Now, the concept of self-efficacy is relevant to observational learning
because people will generally only attempt to imitate
or learn from the behavior of others
if they consider themselves capable of doing whatever it is
that the other person's doing,
and therefore if they have the requisite self-efficacy
to achieve that task or to do whatever that is.
This is why role models are so important,
as I sort of mentioned before,
because even if people might be positively reinforced by what they're seeing
another person doing or the results of another person's efforts or some activity that the other
person is doing, if they don't perceive themselves as having the same self-efficacy as that other
person or having the same ability, then they will not, or much less likely to attempt that
task. So role models, particularly for young people, are very important, and it's one reason why
it's considered to be a very important goal to get more, say, women and other minority groups,
racial and ethnic minority groups into higher positions of, say, political power or economic power
or positions in different organizations.
Because without those role models of younger aspiring people in those fields, then the self-efficacy
is reduced because people don't, that people are less likely perhaps to, for example,
young aspiring female physicists may be less likely to compare themselves or to feel similar
to a male role model than a female role model in that role.
Therefore, they feel that as they are a female, they have lower self-efficacy, because no past females have gone into that position or very few,
and therefore they're less likely to actually attempt to get into that position,
less like to attempt to replicate the behaviours of becoming a physicist, as one example,
because females are very underrepresented in physics and engineering and many of the other physical sciences,
and this may be one reason why.
There's a lack of self-efficacy because of a lack of role models.
So it's sort of self-reinforcing, and that women don't feel like that they are able to,
that they're as good at, say, physics or maths as men do,
and therefore they don't pursue careers in those areas,
and therefore there are a few women in those areas,
and therefore the cycle sort of repeats itself.
So that's an example of social learning, observational learning,
and self-efficacy role-modelling in action.
Okay, so that's about all I wanted to say about observational learning.
I hope you enjoyed this episode.
Please spread the word about the podcast
by posting a review on iTunes or inviting a friend to listen.
Thanks for listening, and I'll talk to you next time.
