KEY DIFFERENCES
“Unfortunately making the shift towards positive reinforcement with horses isn’t just as simple as adding a click and a treat after every release you would normally give. There are some key differences you should be aware of.” – Maddy
EFFECT ON MOTIVATION
In this post, I summarize 3 key application differences between positive and negative reinforcement that I learned while studying marine mammal training. But before we dive into the differences in application, I think it’s important to touch on the differences in the effect on motivation, or something called discretionary effort, to be exact.
Discretionary effort is the level of effort someone could give if they wanted to, but above and beyond the minimum required. Using positive reinforcement creates a relaxed environment from which creativity can thrive. The effect is that animals or staff in a workplace who are being positively reinforced for their efforts go out of their way to do more and go above and beyond what is required of them. Any time a threat of an aversive (something the animal or person wants to avoid) is present, however, the animal does the bare minimum required to escape the pressure. That is a fundamental difference in the type of motivation produced by negative reinforcement versus positive reinforcement. Now let’s move into some differences in its application.
“#1 Ending the session can be punishment in positive reinforcement training and is a big reward in negative reinforcement training”
I explained this concept in detail in my last post. If you are training with positive reinforcement effectively, where the animal is engaged, and you end the session abruptly, you can accidentally apply negative punishment to the horse because you are taking away (subtracting, hence the term “negative”) something the horse wants, like taking a favorite toy away from a toddler. So the behavior that you end the session on can risk being damaged unless careful efforts are taken. For this reason, it’s best to give the horse a big reward for achieving a complex target behavior, and then cue them for some other easier, favorite behaviors before putting them away, with reinforcements offered in the putting away process. Another technique that will avoid accidental negative punishment from ending a fun session is making sure the horse has access to hay at all times, and ideally social contact with other horses. If you put your horse back into an empty stall with no enrichment, you are going to be much more likely to run into the issues of negative punishment and the creation of anxiety.
If you are using solely negative reinforcement, on the other hand, the horse is working to avoid something he doesn’t want (an aversive stimulus, aka pressure). In this case, ending the session altogether is the biggest reward you can give a horse. Cowboys would oftentimes implement this tool into their training. They’d work the horse through some new or complex behavior and as soon as the horse had a breakthrough the cowboy would stop and take a smoke break where the horse got to rest and “soak it in” (hence I refer to such extended rewards as a cowboy smoke break). This method worked much faster and without mindless drilling as the horse was getting clear rewards. I used this method for a long time and it worked wonders creating quiet, relaxed, and willing horses. But during my dolphin training experience, I began to question the underlying assumption. The cowboy smoke break was only effective assuming my horse did not want to work. As explained in my last post, the dolphin trainers would say, “if it’s reinforcing for you to leave and for behaviors to end, you’ve failed.” I wanted my horses to want to work and enjoy the behaviors, not just doing it so that I’d take the saddle off and hang up my bridle.
#1 APPLICATION WITH AMIRA
I taught Amira this bridleless and bareback sliding stop by ending several rides when she stopped using her hind end, thus greatly speeding learning and avoiding drilling.
#1 SUMMARY
- Ending a training session abruptly when using positive reinforcement is negative punishment.
- Ending a training session abruptly when using negative reinforcement is the biggest reward (release of pressure) you can offer the horse.
#2 Negative reinforcement requires control, positive reinforcement requires choice.
Here’s another key difference, that may be hard to wrap your head around if you’re used to traditional pressure and release style training!
In negative reinforcement, you must maintain control of the behavior. This means that if you begin asking for a behavior you must stick with the escalation of pressure until the horse responds in some way. A good trainer will reward the horse’s smallest tries towards the correct behavior and be able to identify these in order to set the horse up for success and avoid frustration and fear. But either way, you must stick with the pressure until the horse shows a sign of responding correctly, otherwise you will accidentally reward the resistance. I call this an “accidental release.” For example, the horse pulls back, the lead rope snaps, and he gets a release of pressure, hence he is rewarded for pulling back. So even though a sensitive trainer will be able to read and reward a horse’s small efforts towards the desirable behavior, releasing pressure and giving up on a behavior before the horse answers correctly is really only damaging to the horse as he is rewarded for an incorrect behavior and becomes even more confused.
In positive reinforcement training, however, if the animal isn’t getting the behavior, no problem. The animal is given the choice to simply perform a different behavior or a smaller approximation with no aversive consequence. I was shocked and impressed with this new concept as I saw it appear throughout my time with the marine mammals. Of course, why hadn’t I thought of that before now! The trainers called this concept the “redirection technique.” The redirection technique is a technique used to ask for other, easier behaviors when the animal becomes confused or withdrawn before returning to it in order to maintain behavioral momentum and prevent behavior breakdown and aggression. The first time I observed the redirection technique at work was with Cosmo, a red macaw (the place I studied at trained not just dolphins, but sea lions, exotic birds, dogs, and reptiles). The trainer asked him to fly to her (as he had previously done), but he just bobbed his head in anticipation, clearly a little nervous about performing the behavior. So the trainer instead asked him for a series of other behaviors. Spin, target, etc. Then she asked him to fly to her again. And he flew. I later learned why the redirection technique is successful. It lets go of control in the training by preventing the trainer from dwelling on a single behavior they want. If the animal isn’t approached with other options to get reinforced, you can easily create behavior breakdown and aggression. And what I found even more interesting about why it works involves the term behavioral momentum.
Behavioral momentum is a strategy that entails making requests that are easy for the animal before making requests that are more challenging or difficult to keep “momentum” flowing. When confusion sets in, you return to a behavior they can easily master to increase their confidence before returning to the behavior they struggled with, but perhaps asking for it at a lower level (successive approximation). There was actually a study done with children taking tests that involved redirection and behavioral momentum. If the children were struggling on a particular question during the test, they were given 6 questions that built their confidence and then were presented with the challenging question again. Children who were taken through this process had a higher success rate and oftentimes were able to answer the challenging question correctly when it was represented to them.
#2 SUMMARY
- Negative reinforcement training requires the trainer to maintain control of the behavior by continuing to apply an aversive (pressure) until the animal responds correctly or he/she accidentally rewards the animal.
- Positive reinforcement training requires the trainer to provide the animal with choice by implementing the redirection technique and/or smaller approximations of the behavior in order to provide the animal with other opportunities to get reinforced.
Note: Notice in both negative and positive reinforcement training, shaping, the process of breaking down a behavior into tiny steps, is absolutely key if an animal gets frustrated or withdrawn. This is such an important concept that it is Rule #3 in my 5 Golden Rules to Horse-Human Connection.
“#3 In negative reinforcement training, the Sd is introduced at the beginning of teaching the behavior, in positive reinforcement training, the Sd is introduced Sd at ~80% mastery of the behavior”
A discriminative stimulus (Sd) is a learned signal for a specific conditioned (trained) behavior that discriminates one learned behavior from another. For example, you lean forward and wave your finger and the horse backs up.
The process of teaching an Sd using negative reinforcement looks something like this. You begin by facing the horse. You ask with the first step and continue through the sequence until the horse takes a step back
- Lean forward and wave your finger (Sd)
- Wave your wrist back and forth, sending pressure into the line
- Wave your forearm back and forth, sending more pressure into the line
- Wave your entire arm back and forth, sending even more pressure into the line
This is a familiar “natural horsemanship” way of asking the horse to back up using an escalation of pressure. Notice that the sd is introduced at the very beginning, before the horse really knows what it means. Through training, the horse learns to respond to step 1 before having to feel the pressure in the following steps.
In positive reinforcement, however, the sd is not introduced the behavior is approximately 80% mastered (depending on the trainer). This is known as “keeping the Sd clean.” Keeping the Sd clean is an effort to prevent “poisoning the cue,” which can happen if the Sd is introduced to soon in the process of mastering the behavior and the animal associates the Sd with:
- An undesirable behavior
- A lower approximation of the behavior
- Or becomes frustrated, thereby associating it with frustration
This came into play when we were teaching Archie, a dog, to wipe his paws on the mat. We first taught him to paw by placing a small treat under a mat. As he approached the mat, we rewarded him for sniffing, then a quick pawing movement with his right paw. We then reinforced him for alternative paws (right, then left) in more of a wiping motion. At this point, we decided to add the Sd, which would be the trainer wiping their feet. It ended up that we introduced the Sd a little too early, because then Archie began wiping his feet at the very edge of the mat instead of in the center. We stopped using the Sd until he understood he would only be reinforced for wiping his paws in the center of the mat so that he wouldn’t associate the Sd with wiping his paws off center, or any accompanying emotions as he worked towards figuring this out.
#3 SUMMARY
- In negative reinforcement training, the Sd is introduced at the beginning of training a behavior, from the very start.
- In positive reinforcement training, the Sd is not introduced until the behavior is at least 80% mastered to prevent poisoning the cue.