One of the mysteries of dog training seems to be how to get a reliable level of off-leash control in your police dog. As a veteran seminar speaker, I have trained all around the world with police K9 handlers and trainers who have seemed to struggle with this mystery. At conferences, handlers and trainers are bombarded with PowerPoint slides of operant conditioning theory and classical conditioning theory, techniques for free- shaping behaviors, and lure-reward systems. If you don’t know what that means, you need to run, not walk, to a good book or article on canine learning theory, read it, and own the material (free on tarheelcanine.com in our media area). You cannot be a K9 handler in this day and age and not understand the four quadrants of operant conditioning, understand that this is learning by dogs experiencing consequences, and be able to define those quadrants. You must be comfortable with examples of how operant conditioning is used to train your police dog. The same goes for classical conditioning and having at least an understanding of how learning by association works in your dog’s training. You must understand reward ratios, the effects of compulsion, and how it is most effective in a coherent training program. But, this article is not about learning theory. It is about how to approach the holy grail of good, consistent, reliable off- leash control in your dog. It is about your mindset on how to create it in your training. With a scientifically incorrect mindset, you will not be successful. 

If you find yourself constantly repeating commands, nagging your dog with leash corrections, raising your voice to get through to your dog, or struggling with heeling or out and recalls in certification, you are probably in the group that needs to radically rethink your training. Recently at a seminar, when I saw a number of the dogs struggle with verbal outs, I told my class: “Imagine that I took my competition dog out and worked him in a bite scenario, and he engaged the decoy (in a bite suit), and the decoy put pressure on him with a drive or stick hits or rolled around with him on the ground. If when the decoy got up and I said ‘out,’ my dog refused to release the decoy on the first command, I would be completely mortified. It just never happens. It never happens in training or trials.” This is a fact. The same goes with heeling. If I give my dog a heel command, regardless of where I am, on a city sidewalk or on a soccer field, and he failed to obey, I would be stunned. My dogs bite civilly, do muzzle work, and are serious, powerful, and high drive. Anyone would like them for a police dog. I don’t say that to boast. They are very strong dogs. 

But despite their strength and power and sometimes willfulness, I have excellent off-leash control. That control isn’t because I use e-collars in my training, though I do. It isn’t because I use toy rewards in my training, though I do. It is because my sole focus when training any behavior in my dogs is to create a habit so strong that it is difficult, if not impossible, to shake the foundation of that behavior. If my dogs fail to perform a behavior consistently and under distraction, it is my fault for not creating a habit powerful enough to be robust to those distractions and environments. I have seen dogs wearing e-collars that are so collar wise that when the e-collar comes off, the dogs fail to perform. I have seen dogs that are frequently rewarded fail to perform when the dog realizes that the reward is not coming when he expects it. Therefore, it is not the tool of reward or the tool of compulsion that results in off-leash control. It is whether the trainer/handler can create habit by using those tools and having an understanding of how they play into training theory and the science of learning. Human beings are bad at habits. We are inconsistent with ourselves oftentimes, and thus we become inconsistent with training and are poor at creating habits in our dogs. Consistency takes effort, knowledge, and discipline. When people watch me train, they often ask, “Yeah, but what happens when the e-collar comes off, and the toy is not given? What will happen then?” That is the goal of my training, of course! If you watch me trial where neither one of these things is allowed, generally my dogs perform at a high level. Therefore, I would never abandon my training progression to “see how my dog will do” until he is ready to perform out of habit. 

People actually think their dogs only perform because they are afraid to get a correction. They try and trick their dogs into thinking they have an e-collar on for certification by putting on a really tight choke collar. They glue pebbles to the inside of flat collars to make them think they are feeling the contact points of an e-collar. They refuse to use a pinch collar or an e-collar because only choke collars can be worn in a certification. Therefore, they use a correction device that for most dogs cannot deliver the needed positive punishment to dissuade a dog from a poor choice in training. As a result, the dogs do not perform well in training day after day, and inconsistent habits are created because handlers are thinking they have to trick the dog into performing in certification. 

I often have the following discussion with people: Imagine you are in the house with a pet dog that has lived with you for a few years, and you ask the dog, “Wanna go outside?” What does the dog generally do? He runs to the back door. No e-collar, no jute tug, no treat — he just goes. He goes because it is a habit. It is always where he goes to go outside. Does he ever, upon hearing those words, jump up and run to the bathroom, get up on the toilet, and stare at the tiny window above the commode? No, he doesn’t, because that is never the route to getting outside. He wouldn’t think of doing anything besides going to the back door and being ready to shoot into the back yard as soon as you open the door. 

Creating habits, therefore, is your job as a K9 handler or trainer. You create habits through repetition of behaviors at the quality level you wish them to be performed. If your dog performs a command inconsistently, and you allow that inconsistency to continue, the inconsistent performance of the command, rather than the consistent correct performance, becomes the habit. Your job as a trainer/handler is to ensure the dog consistently performs correctly, and the performance is either rewarded or is in itself self-rewarding. (See my article in Working Dog Magazine, “Drive Neutrality for Police Dogs: Keeping Your K9 Composed and Focused,” March/April 2017, for examples of intermediate rewards, using the Premack principle in obedience, and methods for training around competing motivations.) Getting to go outside in the back yard is rewarding for the pet because he is allowed to smell the odors of nature, run around a bigger space in the fresh air, roll around in the grass, and most of all, feel the relief of emptying his bladder. It’s self-rewarding. So every repetition of the cue “Wanna go outside?” is rewarded, and as well, noncompliance is negatively punished. I said you will want to know the four quadrants of operant conditioning. Negative punishment is the withholding of a desirable consequence to decrease the likelihood of a behavior. So imagine you said to your dog early in his life with you, “Wanna go outside?” and he did zoomies around the house out of excitement. What did you do? Wait at the door for him to come to it. You withheld outside until he came to the door. Over time, that whittled the behaviors down to going immediately to the door. You probably, out of fun, proofed that behavior. Maybe when you said the key phrase, you went the opposite direction, and after a minute, the dog came looking for you and then ran back to the back door. Then you went there and opened it, reinforcing his choice. 

It’s like proofing a dog in detection when he commits to the odor source and you present another area, and the dog refuses to come off the source. You created a habit. Habits once created are incredibly hard to extinguish. Entire industries are built around trying to help people break bad habits like overeating, gambling, drinking, and smoking. All of these are intrinsically self-rewarding behaviors. All of them remain habits for people for their entire lives. Often, I hear people say when they have been working on training a behavior that they want to see their progress. For instance, after heeling for a little while, they want to take the collars off the dog and heel free to see what he does. More than likely, the dog doesn’t have the habit of heeling yet. He probably still needs boundaries set around distractions (corrections) and needs to be rewarded at a high frequency. By doing this experiment, we remove the rewards and punishments to “see what he does.” Now we introduce the concept to the dog that if the collars are off, he can get out of position, maybe explore some competing motivation like a sleeve or ball on the ground, and there is no consequence for doing so. By engaging in this kind of bad training — taking the leash off too early — under the guise of “practicing off leash,” we relinquish any and all manner of control of the dog’s responses to the consequences provided by the environment (distractions). This ceases to be training and becomes testing. Too much of this testing too soon, and the dog learns that he can do what he wants when he wants. We accomplish the opposite of the desired outcome (off-leash control). We teach the dog to misbehave off leash. We may also teach him to be collar wise. 

To create habits, the handler must have the ability to use his correction devices immediately when needed and to deliver rewards on a variable schedule. Many of the behavior sequences we do in police dog training revolve around fixed action patterns, such as hunting, which are inherently desirable to the dog. The act of hunting, repeated over time, becomes desirable, not just its end result of finding target odor. This is often referred to as the emotional seeking system in dogs becoming turned on. The act of searching or hunting actually becomes more desirable than the finding of the target odor. It’s like when you decide you are going to buy a new car, your desire for the item becomes intense. You research, watch videos, seek out information and reviews, think about the car, and read about the car for weeks and weeks until you finally get the car. Getting the car means you turn off your seeking system, and people often feel an emotional letdown at that time (called buyer’s remorse) and then need to generate an appetite for seeking something else. People who lack goal-directed behavior become depressed. 

Dogs’ brains are much more primitive, and once they find the target, they can be redirected to reengage hunting the next odor, searching for the next decoy, over and over. The search becomes the reward (see my article in Working Dog Magazine, “Detection Training: Why Direct Reward Methods Are Superior to Modern Indirect Reward Methods,” May/June 2017). The anticipation of the finding becomes more important than the finding. This idea that the anticipation of the reward drives the brain to engage more fully and concentrate on the path to the reward with more focus is borne out in neuroscience research. When a signal is given to cue a behavior, like the command to hunt for a decoy or a target odor, upon hearing that command, the brain manufactures a load of a neurochemical called dopamine. Dopamine is the “happy chemical” in our dogs’ brains, leading them to feel pleasure, and it helps build neural pathways that when followed lead to future efficient release of that chemical. 

We used to think it was the receipt of the actual reward that made the dopamine levels rise, but research shows that it is the anticipation of the reward that allows these chemicals to be manufactured and the levels increase. In fact, the dopamine levels drop when the reward is given. Repeatedly practicing the pattern that results in rewards teaches the dog to duplicate those patterns ever more efficiently. We vary rewarding the dog to sustain the dopamine levels over longer and longer periods. It keeps the dog engaged in the behaviors (heeling, hunting, and tracking) over increasing periods of time with intense focus and anticipation but without getting a reward or needing a correction! In other words, off-leash control! The high levels of dopamine in a task that the dog understands keeps the dog focused, engaged, and performing.  When we teach the dog to do a task, such as sit, and we reward the dog with food or a toy each time he follows the command properly, we say the reward ratio is on a fixed ratio of 100%. Research shows that dopamine levels almost double when reward ratios are moved to just 50%. Stanford neuroscience researcher Robert Sapolsky reviews this research this short lecture clip on Youtube:  youtube.com/watch?v=axrywDP9Ii0 

The upshot of this talk, in regard to animals, is that when we introduce variability into the equation in terms of the variable obtainment of the reward, the dopamine levels go even higher! This is why when we work a dog on a sequence of obedience commands and throw the ball only at the end, we are taking advantage of only a fraction of the dog’s possible drive, concentration, and engagement. If we varied the placement of the rewards throughout the obedience routine, we could expect twice the effort, concentration, and task orientation — or more! In practical terms as a trainer, in a routine with multiple tasks, the reward variation needs to span time and tasks, making rewarding only at the end of a sequence of tasks even more problematic. 

Imagine rewarding detection only on the last hide you put out, every single time. If I told you to start doing that in your training, rewarding only the very last final response on target odor, no matter how many training problems you put out, you would call me crazy. But good detection trainers know that if you always reward final response on target odor in a fixed ratio of 100%, it makes the behavior easier to extinguish by removing the reward and breaking the dog’s 100% expectation. So good detection trainers reward correct final responses variably, just like good obedience trainers do. Remember this saying: If it’s good enough for detection training, it should be how you train obedience. Variably reward your dog, and focus on each individual behavior. 

Let’s now circle back to the original question of how to approach reliable off-leash control. Here’s what we’ve learned: The work has to be properly cued to create anticipation of the reward, and the rewards need to be given variably. You need to set boundaries on the performance of the tasks so that the dog rehearses the entire task and doesn’t take shortcuts by self-rewarding on equipment on the ground, for example, while you are heeling. This means every session of training should mimic your deployments, and you should have a ritual that cues the work when you come out of the vehicle. It means you are creating habits, first by reward predictability as the training is building the behaviors. Once the tasks are being rewarded properly, and the dog is not deviating (you are not giving many corrections at all), move to spacing the rewards out and a variable reward system, or reward unpredictability, for the tasks you want performed at a high level. 

In all of your in-service training, you must keep the option to use compulsion to keep the dog on task. Don’t remove leashes and collars until required in certification trials and on certain kinds of deployments, and when you get back to training after certification or a deployment, immediately return to keeping on collars and leashes. But even more critical, always focus on rewards and reward variability to create the habit that is impossible to derail. Set an expectation of the quality of the behavior you want to have; use corrections if the dog understands the task but gets off task, and reward the dog for being on task in a variable way. If you come to training with no reward to give or remove your leash before you start giving commands, you are setting up for failure. You have an incorrect mindset. Off-leash control is created by habitually doing the same task (or task piece, such as the start of the track, middle of the track with turns and breaks and surface changes, and suspect encounter that can all be trained in components) to the same high standard the same way all the time. That task is rewarded with some degree of unpredictability to keep the animal working harder and harder in anticipation of getting the reward. This means off-leash control in obedience is taught by having leashes on and rewards at the ready as you systematically build strong habits. 

“If you come to training with no reward to give or remove your leash before you start giving commands, you are setting up for failure.”