question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

120 second teardown after command with excessively long output

See original GitHub issue

The condition is a command that takes an excessively long time to complete while providing a continuous stream of data during execution. In this example, the command is ping 192.168.42.1 size 18024 repeat 2147483647. The read_timeout, adjusted to 2 seconds, works as expected. During the teardown the call to check_config_mode, which in turn calls read_channel_timing, introduces a 120 second delay. This is due to read_channel_timing’s default read_timeout value of 120.0.

import os
import time

import netmiko

def send_command_with_timeout_context(command: str, read_timeout: int):
    pre_start = time.time()
    with netmiko.ConnectHandler(
        host="192.168.42.254",
        device_type="cisco_ios",
        username=os.getenv("USERNAME"),
        password=os.getenv("PASSWORD"),
    ) as session:
        try:
            error = None
            start = time.time()
            session.send_command(command, read_timeout=read_timeout)
        except netmiko.ReadTimeout:
            error = 'timeout'
        finally:
            pre_finish = time.time()
    finish = time.time()
    print(f"{command=} {read_timeout=} setup={start-pre_start:.02f} execution={pre_finish-start:.02f} teardown={finish-pre_finish:.02f} {error=}")

send_command_with_timeout_context("ping 192.168.42.1 size 18024 repeat 2147483647", 2)

The result:

command='ping 192.168.42.1 size 18024 repeat 2147483647' read_timeout=2 setup=0.78 execution=2.05 teardown=120.01 error='timeout'

A lower read_timeout value of 2.0 (chosen to match the default value of last_read) when called from check_config_mode would solve this issue, and in normal conditions the results from the carriage return sent shouldn’t require more than 2 seconds. Attaching pull request. Willing to adjust if there’s more that needs to be taken into consideration.

The result after adjusting the read_timeout:

command='ping 192.168.42.1 size 18024 repeat 2147483647' read_timeout=2 setup=0.73 execution=2.05 teardown=2.00 error='timeout'

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
ktbyerscommented, Apr 22, 2022

Okay, makes sense.

I still would want it to be 5 to 10 seconds in this particular failing case as well as in general.

In other words, you do really want to gracefully exit the device here (and the context is a fairly abnormal one i.e. the CLI is totally messed-up due to previously long running command). In other words, it is much closer to the 1% case than the other 99% of cases where you just check_config_mode() normally and then exit the SSH session.

Yeah, the teardown is an issue and I will tweak the check_config_mode() downward to probably 5 to 10 seconds.

You really shouldn’t see the teardown time (in your measurement) unless you are doing your timing measurement wrong though. In other words, proper measurement of the time requires that you place the end time measurement as close as possible to when send_command() has failed which would be inside the finally clause like you did here:

        finally:
            pre_finish = time.time()

I am not sure there is much we can do about that i.e. if someone is measuring read_timeout and does the measurement wrong.

I am not particularly concerned about a 10 second disconnect time (in a rare and odd situation). In other words, reliability in the 99% case matters a lot more.

1reaction
zohassadarcommented, Apr 22, 2022

The purpose is to better understand the behavior of the new read_timeout feature. I need to replace several instances of max_loops and delay_factor at some point, most of which were implemented when I didn’t understand what was going on under the hood at all. I was following along with this blog post when I came across the issue of the delay in the context manager exit.

A better example would be a valid command that would complete in a reasonable amount of time if given the chance and leaving everything else default, similar to the show tech example in the blog. With my 3750 running ipbase, I’m not aware of any long running commands, but 5000 large pings could be the stand-in for a large ‘show tech’, as that takes about 80 seconds to complete on my network.

command='ping 192.168.42.1 size 18024 repeat 5000' read_timeout=20 setup=0.73 execution=20.06 teardown=60.30 error='timeout'

During the 80 seconds that it took to complete, 20 seconds were spent in send_command and the remaining 60 seconds were spent in read_channel_timing due to the teardown. From a beginner user perspective without any idea of what’s happening, I would think that read_timeout was broken, when it’s working exactly as it’s supposed to.

As check_config_mode is called for reasons other than just gracefully shutting down the connection, I can see why 2 seconds wouldn’t work for all cases. Another approach may be to distinguish calls to check_config_mode with a flag that indicates that it’s being called as part of the cleanup process and not as a precursor to entering config mode. The flag can then be used to specify a much more aggressive timing strategy during cleanup.

Read more comments on GitHub >

github_iconTop Results From Across the Web

doctest — Test interactive Python examples — Python 3.11.1 ...
This is called after running the tests in each file. The tearDown function will be passed a DocTest object. The setUp function can...
Read more >
The HTTP request to the remote WebDriver timed out after 60 ...
When running a basic script which just created a FF driver and nothing else, a 60 second timeout worked 100% of the time....
Read more >
Troubleshooting Bitbucket Pipelines - Atlassian Documentation
Troubleshooting Steps: Check if there are time consuming processes in the Pipeline build. You can execute the command date +"%T" before and ...
Read more >
pyperf commands — pyperf 2.6.0 documentation
If for some reasons, pyperf program cannot be used, python3 -m pyperf ... can be used: it is the same, it's just longer...
Read more >
Alienware m15 R6 Review - Too Many Problems! - YouTube
The Alienware m15 R6 is a fairly nice gaming laptop, but there are just too many small problems that really add up.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found