Long log lines slow down the runner
See original GitHub issueDescribe the bug Outputting very long log lines seems to cause the runner to slow down significantly. A step that takes less than a second to actually execute can easily take 10s of minutes or even hours to be considered complete, often hitting timeouts.
https://github.com/hamishforbes/actions-test This repo has a super simple python script that outputs the current timestamp and “a” repeated for various lengths.
Turn on timestamps and you can see a discrepancy. Once a log message gets up to about 6k characters it looks like the runner starts to take a few seconds to process. At 100k characters this can be 10 minutes or so
Running the script locally on a macbook pro, even with 100k characters, completes within a couple hundred ms.
Runner timestamp | Python timestamp
Tue, 30 Mar 2021 01:34:19 GMT | 2021-03-30 01:34:16.831361 - Message length: 10240 - aaaaa{snip}
Tue, 30 Mar 2021 01:34:30 GMT | 2021-03-30 01:34:16.831413 - Message length: 20480 - aaaaa{snip}
To Reproduce Copy script and workflow from above repo 😃
Expected behavior
I would expect the runner to be able to handle long log lines even if truncating them.
At the very least it shouldn’t delay how long the step takes to run, the real world workflow that caused this was completing its test suite in < 10 mins but the step would never finish in actions because it would hit timeouts
Runner Version and Platform
Github hosted but tested with the latest self-hosted runner as well
Seems to affect both steps executed in docker and those directly on ubuntu-latest
, although slightly less severe.
https://github.com/hamishforbes/actions-test/tree/no_docker
Issue Analytics
- State:
- Created 2 years ago
- Reactions:4
- Comments:7 (5 by maintainers)
Top GitHub Comments
I figured it would be something like that, would it be reasonable to set a max log length at some performant level (e.g. 10 or 20K chars) and truncate to that length before scanning?
No, its essentially instant
I’m here because it happened in the real world! 😄
We have some tests around maximum message lengths in our application, this involves sending very large messages. You could argue that it’s our fault for logging the full message and we should truncate in the application, and disabling that log message is the workaround i’ve put in place.
But I only discovered what was causing the problem was by downloading the full log archive where i eventually noticed the timestamp discrepancy, after several days of tinkering, tuning, deploying our own runners with more resources etc
This test suite is currently being run on CircleCI with no issues and it would’ve been very easy to just give up and say github actions is broken for us. A warning that log lines have been truncated or even that long lines were detected and could cause slowdowns would’ve made troubleshooting much easier and make the whole experience of switching to actions far smoother
It would be nice to do a performance test to see if we can do anything or provide guidance but outside of that we would start with recommending reducing the number of characters you output.
We will have to stack this priority against other features/bugs (right now it seems lower priority than other work we can do).