Request for improved handling of decode errors during builds.
See original GitHub issueHi,
Recently, my team encountered an error during a docker build
where we hit the docker build log limit, and the output was clipped; this resulted in a random non-utf-8 character being produced. The log limit non-deterministically clipped off content in the middle of a UTF-8 Character. In our logs, there was a UTF-8 Character ➤ present; this is made up of 3 bytes (see Find all Unicode Characters from Hieroglyphs to Dingbats – Unicode Compart ) 0xE2 0x9E 0xA4. the 1MiB log clipping terminates the output at 0xE2 0x9E. It took us some time and troubleshooting to come to this finding.
We are asking if there could be more graceful handling of these decoding errors implemented as we believe that the current handling covered up the log limit error and produced a stack trace with the not-so-helpful UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 10-11: invalid continuation byte
Thank you!
Issue Analytics
- State:
- Created a year ago
- Reactions:2
- Comments:9 (7 by maintainers)
Top GitHub Comments
It might be better to use one of the other decode error handling options, e.g. ignore errors instead of raising https://docs.python.org/3/library/codecs.html#error-handlers. Or could explicitly handle cases where decode errors are ‘expected’ by checking it occurs at the end of the string and just trimming the end off.
I’m planning to do it on Thursday 😃