inconsistent with bz2.open on files containing vertical tab ^K
See original GitHub issueProblem description
Be sure your description clearly answers the following questions:
- What are you trying to achieve? Trying to use smart_open to replace bz2.open
- What is the expected result? same behavior as bz2.open wrt recognizing line breaks
- What are you seeing instead? a long line got truncated due to the presence of non-line break symbol ^K
Steps/code to reproduce the problem
In order for us to be able to solve your problem, we have to be able to reproduce it on our end. Without reproducing the problem, it is unlikely that we’ll be able to help you.
Include full tracebacks, logs and datasets if necessary. Please keep the examples minimal (minimal reproducible example).
take for instance the following binary uncompressed text. compress with bz2. The numbers of columns before and after bz2 as recognize with smart_open(…).readline() are different.
\xe5\x93\x81\x0b\xe3\x80\n
Versions
Please provide the output of:
import platform, sys, smart_open
print(platform.platform())
print("Python", sys.version)
print("smart_open", smart_open.__version__)
print(“smart_open”, smart_open.version) Traceback (most recent call last): File “<stdin>”, line 1, in <module> AttributeError: module ‘smart_open’ has no attribute ‘version’
Instead
pip show smart_open Name: smart-open Version: 1.7.1
Checklist
Before you create the issue, please make sure you have:
- Described the problem clearly
- Provided a minimal reproducible example, including any required data
- Provided the version numbers of the relevant software
Issue Analytics
- State:
- Created 4 years ago
- Comments:6 (2 by maintainers)
I think this is a duplicate of https://github.com/RaRe-Technologies/smart_open/issues/269
Please post a minimal reproducible example, including any required data.