Rebase TSV parser on CSV parser
See original GitHub issueOverview
test.py
from dataflows import Flow, load
file_path = "/path/to/test.tsv"
flows = [load(file_path, name="res", format="tsv", skip_rows=["#"])]
print(Flow(*flows).results())
with file test.tsv:
# This is a comment
#
Lat Lon
33.6062 -117.9312
33.6062 -117.9312
33.6062 -117.9312
I get the error:
Traceback (most recent call last):
File "/home/conrad/.virtualenvs/laminar/lib/python3.8/site-packages/tabulator/stream.py", line 757, in __extract_sample
row_number, headers, row = next(self.__parser.extended_rows)
File "/home/conrad/.virtualenvs/laminar/lib/python3.8/site-packages/tabulator/parsers/tsv.py", line 65, in __iter_extended_rows
for row_number, item in enumerate(items, start=1):
File "/home/conrad/.virtualenvs/laminar/lib/python3.8/site-packages/tsv.py", line 51, in un
if check_line_consistency(columns, values, i, error_bad_lines):
File "/home/conrad/.virtualenvs/laminar/lib/python3.8/site-packages/tsv.py", line 84, in check_line_consistency
raise ValueError(message)
ValueError: Expected 1 fields in line 3, saw 2
It seems like the TSV parser strictly sets the the number of fields allowed when it is initialized (https://github.com/frictionlessdata/tabulator-py/blob/master/tabulator/parsers/tsv.py#L63). Since the first item in this file is a comment with no tabs, it errors when a line shows up with a seemingly larger number of fields.
I would fall back to just using the CSV module and use \t
as the delimiter (https://stackoverflow.com/questions/42358259/how-to-parse-tsv-file-with-python) but I keep getting the error "delimiter" must be a 1-character string
- not sure if that a result of custom code or not.
Please preserve this line to notify @roll (lead of this repository)
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (6 by maintainers)
Top Results From Across the Web
proper options for parsing a TSV file, retrieving exact contents ...
reproduced issue with csv-parse version 4.8.8; parsing a TSV file, where fields may have single or double quotes inside each field; potentially ...
Read more >Modify CSV Parser to work with TSV files C# - Stack Overflow
I have this code for parsing a CSV file ...
Read more >Parsing CSV/TSV Files - TIBCO Product Documentation
TIBCO Clarity can parse the source data from a CSV or TSV file according to the configured parsing rules. Data in a comma-separated...
Read more >csv-parser - npm
Streaming CSV parser that aims for maximum speed as well as compatibility with the csv-spectrum CSV acid test suite. csv-parser can convert CSV...
Read more >Convert Microsoft Access MDB to CSV online - RebaseData
MDB file to CSV files online. No need to install Microsoft Access yourself. ... curl -F files[]=@database.mdb 'https://www.rebasedata.com/api/v1/convert?
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Just to follow back on this, I realized that some front end library I was using was changing “\t” to “\\t” before making the request to the server. Just a note that \t is now working, but it is still not possible to delimit on a mulitcharacter string.
MEGED into https://github.com/frictionlessdata/frictionless-py/issues/398