Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Rebase TSV parser on CSV parser

See original GitHub issue

Overview

test.py

from dataflows import Flow, load


file_path = "/path/to/test.tsv"
flows = [load(file_path, name="res", format="tsv", skip_rows=["#"])]
print(Flow(*flows).results())

with file test.tsv:

#  This is a comment
#  
Lat	Lon
33.6062	-117.9312
33.6062	-117.9312
33.6062	-117.9312

I get the error:

Traceback (most recent call last):
  File "/home/conrad/.virtualenvs/laminar/lib/python3.8/site-packages/tabulator/stream.py", line 757, in __extract_sample
    row_number, headers, row = next(self.__parser.extended_rows)
  File "/home/conrad/.virtualenvs/laminar/lib/python3.8/site-packages/tabulator/parsers/tsv.py", line 65, in __iter_extended_rows
    for row_number, item in enumerate(items, start=1):
  File "/home/conrad/.virtualenvs/laminar/lib/python3.8/site-packages/tsv.py", line 51, in un
    if check_line_consistency(columns, values, i, error_bad_lines):
  File "/home/conrad/.virtualenvs/laminar/lib/python3.8/site-packages/tsv.py", line 84, in check_line_consistency
    raise ValueError(message)
ValueError: Expected 1 fields in line 3, saw 2

It seems like the TSV parser strictly sets the the number of fields allowed when it is initialized (https://github.com/frictionlessdata/tabulator-py/blob/master/tabulator/parsers/tsv.py#L63). Since the first item in this file is a comment with no tabs, it errors when a line shows up with a seemingly larger number of fields.

I would fall back to just using the CSV module and use \t as the delimiter (https://stackoverflow.com/questions/42358259/how-to-parse-tsv-file-with-python) but I keep getting the error "delimiter" must be a 1-character string - not sure if that a result of custom code or not.

Please preserve this line to notify @roll (lead of this repository)

Issue Analytics

State:
Created 3 years ago
Comments:6 (6 by maintainers)

Top GitHub Comments

1reaction

cschloercommented, Oct 5, 2020

Just to follow back on this, I realized that some front end library I was using was changing “\t” to “\\t” before making the request to the server. Just a note that \t is now working, but it is still not possible to delimit on a mulitcharacter string.

0reactions

rollcommented, Sep 26, 2020

MEGED into https://github.com/frictionlessdata/frictionless-py/issues/398

Top Results From Across the Web

proper options for parsing a TSV file, retrieving exact contents ...

reproduced issue with csv-parse version 4.8.8; parsing a TSV file, where fields may have single or double quotes inside each field; potentially ...

Modify CSV Parser to work with TSV files C# - Stack Overflow

I have this code for parsing a CSV file ...

Parsing CSV/TSV Files - TIBCO Product Documentation

TIBCO Clarity can parse the source data from a CSV or TSV file according to the configured parsing rules. Data in a comma-separated...

csv-parser - npm

Streaming CSV parser that aims for maximum speed as well as compatibility with the csv-spectrum CSV acid test suite. csv-parser can convert CSV...

Convert Microsoft Access MDB to CSV online - RebaseData

MDB file to CSV files online. No need to install Microsoft Access yourself. ... curl -F files[]=@database.mdb 'https://www.rebasedata.com/api/v1/convert?