question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Rebase TSV parser on CSV parser

See original GitHub issue

Overview

test.py

from dataflows import Flow, load


file_path = "/path/to/test.tsv"
flows = [load(file_path, name="res", format="tsv", skip_rows=["#"])]
print(Flow(*flows).results())

with file test.tsv:

#  This is a comment
#  
Lat	Lon
33.6062	-117.9312
33.6062	-117.9312
33.6062	-117.9312

I get the error:

Traceback (most recent call last):
  File "/home/conrad/.virtualenvs/laminar/lib/python3.8/site-packages/tabulator/stream.py", line 757, in __extract_sample
    row_number, headers, row = next(self.__parser.extended_rows)
  File "/home/conrad/.virtualenvs/laminar/lib/python3.8/site-packages/tabulator/parsers/tsv.py", line 65, in __iter_extended_rows
    for row_number, item in enumerate(items, start=1):
  File "/home/conrad/.virtualenvs/laminar/lib/python3.8/site-packages/tsv.py", line 51, in un
    if check_line_consistency(columns, values, i, error_bad_lines):
  File "/home/conrad/.virtualenvs/laminar/lib/python3.8/site-packages/tsv.py", line 84, in check_line_consistency
    raise ValueError(message)
ValueError: Expected 1 fields in line 3, saw 2

It seems like the TSV parser strictly sets the the number of fields allowed when it is initialized (https://github.com/frictionlessdata/tabulator-py/blob/master/tabulator/parsers/tsv.py#L63). Since the first item in this file is a comment with no tabs, it errors when a line shows up with a seemingly larger number of fields.

I would fall back to just using the CSV module and use \t as the delimiter (https://stackoverflow.com/questions/42358259/how-to-parse-tsv-file-with-python) but I keep getting the error "delimiter" must be a 1-character string - not sure if that a result of custom code or not.


Please preserve this line to notify @roll (lead of this repository)

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
cschloercommented, Oct 5, 2020

Just to follow back on this, I realized that some front end library I was using was changing “\t” to “\\t” before making the request to the server. Just a note that \t is now working, but it is still not possible to delimit on a mulitcharacter string.

0reactions
rollcommented, Sep 26, 2020
Read more comments on GitHub >

github_iconTop Results From Across the Web

proper options for parsing a TSV file, retrieving exact contents ...
reproduced issue with csv-parse version 4.8.8; parsing a TSV file, where fields may have single or double quotes inside each field; potentially ...
Read more >
Modify CSV Parser to work with TSV files C# - Stack Overflow
I have this code for parsing a CSV file ...
Read more >
Parsing CSV/TSV Files - TIBCO Product Documentation
TIBCO Clarity can parse the source data from a CSV or TSV file according to the configured parsing rules. Data in a comma-separated...
Read more >
csv-parser - npm
Streaming CSV parser that aims for maximum speed as well as compatibility with the csv-spectrum CSV acid test suite. csv-parser can convert CSV...
Read more >
Convert Microsoft Access MDB to CSV online - RebaseData
MDB file to CSV files online. No need to install Microsoft Access yourself. ... curl -F files[]=@database.mdb 'https://www.rebasedata.com/api/v1/convert?
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found