question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

IMP - TFX cant handle a simple quoted CSV

See original GitHub issue

System information

  • Have I specified the code to reproduce the issue (Yes, No): Yes
  • Environment in which the code is executed (e.g., Local(Linux/MacOS/Windows), Interactive Notebook, Google Cloud, etc): Colab
  • TensorFlow version: 2.5
  • TFX Version: 0.30
  • Python version: 3.8
  • Python dependencies (from pip freeze output): tfx 0.30

Describe the current behavior Upon using a quoted CSV file, the CSVExampleGen component considers commas placed inside quotes( “abcde , abcde” ) as a column change. Most of the TF functions like TF.Dataset already have capabilities( use_quote_delim ) to handle this but TFX can’t handle such a basic thing ?

It just crashes with this error: Columns do not match specified csv headers

See example below: Columns do not match specified csv headers: ['drugName', 'condition', 'review', 'rating', 'usefulCount'] -> [b'I did get a Rx for the pilocarpine and then switched to Evoxac. At first I could not perceive a benefit of this drug', b' but now I can tell when my dose (every eight hours) has worn off.'] [while running 'InputToRecord/InferColumnTypes/KeyWithVoid']

Example CSV file: https://raw.githubusercontent.com/rafiqhasan/AI_DL_ML_Repo/master/Datasets/uci_drugs_nlp/eval.csv

Describe the expected behavior: Absolutely no error should be raised, its a simple quoted CSV.

Standalone code to reproduce the issue

from tfx.utils.dsl_utils import external_input
CsvExampleGen(input=external_input(<path to any quote delimited CSV file>))

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
arghyagangulycommented, Jul 13, 2021

@rafiqhasan ,An internal PR has been raised and is currently under review.

0reactions
google-ml-butler[bot]commented, Nov 18, 2021

Are you satisfied with the resolution of your issue? Yes No

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to deal with csv file whoese fields are enclosed by quote?
TeX can be instructed to ignore a character, by assigning it the category code 9. This will not affect usages of \" in...
Read more >
Bulk Insert Partially Quoted CSV File in SQL Server
Unfortunately SQL Server interprets the quoted comma as a delimiter. This applies to both BCP and bulk insert .
Read more >
How to fix common CSV data formatting issues
FIX: Open the CSV file using any text editor like Notepad++ (https://notepad-plus-plus.org/) and just replace semi-colons with commas or vice- ...
Read more >
TFX: CsvExampleGen does not work with simply example ...
I am trying out TFX pipelines, and the first step is to ingest data. In this simple example I am ingesting data from...
Read more >
Bulk Insert from csv having issues with double quotes
BCP cannot handle double quotes sometimes and not other times. The string fields must be ALL double quoted all the time. The only...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found