IMP - TFX cant handle a simple quoted CSV
See original GitHub issueSystem information
- Have I specified the code to reproduce the issue (Yes, No): Yes
- Environment in which the code is executed (e.g., Local(Linux/MacOS/Windows), Interactive Notebook, Google Cloud, etc): Colab
- TensorFlow version: 2.5
- TFX Version: 0.30
- Python version: 3.8
- Python dependencies (from
pip freeze
output): tfx 0.30
Describe the current behavior
Upon using a quoted CSV file, the CSVExampleGen component considers commas placed inside quotes( “abcde , abcde” ) as a column change. Most of the TF functions like TF.Dataset already have capabilities( use_quote_delim
) to handle this but TFX can’t handle such a basic thing ?
It just crashes with this error: Columns do not match specified csv headers
See example below:
Columns do not match specified csv headers: ['drugName', 'condition', 'review', 'rating', 'usefulCount'] -> [b'I did get a Rx for the pilocarpine and then switched to Evoxac. At first I could not perceive a benefit of this drug', b' but now I can tell when my dose (every eight hours) has worn off.'] [while running 'InputToRecord/InferColumnTypes/KeyWithVoid']
Example CSV file: https://raw.githubusercontent.com/rafiqhasan/AI_DL_ML_Repo/master/Datasets/uci_drugs_nlp/eval.csv
Describe the expected behavior: Absolutely no error should be raised, its a simple quoted CSV.
Standalone code to reproduce the issue
from tfx.utils.dsl_utils import external_input
CsvExampleGen(input=external_input(<path to any quote delimited CSV file>))
Issue Analytics
- State:
- Created 2 years ago
- Comments:8 (4 by maintainers)
Top GitHub Comments
@rafiqhasan ,An internal PR has been raised and is currently under review.
Are you satisfied with the resolution of your issue? Yes No