question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Comments in data file break parsing

See original GitHub issue
  • Operating System: Mac OS X
  • Node Version: 10.15.0
  • NPM Version: 6.4.1
  • csv-parser Version: 2.1.0

Expected Behavior

USGS files should be parsed, but instead their leading hash tag comments break the parsing. Simple data file and code is pasted below.

I think skiplines might let me ignore those leading hash lines, but question: what if the number of hashed comments changes (e.g. when government shutdown finally stops, hopefully) Anyway to ignore/filter the hashtag lines?

[ Row { agency_cd: '5s', site_no: '15s', datetime: '20d', tz_cd: '6s', '174907_72019': '14n', '174907_72019_cd': '10s' }, Row { agency_cd: 'USGS', site_no: '174237064474900', datetime: '2019-01-21 00:00', tz_cd: 'AST', '174907_72019': '14.09', '174907_72019_cd': 'P' }, Row { agency_cd: 'USGS', site_no: '174237064474900', datetime: '2019-01-21 01:00', tz_cd: 'AST', '174907_72019': '14.09', '174907_72019_cd': 'P' }, Row { agency_cd: 'USGS', site_no: '174237064474900', datetime: '2019-01-21 02:00', tz_cd: 'AST', '174907_72019': '14.07', '174907_72019_cd': 'P' }, Row { agency_cd: 'USGS', site_no: '174237064474900', datetime: '2019-01-21 03:00', tz_cd: 'AST', '174907_72019': '14.07', '174907_72019_cd': 'P' }, Row { agency_cd: 'USGS', site_no: '174237064474900', datetime: '2019-01-21 04:00', tz_cd: 'AST', '174907_72019': '14.07', '174907_72019_cd': 'P' } ]

Actual Behavior

[ Row { '# ---------------------------------- WARNING ----------------------------------------': '# Some of the data that you have obtained from this U.S. Geological Survey database' }, Row { '# ---------------------------------- WARNING ----------------------------------------': '# may not have received Director\'s approval. Any such data values are qualified' }, Row { '# ---------------------------------- WARNING ----------------------------------------': '# as provisional and are subject to revision. Provisional data are released on the' }, Row { '# ---------------------------------- WARNING ----------------------------------------': '# condition that neither the USGS nor the United States Government may be held liable' }, Row { '# ---------------------------------- WARNING ----------------------------------------': '# for any damages resulting from its use.' }, Row { '# ---------------------------------- WARNING ----------------------------------------': '#' }, Row { '# ---------------------------------- WARNING ----------------------------------------': 'agency_cd' }, Row { '# ---------------------------------- WARNING ----------------------------------------': '5s' }, Row { '# ---------------------------------- WARNING ----------------------------------------': 'USGS' }, Row { '# ---------------------------------- WARNING ----------------------------------------': 'USGS' }, Row { '# ---------------------------------- WARNING ----------------------------------------': 'USGS' }, Row { '# ---------------------------------- WARNING ----------------------------------------': 'USGS' }, Row { '# ---------------------------------- WARNING ----------------------------------------': 'USGS' } ]

How Do We Reproduce?

Super simple test data file: https://gist.githubusercontent.com/mishawagon/d8047ae5aaf29e63bcf1b6348318b7a4/raw/c9ddff257eef6ba430e6024bdea8813c00dd7f2a/USGS%2520Test%2520file

Super simple node.js program to parse it: https://gist.githubusercontent.com/mishawagon/6ff68b94c87c8e3fddd703c92a92e4af/raw/afc6b0858f4bf2e452beffc03d00d7df2d7dd922/gistfile1.txt

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:6

github_iconTop GitHub Comments

2reactions
mishawagoncommented, Feb 27, 2019

Thank you very much and totally amazing how you can do this while daddying. Truly you are a gentleperson and a scholar. Thanks again

1reaction
mishawagoncommented, Feb 12, 2019

I realized I never thanked you for taking the time to look into this, thanks again!

Read more comments on GitHub >

github_iconTop Results From Across the Web

How can I split a text file based on comment blocks in Python?
Use groupby . from itertools import groupby def contains_data(ln): # just an example; there are smarter ways to do this return ln[0] not...
Read more >
How to add comments to JSON files - Pinter Computing
JSON was designed to be a data only format, and deliberately does not support comments to avoid parsing directives which could break ......
Read more >
Parse CSV with Comments in file - Power Platform Community
Conceptually my thought is to break the array into individual arrays ("child arrays"), find the object with a quotation, removing the quotations ...
Read more >
3 Ways to Read a File and Skip Initial Comments in Python
A naive way to read a file and skip initial comment lines is to use “if” statement and check if each line starts...
Read more >
pandas.read_table — pandas 1.5.2 documentation
Character to break file into lines. Only valid with C parser. quotecharstr (length 1), optional. The character used to denote the start and...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found