io.ascii guessing tries formats it doesn't need to
See original GitHub issueI made a file that is not a valid ASCII table for testing and tried to read it in using:
>>> from astropy.io.ascii import read
>>> read('test', delimiter=',')
I noticed that some of the formats that were tried are ones for which delimiter=','
shouldn’t be a valid option:
InconsistentTableError:
ERROR: Unable to guess table format with the guesses listed below:
Reader:Ecsv delimiter: ',' fast_reader: {'enable': True} fill_values: [('', '0')] strict_names: True
Reader:FixedWidthTwoLine delimiter: ',' fast_reader: {'enable': True} fill_values: [('', '0')] strict_names: True
Reader:RST delimiter: ',' fast_reader: {'enable': True} fill_values: [('', '0')] strict_names: True
Reader:FastBasic delimiter: ',' fast_reader: {'enable': True} fill_values: [('', '0')] strict_names: True
Reader:Basic delimiter: ',' fast_reader: {'enable': True} fill_values: [('', '0')] strict_names: True
Reader:FastRdb delimiter: ',' fast_reader: {'enable': True} fill_values: [('', '0')] strict_names: True
Reader:Rdb delimiter: ',' fast_reader: {'enable': True} fill_values: [('', '0')] strict_names: True
Reader:FastTab delimiter: ',' fast_reader: {'enable': True} fill_values: [('', '0')] strict_names: True
Reader:Tab delimiter: ',' fast_reader: {'enable': True} fill_values: [('', '0')] strict_names: True
Reader:Cds delimiter: ',' fast_reader: {'enable': True} fill_values: [('', '0')] strict_names: True
Reader:Daophot delimiter: ',' fast_reader: {'enable': True} fill_values: [('', '0')] strict_names: True
Reader:SExtractor delimiter: ',' fast_reader: {'enable': True} fill_values: [('', '0')] strict_names: True
Reader:Ipac delimiter: ',' fast_reader: {'enable': True} fill_values: [('', '0')] strict_names: True
Reader:Latex delimiter: ',' fast_reader: {'enable': True} fill_values: [('', '0')] strict_names: True
Reader:AASTex delimiter: ',' fast_reader: {'enable': True} fill_values: [('', '0')] strict_names: True
Reader:FastCommentedHeader delimiter: ',' fast_reader: {'enable': True} fill_values: [('', '0')] quotechar: '"' strict_names: True
Reader:FastCommentedHeader delimiter: ',' fast_reader: {'enable': True} fill_values: [('', '0')] quotechar: "'" strict_names: True
Reader:CommentedHeader delimiter: ',' fast_reader: {'enable': True} fill_values: [('', '0')] quotechar: '"' strict_names: True
Reader:CommentedHeader delimiter: ',' fast_reader: {'enable': True} fill_values: [('', '0')] quotechar: "'" strict_names: True
Reader:FastBasic delimiter: ',' fast_reader: {'enable': True} fill_values: [('', '0')] quotechar: '"' strict_names: True
Reader:FastBasic delimiter: ',' fast_reader: {'enable': True} fill_values: [('', '0')] quotechar: "'" strict_names: True
Reader:Basic delimiter: ',' fast_reader: {'enable': True} fill_values: [('', '0')] quotechar: '"' strict_names: True
Reader:Basic delimiter: ',' fast_reader: {'enable': True} fill_values: [('', '0')] quotechar: "'" strict_names: True
Reader:FastNoHeader delimiter: ',' fast_reader: {'enable': True} fill_values: [('', '0')] quotechar: '"' strict_names: True
Reader:FastNoHeader delimiter: ',' fast_reader: {'enable': True} fill_values: [('', '0')] quotechar: "'" strict_names: True
Reader:NoHeader delimiter: ',' fast_reader: {'enable': True} fill_values: [('', '0')] quotechar: '"' strict_names: True
Reader:NoHeader delimiter: ',' fast_reader: {'enable': True} fill_values: [('', '0')] quotechar: "'" strict_names: True
Reader:Basic delimiter: ',' fast_reader: {'enable': True} fill_values: [('', '0')]
************************************************************************
** ERROR: Unable to guess table format with the guesses listed above. **
** **
** To figure out why the table did not read, use guess=False and **
** fast_reader=False, along with any appropriate arguments to read(). **
** In particular specify the format and any known attributes like the **
** delimiter. **
************************************************************************
Specifically, Ipac
and Tab
shouldn’t be tried since delimiter=','
shouldn’t be a valid option for those? I thought maybe it was still skipping these internally but the read trace shows that they are actually executed:
{'kwargs': {'Reader': astropy.io.ascii.ipac.Ipac,
'delimiter': ',',
'fast_reader': {'enable': True},
'fill_values': [('', '0')],
'strict_names': True},
'status': 'ValueError: At least one header line beginning and ending with delimiter required',
'dt': '3.011 ms'},
There might be room for optimization here by checking whether the reader actually supports the specified options?
Issue Analytics
- State:
- Created 5 years ago
- Comments:5 (5 by maintainers)
Top Results From Across the Web
io.ascii can be surprisingly slow and inefficient? #1485 - GitHub
One minor performance recommendation is to always use guess=False when you know the file format. Otherwise it is automatically trying various ...
Read more >Fast ASCII I/O — Astropy v1.2.dev14793 - Read the Docs
By default read() will try to guess the format of in the input data by successively trying different formats until one succeeds ([reference...
Read more >Reading Tables — Astropy v5.2
Guessing the file format is often slow for large files because the reader tries parsing the file with every allowed format until one...
Read more >Fast ASCII I/O — Astropy v1.0.4
By default read() will try to guess the format of in the input data by successively trying different formats until one succeeds ([reference...
Read more >asciitable Documentation - Read the Docs
Asciitable can read and write a wide range of ASCII table formats via ... guess: try to guess table format (default=True) If set...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
We really don’t know either, since we don’t count which function is called how often by which user. You’ll also notice that he most specific (ESCV, LATEX) readers are on the top. Those should fail fast, while the generic readers that read almost anything are at the bottom (That’s because a table that is ECSV will almost certainly be ECSV, while a table that the CSV can parse could also be e.g. a Daophot table.)
All in all, this is a complex issue where a lot of obscure corner cases will come up. I’m not saying @bhavyakh can’t do it - but it’s probably much more involved than it sounds at first!
Yes! I tried looking at some tests and the reader functions, I got a vague idea of how many issues can come up. I totally understand your point.