Inaccessible file path for data catalog fails silently
See original GitHub issueWhen trying to load a catalog yml file hosted S3, missing credentials will silently lead to an empty catalog. So the user executes,
In [1]: from intake import Catalog
In [2]: cat = Catalog("s3://foo/bar.yml")
In [3]: cat.yaml()
Out[3]: "sources:\n bar:\n args:\n path: s3://foo/bar.yml\n description: ''\n driver: yaml_files_cat\n metadata: {}\n"
and it’s not obvious why there are no sources.
For the case of s3 specifically, a quick check looks like:
def verify_credentials(cat_uri):
"""
FIXME copied from data-layer-api
The intake catalog will appear empty when trying to read a catalog yml
file on s3 if the credentials are not supplied correctly.
"""
if "s3:" not in cat_uri:
return True
fs = s3fs.S3FileSystem()
cat_exists = fs.exists(cat_uri)
if not cat_exists:
err_msg = "Catalog file not found at {}. Check AWS credentials.".format(cat_uri)
raise EnvironmentError(err_msg)
There are probably issues related to this as well if trying to open a local yml file in a directory that doesn’t exist.
Issue Analytics
- State:
- Created 5 years ago
- Comments:7 (4 by maintainers)
Top Results From Across the Web
Troubleshooting errors in AWS Glue
If AWS Glue fails to successfully run a crawler to catalog your data, it might be because of one of the following reasons....
Read more >How to fix the error «The file or directory is corrupted and ...
Read ⭐ this article to find out how to fix errors and recover lost data from damaged or unreadable sectors. Unexpected errors in...
Read more >Troubleshoot Dataflow errors | Google Cloud
Invalid table specification in Data Catalog. This error occurs if the Dataflow service account doesn't have access to the Data Catalog API. To...
Read more >Release Notes (10.5.2) - Informatica Documentation
Power Query processing fails for a report that you create from multiple Amazon S3 data sets belonging to different file paths. For example:....
Read more >"Windows cannot access the specified device, path, or file ...
Troubleshooting error message: Windows cannot access the specified device, path, or file. You may not have the appropriate permission to access the item....
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I think there might be an unnecessary linkage between the handling of a catalog directory and a catalog file. A catalog directory might be initially empty and have new files appear during operation (like conda installing a data package while notebook is running). A catalog file, local or remote, should exist when the catalog object for a file is created, otherwise simple typos in filenames are very hard to catch.
Indeed, that could be the way to do it, @stsievert , have different defaults for allow_empty for the two file Catalog classes, or raise an exception in YAMLFileCatalog._load if nothing is found (and not change anything elsewhere). The latter is easier to implement.