question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Inaccessible file path for data catalog fails silently

See original GitHub issue

When trying to load a catalog yml file hosted S3, missing credentials will silently lead to an empty catalog. So the user executes,

In [1]: from intake import Catalog

In [2]: cat = Catalog("s3://foo/bar.yml")

In [3]: cat.yaml()
Out[3]: "sources:\n  bar:\n    args:\n      path: s3://foo/bar.yml\n    description: ''\n    driver: yaml_files_cat\n    metadata: {}\n"

and it’s not obvious why there are no sources.

For the case of s3 specifically, a quick check looks like:

def verify_credentials(cat_uri):
    """
    FIXME copied from data-layer-api
    The intake catalog will appear empty when trying to read a catalog yml
    file on s3 if the credentials are not supplied correctly.
    """
    if "s3:" not in cat_uri:
        return True

    fs = s3fs.S3FileSystem()
    cat_exists = fs.exists(cat_uri)
    if not cat_exists:
        err_msg = "Catalog file not found at {}. Check AWS credentials.".format(cat_uri)
        raise EnvironmentError(err_msg)

There are probably issues related to this as well if trying to open a local yml file in a directory that doesn’t exist.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
seibertcommented, Sep 13, 2018

I think there might be an unnecessary linkage between the handling of a catalog directory and a catalog file. A catalog directory might be initially empty and have new files appear during operation (like conda installing a data package while notebook is running). A catalog file, local or remote, should exist when the catalog object for a file is created, otherwise simple typos in filenames are very hard to catch.

0reactions
martindurantcommented, Sep 13, 2018

Indeed, that could be the way to do it, @stsievert , have different defaults for allow_empty for the two file Catalog classes, or raise an exception in YAMLFileCatalog._load if nothing is found (and not change anything elsewhere). The latter is easier to implement.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshooting errors in AWS Glue
If AWS Glue fails to successfully run a crawler to catalog your data, it might be because of one of the following reasons....
Read more >
How to fix the error «The file or directory is corrupted and ...
Read ⭐ this article to find out how to fix errors and recover lost data from damaged or unreadable sectors. Unexpected errors in...
Read more >
Troubleshoot Dataflow errors | Google Cloud
Invalid table specification in Data Catalog. This error occurs if the Dataflow service account doesn't have access to the Data Catalog API. To...
Read more >
Release Notes (10.5.2) - Informatica Documentation
Power Query processing fails for a report that you create from multiple Amazon S3 data sets belonging to different file paths. For example:....
Read more >
"Windows cannot access the specified device, path, or file ...
Troubleshooting error message: Windows cannot access the specified device, path, or file. You may not have the appropriate permission to access the item....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found