question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add abfss in the list of cloud protocols

See original GitHub issue

Description

abfss is not in the list of cloud protocols in the module kedro/io/core.py.

Context

I’m currently testing kedro on my project and I am facing an issue when trying to load an ExcelDataSet from an abfss cloud storage.

This works

import pandas as pd
pd.read_excel("abfss://container/path/to/excel/file.xlsx", engine="openpyxl")

This doesn’t

from kedro.extras.datasets.pandas import ExcelDataSet
dataset = ExcelDataSet(filepath="abfss://container/path/to/excel/file.xlsx", load_args={"engine": "openpyxl"})
dataset.load()

Possible Implementation

Simply add abfss to the list of cloud protocols.

Currently, in kedro/io/core.py, line 31:

CLOUD_PROTOCOLS = ("s3", "gcs", "gs", "adl", "abfs")

Update with

CLOUD_PROTOCOLS = ("s3", "gcs", "gs", "adl", "abfs", "abfss")

This solution worked with my issue.

Possible Alternatives

If adding abfss to the list of cloud protocols is not an option to fix this issue, I can provide more information on my issue.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
merelchtcommented, Mar 14, 2022

@datajoely We can definitely look at optimising the list of allowed protocols, but currently we have separation between “cloud” and “http” protocols, so it would require a bigger change to handle a generic list like the one above. Also, it doesn’t look like abfss is included, so we’d have to add that anyway.

0reactions
philominecommented, Mar 15, 2022

Thanks for your fast answer! I’ll open a PR then 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

Use the Azure Data Lake Storage Gen2 URI - Microsoft Learn
Scheme identifier: The abfs protocol is used as the scheme identifier. If you add an 's' at the end (abfss) then the ABFS...
Read more >
Azure Install Base Storage Layer ADLS Gen2
A comma-separated list of protocols that are permitted to read and write with ADLS Gen2 storage. NOTE: The protocol identifier "abfss" must ...
Read more >
Introduction to Azure Storage and the ABFS Connector
The Hadoop-Azure module provides support for Azure Data Lake Storage Gen2 storage layer through the abfs connector.
Read more >
Azure Cloud Storage protocol - SAP Help Portal
To open this window, open the Format tab in the Designer object library, right-click the File Locations category, and select New. The following...
Read more >
Access Azure Data Lake Storage Gen2 and Blob Storage
See Mounting cloud object storage on Databricks. ... data in an Azure storage account using OAuth 2.0 with an Azure Active Directory (Azure...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found