Add abfss in the list of cloud protocols
See original GitHub issueDescription
abfss is not in the list of cloud protocols in the module kedro/io/core.py.
Context
I’m currently testing kedro on my project and I am facing an issue when trying to load an ExcelDataSet from an abfss cloud storage.
This works
import pandas as pd
pd.read_excel("abfss://container/path/to/excel/file.xlsx", engine="openpyxl")
This doesn’t
from kedro.extras.datasets.pandas import ExcelDataSet
dataset = ExcelDataSet(filepath="abfss://container/path/to/excel/file.xlsx", load_args={"engine": "openpyxl"})
dataset.load()
Possible Implementation
Simply add abfss to the list of cloud protocols.
Currently, in kedro/io/core.py, line 31:
CLOUD_PROTOCOLS = ("s3", "gcs", "gs", "adl", "abfs")
Update with
CLOUD_PROTOCOLS = ("s3", "gcs", "gs", "adl", "abfs", "abfss")
This solution worked with my issue.
Possible Alternatives
If adding abfss to the list of cloud protocols is not an option to fix this issue, I can provide more information on my issue.
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (5 by maintainers)
Top Results From Across the Web
Use the Azure Data Lake Storage Gen2 URI - Microsoft Learn
Scheme identifier: The abfs protocol is used as the scheme identifier. If you add an 's' at the end (abfss) then the ABFS...
Read more >Azure Install Base Storage Layer ADLS Gen2
A comma-separated list of protocols that are permitted to read and write with ADLS Gen2 storage. NOTE: The protocol identifier "abfss" must ...
Read more >Introduction to Azure Storage and the ABFS Connector
The Hadoop-Azure module provides support for Azure Data Lake Storage Gen2 storage layer through the abfs connector.
Read more >Azure Cloud Storage protocol - SAP Help Portal
To open this window, open the Format tab in the Designer object library, right-click the File Locations category, and select New. The following...
Read more >Access Azure Data Lake Storage Gen2 and Blob Storage
See Mounting cloud object storage on Databricks. ... data in an Azure storage account using OAuth 2.0 with an Azure Active Directory (Azure...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

@datajoely We can definitely look at optimising the list of allowed protocols, but currently we have separation between “cloud” and “http” protocols, so it would require a bigger change to handle a generic list like the one above. Also, it doesn’t look like
abfssis included, so we’d have to add that anyway.Thanks for your fast answer! I’ll open a PR then 😃