question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add ability for all operators to interact with storages of AWS/GCP/AZURE

See original GitHub issue

Description

Currently every operator is doing XToY. so if someone wrote MySQLToGCS it doesn’t help someone who needs MySQLToS3.

Use case / motivation

It would be great if when PR is raised people will need only to handle the X part and provide a list of the Y part. Something like people need to write XtoDataframe or XtoFile and there is build in integration in Airflow that can handle the FileToS3 FileToGCS etc…

So when user is PR MySQLToFile Airflow will utilise this and auto create MySQLToGCS and MySQLToS3.

The idea is to build infrastructure layer once that will be automated for all.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
jrderuitercommented, Dec 12, 2020

Would be interesting to implement this using fsspec (https://filesystem-spec.readthedocs.io/en/latest/). Such an implementation would provide an interface to all filesystems supported by fsspec, which includes Azure Blob/GDL2, S3 and GCS. You also get others like FTP and SFTP for free too (https://filesystem-spec.readthedocs.io/en/latest/api.html#implementations).

1reaction
dinigocommented, May 11, 2020

@turbaszek I see how this is related, and I face this question frequently. But I think #8059 depends on having a common FS interface like #3526 sugests.

Right now we have (I might miss some) the following file providers: S3, GCS, Azure, SFTP/SSH, FTP, Samba. If we want to allow all operations between them it means it’s N^2 operators. It’s 36 operators. And will be more as other cloud providers are added (Alibaba, DO, …). I’ll give a look at the PR @ashb , thanks.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Compare AWS and Azure services to Google Cloud
Service category Service type Google Cloud product App modernization CI/CD Cloud Build App modernization CI/CD Google Cloud Deploy App modernization Execution Control Cloud Tasks
Read more >
Cloud storage 101: NAS file storage on AWS, Azure and GCP
We look at NAS file storage options in AWS, Azure and Google Cloud. All three offer native- and NetApp-based file storage with Azure...
Read more >
Azure Archive Storage – Data Management
Azure Archive Storage provides a low cost means of delivering durable, highly available, secure cloud storage and data management for rarely accessed data....
Read more >
Integration — Airflow Documentation
All classes communicate via the Window Azure Storage Blob protocol. Make sure that a Airflow connection of type wasb exists. Authorization can be...
Read more >
Confidential computing: an AWS perspective
If any AWS operator, including those with the highest privileges, needs to do maintenance work on the EC2 server, they can do so...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found