question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Wildcard retrieval of files

See original GitHub issue

I am attempting to access data in our s3 datalake. Since production systems are writing the data I want along with other files that I don’t want, simply using a prefix is insufficient to get the data I need.

Dask allows this sort of wildcarding, for example

import dask.dataframe as dd
dd.read_parquet(f's3://{bucket}/prod-system-*/*/parquet/*.parquet')

Using awswrangler for the above task isn’t viable. I know that boto3 doesn’t allow for wildcard filtering, but surely it must be doable if dask is able to implement that functionality?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:15 (12 by maintainers)

github_iconTop GitHub Comments

1reaction
nicholas-milescommented, Jul 23, 2020

@igorborgest I’m happy to take it on! I’d love any guidance you can give on how to do this though. I’m assuming we want to avoid additional imports/dependencies

1reaction
ghostcommented, Jul 22, 2020

Sorry @igorborgest . I am unassigning myself from this because I got a heavy task today from my team. Therefore, I wont be able to contribute for next 3-4 weeks. Please re-assign. I will pick up a new task once I am free. I apologize again for inconvenience caused.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Wildcards for Windows File System Subclients
You can use wildcards when you specify the content for a subclient. The search for subclient content is not case-sensitive. Note: If the ......
Read more >
Wildcard retrieval of files · Issue #322 · aws/aws-sdk-pandas
I am attempting to access data in our s3 datalake. Since production systems are writing the data I want along with other files...
Read more >
How to search for files using the wildcard character (*) in ...
You can also search for files with a specific name or using the wildcard (*) character. Command. Below command will search for the...
Read more >
"GET" command retrieves multiple files while using wildcard
Hi All I am using GNU/Linux This is regarding the get command to retrieve files (filename with wild card characters) from remote server....
Read more >
Retrieval of Funds Capture Acknowledgment Files Using a ...
You can retrieve multiple funds capture acknowledgment files from your bank instead of a single file by using a wildcard in the file...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found