ADLS Gen2 FileSystemClient.get_paths() returns only 5000 paths (1 page in PageIterator)
See original GitHub issue- Package Name: azure.storage.filedatalake
- Package Version: 12.2.2
- Operating System: Azure Databricks, Ubuntu 16.04.6LTS
- Python Version: 3.7.3
Describe the bug
After getting FileSystemClient of particular container in ADLS Gen2, that contains more than 5000 files & folders, I am trying to retrieve all paths from this container using get_paths() method, which returns me the iterator, that contains only 5000 items of PathProperties or only 1 page in case I am using by_page() method.
To Reproduce Steps to reproduce the behavior:
- Connect to the ADLS Gen2 Storage - in my case I used
DataLakeServiceClientwith storage account key as credential. - Get the
FileSystemClientof Container which contains >5000 files & folders usingget_file_system_client()method. - Use
get_paths()method to get thePathPropertiesiterator. - Check the number of retreived paths after transforming iterator to list
- Optional: Check the number of Pages of retreived paths using
by_pages()method.
Expected behavior
I excpect to get the iterator of PathProperties with correct number of items corresponding to the particular container (>5000).
Optional: I excpect to get the PageIterator with correct number pages (>1) in case of container contains >5000 paths.
Screenshots

Issue Analytics
- State:
- Created 3 years ago
- Comments:9 (6 by maintainers)
Top Results From Across the Web
azure.storage.filedatalake.FileSystemClient class
Returns all user-defined metadata and system properties for the specified file system. The data returned does not include the file system's list of...
Read more >Microsoft Azure Data Lake Storage (Tech preview) operation
Create one or more new filesystems in a given ADLS Gen2 storage account. ... Restriction: The List Path operation can fetch up to...
Read more >How do I retrieve all directory paths from Azure Data Lake ...
get_paths method retrieves paths to both files and directories. Is there an efficent workaround to retrieve or filter only directory paths?
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

Hi @siarblack, datalake 12.2.3 has been released!
Hi @siarblack the fix got merged and we will be doing a patch release for this very soon. I will keep you updated.