413 Client Error: Payload Too Large when using upload_folder on a lot of files
See original GitHub issueDescribe the bug
When trying to commit a folder with many CSV files, I got the following error:
HTTPError: 413 Client Error: Payload Too Large for url: https://huggingface.co/api/datasets/nateraw/test-upload-folder-bug/preupload/main
I assume there is a limit to total payload size when uploading a folder that I am going over here. I confirmed it has nothing to do with the number of files, but rather the total size of the files that are being uploaded. It would be great in the short term if we could document what this limit is clearly in the upload_folder
fn.
Reproduction
The following fails on the last line. I wrote it so you can run it yourself without updating the repo ID or anything…so if you’re logged in, the below should work (assuming you have torchvision installed).
import os
from torchvision.datasets.utils import download_and_extract_archive
from huggingface_hub import upload_folder, whoami, create_repo
user = whoami()['name']
repo_id = f'{user}/test-upload-folder-bug'
create_repo(repo_id, exist_ok=True, repo_type='dataset')
os.mkdir('./data')
download_and_extract_archive(
url='https://zenodo.org/api/files/f7f7377b-8405-4d4f-b814-f021df5593b1/hyperbard_data.zip',
download_root='./data',
remove_finished=True
)
upload_folder(
folder_path='./data',
path_in_repo="",
repo_id=repo_id,
repo_type='dataset'
)
Logs
---------------------------------------------------------------------------
HTTPError Traceback (most recent call last)
<ipython-input-2-91516b1ea47f> in <module>()
18 path_in_repo="",
19 repo_id=repo_id,
---> 20 repo_type='dataset'
21 )
3 frames
/usr/local/lib/python3.7/dist-packages/huggingface_hub/hf_api.py in upload_folder(self, repo_id, folder_path, path_in_repo, commit_message, commit_description, token, repo_type, revision, create_pr)
2115 token=token,
2116 revision=revision,
-> 2117 create_pr=create_pr,
2118 )
2119
/usr/local/lib/python3.7/dist-packages/huggingface_hub/hf_api.py in create_commit(self, repo_id, operations, commit_message, commit_description, token, repo_type, revision, create_pr, num_threads)
1813 token=token,
1814 revision=revision,
-> 1815 endpoint=self.endpoint,
1816 )
1817 upload_lfs_files(
/usr/local/lib/python3.7/dist-packages/huggingface_hub/_commit_api.py in fetch_upload_modes(additions, repo_type, repo_id, token, revision, endpoint)
380 headers=headers,
381 )
--> 382 resp.raise_for_status()
383
384 preupload_info = validate_preupload_info(resp.json())
/usr/local/lib/python3.7/dist-packages/requests/models.py in raise_for_status(self)
939
940 if http_error_msg:
--> 941 raise HTTPError(http_error_msg, response=self)
942
943 def close(self):
HTTPError: 413 Client Error: Payload Too Large for url: https://huggingface.co/api/datasets/nateraw/test-upload-folder-bug/preupload/main
### System Info
```shell
Colab
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:22 (15 by maintainers)
Top GitHub Comments
It should be soon!! cc @Wauplin
@fcakyon or use
pip install huggingface_hub==0.11.0rc0
which is about to be publicly released and will be a more robust future-proof fix 😃