Build: improve `upload` step
See original GitHub issueWe are using Django storages to delete/upload/sync the files from the builders into S3. This is good because it uses the same Django storages’ API, without matter what’s the backend behind the scenes.
However, S3/the API does not support “in bulk” uploads/deletion. So, there a lot of API requests have to be done to delete/upload a full directory. The code that does this is at https://github.com/readthedocs/readthedocs.org/blob/0a9afda92e38331d21f6915909818be0f7e74e17/readthedocs/builds/storage.py#L48-L136
This amount of API requests make the upload
process slow. In particular, when there are many files. We talked about improving this by using something like rclone
(https://rclone.org/) or similar. There is a django-rclone-storage
(https://pypi.org/project/django-rclone-storage/) where we can get some inspiration for this.
Slightly related to: https://github.com/readthedocs/readthedocs.org/issues/9179
Issue Analytics
- State:
- Created a year ago
- Comments:23 (23 by maintainers)
Noting here that we should take into account the symlink issue that we found in the last weeks when swapping the backend. In #9800 all the logic was moved into
safe_open
(used for opening user’s configuration files and also when uploading the files to the storage) which will help on this. We will need to use the same logic [^1] immediately before uploading the files withrclone
[^1]: check the symlink target is inside
DOCROOT
(or more specifically, the project’s path if possible)I also assumed we were talking about this solution as the first to try.
I’m rather hesitant to fiddle with threading in our own application implementation, or really trying to get too technically correct here. We don’t have many projects being negatively affected by this issue, so our solution should match the severity.
A drop in replacement is a great option. Even if it’s only twice as fast, that’s good value for little effort.