Endpoint pod crashed with Download of 10GB file
See original GitHub issueEnvironment info
- NooBaa Version: master-20210607
- Platform: OCP 4.6.16
Actual behavior
- Download of 10GB files caused endpoint pod crash
Expected behavior
- Download of any size files should not cause endpoint pod crash
Steps to reproduce
- Created 10GB files using setpriv in bucket-11
- Started download and it caused endpoint panic
download failed: s3://bucket-11/medium_file_2 to medium_read/medium_file_2 An error occurred (503) when calling the GetObject operation (reached max retries: 2): Service Unavailable
download failed: s3://bucket-11/medium_file_3 to medium_read/medium_file_3 An error occurred (503) when calling the GetObject operation (reached max retries: 2): Service Unavailable
download failed: s3://bucket-11/medium_file_1 to medium_read/medium_file_1 An error occurred (503) when calling the GetObject operation: Service Unavailable
download failed: s3://bucket-11/medium_file_10 to medium_read/medium_file_10 An error occurred (503) when calling the GetObject operation (reached max retries: 2): Service Unavailable
urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host 's3-noobaa.apps.ocp-akshat-1.cp.fyre.ibm.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
download failed: s3://bucket-11/medium_file_6 to medium_read/medium_file_6 An error occurred (503) when calling the GetObject operation: Service Unavailable
download failed: s3://bucket-11/medium_file_8 to medium_read/medium_file_8 An error occurred (503) when calling the GetObject operation: Service Unavailable
download failed: s3://bucket-11/medium_file_5 to medium_read/medium_file_5 An error occurred (503) when calling the GetObject operation: Service Unavailable
download failed: s3://bucket-11/medium_file_7 to medium_read/medium_file_7 An error occurred (503) when calling the GetObject operation: Service Unavailable
download failed: s3://bucket-11/medium_file_9 to medium_read/medium_file_9 An error occurred (503) when calling the GetObject operation: Service Unavailable
download failed: s3://bucket-11/medium_file_4 to medium_read/medium_file_4 An error occurred (503) when calling the GetObject operation: Service Unavailable
[root@hpo-node1 ~]# s3u1 ls s3://bucket-11/
urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host 's3-noobaa.apps.ocp-akshat-1.cp.fyre.ibm.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
2021-06-08 22:31:19 10240000000 medium_file_1
2021-06-08 22:36:09 10240000000 medium_file_10
2021-06-08 22:32:45 10240000000 medium_file_2
2021-06-08 22:33:10 10240000000 medium_file_3
2021-06-08 22:33:36 10240000000 medium_file_4
2021-06-08 22:34:01 10240000000 medium_file_5
2021-06-08 22:34:26 10240000000 medium_file_6
2021-06-08 22:34:52 10240000000 medium_file_7
2021-06-08 22:35:18 10240000000 medium_file_8
2021-06-08 22:35:43 10240000000 medium_file_9
More information - Screenshots / Logs / Other output
Jun-9 5:46:12.721 [Endpoint/8] [L0] core.endpoint.s3.ops.s3_get_object:: request aborted: undefined
Jun-9 5:46:12.722 [Endpoint/8] [LOG] CONSOLE:: NamespaceFS: read_object_stream { file_path: '/nsfs/nsfs-nsr-1/bucket-11/medium_file_4', start: 293601280, end: 301989888 }
Jun-9 5:46:12.722 [Endpoint/8] [LOG] CONSOLE:: NamespaceFS: read_object_stream { file_path: '/nsfs/nsfs-nsr-1/bucket-11/medium_file_4', start: 301989888, end: 310378496 }
Jun-9 5:46:12.722 [Endpoint/8] [LOG] CONSOLE:: NamespaceFS: read_object_stream { file_path: '/nsfs/nsfs-nsr-1/bucket-11/medium_file_3', start: 83886080, end: 92274688 }
Jun-9 5:46:12.729 [Endpoint/8] [LOG] CONSOLE:: NamespaceFS: read_object_stream { file_path: '/nsfs/nsfs-nsr-1/bucket-11/medium_file_10', start: 33554432, end: 41943040 }
Jun-9 5:46:12.730 [Endpoint/8] [LOG] CONSOLE:: NamespaceFS: read_object_stream { file_path: '/nsfs/nsfs-nsr-1/bucket-11/medium_file_10', start: 109051904, end: 117440512 }
Jun-9 5:46:12.731 [Endpoint/8] [LOG] CONSOLE:: NamespaceFS: read_object_stream { file_path: '/nsfs/nsfs-nsr-1/bucket-11/medium_file_4', start: 318767104, end: 327155712 }
Jun-9 5:46:12.732 [Endpoint/8] [LOG] CONSOLE:: NamespaceFS: read_object_stream { file_path: '/nsfs/nsfs-nsr-1/bucket-11/medium_file_2', start: 109051904, end: 117440512 }
Jun-9 5:46:12.733 [Endpoint/8] [LOG] CONSOLE:: NamespaceFS: read_object_stream { file_path: '/nsfs/nsfs-nsr-1/bucket-11/medium_file_10', start: 0, end: 8388608 }
Jun-9 5:46:12.734 [Endpoint/8] [LOG] CONSOLE:: NamespaceFS: read_object_stream { file_path: '/nsfs/nsfs-nsr-1/bucket-11/medium_file_3', start: 0, end: 8388608 }
...
...
...
Jun-9 5:46:13.005 [Endpoint/8] [L0] core.endpoint.s3.ops.s3_get_object:: request aborted: undefined
Jun-9 5:46:13.018 [Endpoint/8] [LOG] CONSOLE:: NamespaceFS: read_object_stream { file_path: '/nsfs/nsfs-nsr-1/bucket-11/medium_file_2', start: 50331648, end: 58720256 }
Jun-9 5:46:13.018 [Endpoint/8] [LOG] CONSOLE:: NamespaceFS: read_object_stream { file_path: '/nsfs/nsfs-nsr-1/bucket-11/medium_file_4', start: 343932928, end: 352321536 }
Jun-9 5:46:13.019 [Endpoint/8] [LOG] CONSOLE:: NamespaceFS: read_object_stream { file_path: '/nsfs/nsfs-nsr-1/bucket-11/medium_file_1', start: 50331648, end: 58720256 }
Jun-9 5:46:13.019 [Endpoint/8] [LOG] CONSOLE:: NamespaceFS: read_object_stream { file_path: '/nsfs/nsfs-nsr-1/bucket-11/medium_file_10', start: 117440512, end: 125829120 }
Jun-9 5:46:13.019 [Endpoint/8] [LOG] CONSOLE:: NamespaceFS: read_object_stream { file_path: '/nsfs/nsfs-nsr-1/bucket-11/medium_file_4', start: 335544320, end: 343932928 }
Jun-9 5:46:13.020 [Endpoint/8] [LOG] CONSOLE:: NamespaceFS: read_object_stream { file_path: '/nsfs/nsfs-nsr-1/bucket-11/medium_file_4', start: 33554432, end: 41943040 }
Jun-9 5:46:13.021 [Endpoint/8] [LOG] CONSOLE:: NamespaceFS: read_object_stream { file_path: '/nsfs/nsfs-nsr-1/bucket-11/medium_file_1', start: 167772160, end: 176160768 }
Jun-9 5:46:13.022 [Endpoint/8] [LOG] CONSOLE:: NamespaceFS: read_object_stream { file_path: '/nsfs/nsfs-nsr-1/bucket-11/medium_file_4', start: 369098752, end: 377487360 }
Jun-9 5:46:13.023 [Endpoint/8] [LOG] CONSOLE:: NamespaceFS: read_object_stream { file_path: '/nsfs/nsfs-nsr-1/bucket-11/medium_file_4', start: 377487360, end: 385875968 }
Jun-9 5:46:13.099 [Endpoint/8] [LOG] CONSOLE:: NamespaceFS: read_object_stream { file_path: '/nsfs/nsfs-nsr-1/bucket-11/medium_file_10', start: 41943040, end: 50331648 }
Jun-9 5:46:13.100 [Endpoint/8] [LOG] CONSOLE:: /nsfs/nsfs-nsr-1/bucket-11/medium_file_2 { dev: 242, ino: 84978811, mode: 33206, nlink: 1, uid: 5000, gid: 800, rdev: 0, size: 10240000000, blksize: 4194304, blocks: 20000000, atimeMs: 1623217572505.5771, ctimeMs: 1623217553595.954, mtimeMs: 1623216765096.5105, birthtimeMs: 1623217553595.954, atime: 2021-06-09T05:46:12.506Z, mtime: 2021-06-09T05:32:45.097Z, ctime: 2021-06-09T05:45:53.596Z, birthtime: 2021-06-09T05:45:53.596Z }
Jun-9 5:46:13.101 [Endpoint/8] [LOG] CONSOLE:: /nsfs/nsfs-nsr-1/bucket-11/medium_file_3 { dev: 242, ino: 84978770, mode: 33206, nlink: 1, uid: 5000, gid: 800, rdev: 0, size: 10240000000, blksize: 4194304, blocks: 20000000, atimeMs: 1623217564642.244, ctimeMs: 1623217553596.276, mtimeMs: 1623216790533.7551, birthtimeMs: 1623217553596.276, atime: 2021-06-09T05:46:04.642Z, mtime: 2021-06-09T05:33:10.534Z, ctime: 2021-06-09T05:45:53.596Z, birthtime: 2021-06-09T05:45:53.596Z }
Jun-9 5:46:13.101 [Endpoint/8] [LOG] CONSOLE:: /nsfs/nsfs-nsr-1/bucket-11/medium_file_4 { dev: 242, ino: 84978720, mode: 33206, nlink: 1, uid: 5000, gid: 800, rdev: 0, size: 10240000000, blksize: 4194304, blocks: 20000000, atimeMs: 1623217562935.9092, ctimeMs: 1623217553595.932, mtimeMs: 1623216816087.8848, birthtimeMs: 1623217553595.932, atime: 2021-06-09T05:46:02.936Z, mtime: 2021-06-09T05:33:36.088Z, ctime: 2021-06-09T05:45:53.596Z, birthtime: 2021-06-09T05:45:53.596Z }
Jun-9 5:46:13.101 [Endpoint/8] [LOG] CONSOLE:: /nsfs/nsfs-nsr-1/bucket-11/medium_file_4 { dev: 242, ino: 84978720, mode: 33206, nlink: 1, uid: 5000, gid: 800, rdev: 0, size: 10240000000, blksize: 4194304, blocks: 20000000, atimeMs: 1623217562935.9092, ctimeMs: 1623217553595.932, mtimeMs: 1623216816087.8848, birthtimeMs: 1623217553595.932, atime: 2021-06-09T05:46:02.936Z, mtime: 2021-06-09T05:33:36.088Z, ctime: 2021-06-09T05:45:53.596Z, birthtime: 2021-06-09T05:45:53.596Z }
Jun-9 5:46:13.101 [Endpoint/8] [LOG] CONSOLE:: /nsfs/nsfs-nsr-1/bucket-11/medium_file_4 { dev: 242, ino: 84978720, mode: 33206, nlink: 1, uid: 5000, gid: 800, rdev: 0, size: 10240000000, blksize: 4194304, blocks: 20000000, atimeMs: 1623217562935.9092, ctimeMs: 1623217553595.932, mtimeMs: 1623216816087.8848, birthtimeMs: 1623217553595.932, atime: 2021-06-09T05:46:02.936Z, mtime: 2021-06-09T05:33:36.088Z, ctime: 2021-06-09T05:45:53.596Z, birthtime: 2021-06-09T05:45:53.596Z }
PANIC: FS::FileWrap::dtor: file not closed _path=/nsfs/nsfs-nsr-1/bucket-11/medium_file_4 _fd=38 Success (0) ~FileWrap() at ../src/native/fs/fs_napi.cpp:603
/noobaa_init_files/noobaa_init.sh: line 74: 8 Aborted $*
######################################################################
Wed Jun 9 05:46:13 UTC 2021 NooBaa: Process exited RIP (RC=134)
######################################################################
Issue Analytics
- State:
- Created 2 years ago
- Comments:8 (3 by maintainers)
Top Results From Across the Web
kubectl cp to work on stopped/completed pods #454 - GitHub
The current implementation of kubectl cp requires the container to be running by encapsulating the exec command using the tar binary.
Read more >Determine the Reason for Pod Failure - Kubernetes
In the YAML file, in the command and args fields, you can see that the container sleeps for 10 seconds and then writes...
Read more >Troubleshooting online endpoints deployment and scoring
Learn how to troubleshoot some common deployment and scoring errors with online endpoints.
Read more >Node.js app crashing when trying to download large file using it
This project is working basically working for download links contain small files. Like a file of 1-500Mb. My problem is if the file...
Read more >Troubleshooting | Google Kubernetes Engine (GKE)
Check that you can connect to the core API endpoint in the API server: ... You can find out why your Pod's container...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

@liranmauda , I would attempt this post fix for 6637 to confirm.
@akmithal as we talked, duping into #6624