Concurrent IO upload failed invalid XML received
See original GitHub issueEnvironment info
- NooBaa Version: VERSION
- Platform: Kubernetes 1.14.1 | minikube 1.1.1 | OpenShift 4.1 | other: specify
oc version Client Version: 4.9.0 Server Version: 4.9.0 Kubernetes Version: v1.22.0-rc.0+894a78b
Noobaa version (Downstream - ODF 4.9 RC build)
noobaa status INFO[0000] CLI version: 5.9.0 INFO[0000] noobaa-image: quay.io/rhceph-dev/mcg-core@sha256:6ce2ddee7aff6a0e768fce523a77c998e1e48e25d227f93843d195d65ebb81b9 INFO[0000] operator-image: quay.io/rhceph-dev/mcg-operator@sha256:cc293c7fe0fdfe3812f9d1af30b6f9c59e97d00c4727c4463a5b9d3429f4278e INFO[0000] noobaa-db-image: registry.redhat.io/rhel8/postgresql-12@sha256:b3e5b7bc6acd6422f928242d026171bcbed40ab644a2524c84e8ccb4b1ac48ff INFO[0000] Namespace: openshift-storage
Actual behavior
Expected behavior
Steps to reproduce
Copying file (dd_file_30G) concurrently from INF node onto the 3 user buckets that ran into an error as shown below.
Note: MetalLB is configured
AWS_ACCESS_KEY_ID=7JKcByiKGPoEiZ31T9JL AWS_SECRET_ACCESS_KEY=P1prfehwoxOEIpZoSv3qJm6pUXe5MwS24gOJ2uDo aws --endpoint https://10.21.30.46:443 --no-verify-ssl s3 cp /root/dd_file_30G s3://newbucket-11k001-ha & AWS_ACCESS_KEY_ID=EljFZq2k2yMqqcrBLghp AWS_SECRET_ACCESS_KEY=AxmUJQqnRDCe1YqSkaoTC7EDRH2U0zD2tzRaPYP/ aws --endpoint https://10.21.30.47:443 --no-verify-ssl s3 cp /root/dd_file_30G s3://newbucket-11k002-ha & AWS_ACCESS_KEY_ID=FrFX7OsjtxL1MLDiAw6i AWS_SECRET_ACCESS_KEY=n4bB5QU6fSnRvbHAkq4vFtxiS6gpQjqPQu2BPTbk aws --endpoint https://10.21.30.48:443 --no-verify-ssl s3 cp /root/dd_file_30G s3://newbucket-11k000-ha &
Three ip address are assigned onto the 3 worker nodes, Noobaa-endpoint (9) three on each node, has initially running 3 of them and scaled down to the 6 of them when the 30G file is getting copied.
Complete log file of the IO is attached and the noobaa adm logs are also attached
upload failed: ../../dd_file_30G to s3://newbucket-11k001-ha/dd_file_30G Unable to parse response (XML or text declaration not at start of entity: line 1, column 39), invalid XML received. Further retries may succeed:
b'<?xml version="1.0" encoding="UTF-8"?> <?xml version="1.0" encoding="UTF-8"?><Error><Code>InternalError</Code><Message>We encountered an internal error. Please try again.</Message><Resource>/newbucket-11k001-ha/dd_file_30G?uploadId=82f5bcce-1aee-4b4c-97ef-1890ae6263e9</Resource><RequestId>kwm6c96w-cm4cl6-rir</RequestId></Error>'
urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.21.30.47'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.21.30.47'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.21.30.47'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.21.30.47'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.21.30.47'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.21.30.47'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.21.30.47'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.21.30.47'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.21.30.47'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.21.30.47'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.21.30.47'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.21.30.47'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.21.30.47'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.21.30.47'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.21.30.47'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.21.30.47'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
upload failed: ../../dd_file_30G to s3://newbucket-11k002-ha/dd_file_30G Unable to parse response (XML or text declaration not at start of entity: line 1, column 39), invalid XML received. Further retries may succeed:
b'<?xml version="1.0" encoding="UTF-8"?> <?xml version="1.0" encoding="UTF-8"?><Error><Code>InternalError</Code><Message>We encountered an internal error. Please try again.</Message><Resource>/newbucket-11k002-ha/dd_file_30G?uploadId=1fe614f8-260f-4bff-b402-3d6dc4e737fc</Resource><RequestId>kwm6c9oe-6kxpcr-hoi</RequestId></Error>'
Noobaa-endpoint df -h is shown below, endpoint has been total 6 of them coming back from 9 as we are issuing scaling down… However that shouldn’t matter as IO should continue w/o any disruption.
sh-4.4# df -h
Filesystem Size Used Avail Use% Mounted on
overlay 250G 25G 226G 10% /
tmpfs 64M 0 64M 0% /dev
tmpfs 16G 0 16G 0% /sys/fs/cgroup
shm 64M 0 64M 0% /dev/shm
tmpfs 16G 64M 16G 1% /etc/hostname
tmpfs 3.0G 8.0K 3.0G 1% /etc/mgmt-secret
tmpfs 3.0G 8.0K 3.0G 1% /etc/s3-secret
remote-sample 200G 93G 108G 47% /nsfs/noobaa-s3res-4080029599
/dev/vda4 250G 25G 226G 10% /etc/hosts
tmpfs 3.0G 20K 3.0G 1% /run/secrets/kubernetes.io/serviceaccount
tmpfs 16G 0 16G 0% /proc/acpi
tmpfs 16G 0 16G 0% /proc/scsi
tmpfs 16G 0 16G 0% /sys/firmware
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/ibm-spectrum-scale-das-ip-worker0-hpo-cp-fyre-ibm-com LoadBalancer 172.30.215.25 10.21.30.46 80:32115/TCP,443:30016/TCP,8444:31512/TCP,7004:32279/TCP 4d7h
service/ibm-spectrum-scale-das-ip-worker1-hpo-cp-fyre-ibm-com LoadBalancer 172.30.38.191 10.21.30.47 80:32492/TCP,443:30056/TCP,8444:30247/TCP,7004:31515/TCP 4d7h
service/ibm-spectrum-scale-das-ip-worker2-hpo-cp-fyre-ibm-com LoadBalancer 172.30.220.45 10.21.30.48 80:30798/TCP,443:31452/TCP,8444:30929/TCP,7004:30096/TCP 4d7h
service/noobaa-db-pg ClusterIP 172.30.213.87 <none> 5432/TCP 4d7h
service/noobaa-mgmt LoadBalancer 172.30.10.150 <pending> 80:31641/TCP,443:32499/TCP,8445:32647/TCP,8446:31382/TCP 4d7h
service/odf-console-service ClusterIP 172.30.231.250 <none> 9001/TCP 4d8h
service/odf-operator-controller-manager-metrics-service ClusterIP 172.30.167.200 <none> 8443/TCP 4d8h
service/s3 LoadBalancer 172.30.33.11 <pending> 80:31322/TCP,443:32747/TCP,8444:31794/TCP,7004:30363/TCP 4d7h
[root@api.hpo.cp.fyre.ibm.com inf-s3]# oc get pods -n openshift-storage -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
noobaa-core-0 1/1 Running 0 4d7h 10.254.19.106 worker1.hpo.cp.fyre.ibm.com <none> <none>
noobaa-db-pg-0 1/1 Running 0 4d7h 10.254.13.74 worker0.hpo.cp.fyre.ibm.com <none> <none>
noobaa-default-backing-store-noobaa-pod-2cd6196a 1/1 Running 0 4d7h 10.254.19.109 worker1.hpo.cp.fyre.ibm.com <none> <none>
noobaa-endpoint-6bf8f4bb8f-95vhw 1/1 Running 0 106m 10.254.22.144 worker2.hpo.cp.fyre.ibm.com <none> <none>
noobaa-endpoint-6bf8f4bb8f-bc4sf 1/1 Running 0 4d7h 10.254.19.110 worker1.hpo.cp.fyre.ibm.com <none> <none>
noobaa-endpoint-6bf8f4bb8f-hpcqx 1/1 Running 0 106m 10.254.19.50 worker1.hpo.cp.fyre.ibm.com <none> <none>
noobaa-endpoint-6bf8f4bb8f-qdq7z 1/1 Running 0 4d7h 10.254.13.76 worker0.hpo.cp.fyre.ibm.com <none> <none>
noobaa-endpoint-6bf8f4bb8f-vdssm 1/1 Running 0 4d7h 10.254.20.205 worker2.hpo.cp.fyre.ibm.com <none> <none>
noobaa-endpoint-6bf8f4bb8f-vpb7c 1/1 Running 0 106m 10.254.13.136 worker0.hpo.cp.fyre.ibm.com <none> <none>
noobaa-operator-7868fc9bcb-kgh2f 1/1 Running 26 (74m ago) 4d8h 10.254.19.87 worker1.hpo.cp.fyre.ibm.com <none> <none>
ocs-metrics-exporter-849cc696d7-rftz6 1/1 Running 0 4d8h 10.254.13.60 worker0.hpo.cp.fyre.ibm.com <none> <none>
ocs-operator-58c5d98867-szfdg 1/1 Running 30 (3h14m ago) 4d8h 10.254.13.59 worker0.hpo.cp.fyre.ibm.com <none> <none>
odf-console-9b698b47-h7ncb 1/1 Running 0 4d8h 10.254.19.89 worker1.hpo.cp.fyre.ibm.com <none> <none>
odf-operator-controller-manager-6cb768f45b-wskpc 2/2 Running 33 (3h14m ago) 4d8h 10.254.20.198 worker2.hpo.cp.fyre.ibm.com <none> <none>
rook-ceph-operator-6ff88bd68d-4x2s4 1/1 Running 0 4d8h 10.254.19.88 worker1.hpo.cp.fyre.ibm.com <none> <none>
endpoint logs snippet … 2021-11-30 14:04:24.583405 [PID-14/TID-21] [L0] FS::FSWorker::Execute: WARNING FileFsync _wrap->_path=/nsfs/noobaa-s3res-4080029599/newbucket-11k002-ha/.noobaa-nsfs_61a5c9f04145e6002afe7ca2/multipart-uploads/1fe614f8-260f-4bff-b402-3d6dc4e737fc/part-2365 took too long: 100.431 ms 2021-11-30 14:04:30.868407 [PID-14/TID-21] [L0] FS::FSWorker::Execute: WARNING FileFsync _wrap->_path=/nsfs/noobaa-s3res-4080029599/newbucket-11k002-ha/.noobaa-nsfs_61a5c9f04145e6002afe7ca2/multipart-uploads/1fe614f8-260f-4bff-b402-3d6dc4e737fc/part-2423 took too long: 117.055 ms 2021-11-30 14:05:57.992646 [PID-14/TID-21] [L0] FS::FSWorker::Execute: WARNING FileFsync _wrap->_path=/nsfs/noobaa-s3res-4080029599/newbucket-11k002-ha/.noobaa-nsfs_61a5c9f04145e6002afe7ca2/multipart-uploads/1fe614f8-260f-4bff-b402-3d6dc4e737fc/part-3230 took too long: 107.036 ms 2021-11-30 14:06:19.917411 [PID-14/TID-24] [L0] FS::FSWorker::Execute: WARNING FileFsync _wrap->_path=/nsfs/noobaa-s3res-4080029599/newbucket-11k002-ha/.noobaa-nsfs_61a5c9f04145e6002afe7ca2/multipart-uploads/1fe614f8-260f-4bff-b402-3d6dc4e737fc/part-3432 took too long: 134.502 ms
grep “1fe614f8-260f-4bff-b402-3d6dc4e737fc” /tmp/noobaa-endpoint-worker1.log |wc -l 22
More information - Scre
noobaa-endpoint-worker1.log enshots / Logs / Other output must-gather-ioerror.gz
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (1 by maintainers)

Top Related StackOverflow Question
Thank you for trying Jenia… Randomly posted the sleep 600… it is max time that it would be able to complete the upload when running… i u/s because of the upload fails it would give NO SUCH KEY message is what you might be referring to when it gets to Delete
@rkomandu Attempted yestreday and everything worked on my end. Used smaller files due to the fact that I did not have much space to work with.
Also, I saw that your bash script was reliant on abstract time sleeps which is invalid. I’ve changed the script to do the uploads and deletes in parallel and actually wait for completions between the stages without doing sleep and hoping that they will complete in a certain time.