question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Concurrent IO upload failed invalid XML received

See original GitHub issue

Environment info

  • NooBaa Version: VERSION
  • Platform: Kubernetes 1.14.1 | minikube 1.1.1 | OpenShift 4.1 | other: specify

oc version Client Version: 4.9.0 Server Version: 4.9.0 Kubernetes Version: v1.22.0-rc.0+894a78b

Noobaa version (Downstream - ODF 4.9 RC build)

noobaa status INFO[0000] CLI version: 5.9.0 INFO[0000] noobaa-image: quay.io/rhceph-dev/mcg-core@sha256:6ce2ddee7aff6a0e768fce523a77c998e1e48e25d227f93843d195d65ebb81b9 INFO[0000] operator-image: quay.io/rhceph-dev/mcg-operator@sha256:cc293c7fe0fdfe3812f9d1af30b6f9c59e97d00c4727c4463a5b9d3429f4278e INFO[0000] noobaa-db-image: registry.redhat.io/rhel8/postgresql-12@sha256:b3e5b7bc6acd6422f928242d026171bcbed40ab644a2524c84e8ccb4b1ac48ff INFO[0000] Namespace: openshift-storage

Actual behavior

Expected behavior

Steps to reproduce

Copying file (dd_file_30G) concurrently from INF node onto the 3 user buckets that ran into an error as shown below.

Note: MetalLB is configured

AWS_ACCESS_KEY_ID=7JKcByiKGPoEiZ31T9JL AWS_SECRET_ACCESS_KEY=P1prfehwoxOEIpZoSv3qJm6pUXe5MwS24gOJ2uDo aws --endpoint https://10.21.30.46:443 --no-verify-ssl s3 cp /root/dd_file_30G s3://newbucket-11k001-ha & AWS_ACCESS_KEY_ID=EljFZq2k2yMqqcrBLghp AWS_SECRET_ACCESS_KEY=AxmUJQqnRDCe1YqSkaoTC7EDRH2U0zD2tzRaPYP/ aws --endpoint https://10.21.30.47:443 --no-verify-ssl s3 cp /root/dd_file_30G s3://newbucket-11k002-ha & AWS_ACCESS_KEY_ID=FrFX7OsjtxL1MLDiAw6i AWS_SECRET_ACCESS_KEY=n4bB5QU6fSnRvbHAkq4vFtxiS6gpQjqPQu2BPTbk aws --endpoint https://10.21.30.48:443 --no-verify-ssl s3 cp /root/dd_file_30G s3://newbucket-11k000-ha &

Three ip address are assigned onto the 3 worker nodes, Noobaa-endpoint (9) three on each node, has initially running 3 of them and scaled down to the 6 of them when the 30G file is getting copied.

Complete log file of the IO is attached and the noobaa adm logs are also attached


upload failed: ../../dd_file_30G to s3://newbucket-11k001-ha/dd_file_30G Unable to parse response (XML or text declaration not at start of entity: line 1, column 39), invalid XML received. Further retries may succeed:
b'<?xml version="1.0" encoding="UTF-8"?> <?xml version="1.0" encoding="UTF-8"?><Error><Code>InternalError</Code><Message>We encountered an internal error. Please try again.</Message><Resource>/newbucket-11k001-ha/dd_file_30G?uploadId=82f5bcce-1aee-4b4c-97ef-1890ae6263e9</Resource><RequestId>kwm6c96w-cm4cl6-rir</RequestId></Error>'
urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.21.30.47'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.21.30.47'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.21.30.47'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.21.30.47'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.21.30.47'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.21.30.47'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.21.30.47'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.21.30.47'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.21.30.47'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.21.30.47'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.21.30.47'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.21.30.47'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.21.30.47'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.21.30.47'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.21.30.47'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.21.30.47'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
upload failed: ../../dd_file_30G to s3://newbucket-11k002-ha/dd_file_30G Unable to parse response (XML or text declaration not at start of entity: line 1, column 39), invalid XML received. Further retries may succeed:
b'<?xml version="1.0" encoding="UTF-8"?> <?xml version="1.0" encoding="UTF-8"?><Error><Code>InternalError</Code><Message>We encountered an internal error. Please try again.</Message><Resource>/newbucket-11k002-ha/dd_file_30G?uploadId=1fe614f8-260f-4bff-b402-3d6dc4e737fc</Resource><RequestId>kwm6c9oe-6kxpcr-hoi</RequestId></Error>'

Noobaa-endpoint df -h is shown below, endpoint has been total 6 of them coming back from 9 as we are issuing scaling down… However that shouldn’t matter as IO should continue w/o any disruption.


sh-4.4# df -h
Filesystem      Size  Used Avail Use% Mounted on
overlay         250G   25G  226G  10% /
tmpfs            64M     0   64M   0% /dev
tmpfs            16G     0   16G   0% /sys/fs/cgroup
shm              64M     0   64M   0% /dev/shm
tmpfs            16G   64M   16G   1% /etc/hostname
tmpfs           3.0G  8.0K  3.0G   1% /etc/mgmt-secret
tmpfs           3.0G  8.0K  3.0G   1% /etc/s3-secret
remote-sample   200G   93G  108G  47% /nsfs/noobaa-s3res-4080029599
/dev/vda4       250G   25G  226G  10% /etc/hosts
tmpfs           3.0G   20K  3.0G   1% /run/secrets/kubernetes.io/serviceaccount
tmpfs            16G     0   16G   0% /proc/acpi
tmpfs            16G     0   16G   0% /proc/scsi
tmpfs            16G     0   16G   0% /sys/firmware

NAME                                                            TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)                                                    AGE
service/ibm-spectrum-scale-das-ip-worker0-hpo-cp-fyre-ibm-com   LoadBalancer   172.30.215.25    10.21.30.46   80:32115/TCP,443:30016/TCP,8444:31512/TCP,7004:32279/TCP   4d7h
service/ibm-spectrum-scale-das-ip-worker1-hpo-cp-fyre-ibm-com   LoadBalancer   172.30.38.191    10.21.30.47   80:32492/TCP,443:30056/TCP,8444:30247/TCP,7004:31515/TCP   4d7h
service/ibm-spectrum-scale-das-ip-worker2-hpo-cp-fyre-ibm-com   LoadBalancer   172.30.220.45    10.21.30.48   80:30798/TCP,443:31452/TCP,8444:30929/TCP,7004:30096/TCP   4d7h
service/noobaa-db-pg                                            ClusterIP      172.30.213.87    <none>        5432/TCP                                                   4d7h
service/noobaa-mgmt                                             LoadBalancer   172.30.10.150    <pending>     80:31641/TCP,443:32499/TCP,8445:32647/TCP,8446:31382/TCP   4d7h
service/odf-console-service                                     ClusterIP      172.30.231.250   <none>        9001/TCP                                                   4d8h
service/odf-operator-controller-manager-metrics-service         ClusterIP      172.30.167.200   <none>        8443/TCP                                                   4d8h
service/s3                                                      LoadBalancer   172.30.33.11     <pending>     80:31322/TCP,443:32747/TCP,8444:31794/TCP,7004:30363/TCP   4d7h


[root@api.hpo.cp.fyre.ibm.com inf-s3]# oc get pods -n openshift-storage -o wide
NAME                                               READY   STATUS    RESTARTS         AGE    IP              NODE                          NOMINATED NODE   READINESS GATES
noobaa-core-0                                      1/1     Running   0                4d7h   10.254.19.106   worker1.hpo.cp.fyre.ibm.com   <none>           <none>
noobaa-db-pg-0                                     1/1     Running   0                4d7h   10.254.13.74    worker0.hpo.cp.fyre.ibm.com   <none>           <none>
noobaa-default-backing-store-noobaa-pod-2cd6196a   1/1     Running   0                4d7h   10.254.19.109   worker1.hpo.cp.fyre.ibm.com   <none>           <none>
noobaa-endpoint-6bf8f4bb8f-95vhw                   1/1     Running   0                106m   10.254.22.144   worker2.hpo.cp.fyre.ibm.com   <none>           <none>
noobaa-endpoint-6bf8f4bb8f-bc4sf                   1/1     Running   0                4d7h   10.254.19.110   worker1.hpo.cp.fyre.ibm.com   <none>           <none>
noobaa-endpoint-6bf8f4bb8f-hpcqx                   1/1     Running   0                106m   10.254.19.50    worker1.hpo.cp.fyre.ibm.com   <none>           <none>
noobaa-endpoint-6bf8f4bb8f-qdq7z                   1/1     Running   0                4d7h   10.254.13.76    worker0.hpo.cp.fyre.ibm.com   <none>           <none>
noobaa-endpoint-6bf8f4bb8f-vdssm                   1/1     Running   0                4d7h   10.254.20.205   worker2.hpo.cp.fyre.ibm.com   <none>           <none>
noobaa-endpoint-6bf8f4bb8f-vpb7c                   1/1     Running   0                106m   10.254.13.136   worker0.hpo.cp.fyre.ibm.com   <none>           <none>
noobaa-operator-7868fc9bcb-kgh2f                   1/1     Running   26 (74m ago)     4d8h   10.254.19.87    worker1.hpo.cp.fyre.ibm.com   <none>           <none>
ocs-metrics-exporter-849cc696d7-rftz6              1/1     Running   0                4d8h   10.254.13.60    worker0.hpo.cp.fyre.ibm.com   <none>           <none>
ocs-operator-58c5d98867-szfdg                      1/1     Running   30 (3h14m ago)   4d8h   10.254.13.59    worker0.hpo.cp.fyre.ibm.com   <none>           <none>
odf-console-9b698b47-h7ncb                         1/1     Running   0                4d8h   10.254.19.89    worker1.hpo.cp.fyre.ibm.com   <none>           <none>
odf-operator-controller-manager-6cb768f45b-wskpc   2/2     Running   33 (3h14m ago)   4d8h   10.254.20.198   worker2.hpo.cp.fyre.ibm.com   <none>           <none>
rook-ceph-operator-6ff88bd68d-4x2s4                1/1     Running   0                4d8h   10.254.19.88    worker1.hpo.cp.fyre.ibm.com   <none>           <none>

endpoint logs snippet … 2021-11-30 14:04:24.583405 [PID-14/TID-21] [L0] FS::FSWorker::Execute: WARNING FileFsync _wrap->_path=/nsfs/noobaa-s3res-4080029599/newbucket-11k002-ha/.noobaa-nsfs_61a5c9f04145e6002afe7ca2/multipart-uploads/1fe614f8-260f-4bff-b402-3d6dc4e737fc/part-2365 took too long: 100.431 ms 2021-11-30 14:04:30.868407 [PID-14/TID-21] [L0] FS::FSWorker::Execute: WARNING FileFsync _wrap->_path=/nsfs/noobaa-s3res-4080029599/newbucket-11k002-ha/.noobaa-nsfs_61a5c9f04145e6002afe7ca2/multipart-uploads/1fe614f8-260f-4bff-b402-3d6dc4e737fc/part-2423 took too long: 117.055 ms 2021-11-30 14:05:57.992646 [PID-14/TID-21] [L0] FS::FSWorker::Execute: WARNING FileFsync _wrap->_path=/nsfs/noobaa-s3res-4080029599/newbucket-11k002-ha/.noobaa-nsfs_61a5c9f04145e6002afe7ca2/multipart-uploads/1fe614f8-260f-4bff-b402-3d6dc4e737fc/part-3230 took too long: 107.036 ms 2021-11-30 14:06:19.917411 [PID-14/TID-24] [L0] FS::FSWorker::Execute: WARNING FileFsync _wrap->_path=/nsfs/noobaa-s3res-4080029599/newbucket-11k002-ha/.noobaa-nsfs_61a5c9f04145e6002afe7ca2/multipart-uploads/1fe614f8-260f-4bff-b402-3d6dc4e737fc/part-3432 took too long: 134.502 ms

grep “1fe614f8-260f-4bff-b402-3d6dc4e737fc” /tmp/noobaa-endpoint-worker1.log |wc -l 22

More information - Scre

noobaa-endpoint-worker1.log enshots / Logs / Other output must-gather-ioerror.gz

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:6 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
rkomanducommented, Dec 8, 2021

Thank you for trying Jenia… Randomly posted the sleep 600… it is max time that it would be able to complete the upload when running… i u/s because of the upload fails it would give NO SUCH KEY message is what you might be referring to when it gets to Delete

0reactions
jeniawhitecommented, Dec 8, 2021

@rkomandu Attempted yestreday and everything worked on my end. Used smaller files due to the fact that I did not have much space to work with.

upload: ./temp_5GB_file to s3://jenia1/temp_5GB_file              ile(s) remaining   ile(s) remaining   ile(s) remaining   
upload: ./temp_5GB_file to s3://jenia3/temp_5GB_file              
upload: ./temp_5GB_file to s3://jenia2/temp_5GB_file              
Run deletes 47
delete: s3://jenia3/temp_5GB_file
delete: s3://jenia1/temp_5GB_file
delete: s3://jenia2/temp_5GB_file
Done loop 47
Run uploads 48
upload: ./temp_5GB_file to s3://jenia1/temp_5GB_file              ile(s) remaining   ile(s) remaining   
upload: ./temp_5GB_file to s3://jenia2/temp_5GB_file              
upload: ./temp_5GB_file to s3://jenia3/temp_5GB_file              
Run deletes 48
delete: s3://jenia2/temp_5GB_file
delete: s3://jenia3/temp_5GB_file
delete: s3://jenia1/temp_5GB_file
Done loop 48
Run uploads 49
upload: ./temp_5GB_file to s3://jenia3/temp_5GB_file              ile(s) remaining   ining 
upload: ./temp_5GB_file to s3://jenia1/temp_5GB_file              
upload: ./temp_5GB_file to s3://jenia2/temp_5GB_file              
Run deletes 49
delete: s3://jenia2/temp_5GB_file
delete: s3://jenia3/temp_5GB_file
delete: s3://jenia1/temp_5GB_file
Done loop 49
Run uploads 50
upload: ./temp_5GB_file to s3://jenia3/temp_5GB_file              ile(s) remaining   ile(s) remaining   
upload: ./temp_5GB_file to s3://jenia1/temp_5GB_file              
upload: ./temp_5GB_file to s3://jenia2/temp_5GB_file              
Run deletes 50
delete: s3://jenia1/temp_5GB_file
delete: s3://jenia2/temp_5GB_file
delete: s3://jenia3/temp_5GB_file
Done loop 50

Also, I saw that your bash script was reliant on abstract time sleeps which is invalid. I’ve changed the script to do the uploads and deletes in parallel and actually wait for completions between the stages without doing sleep and hoping that they will complete in a certain time.

#!/bin/bash
for i in {1..50}
do
    echo "Run uploads $i"
    aws --endpoint http://localhost:6001 --no-verify-ssl s3 cp /Users/jenia/Documents/GitHub/noobaa-core/temp_5GB_file s3://jenia1 &
    aws --endpoint http://localhost:6001 --no-verify-ssl s3 cp /Users/jenia/Documents/GitHub/noobaa-core/temp_5GB_file s3://jenia2 &
    aws --endpoint http://localhost:6001 --no-verify-ssl s3 cp /Users/jenia/Documents/GitHub/noobaa-core/temp_5GB_file s3://jenia3 &
    wait

    echo "Run deletes $i"
    aws --endpoint http://localhost:6001 --no-verify-ssl s3 rm s3://jenia1/temp_5GB_file &
    aws --endpoint http://localhost:6001 --no-verify-ssl s3 rm s3://jenia2/temp_5GB_file &
    aws --endpoint http://localhost:6001 --no-verify-ssl s3 rm s3://jenia3/temp_5GB_file &
    wait

    echo "Done loop $i"
done
Read more comments on GitHub >

github_iconTop Results From Across the Web

[BUG] get "code 416: The range specified is invalid ... - GitHub
I create a file share in Portal, then upload an empty file manually. In my project, I want to update the file content....
Read more >
AWS S3: Uploading large file fails with ResetException
The failure always happens after a while (i.e. after around 15 minutes), which must mean that the upload process is executing only to...
Read more >
running an application example at Anypoint Studio
Invalid mapping (Port binding to a root element may produce invalid XML file. Set 'Records per file' or 'Max number of records' component...
Read more >
NetSuite Integration Guide: Common Errors - Boomi Community
This article contains a list of common errors related to NetSuite Integration. See also NetSuite Integration Guide. Error: NetSuite Import ...
Read more >
Troubleshooting AWS CodeBuild
Issue: When you run a build, the UPLOAD_ARTIFACTS build phase fails with the error Failed to upload artifacts: Invalid arn . Possible cause:...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found