edgeHub crashes after changing config.yaml when configured for additional offline storage
See original GitHub issueExpected Behavior
Changing RuntimeLogLevel to “debug” in /etc/config.yaml and restarting the iotedge runtime (systemctl restart iotedge) when the edgeHub is configured for additional storage should change the log level of the edgeAgent and edgeHub and not cause the edgeHub to repeatedly crash.
Current Behavior
If I configure the edgeHub to use non-container storage (https://docs.microsoft.com/en-us/azure/iot-edge/offline-capabilities) and then edit the config.yaml file ( /etc/iotedge/config.yaml
) to change the RuntimeLogLevel and restart systemctl restart iotedge
the edgeHub fails to come back up completely, logging that it received 500 Internal Server Error when making a call to the workload API to decrypt.
Changing the config.yaml file back to the way it was before and restarting again does not resolve the issue.
Steps to Reproduce
Create & change permissions on the directory for offline/non-container edgeHub storage if you haven’t already done so
sudo mkdir /etc/iotedge/strorage && sudo chown -R 1000:1000 /etc/iotedge/storage
Update your deployment.json to use this directory for storage. My systemModules section looks like the following:
"systemModules": {
"edgeAgent": {
"settings": {
"image": "mcr.microsoft.com/azureiotedge-agent:1.0.6",
"createOptions": ""
},
"type": "docker"
},
"edgeHub": {
"settings": {
"image": "mcr.microsoft.com/azureiotedge-hub:1.0.6",
"createOptions": "{\"HostConfig\":{\"Binds\":[\"/etc/iotedge/storage/:/iotedge/storage/\"],\"PortBindings\":{\"8883/tcp\":[{\"HostPort\":\"8883\"}],\"443/tcp\":[{\"HostPort\":\"443\"}],\"5671/tcp\":[{\"HostPort\":\"5671\"}]}}}"
},
"type": "docker",
"env": {
"storageFolder": {
"value": "/iotedge/storage/"
}
},
"status": "running",
"restartPolicy": "always"
}
}
Deploy and make sure everything is working/that you are using the additional storage.
Edit /etc/iotedge/config.yaml
and change RuntimeLogLevel to debug (or if you have it at debug, you can change it to info or whatever your favorite log level happens to be). Save your changes and then restart the edge runtime systemctl restart itoedge
so the changes are applied.
If you follow the edgeHub and edge daemon logs, you should now see errors. journalctl -u iotedge:
May 06 14:19:51 betsyc-iotedge9 iotedged[4327]: 2019-05-06T14:19:51Z [ERR!] - Internal server error: Could not decrypt
May 06 14:19:51 betsyc-iotedge9 iotedged[4327]: caused by: A error occurred in the key store.
May 06 14:19:51 betsyc-iotedge9 iotedged[4327]: caused by: HSM failure
May 06 14:19:51 betsyc-iotedge9 iotedged[4327]: caused by: HSM API failure occurred: 417
May 06 14:19:51 betsyc-iotedge9 iotedged[4327]: 2019-05-06T14:19:51Z [INFO] - [work] - - - [2019-05-06 14:19:51.665964641 UTC] "POST /modules/%24edgeHub/genid/636927468235895329/decrypt?api-version=2018-06-28 HTTP/1.1" 500 Internal Server Error 150 "-" "-" pid(5198)
docker logs edgeHub:
2019-05-06 14:19:15.488 +00:00 [INF] [EdgeHub] - Starting Edge Hub
2019-05-06 14:19:15.489 +00:00 [INF] [EdgeHub] -
█████╗ ███████╗██╗ ██╗██████╗ ███████╗
██╔══██╗╚══███╔╝██║ ██║██╔══██╗██╔════╝
███████║ ███╔╝ ██║ ██║██████╔╝█████╗
██╔══██║ ███╔╝ ██║ ██║██╔══██╗██╔══╝
██║ ██║███████╗╚██████╔╝██║ ██║███████╗
╚═╝ ╚═╝╚══════╝ ╚═════╝ ╚═╝ ╚═╝╚══════╝
██╗ ██████╗ ████████╗ ███████╗██████╗ ██████╗ ███████╗
██║██╔═══██╗╚══██╔══╝ ██╔════╝██╔══██╗██╔════╝ ██╔════╝
██║██║ ██║ ██║ █████╗ ██║ ██║██║ ███╗█████╗
██║██║ ██║ ██║ ██╔══╝ ██║ ██║██║ ██║██╔══╝
██║╚██████╔╝ ██║ ███████╗██████╔╝╚██████╔╝███████╗
╚═╝ ╚═════╝ ╚═╝ ╚══════╝╚═════╝ ╚═════╝ ╚══════╝
2019-05-06 14:19:15.489 +00:00 [INF] [EdgeHub] - Version - 1.0.6.19913336 (8288bc9bd6f6e15295fea506cd3f99d7f6347a6a)
2019-05-06 14:19:15.491 +00:00 [INF] [EdgeHub] - Loaded server certificate with expiration date of "2019-08-04T13:52:25.0000000+00:00"
2019-05-06 14:19:15.523 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Hub.Core.Storage.MessageStore] - Created new message store
2019-05-06 14:19:15.523 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Hub.Core.Storage.MessageStore] - Started task to cleanup processed and stale messages
2019-05-06 14:19:15.592 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.CloudProxy.DeviceConnectivityManager] - Created DeviceConnectivityManager with connected check frequency 00:05:00 and disconnected check frequency 00:02:00
2019-05-06 14:19:20.350 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Connecting socket /var/run/iotedge/workload.sock
2019-05-06 14:19:20.351 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Connected socket /var/run/iotedge/workload.sock
2019-05-06 14:19:20.351 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Sending request http://workload.sock/modules/%24edgeHub/genid/636927468235895329/decrypt?api-version=2018-06-28
2019-05-06 14:19:20.354 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Response received InternalServerError
2019-05-06 14:19:20.354 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Edged.WorkloadClient] - Retrying Http call to unix:///var/run/iotedge/workload.sock to Decrypt because of error Error, retry count = 3
2019-05-06 14:19:30.062 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Connecting socket /var/run/iotedge/workload.sock
2019-05-06 14:19:30.063 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Connected socket /var/run/iotedge/workload.sock
2019-05-06 14:19:30.063 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Sending request http://workload.sock/modules/%24edgeHub/genid/636927468235895329/decrypt?api-version=2018-06-28
2019-05-06 14:19:30.066 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Response received InternalServerError
Unhandled Exception: System.AggregateException: One or more errors occurred. (Error calling Decrypt: Could not decrypt
caused by: A error occurred in the key store.
caused by: HSM failure
caused by: HSM API failure occurred: 417) ---> Microsoft.Azure.Devices.Edge.Util.Edged.WorkloadCommunicationException: Error calling Decrypt: Could not decrypt
caused by: A error occurred in the key store.
caused by: HSM failure
caused by: HSM API failure occurred: 417
at Microsoft.Azure.Devices.Edge.Util.Edged.WorkloadClient.Execute[T](Func`1 func, String operation) in /home/vsts/work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/edged/WorkloadClient.cs:line 109
at Microsoft.Azure.Devices.Edge.Util.Edged.WorkloadClient.DecryptAsync(String initializationVector, String encryptedText) in /home/vsts/work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/edged/WorkloadClient.cs:line 83
at Microsoft.Azure.Devices.Edge.Storage.EncryptedStore`2.<>c__DisplayClass17_0.<<IterateBatch>b__0>d.MoveNext() in /home/vsts/work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Storage/EncryptedStore.cs:line 89
--- End of stack trace from previous location where exception was thrown ---
at Microsoft.Azure.Devices.Edge.Storage.RocksDb.ColumnFamilyDbStore.IterateBatch(Action`1 seeker, Int32 batchSize, Func`3 callback, CancellationToken cancellationToken) in /home/vsts/work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Storage.RocksDb/ColumnFamilyDbStore.cs:line 162
at Microsoft.Azure.Devices.Edge.Util.TaskEx.TimeoutAfter(Task task, TimeSpan timeout) in /home/vsts/work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/TaskEx.cs:line 142
at Microsoft.Azure.Devices.Edge.Hub.Core.DeviceScopeIdentitiesCache.ReadCacheFromStore(IKeyValueStore`2 encryptedStore) in /home/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Core/DeviceScopeIdentitiesCache.cs:line 135
at Microsoft.Azure.Devices.Edge.Hub.Core.DeviceScopeIdentitiesCache.Create(IServiceProxy serviceProxy, IKeyValueStore`2 encryptedStorage, TimeSpan refreshRate) in /home/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Core/DeviceScopeIdentitiesCache.cs:line 55
at Microsoft.Azure.Devices.Edge.Hub.Service.Modules.CommonModule.<Load>b__17_9(IComponentContext c) in /home/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Service/modules/CommonModule.cs:line 211
at Microsoft.Azure.Devices.Edge.Hub.Service.Modules.RoutingModule.<Load>b__20_10(IComponentContext c) in /home/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Service/modules/RoutingModule.cs:line 193
at Microsoft.Azure.Devices.Edge.Hub.Service.Modules.RoutingModule.<Load>b__20_12(IComponentContext c) in /home/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Service/modules/RoutingModule.cs:line 225
at Microsoft.Azure.Devices.Edge.Hub.Service.Modules.RoutingModule.<Load>b__20_25(IComponentContext c) in /home/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Service/modules/RoutingModule.cs:line 392
at Microsoft.Azure.Devices.Edge.Hub.Service.Modules.RoutingModule.<Load>b__20_28(IComponentContext c) in /home/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Service/modules/RoutingModule.cs:line 450
at Microsoft.Azure.Devices.Edge.Hub.Service.Program.MainAsync(IConfigurationRoot configuration) in /home/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Service/Program.cs:line 62
--- End of inner exception stack trace ---
at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification)
at Microsoft.Azure.Devices.Edge.Hub.Service.Program.Main() in /home/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Service/Program.cs:line 30
Context (Environment)
Device (Host) Operating System
Ubuntu 18.04 LTS
Container Operating System
Linux Containers
Runtime Versions
iotedged
iotedge 1.0.6.1 (3fa6cbef8b7fc3c55a49a622735eb1021b8a5963)
Edge Agent
1.0.6
Edge Hub
1.0.6
Docker
3.0.5
Logs
Additional Information
When the edgeHub is not configured for additional storage, I don’t see this issue. I can change the RuntimeLogLevel repeatedly and restart the runtime without issue.
After getting to this bad state, I can return to a workings state by stopping the runtime, clearing out the storage directory, and restarting the runtime.
systemctl stop iotedge
sudo rm -rf /etc/iotedge/storage/*
systemctl restart iotedge
Issue Analytics
- State:
- Created 4 years ago
- Reactions:3
- Comments:11 (5 by maintainers)
Top GitHub Comments
Thank you for the bug report. You are running into this bug: https://github.com/Azure/iotedge/pull/1082
This fix is in the
1.0.7
release which will be out today (in a couple of hours).Well, looks like the actual issue is different, just the symptom is the same:
I forgot we bind-mount a named volume for the edgeHub storage instead of
/etc/iotedge/storage
. After stopping iotedge and deleting the named volume and restarting iotedge caused the deployment to re-apply and re-create the named volume. At which point edgeHub successfully managed to start up. I’ll create a new issue for this