Tasks don't complete in Device - Channels module
See original GitHub issueObserved behavior
Tasks are no longer completing after an abrupt shutdown (power to the server unplugged) while an update task was ongoing. The tasks were completing and had even cleared them prior to the abrupt shutdown.
Expected behavior
Tasks should complete.
User-facing consequences
Can’t accomplish any task in the Device -> Channels module.
Errors and logs
Herein are attached the logs of the last few days ever since the problem started. kolibri-2020-01-25.txt kolibri-2020-01-26.txt kolibri-2020-01-27.txt kolibri-2020-01-23.txt kolibri-2020-01-24.txt
Steps to reproduce
- Start a channel content update task
- Simulate an abrupt shutdown of Kolibri while the task is still underway (Unplug the device)
- Start device and Kolibri
- No new tasks shall complete thereafter
Context
Kolibri server - 0.3.6 Kolibri - 0.13.0 Ubuntu 18.04.3
Issue Analytics
- State:
- Created 4 years ago
- Comments:10 (6 by maintainers)
Top Results From Across the Web
DAQmx two Input Tasks using different Physical Channels ...
My problem is the continuous measurement task is reserving the hardware so the other loop cannot sample the other physical channel, resulting in ......
Read more >Task modules - Teams | Microsoft Learn
Task modules are useful for initiating and completing tasks or displaying rich information, such as videos or Power Business Intelligence ...
Read more >Tasks and the back stack | Android Developers
A task is a collection of activities that users interact with when trying to do something in your app. These activities are arranged...
Read more >Controlling playbook execution: strategies and more
Controlling playbook execution: strategies and more . By default, Ansible runs each task on all hosts affected by a play before starting...
Read more >Removing and replacing a memory module (IBM SSR task)
No tools are required to complete this task. Do not remove or loosen any screws. As Figure 1 shows, each node canister contains...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
After some live debugging with the @intelliant01 installation, the problem was on this line https://github.com/learningequality/kolibri/blob/release-v0.13.x/kolibri/core/content/management/commands/importchannel.py#L179
importchannel
tries to acquire a lock to do the importation and could not do it because the database said a lock was acquired by another process with a differentpid
. Diskcache does not return any error in this case, it just keeps the acquire routine in an infinite loop: https://github.com/grantjenks/python-diskcache/blob/master/diskcache/recipes.py#L145-L154 until the lock is released.The problem appears because when the abrupt shutdown happened, kolibri was interrupted while doing the import and had a pid that was inserted in the cache.db to lock the process. After kolibri restarted the pid was not the same and we were in a deadlock.
I think the solution should be deleting
cache.db
whenever kolibri starts, to avoid this kind of problems, but I’d like to hear more opinions, specially from @rtibbles who implemented the locking here.btw, this bug may happen with all the tasks using db_task_write_lock, not only importing channels, also deleting them, trying to vacuum, importing content, etc.
I am attaching here the bad db to help debugging
My experience is similar to point 1 therein, though I have not tried the uninstall and reinstall step -
However I do remember being able to start and complete multiple imports. They did execute sequentially but they did get enlisted in the tasks list simultaneously and did complete. It was only after a hard shutdown that I experienced the problem.
Shall try to provide a gif soon as I may be away from the problem system for the next two days.