Incorrect calculation of disk usage and availability during import
See original GitHub issueObserved behavior
This issue reports possibly two related bugs:
1/ Incorrect calculation of disk available:
Actual: 0 Expected: 33G
2/ Misleading calculation when computing file size already on disk
Context: channel import of a large channel KA (ru) failed 70% through (due to network error, see below), so 70% of the files are already downloaded to /storage, but haven’t been marked as “available” in Kolibri DB, so are not included in the calculation.
Expected behavior
- 1/ Interfacing with OS to check disk space available should be accurate
- 2/ Channel import/update user interface should show content nodes available and size of /storage already downloaded (since this is what users need to know to judge the size of the download).
User-facing consequences
Users see wrong disk-space-available calculation (until restarting Kolibri).
Users see misleading “already downloaded” information that doesn’t account for files downloaded, but not marked available.
Errors and logs
@laurenlichtman was importing through the web around 14:31:29 UTC, then first import errors due to network timeout?
INFO 2018-06-22 14:31:29,076 importchannel Downloading data for channel id 303df4e42aac519796a3f49bed613cb4
INFO 2018-06-22 14:31:30,282 channel_import Importing ContentTag data
INFO 2018-06-22 14:31:30,285 channel_import Importing ContentNode_has_prerequisite data
INFO 2018-06-22 14:31:30,286 channel_import Importing ContentNode_related data
INFO 2018-06-22 14:31:30,287 channel_import Importing ContentNode_tags data
INFO 2018-06-22 14:31:30,288 channel_import Importing ContentNode data
INFO 2018-06-22 14:31:31,348 channel_import Importing Language data
INFO 2018-06-22 14:31:31,353 channel_import Importing File data
INFO 2018-06-22 14:31:32,500 channel_import Importing LocalFile data
INFO 2018-06-22 14:31:42,336 channel_import Importing AssessmentMetaData data
INFO 2018-06-22 14:31:42,339 channel_import Importing ChannelMetadata data
INFO 2018-06-22 14:31:43,186 annotation Setting availability of File objects based on LocalFile availability
INFO 2018-06-22 14:31:43,296 annotation Setting availability of non-topic ContentNode objects based on File availability
INFO 2018-06-22 14:31:43,582 annotation Setting availability of ContentNode objects with children for 2 levels
INFO 2018-06-22 14:31:43,583 annotation Setting availability of ContentNode objects with children for level 2
INFO 2018-06-22 14:31:43,587 annotation Setting availability of ContentNode objects with children for level 1
ERROR 2018-06-22 15:47:55,066 importcontent An error occured during content import: 504 Server Error: Gateway Time-out for url: https://studio.learningequality.org/content/storage/7/4/7489c3176ca502d0c91f7d9a67829d5e.jpg
ERROR 2018-06-22 16:28:13,775 importcontent An error occured during content import: HTTPSConnectionPool(host='studio.learningequality.org', port=443): Max retries exceeded with url: /content/storage/b/3/b30e4ce8c480f46e27b0eabddc06ac95.jpg (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'ssl3_read_bytes', 'sslv3 alert handshake failure')],)",),))
ERROR 2018-06-22 16:35:12,585 importcontent An error occured during content import: HTTPSConnectionPool(host='studio.learningequality.org', port=443): Read timed out. (read timeout=20)
WARNING 2018-06-22 16:35:13,426 base Job 331d2d0e55aa4dffa59c47cb26b602ab raised an exception: Traceback (most recent call last):
File "/home/kolibri/.pex/install/kolibri-0.10.0b5-py2.py3-none-any.whl.231bf69099f195f7a4092238814111ee2fcdd688/kolibri-0.10.0b5-py2.py3-none-any.whl/kolibri/dist/iceqube/worker/backends/inmem.py", line 75, in handle_finished_future
result = future.result()
File "/home/kolibri/.pex/install/kolibri-0.10.0b5-py2.py3-none-any.whl.231bf69099f195f7a4092238814111ee2fcdd688/kolibri-0.10.0b5-py2.py3-none-any.whl/kolibri/dist/py2only/concurrent/futures/_base.py", line 422, in result
return self.__get_result()
File "/home/kolibri/.pex/install/kolibri-0.10.0b5-py2.py3-none-any.whl.231bf69099f195f7a4092238814111ee2fcdd688/kolibri-0.10.0b5-py2.py3-none-any.whl/kolibri/dist/py2only/concurrent/futures/thread.py", line 62, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/kolibri/.pex/install/kolibri-0.10.0b5-py2.py3-none-any.whl.231bf69099f195f7a4092238814111ee2fcdd688/kolibri-0.10.0b5-py2.py3-none-any.whl/kolibri/dist/iceqube/worker/backends/inmem.py", line 149, in wrap
raise e
ReadTimeout: HTTPSConnectionPool(host='studio.learningequality.org', port=443): Read timed out. (read timeout=20)
Second attempt at import fail for unknown reason
INFO 2018-06-23 06:42:01,437 importchannel Downloading data for channel id 303df4e42aac519796a3f49bed613cb4
WARNING 2018-06-23 06:42:03,542 channel_import Version 1 of channel 303df4e42aac519796a3f49bed613cb4 already exists in database; cancelling import of version 1
INFO 2018-06-23 06:42:03,547 annotation Setting availability of File objects based on LocalFile availability
INFO 2018-06-23 06:42:03,805 annotation Setting availability of non-topic ContentNode objects based on File availability
INFO 2018-06-23 06:42:04,360 annotation Setting availability of ContentNode objects with children for 2 levels
INFO 2018-06-23 06:42:04,361 annotation Setting availability of ContentNode objects with children for level 2
INFO 2018-06-23 06:42:04,365 annotation Setting availability of ContentNode objects with children for level 1
INFO 2018-06-23 06:42:05,911 apps Running Kolibri with the following settings: kolibri.deployment.default.settings.base
ERROR 2018-06-23 07:09:38,228 importcontent An error occured during content import: [Errno 5] Input/output error
After restarting Kolibri the disk space available was correct (33G), still, the UI import logic did not allow me to choose “select all checkbox” then import (because thinks not enough disk space)
Third attempt via command line stalls at first, but after restarting Kolibri 4 mins later, seems to finish task OK:
INFO 2018-06-23 12:17:04,666 importchannel Downloading data for channel id 303df4e42aac519796a3f49bed613cb4
WARNING 2018-06-23 12:17:06,791 channel_import Version 1 of channel 303df4e42aac519796a3f49bed613cb4 already exists in database; cancelling import of version 1
INFO 2018-06-23 12:17:06,796 annotation Setting availability of File objects based on LocalFile availability
INFO 2018-06-23 12:17:07,267 annotation Setting availability of non-topic ContentNode objects based on File availability
INFO 2018-06-23 12:17:08,041 annotation Setting availability of ContentNode objects with children for 2 levels
INFO 2018-06-23 12:17:08,043 annotation Setting availability of ContentNode objects with children for level 2
INFO 2018-06-23 12:17:08,046 annotation Setting availability of ContentNode objects with children for level 1
INFO 2018-06-23 12:17:10,825 apps Running Kolibri with the following settings: kolibri.deployment.default.settings.base
INFO 2018-06-23 12:22:05,964 annotation Setting availability of 7464 LocalFile objects based on passed in checksums
INFO 2018-06-23 12:22:06,336 annotation Setting availability of File objects based on LocalFile availability
INFO 2018-06-23 12:22:06,764 annotation Setting availability of non-topic ContentNode objects based on File availability
INFO 2018-06-23 12:22:07,770 annotation Setting availability of ContentNode objects with children for 2 levels
INFO 2018-06-23 12:22:07,772 annotation Setting availability of ContentNode objects with children for level 2
INFO 2018-06-23 12:22:08,229 annotation Setting availability of ContentNode objects with children for level 1
So demo server is in working state now: http://ka-ru-demo.learningequality.org/learn/
Steps to reproduce
Not sure what caused the network error so difficult to reproduce. Presumably, if the network import task had finished correctly files would have been marked available so bug 2/ would not be visible.
If it helps chasing bug 1/, I can create an identical demo server as above (100G disk, trying to import 70G channel). Kolibri UI should report 30G available after finished import.
Context
- Kolibri version: Kolibri 0.10.0b5
- Operating system: linux
- Browser: chrome
Issue Analytics
- State:
- Created 5 years ago
- Comments:5 (4 by maintainers)
Top GitHub Comments
For the first problem, I wonder if it would be less confusing to say:
rather than
because ‘Your remaining space’ is ambiguous about whether it refers to post- or pre-import
I agree with @indirectlylit - this text change is urgently needed. I know that we have a string freeze in 0.10.x, but we could fix this by only adding text and changing the order?