Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ChiaDog is not recovering from a remote harvester being down

See original GitHub issue

Hi, I have ChiaDog running on a CentOS box. I mapped my harvesters to local folders. Works great.

However, when a harvester box is restarted, ChiaDog is stuck on not seeing that log file anymore, until I restart ChiaDog for that harvester. Maybe when ChiaDog is detecting harvester down (no access to the file), it should try to check whether the file access has been restored?

A clear and concise description of what the bug is and how it can be reproduced.

Setup
- One box for harvester, one for ChiaDog
Map harvester log folder to a local folder on ChiaDog box
Run ChiaDog
Pull down the network cable from the ChiaDog box
- ChiaDog starts sending “Harvester Down” notifications
Reconnect network to ChiaDog
- ChiaDog keeps sending “Harvester Down” notifications

Environment:

OS: CentOS (for ChiaDog box)
Python version: 3.9.6
Chiadog version: hmm, latest? Maybe ChiaDog version should be included in those notifications, or in the first log line, when it is started?
Harvester: remote; however, mapped to a local folder, so seen as local to ChiaDog (maybe this is the reason that ChiaDog is not checking whether file access was restored, as it assumed that this is a catastrophic failure, and is due to reboot?)

Here is the exception generated when harvester went down:

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "/mnt/chia_logs/chiadog/ox/venv/lib/python3.9/site-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/mnt/chia_logs/chiadog/ox/venv/lib/python3.9/site-packages/retry/api.py", line 73, in retry_decorator
    return __retry_internal(partial(f, *args, **kwargs), exceptions, tries, delay, max_delay, backoff, jitter,
  File "/mnt/chia_logs/chiadog/ox/venv/lib/python3.9/site-packages/retry/api.py", line 33, in __retry_internal
    return f()
  File "/mnt/chia_logs/chiadog/ox/src/chia_log/log_consumer.py", line 75, in _consume_loop
    for log_line in Pygtail(self._expanded_log_path, read_from_end=True, offset_file=self._offset_path):
  File "/mnt/chia_logs/chiadog/ox/venv/lib/python3.9/site-packages/pygtail/core.py", line 89, in __init__
    if self._offset_file_inode != stat(self.filename).st_ino or \
OSError: [Errno 112] Host is down: '/mnt/chia_logs/ox/debug.log'
Exception ignored in: <function Pygtail.__del__ at 0x7f8f87633c10>
Traceback (most recent call last):
  File "/mnt/chia_logs/chiadog/ox/venv/lib/python3.9/site-packages/pygtail/core.py", line 97, in __del__
    if self._filehandle():
  File "/mnt/chia_logs/chiadog/ox/venv/lib/python3.9/site-packages/pygtail/core.py", line 179, in _filehandle
    self._fh = open(filename, "r", 1)
OSError: [Errno 112] Host is down: '/mnt/chia_logs/ox/debug.log'

Issue Analytics

State:
Created 2 years ago
Comments:8 (2 by maintainers)

Top GitHub Comments

1reaction

Jacek-ghubcommented, Aug 26, 2021

I would also suggest that just one notification about the harvester being down event is being sent. I guess, we all know what to do when we get notified, so those extra notifications are both redundant and (to me only?) annoying.

Saying that, I would also like to see a notification when a bunch of plots is being added (what would indicate connecting a new drive with plots - moving HDs around). That notification would be most often complementary to the one that is being sent when plots are disappearing from the harvester (HD unplugged from the plotter). This way, it would be a good notification that the added drive was recognized by the harvester, so we would not need to relay on rather hopeless full node UI.

1reaction

sorenfriiscommented, Aug 25, 2021

@Jacek-ghub I am only letting chiadog in over SSH with a dedicated user who only has read access to the log file