Failed plots from Chia GUI crash plotman interactive
See original GitHub issueDescribe the bug I fired up interactive mode in a terminal last night and everything was looking good. I only have NVMe / SSD as temp and internal HDD as dest. Separately in the Chia GUI, I decided to experiment with plotting to an external HDD just giving it 1 thread. Mainly curious how bin count affects throughput on a platter drive. I go to bed, then two hours later the external HDD unmounted (I think it overheated), but that crashed Plotman Intereactive that I had left up in the terminal. No new jobs were scheduled all night long.
Why did the chia plot kicked off outside Plotman on a drive that doesn’t appear in the yaml crash it. Is that intended?
To Reproduce
Steps to reproduce the behavior, e.g.:
- Set up config with normal plotting parameters
- Start a plot from the Chia GUI on a drive not in the plotman.yml
- Unmount the drive that the additional plot was running on
- See error:
Traceback (most recent call last):
File "/home/username/chia-blockchain/venv/bin/plotman", line 8, in <module>
sys.exit(main())
File "/home/username/chia-blockchain/venv/lib/python3.8/site-packages/plotman/plotman.py", line 173, in main
interactive.run_interactive()
File "/home/username/chia-blockchain/venv/lib/python3.8/site-packages/plotman/interactive.py", line 334, in run_interactive
curses.wrapper(curses_main)
File "/usr/lib/python3.8/curses/__init__.py", line 105, in wrapper
return func(stdscr, *args, **kwds)
File "/home/username/chia-blockchain/venv/lib/python3.8/site-packages/plotman/interactive.py", line 261, in curses_main
jobs_win.addstr(0, 0, reporting.status_report(jobs, n_cols, jobs_h,
File "/home/username/chia-blockchain/venv/lib/python3.8/site-packages/plotman/reporting.py", line 106, in status_report
plot_util.human_format(j.get_tmp_usage(), 0),
File "/home/username/chia-blockchain/venv/lib/python3.8/site-packages/plotman/job.py", line 351, in get_tmp_usage
with os.scandir(self.tmpdir) as it:
FileNotFoundError: [Errno 2] No such file or directory: '/media/username/easystore/chia-plots'
Expected behavior Errors with plots scheduled on drives unknown to plotman shouldn’t halt scheduling.
One drive disconnecting shouldn’t halt scheduling for plotman. (If there are destination drives remaining.)
System setup:
- OS: Linux Mint
- Method of archiving: none
Config
full configuration
# Default/example plotman.yaml configuration file
# Options for display and rendering
user_interface:
# Call out to the `stty` program to determine terminal size, instead of
# relying on what is reported by the curses library. In some cases,
# the curses library fails to update on SIGWINCH signals. If the
# `plotman interactive` curses interface does not properly adjust when
# you resize the terminal window, you can try setting this to True.
use_stty_size: True
# Where to plot and log.
directories:
# One directory in which to store all plot job logs (the STDOUT/
# STDERR of all plot jobs). In order to monitor progress, plotman
# reads these logs on a regular basis, so using a fast drive is
# recommended.
# log: /home/username/chia/logs
log: /home/username/.chia/mainnet/plotter
# One or more directories to use as tmp dirs for plotting. The
# scheduler will use all of them and distribute jobs among them.
# It assumes that IO is independent for each one (i.e., that each
# one is on a different physical device).
#
# If multiple directories share a common prefix, reports will
# abbreviate and show just the uniquely identifying suffix.
tmp:
- /home/username/plotter-1/chia-plot-temp
- /media/username/plotter-2/chia-plot-temp
- /media/username/plotter-3/chia-plot-temp
- /media/username/plotter-4/chia-plot-temp
- /media/username/ssd-os/home/username/ssd-chia-plot-temp
# Optional: Allows overriding some characteristics of certain tmp
# directories. This contains a map of tmp directory names to
# attributes. If a tmp directory and attribute is not listed here,
# it uses the default attribute setting from the main configuration.
#
# Currently support override parameters:
# - tmpdir_max_jobs
# tmp_overrides:
# In this example, /mnt/tmp/00 is larger than the other tmp
# dirs and it can hold more plots than the default.
# "/mnt/tmp/00":
# tmpdir_max_jobs: 5
# Optional: tmp2 directory. If specified, will be passed to
# chia plots create as -2. Only one tmp2 directory is supported.
# tmp2: /mnt/tmp/a
# One or more directories; the scheduler will use all of them.
# These again are presumed to be on independent physical devices,
# so writes (plot jobs) and reads (archivals) can be scheduled
# to minimize IO contention.
dst:
- /media/username/farmer-01/chia-plots
- /media/username/farmer-02/chia-plots
- /media/username/farmer-03/chia-plots
- /media/username/farmer-04/chia-plots
- /media/username/farmer-05/chia-plots
- /media/username/farmer-06/chia-plots
- /media/username/farmer-07/chia-plots
- /media/username/farmer-08/chia-plots
# Archival configuration. Optional; if you do not wish to run the
# archiving operation, comment this section out.
#
# Currently archival depends on an rsync daemon running on the remote
# host.
# The archival also uses ssh to connect to the remote host and check
# for available directories. Set up ssh keys on the remote host to
# allow public key login from rsyncd_user.
# Complete example: https://github.com/ericaltendorf/plotman/wiki/Archiving
# archive:
# rsyncd_module: plots # Define this in remote rsyncd.conf.
# rsyncd_path: /plots # This is used via ssh. Should match path
# # defined in the module referenced above.
# rsyncd_bwlimit: 80000 # Bandwidth limit in KB/s
# rsyncd_host: myfarmer
# rsyncd_user: chia
# # Optional index. If omitted or set to 0, plotman will archive
# to the first archive dir with free space. If specified,
# plotman will skip forward up to 'index' drives (if they exist).
# This can be useful to reduce io contention on a drive on the
# archive host if you have multiple plotters (simultaneous io
# can still happen at the time a drive fills up.) E.g., if you
# have four plotters, you could set this to 0, 1, 2, and 3, on
# the 4 machines, or 0, 1, 0, 1.
# index: 0
# Plotting scheduling parameters
scheduling:
# Run a job on a particular temp dir only if the number of existing jobs
# before [tmpdir_stagger_phase_major : tmpdir_stagger_phase_minor]
# is less than tmpdir_stagger_phase_limit.
# Phase major corresponds to the plot phase, phase minor corresponds to
# the table or table pair in sequence, phase limit corresponds to
# the number of plots allowed before [phase major : phase minor].
# e.g, with default settings, a new plot will start only when your plot
# reaches phase [2 : 1] on your temp drive. This setting takes precidence
# over global_stagger_m
tmpdir_stagger_phase_major: 2
tmpdir_stagger_phase_minor: 1
# Optional: default is 1
tmpdir_stagger_phase_limit: 1
# Don't run more than this many jobs at a time on a single temp dir.
tmpdir_max_jobs: 3
# Don't run more than this many jobs at a time in total.
# Setting 6 because each plotting drive (2 currently) has room for 3, maybe 4 if optimized
# global_max_jobs: 0
global_max_jobs: 15
# Don't run any jobs (across all temp dirs) more often than this, in minutes.
# (default was 30)
global_stagger_m: 10
# How often the daemon wakes to consider starting a new plot job, in seconds.
polling_time_s: 60
# Plotting parameters. These are pass-through parameters to chia plots create.
# See documentation at
# https://github.com/Chia-Network/chia-blockchain/wiki/CLI-Commands-Reference#create
plotting:
k: 32
e: False # Use -e plotting option
n_threads: 2 # Threads per job
n_buckets: 128 # Number of buckets to split data into
job_buffer: 3389 # Per job memory (default: 3389)
# If specified, pass through to the -f and -p options. See CLI reference.
# farmer_pk: ...
# pool_pk: ...
Additional context & screenshots
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (1 by maintainers)
Top GitHub Comments
Thanks for the stacktrace and filing an issue. Sounds like the basic problem is if one of the dirs plotman depends on disappears (ie the drive gets unmounted) we die instead of cleanly recovering?
Hi sorry, I thought I subscribed to this issue, but I’ve somehow subscribed to all activity in the repo and lost your reply in the noise.
It’s possible that is the case, but I think what I’m observing was slightly different.
I was using the Chia GUI to experiment with plotting to an external hard drive. That was logging to a default location in .chia/mainnet/plotter.
Separately, I had plot man configured to log into that same directory, because I noticed it would scan the logs of plots from other sources. So that way I could keep an eye on the experimental plots, and let plotman take them into account when scheduling.
Some kind of error happened with the external hard drive and the mount got really messed up. I eventually had to force unmount it. Any calls to stat that drive were locking up processes.
So it’s possible that was related. Even just trying to list the root directory of the drive with
ls /dev/sdm
would just hang forever. Trying to check the smart data and grab temperature for instance would hang forever.So if plotman is doing any of that under the hood, maybe it was stuck waiting on a hung process.