Help running napari with IPykernel on GitHub actions
See original GitHub issueBackground
First the good news! Thanks to the efforts of the executable book community, and of @GenevieveBuckley, @potating-potato, and @tlambert03 in napari/napari.github.io#95, I’ve got a very nice workflow going for the skan docs authoring content in markdown within a jupyter notebook, and executing them in GitHub Actions, including grabbing screenshots “invisibly” (using :tags: ["remove-input"]
at the start of the code cell containing the napari.utils.nbscreenshot()
call, as well as any calls setting up e.g. the camera position). I really think this is going to be amazing for our docs!
Also, aside: I don’t know when this started but now when you restart builds, GitHub Actions now lets you see previous builds!
Ok the bad news: I am seeing sporadic cells timing out when they interact with napari. Unfortunately they happen at a high enough frequency that you’d never get a complete docs build in napari. So I’m really interested in nipping them dead!
I’m starting this as a thread for people to provide ideas, since I know a few people on here have dealt with timeouts in the past, and I’m volunteering the skan repo as a nice place to experiment, since it has just a single page that depends on napari.
Authoring with myst markdown
Here’s the PR implementing myst markdown on skan: jni/skan#151. Summary for a new page:
- install jupytext
- install myst-nb (both are in the docs requirements)
- open a jupyter notebook
- click file > jupytext > pair with myst-nb
- work on your notebook as normal
- when done, close it, delete the ipynb (the .md contains all the info)
- commit the .md file
For editing an existing page, you can either edit the markdown, or
- launch jupyter notebook (again, ensuring you have jupytext and myst-nb installed)
- click on the .md file
- edit as usual
Here’s an example myst markdown source page, and here’s the rendered version.
Rendering napari windows on myst-nb
Any time you want to add a napari screenshot, you can add a cell with contents like so:
:tags: ["remove-input"]
viewer.camera.angles = (-30, 30, -135)
viewer.camera.zoom = 6.5
napari.utils.nbscreenshot(viewer)
This causes a screenshot to appear but the code to not be shown. You can see an example page with napari screenshots here, though the last screenshot is currently missing because of the timeout issue.
Rendering on CI
In addition to needing Talley’s “setup Qt libs” action, I also copied from @GenevieveBuckley’s napari/napari.github.io#95 starting a display and setting the DISPLAY env variable.
timeouts
So that just leaves the timeouts. They seem to happen arbitrarily on any cells that have napari interaction, not necessarily screenshot cells. Here are two builds during the PR adding that page:
- In the first build, I get the message “ERROR: Execution Failed with traceback saved in /home/runner/work/skan/skan/doc/_build/html/reports/visualizing_3d_skeletons.log”. The build artifact is unfortunately not preserved, but, spoiler alert, it was a timeout error. Here’s the full traceback:
nbclient.exceptions.CellTimeoutError: A cell timed out while it was being executed, after 300 seconds.
The message was: Cell execution timed out.
Here is a preview of the cell contents:
-------------------
skeleton_layer.edge_color = 'branch-distance'
skeleton_layer.edge_colormap = 'viridis'
# for now, we need to set the face color as well
skeleton_layer.face_color = 'branch-distance'
skeleton_layer.face_colormap = 'viridis'
-------------------
nbclient.exceptions.CellTimeoutError: A cell timed out while it was being executed, after 300 seconds.
The message was: Cell execution timed out.
Here is a preview of the cell contents:
-------------------
skeleton_layer.edge_color = 'branch-distance'
skeleton_layer.edge_colormap = 'viridis'
# for now, we need to set the face color as well
skeleton_layer.face_color = 'branch-distance'
skeleton_layer.face_colormap = 'viridis'
-------------------
File "/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/nbclient/client.py", line 618, in _async_poll_for_reply
msg = await ensure_async(self.kc.shell_channel.get_msg(timeout=new_timeout))
File "/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/nbclient/util.py", line 96, in ensure_async
result = await obj
File "/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/jupyter_client/channels.py", line 230, in get_msg
raise Empty
_queue.Empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/jupyter_cache/executors/utils.py", line 51, in single_nb_execution
executenb(
File "/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/nbclient/client.py", line 1085, in execute
return NotebookClient(nb=nb, resources=resources, km=km, **kwargs).execute()
File "/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/nbclient/util.py", line 84, in wrapped
return just_run(coro(*args, **kwargs))
File "/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/nbclient/util.py", line 62, in just_run
return loop.run_until_complete(coro)
File "/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
return future.result()
File "/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/nbclient/client.py", line 551, in async_execute
await self.async_execute_cell(
File "/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/nbclient/client.py", line 830, in async_execute_cell
exec_reply = await self.task_poll_for_reply
File "/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/nbclient/client.py", line 642, in _async_poll_for_reply
await self._async_handle_timeout(timeout, cell)
File "/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/nbclient/client.py", line 689, in _async_handle_timeout
raise CellTimeoutError.error_from_timeout_and_cell(
nbclient.exceptions.CellTimeoutError: A cell timed out while it was being executed, after 300 seconds.
The message was: Cell execution timed out.
Here is a preview of the cell contents:
-------------------
skeleton_layer.edge_color = 'branch-distance'
skeleton_layer.edge_colormap = 'viridis'
# for now, we need to set the face color as well
skeleton_layer.face_color = 'branch-distance'
skeleton_layer.face_colormap = 'viridis'
-------------------
- re-running the workflow produced no timeout errors and correctly built the page. (build artifact here)
The workflow on main after merging the PR got a timeout in the final cell.
Help!
If anyone has ideas on how to deal with the timeouts, I’m all ears and deeply interest in implementing them! Thank you all! 🙏
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (6 by maintainers)
@tlambert03
This was on the right track but not quite! TIL that steps and jobs can have a
working-directory:
tag, and that xvfb-action can’t do bashy things like if/then/fi, and that each xvfb run step is run independent of the others socd
has no effect, and that xvfb-action also has aworking-directory:
parameter. 😜 You can see the final diff at jni/skan#156.Anyway, after that change I’ve had two successful builds without timeouts, so maybe that’s fixed it…??? 🤞 Gonna keep fiddling, I’ll very happily close this if I get a few more uneventful builds…!
Possibly related: on the first screenshot cell I get the following warning (this was the case before the xvfb-action change also):
Do I just add XDG_RUNTIME_DIR=‘/tmp/runtime-runner’ to my env? Or is there something cleaner here?
Dang, 3/4 built without error, but timeout happened again.
https://github.com/jni/skan/runs/5171591210?check_suite_focus=true#step:7:46