onnxmodel pwrite broken pipe on CUDAExecutionProvider
See original GitHub issueDescribe the bug
When starting OP on Ubuntu for simulation, sometimes the following error would occur and modeld won’t work. This is likely caused by a race condition somewhere since it works fine sometimes.
_modeld: selfdrive/modeld/runners/onnxmodel.cc:63: void ONNXModel::pwrite(float *, int): Assertion `err >= 0’ failed.
The specific reason for this error is writing to a broken pipe according to the errno
, but I don’t know why exactly this happens.
How to reproduce or log data
Run OP on Ubuntu with CUDAExecutionProvider
Expected behavior
modeld works properly
Additional context
I tested for this bug multiple times, from what I’m seeing so far, it rarely happens on onnx’s normal CPUExecutionProvider, and once I switch to the faster CUDAExecutionProvider, this problem begins to happen more frequently. This could be the race condition itself, or maybe it’s just my testing environment.
Operating system: Ubuntu 20.10
Issue Analytics
- State:
- Created 3 years ago
- Reactions:2
- Comments:10 (2 by maintainers)
Top GitHub Comments
We haven’t seen this after the fixes in that PR. @iejMac will be spending some more time this week on making the sim stuff high quality, so if this is still an issue, we’ll see it and make sure it’s fixed.
I still get this error when running on the fresh openpilot-sim container (sha256:1a618f25cd) on CPU. My steps to repro are:
(Note: a. fixed
ghcr.io
docker repo b. no--gpus all
flag c.create
instead ofrun
as my carla server is on a server and I need to modify the IP address in bridge.py)followed by:
The model works approximatelly one time out of five, I see the UI and can control the car with WASD,
OP commands do not work, pressing 1 gives the “open pilot unavailable communication issue between processes” message, which I’m failing to resolve by commenting lines in controlsd.py as indicated in the wiki.Edit: OP commands do work, managed to fix the controlsd.py, it was line 214 that needed commenting. 👍