Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

onnxmodel pwrite broken pipe on CUDAExecutionProvider

See original GitHub issue

Describe the bug

When starting OP on Ubuntu for simulation, sometimes the following error would occur and modeld won’t work. This is likely caused by a race condition somewhere since it works fine sometimes.

_modeld: selfdrive/modeld/runners/onnxmodel.cc:63: void ONNXModel::pwrite(float *, int): Assertion `err >= 0’ failed.

The specific reason for this error is writing to a broken pipe according to the errno, but I don’t know why exactly this happens.

How to reproduce or log data

Run OP on Ubuntu with CUDAExecutionProvider

Expected behavior

modeld works properly

Additional context

I tested for this bug multiple times, from what I’m seeing so far, it rarely happens on onnx’s normal CPUExecutionProvider, and once I switch to the faster CUDAExecutionProvider, this problem begins to happen more frequently. This could be the race condition itself, or maybe it’s just my testing environment.

Operating system: Ubuntu 20.10

Issue Analytics

State:
Created 3 years ago
Reactions:2
Comments:10 (2 by maintainers)

Top GitHub Comments

2reactions

adeebshihadehcommented, Mar 8, 2021

We haven’t seen this after the fixes in that PR. @iejMac will be spending some more time this week on making the sim stuff high quality, so if this is still an issue, we’ll see it and make sure it’s fixed.

1reaction

psarkacommented, Mar 10, 2021

I still get this error when running on the fresh openpilot-sim container (sha256:1a618f25cd) on CPU. My steps to repro are:

docker create --net=host \
  --name openpilot_client \
  --rm \
  -it \
  --device=/dev/dri \
  -v /tmp/.X11-unix:/tmp/.X11-unix \
  --shm-size 1G \
  -e DISPLAY=$DISPLAY \
  -e QT_X11_NO_MITSHM=1 \
  ghcr.io/commaai/openpilot-sim \
  /bin/bash -c "cd /openpilot/tools/sim && ./tmux_script.sh $*"

(Note: a. fixed ghcr.io docker repo b. no --gpus all flag c. create instead of run as my carla server is on a server and I need to modify the IP address in bridge.py)

followed by:

docker cp bridge.py openpilot_client:/openpilot/tools/sim/bridge.py
docker start openpilot_client -i

The model works approximatelly one time out of five, I see the UI and can control the car with WASD, OP commands do not work, pressing 1 gives the “open pilot unavailable communication issue between processes” message, which I’m failing to resolve by commenting lines in controlsd.py as indicated in the wiki.

Edit: OP commands do work, managed to fix the controlsd.py, it was line 214 that needed commenting. 👍