question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

onnxmodel pwrite broken pipe on CUDAExecutionProvider

See original GitHub issue

Describe the bug

When starting OP on Ubuntu for simulation, sometimes the following error would occur and modeld won’t work. This is likely caused by a race condition somewhere since it works fine sometimes.

_modeld: selfdrive/modeld/runners/onnxmodel.cc:63: void ONNXModel::pwrite(float *, int): Assertion `err >= 0’ failed.

The specific reason for this error is writing to a broken pipe according to the errno, but I don’t know why exactly this happens.

How to reproduce or log data

Run OP on Ubuntu with CUDAExecutionProvider

Expected behavior

modeld works properly

Additional context

I tested for this bug multiple times, from what I’m seeing so far, it rarely happens on onnx’s normal CPUExecutionProvider, and once I switch to the faster CUDAExecutionProvider, this problem begins to happen more frequently. This could be the race condition itself, or maybe it’s just my testing environment.

Operating system: Ubuntu 20.10

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:2
  • Comments:10 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
adeebshihadehcommented, Mar 8, 2021

We haven’t seen this after the fixes in that PR. @iejMac will be spending some more time this week on making the sim stuff high quality, so if this is still an issue, we’ll see it and make sure it’s fixed.

1reaction
psarkacommented, Mar 10, 2021

I still get this error when running on the fresh openpilot-sim container (sha256:1a618f25cd) on CPU. My steps to repro are:

docker create --net=host \
  --name openpilot_client \
  --rm \
  -it \
  --device=/dev/dri \
  -v /tmp/.X11-unix:/tmp/.X11-unix \
  --shm-size 1G \
  -e DISPLAY=$DISPLAY \
  -e QT_X11_NO_MITSHM=1 \
  ghcr.io/commaai/openpilot-sim \
  /bin/bash -c "cd /openpilot/tools/sim && ./tmux_script.sh $*"

(Note: a. fixed ghcr.io docker repo b. no --gpus all flag c. create instead of run as my carla server is on a server and I need to modify the IP address in bridge.py)

followed by:

docker cp bridge.py openpilot_client:/openpilot/tools/sim/bridge.py
docker start openpilot_client -i

The model works approximatelly one time out of five, I see the UI and can control the car with WASD, OP commands do not work, pressing 1 gives the “open pilot unavailable communication issue between processes” message, which I’m failing to resolve by commenting lines in controlsd.py as indicated in the wiki.

Edit: OP commands do work, managed to fix the controlsd.py, it was line 214 that needed commenting. 👍

Read more comments on GitHub >

github_iconTop Results From Across the Web

mat.s 电子烟 - CSDN
pipe n.管子,导管;烟斗 trace n.痕迹;丝毫 yawn vi.打呵欠 wireless a.不用电线的,无线的 structural a.结构的,构造的 speciality n.专业,特长;特产
Read more >
The onnxruntime_backend from triton-inference-server
Hello,. I have an ONNX model. I am sharing the input and output dimensions of this model below. ... I need to deploy...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found