question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

jaxlib v0.1.68 causing nondeterministic segfault for only macOS on GitHub Actions and Azure Pipelines servers

See original GitHub issue

Hi. This is a bit of a strange (possible) bug report that we’ve held off on a for a few days until we could try to get better reporting information. Both pyhf and awkward have been seeing segfaults on GitHub Actions jobs (for pyhf) and Azure Pipelines jobs (for awkward) since the release of jaxlib v0.1.68 that happen only for v0.1.68 (c.f. https://github.com/scikit-hep/pyhf/issues/1501)

$ pip list | grep jax
jax                    0.2.16
jaxlib                 0.1.68

and go away if we downgrade to jaxlib<0.1.68 (c.f. https://github.com/scikit-hep/awkward-1.0/pull/963 and https://github.com/scikit-hep/pyhf/pull/1502).

$ pip list | grep jax
jax                    0.2.16
jaxlib                 0.1.67

The bizarre part is that I am unable to replicate these segfaults on a MacBook Air that I’ve borrowed to debug this.

Minimal Failing Examples on GitHub Actions

The pyhf test suite has been segfaulting during runs as documented in https://github.com/scikit-hep/pyhf/issues/1501. To look at the environment in which this was happening I connected to a tmate session on the GHA servers using the mxschmitt/action-tmate@v3 GHA and I was able to replicate the segfault behavior on GHA with the following examples using just pure JAX

# debug_32b.py
import jax  # noqa: F401
import jax.numpy as jnp

print(jnp.asarray([-2, -1], dtype=jnp.float32))
print(jnp.asarray([-2, -1], dtype=jnp.float64))
# debug_64.py
import jax  # noqa: F401
from jax.config import config

config.update('jax_enable_x64', True)
import jax.numpy as jnp

# 32b first
jnp.asarray([-2, -1])
# then switch to 64b
jnp.asarray([-2, -1], dtype=jnp.float64)

and the following commands (with the bash-3.2 removed from before the $ for formatting) using both the deubg_32b.py

$ python debug_32b.py
Segmentation fault: 11
$ python debug_32b.py
WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
[-2. -1.]
/Users/runner/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/jax/_src/numpy/lax_numpy.py:3062: UserWarning: Explicitly requested dtype <class 'jax._src.numpy.lax_numpy.float64'> requested in asarray is not available, and will be truncated to dtype float32. To enable more dtypes, set the jax_enable_x64 configuration option or the JAX_ENABLE_X64 shell environment variable. See https://github.com/google/jax#current-gotchas for more.
  lax._check_user_dtype_supported(dtype, "asarray")
[-2. -1.]
$ python debug_32b.py
WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
[-2. -1.]
/Users/runner/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/jax/_src/numpy/lax_numpy.py:3062: UserWarning: Explicitly requested dtype <class 'jax._src.numpy.lax_numpy.float64'> requested in asarray is not available, and will be truncated to dtype float32. To enable more dtypes, set the jax_enable_x64 configuration option or the JAX_ENABLE_X64 shell environment variable. See https://github.com/google/jax#current-gotchas for more.
  lax._check_user_dtype_supported(dtype, "asarray")
[-2. -1.]
$ python debug_32b.py
WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
[-2. -1.]
/Users/runner/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/jax/_src/numpy/lax_numpy.py:3062: UserWarning: Explicitly requested dtype <class 'jax._src.numpy.lax_numpy.float64'> requested in asarray is not available, and will be truncated to dtype float32. To enable more dtypes, set the jax_enable_x64 configuration option or the JAX_ENABLE_X64 shell environment variable. See https://github.com/google/jax#current-gotchas for more.
  lax._check_user_dtype_supported(dtype, "asarray")
[-2. -1.]
$ python debug_32b.py
Segmentation fault: 11

and the debug_64b.py

$ python debug_64b.py
WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
$ python debug_64b.py
WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
$ python debug_64b.py
Segmentation fault: 11

For the GHA sever the env is (the same happens on the Python 3.8 jobs)

$ python --version --version
Python 3.7.10 (default, Feb 16 2021, 11:44:40)
[Clang 11.0.0 (clang-1100.0.33.17)]
$ printenv
GITHUB_JOB=test
GITHUB_EVENT_PATH=/Users/runner/work/_temp/_github_workflow/event.json
RUNNER_OS=macOS
XCODE_12_DEVELOPER_DIR=/Applications/Xcode_12.4.app/Contents/Developer
ANDROID_HOME=/Users/runner/Library/Android/sdk
GITHUB_BASE_REF=
NVM_CD_FLAGS=
CHROMEWEBDRIVER=/usr/local/Caskroom/chromedriver/91.0.4472.101
SHELL=/bin/bash
TERM=screen-256color
PIPX_BIN_DIR=/usr/local/opt/pipx_bin
GITHUB_REPOSITORY_OWNER=scikit-hep
INPUT_SUDO=true
TMPDIR=/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/
GITHUB_ACTIONS=true
GITHUB_RUN_NUMBER=7368
ANDROID_SDK_ROOT=/Users/runner/Library/Android/sdk
JAVA_HOME_8_X64=/Users/runner/hostedtoolcache/Java_Adopt_jdk/8.0.292-10/x64/Contents/Home
RCT_NO_LAUNCH_PACKAGER=1
RUNNER_WORKSPACE=/Users/runner/work/pyhf
NUNIT_BASE_PATH=/Library/Developer/nunit
RUNNER_PERFLOG=/usr/local/opt/runner/perflog
GITHUB_REF=refs/heads/fix/test-jax-version-that-breaks-ci
GITHUB_WORKFLOW=CI/CD
LC_ALL=en_US.UTF-8
NUNIT3_PATH=/Library/Developer/nunit/3.6.0
JAVA_HOME_11_X64=/Users/runner/hostedtoolcache/Java_Adopt_jdk/11.0.11-9/x64/Contents/Home
RUNNER_TOOL_CACHE=/Users/runner/hostedtoolcache
GITHUB_ACTION_REPOSITORY=mxschmitt/action-tmate
JAVA_HOME_14_X64=/Users/runner/hostedtoolcache/Java_Adopt_jdk/14.0.2-12/x64/Contents/Home
NVM_DIR=/Users/runner/.nvm
USER=runner
GITHUB_API_URL=https://api.github.com
GITHUB_EVENT_NAME=push
GITHUB_SHA=2e371805064fc961c95106c4098702b3696827c3
XCODE_10_DEVELOPER_DIR=/Applications/Xcode_10.3.app/Contents/Developer
RUNNER_TEMP=/Users/runner/work/_temp
pythonLocation=/Users/runner/hostedtoolcache/Python/3.7.10/x64
ANDROID_NDK_ROOT=/Users/runner/Library/Android/sdk/ndk-bundle
ANDROID_NDK_LATEST_HOME=/Users/runner/Library/Android/sdk/ndk/22.1.7171670
ImageVersion=20210620.1
SSH_AUTH_SOCK=/private/tmp/com.apple.launchd.I24a5PqIL3/Listeners
GITHUB_SERVER_URL=https://github.com
HOMEBREW_NO_AUTO_UPDATE=1
__CF_USER_TEXT_ENCODING=0x1F5:0:0
AGENT_TOOLSDIRECTORY=/Users/runner/hostedtoolcache
GITHUB_HEAD_REF=
GITHUB_GRAPHQL_URL=https://api.github.com/graphql
TMUX=/tmp/tmate.sock,1968,0
PATH=/Users/runner/hostedtoolcache/Python/3.7.10/x64/bin:/Users/runner/hostedtoolcache/Python/3.7.10/x64:/usr/local/opt/pipx_bin:/Users/runner/.cargo/bin:/usr/local/lib/ruby/gems/2.7.0/bin:/usr/local/opt/ruby@2.7/bin:/usr/local/opt/curl/bin:/usr/local/bin:/usr/local/sbin:/Users/runner/bin:/Users/runner/.yarn/bin:/Users/runner/Library/Android/sdk/tools:/Users/runner/Library/Android/sdk/platform-tools:/Users/runner/Library/Android/sdk/ndk-bundle:/Library/Frameworks/Mono.framework/Versions/Current/Commands:/usr/bin:/bin:/usr/sbin:/sbin:/Users/runner/.dotnet/tools:/Users/runner/.ghcup/bin:/Users/runner/hostedtoolcache/stack/2.7.1/x64
INPUT_LIMIT-ACCESS-TO-ACTOR=false
GITHUB_RETENTION_DAYS=90
PERFLOG_LOCATION_SETTING=RUNNER_PERFLOG
CONDA=/usr/local/miniconda
DOTNET_ROOT=/Users/runner/.dotnet
EDGEWEBDRIVER=/usr/local/share/edge_driver
PWD=/Users/runner/work/pyhf/pyhf
VM_ASSETS=/usr/local/opt/runner/scripts
JAVA_HOME=/Users/runner/hostedtoolcache/Java_Adopt_jdk/8.0.292-10/x64/Contents/Home
JAVA_HOME_12_X64=/Users/runner/hostedtoolcache/Java_Adopt_jdk/12.0.2-10.3/x64/Contents/Home
VCPKG_INSTALLATION_ROOT=/usr/local/share/vcpkg
LANG=en_US.UTF-8
ImageOS=macos1015
TMUX_PANE=%0
XPC_FLAGS=0x0
PIPX_HOME=/usr/local/opt/pipx
GECKOWEBDRIVER=/usr/local/opt/geckodriver/bin
GITHUB_ACTOR=matthewfeickert
XPC_SERVICE_NAME=0
HOME=/Users/runner
SHLVL=4
ACTIONS_RUNTIME_TOKEN=eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsIng1dCI6Ik9ta3lYbmJnM05RTE1nMGZMaTBSNnJxdzlxdyJ9.eyJuYW1laWQiOiJkZGRkZGRkZC1kZGRkLWRkZGQtZGRkZC1kZGRkZGRkZGRkZGQiLCJzY3AiOiJBY3Rpb25zLkdlbmVyaWNS
ZWFkOjAwMDAwMDAwLTAwMDAtMDAwMC0wMDAwLTAwMDAwMDAwMDAwMCBBY3Rpb25zLlVwbG9hZEFydGlmYWN0czowMDAwMDAwMC0wMDAwLTAwMDAtMDAwMC0wMDAwMDAwMDAwMDAvMTpCdWlsZC9CdWlsZC8yMzU5MSBMb2NhdGlvblNlcnZpY2UuQ29ubmVjdCBSZWFkQW5
kVXBkYXRlQnVpbGRCeVVyaTowMDAwMDAwMC0wMDAwLTAwMDAtMDAwMC0wMDAwMDAwMDAwMDAvMTpCdWlsZC9CdWlsZC8yMzU5MSIsIklkZW50aXR5VHlwZUNsYWltIjoiU3lzdGVtOlNlcnZpY2VJZGVudGl0eSIsImh0dHA6Ly9zY2hlbWFzLnhtbHNvYXAub3JnL3dzLz
IwMDUvMDUvaWRlbnRpdHkvY2xhaW1zL3NpZCI6IkRERERERERELUREREQtRERERC1ERERELURERERERERERERERCIsImh0dHA6Ly9zY2hlbWFzLm1pY3Jvc29mdC5jb20vd3MvMjAwOC8wNi9pZGVudGl0eS9jbGFpbXMvcHJpbWFyeXNpZCI6ImRkZGRkZGRkLWRkZGQtZ
GRkZC1kZGRkLWRkZGRkZGRkZGRkZCIsImF1aSI6ImE1YTFiNzdhLWFhYTktNDFiNi05ZTRjLWQ2OWI4NzRiMmRkNCIsInNpZCI6IjU0OGJjZjNlLWU3MWYtNDI2YS1iYTY0LTNkNmZjNjVhM2JkYSIsImFjIjoiW3tcIlNjb3BlXCI6XCJyZWZzL2hlYWRzL2ZpeC90ZXN0
LWpheC12ZXJzaW9uLXRoYXQtYnJlYWtzLWNpXCIsXCJQZXJtaXNzaW9uXCI6M30se1wiU2NvcGVcIjpcInJlZnMvaGVhZHMvbWFzdGVyXCIsXCJQZXJtaXNzaW9uXCI6MX1dIiwib3JjaGlkIjoiMDhhMmQ1NDEtNTU5NC00NjBlLWFlOGQtMjM3YjUyZTYyYjY1LnRlc3Q
ubWFjb3MtbGF0ZXN0XzNfNyIsImlzcyI6InZzdG9rZW4uYWN0aW9ucy5naXRodWJ1c2VyY29udGVudC5jb20iLCJhdWQiOiJ2c3Rva2VuLmFjdGlvbnMuZ2l0aHVidXNlcmNvbnRlbnQuY29tfHZzbzo1YmY5NjQ5Zi01MjlhLTRhYmMtODAwYS1iNThhMDNjZDNlM2IiLC
JuYmYiOjE2MjQ5MTIzOTEsImV4cCI6MTYyNDkzNTE5MX0.o0oeOn2M2Dbx8-K3yhe4JGA7k9KmR9KBoVAujCk29uptx7HOPfB1kba1l4Ofylm1DeKuB0xfMF5Y8ttibvDTgH2HitCC3BMdL64LZ99IUNnjngkuUsGuQsFI3E3uwT3SF6OpQcaeLjtCV3Qx2iUGkPsWM8Tpt
XD0TH4IXw5NJsbx3rKHHC2aSM6384Im-Nu965w_7539XkaIyLkg8MFK9MTIBr0O0HfRxJqvvareP7ufdqDnvY9EVupoVCdSEs3Xe5fuYW_GJvsKHImbsGoRTOgTFgiwOFxYIiMvcjyU1PDjg3ttjBF0JiMmReypLgSsQqUD-BrPIvjKuHYuzQplTg
RUNNER_TRACKING_ID=github_f9075cf8-002b-4c98-a7e2-61bcc0d94891
ANDROID_NDK_18R_PATH=/Users/runner/Library/Android/sdk/ndk/18.1.5063045
GITHUB_WORKSPACE=/Users/runner/work/pyhf/pyhf
CI=true
GITHUB_ACTION_REF=v3
GITHUB_RUN_ID=980389940
ACTIONS_RUNTIME_URL=https://pipelines.actions.githubusercontent.com/7egiF0eguRHanWqGVl5G5J1mX1k4YmsTgFLGKvP1guMOJIVNqS/
LOGNAME=runner
ACTIONS_CACHE_URL=https://artifactcache.actions.githubusercontent.com/7egiF0eguRHanWqGVl5G5J1mX1k4YmsTgFLGKvP1guMOJIVNqS/
GITHUB_ENV=/Users/runner/work/_temp/_runner_file_commands/set_env_af7b2b08-a369-43cf-87cc-23e9c7f65cbc
LC_CTYPE=en_US.UTF-8
HOMEBREW_CLEANUP_PERIODIC_FULL_DAYS=3650
JAVA_HOME_13_X64=/Users/runner/hostedtoolcache/Java_Adopt_jdk/13.0.2-8.1/x64/Contents/Home
HOMEBREW_CASK_OPTS=--no-quarantine
POWERSHELL_DISTRIBUTION_CHANNEL=GitHub-Actions-macos1015
ANDROID_NDK_HOME=/Users/runner/Library/Android/sdk/ndk-bundle
BOOTSTRAP_HASKELL_NONINTERACTIVE=1
XCODE_11_DEVELOPER_DIR=/Applications/Xcode_11.7.app/Contents/Developer
GITHUB_REPOSITORY=scikit-hep/pyhf
GITHUB_PATH=/Users/runner/work/_temp/_runner_file_commands/add_path_af7b2b08-a369-43cf-87cc-23e9c7f65cbc
GITHUB_ACTION=mxschmittaction-tmate
DOTNET_MULTILEVEL_LOOKUP=0
_=/usr/bin/printenv

However, I am unable to replicate this at all on the Macbook Air

$ sw_vers
ProductName: Mac OS X
ProductVersion: 10.13.6
BuildVersion: 17G14042
$ python --version --version
Python 3.8.10 (default, Jun 27 2021, 18:38:01)
[Clang 10.0.0 (clang-1000.10.44.4)]
$ printenv
SSH_AGENT_PID=533
TERM_PROGRAM=iTerm.app
PYENV_ROOT=/Users/cerylinae/.pyenv
TERM=xterm-256color
SHELL=/bin/bash
TMPDIR=/var/folders/rx/t5jm47z56bxfxmbp2qs6fsj80000gn/T/
Apple_PubSub_Socket_Render=/private/tmp/com.apple.launchd.dWj7SOkSaA/Render
TERM_PROGRAM_VERSION=3.3.12
OLDPWD=/Users/cerylinae/Code
TERM_SESSION_ID=w0t0p0:FEA1A898-9304-451B-9F5E-765940B67423
PYENV_VERSION=pyhf-debug
USER=cerylinae
SSH_AUTH_SOCK=/var/folders/rx/t5jm47z56bxfxmbp2qs6fsj80000gn/T//ssh-g3V3yN8vZC0o/agent.532
__CF_USER_TEXT_ENCODING=0x0:0:0
PYENV_VIRTUALENV_INIT=1
VIRTUAL_ENV=/Users/cerylinae/.pyenv/versions/3.8.10/envs/pyhf-debug
PYENV_VIRTUAL_ENV=/Users/cerylinae/.pyenv/versions/3.8.10/envs/pyhf-debug
PATH=/Users/cerylinae/.pyenv/plugins/pyenv-virtualenv/shims:/Users/cerylinae/.pyenv/shims:/Users/cerylinae/.pyenv/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/opt/X11/bin
PWD=/Users/cerylinae/Code/pyhf
LANG=en_US.UTF-8
ITERM_PROFILE=Default
_OLD_VIRTUAL_PS1=\h:\W \u\$
XPC_FLAGS=0x0
PS1=(pyhf-debug) \h:\W \u\$
XPC_SERVICE_NAME=0
PYENV_SHELL=bash
SHLVL=1
HOME=/Users/cerylinae
COLORFGBG=7;0
LC_TERMINAL_VERSION=3.3.12
ITERM_SESSION_ID=w0t0p0:FEA1A898-9304-451B-9F5E-765940B67423
LOGNAME=cerylinae
LC_TERMINAL=iTerm2
DISPLAY=/private/tmp/com.apple.launchd.39ujEqef0g/org.macosforge.xquartz:0
PYENV_ACTIVATE_SHELL=1
COLORTERM=truecolor
_=/usr/bin/printenv

We thought that we would report this to the JAX team as we are unable to replicate this behavior with older versions of jaxlib, but as we’re unable to replicate this locally for jaxlib v0.1.68 if you’d like us to open complimentary issues with the GitHub Actions virtual environments team we’re happy to do so as well.

We’re happy to do whatever we can to try to help debug this, and if it is of any help there is a branch of the pyhf repo (fix/test-jax-version-that-breaks-ci) that has the examples shown here on it.

cc @lukasheinrich @kratsg @jpivarski

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:13 (5 by maintainers)

github_iconTop GitHub Comments

3reactions
hawkinspcommented, Jul 7, 2021

It looks like this is related to the alignment of an AVX instruction:

(lldb) run t.py
Process 2977 launched: '/Users/runner/hostedtoolcache/Python/3.7.10/x64/python' (x86_64)
Process 2977 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
    frame #0: 0x000000010525c983 xla_extension.so`std::__1::vector<tfrt::internal::EventCount::Waiter, std::__1::allocator<tfrt::internal::EventCount::Waiter> >::vector(unsigned long) + 99
xla_extension.so`std::__1::vector<tfrt::internal::EventCount::Waiter, std::__1::allocator<tfrt::internal::EventCount::Waiter> >::vector:
->  0x10525c983 <+99>:  vmovaps %ymm0, (%rcx)
    0x10525c987 <+103>: vmovaps %ymm0, 0x40(%rcx)
    0x10525c98c <+108>: vmovaps %ymm0, 0x20(%rcx)
    0x10525c991 <+113>: vmovaps %ymm0, 0x80(%rcx)
Target 1: (python) stopped.
(lldb) register read
General Purpose Registers:
       rax = 0x000000012310eeb0
       rbx = 0x000000012310eca0
       rcx = 0x000000012310eeb0
       rdx = 0x0000000000000003
       rdi = 0x00000000000000b1
       rsi = 0x0000000000080000
       rbp = 0x00007ffeefbfcaf0
       rsp = 0x00007ffeefbfcad0
        r8 = 0x0000000000000ae4
        r9 = 0x0000000000000b13
       r10 = 0x00000000fff80000
       r11 = 0x0000000115ecd1f0
       r12 = 0x0000000000000003
       r13 = 0x0000000000000003
       r14 = 0x0000000000000003
       r15 = 0x000000012310f1b0
       rip = 0x000000010525c983  xla_extension.so`std::__1::vector<tfrt::internal::EventCount::Waiter, std::__1::allocator<tfrt::internal::EventCount::Waiter> >::vector(unsigned long) + 99
    rflags = 0x0000000000010202
        cs = 0x000000000000002b
        fs = 0x0000000000000000
        gs = 0x0000000000000000

(lldb) x/4w 0x000000012310eeb0
0x12310eeb0: 0x00000000 0x00000000 0x00000000 0x00000000
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
  * frame #0: 0x000000010525c983 xla_extension.so`std::__1::vector<tfrt::internal::EventCount::Waiter, std::__1::allocator<tfrt::internal::EventCount::Waiter> >::vector(unsigned long) + 99
    frame #1: 0x000000010525c4d6 xla_extension.so`tfrt::internal::WorkQueueBase<tfrt::internal::NonBlockingWorkQueue<tfrt::internal::StdThreadingEnvironment> >::WorkQueueBase(tfrt::internal::QuiescingState*, llvm::StringRef, int) + 326
    frame #2: 0x00000001052592c8 xla_extension.so`tfrt::MultiThreadedWorkQueue::MultiThreadedWorkQueue(int, int) + 88
    frame #3: 0x000000010525b74f xla_extension.so`tfrt::CreateMultiThreadedWorkQueue(int, int) + 47
    frame #4: 0x0000000102ae44c0 xla_extension.so`xla::GetTfrtCpuClient(bool) + 112
    frame #5: 0x0000000102832f69 xla_extension.so`void pybind11::cpp_function::initialize<xla::pybind11_init_xla_extension(pybind11::module_&)::$_11, tensorflow::StatusOr<std::__1::shared_ptr<xla::PyClient> >, bool, pybind11::name, pybind11::scope, pybind11::sibling, pybind11::arg_v>(xla::pybind11_init_xla_extension(pybind11::module_&)::$_11&&, tensorflow::StatusOr<std::__1::shared_ptr<xla::PyClient> > (*)(bool), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&, pybind11::arg_v const&)::'lambda'(pybind11::detail::function_call&)::__invoke(pybind11::detail::function_call&) + 185
    frame #6: 0x000000010280f84e xla_extension.so`pybind11::cpp_function::dispatcher(_object*, _object*, _object*) + 3166
    frame #7: 0x000000010013592f libpython3.7m.dylib`_PyMethodDef_RawFastCallKeywords + 639
    frame #8: 0x000000010026a72c libpython3.7m.dylib`call_function + 364

I think vmovaps requires 32-byte alignment, but 0x...b0 is only 16-byte aligned.

This problem only appears to reproduce under the newer TFRT runtime. If you use jax from head, you can set JAX_CPU_BACKEND_VARIANT=stream_executor which works around the problem.

I’ll keep debugging, this is most odd.

2reactions
zhangqiaorjccommented, Jul 7, 2021

Now that the fix is in. I’m building a new jaxlib release.

Read more comments on GitHub >

github_iconTop Results From Across the Web

jaxlib v0.1.68 breaks CI with segfault on macOS #1501 - GitHub
jaxlib v0.1.68 causing nondeterministic segfault for only macOS on GitHub Actions and Azure Pipelines servers google/jax#7128.
Read more >
jaxlib segfaults on macOS virtual environments but not on ... - GitHub
jaxlib v0.1.68 causing nondeterministic segfault for only macOS on GitHub Actions and Azure Pipelines servers google/jax#7128.
Read more >
Incorrect results on jaxlib-0.1.68 CPU · Issue #7229 · google/jax
jaxlib -0.1.68 has defaulted to a new CPU backend; we recently noticed it ... jaxlib v0.1.68 causing nondeterministic segfault for only macOS on...
Read more >
Segfaults on MacOs Github Actions with jaxlib 0.1.68 #7198
I have been unable to reproduce it on my local (Mac) computer. I have been able to pinpoint this to jaxlib==0.1.68 . Downgrading...
Read more >
Tests segfault on MacOS #1146 - GitHub
The segfault error was observed for a while also in the github actions. It might be related to issue #1140 . The fixes...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found