Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Lack of AVX in Docker image leading to Tensorflow crash?

See original GitHub issue

With the default version of numpy (1.13.1) and tensorflow (1.8.0) given in the Dockerfile, I get a message saying Illegal instruction (core dumped). Downgrading Tensorflow to 1.5.0 fixes the import issue, so I believe the problem is a lack of AVX support in the Docker image (see https://github.com/tensorflow/tensorflow/issues/17411). However, the code doesn’t seem to be backward compatible, as with tensorflow 1.5.0 (and numpy 1.15.1 since 1.13.1 wasn’t compatible) I then get:

tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'batch_normalization_1/keras_learning_phase' with dtype bool

Here is my original stack trace via GDB (from the unmodified Dockerfile):

#0  0x00007fffce85dfd0 in std::pair<std::__detail::_Node_iterator<std::pair<tensorflow::Stri\
ngPiece const, std::function<bool (tensorflow::Variant*)> >, false, true>, bool> std::_Hasht\
able<tensorflow::StringPiece, std::pair<tensorflow::StringPiece const, std::function<bool (t\
ensorflow::Variant*)> >, std::allocator<std::pair<tensorflow::StringPiece const, std::functi\
on<bool (tensorflow::Variant*)> > >, std::__detail::_Select1st, std::equal_to<tensorflow::St\
ringPiece>, tensorflow::hash<tensorflow::StringPiece>, std::__detail::_Mod_range_hashing, st\
d::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hash\
table_traits<true, false, true> >::_M_emplace<std::pair<tensorflow::StringPiece, std::functi\
on<bool (tensorflow::Variant*)> > >(std::integral_constant<bool, true>, std::pair<tensorflow\
::StringPiece, std::function<bool (tensorflow::Variant*)> >&&) ()
   from /opt/conda/lib/python3.6/site-packages/tensorflow/python/../libtensorflow_framework.\
so
#1  0x00007fffce8618e5 in tensorflow::UnaryVariantOpRegistry::RegisterDecodeFn(std::string c\
onst&, std::function<bool (tensorflow::Variant*)> const&) ()
   from /opt/conda/lib/python3.6/site-packages/tensorflow/python/../libtensorflow_framework.\
so
#2  0x00007fffce83d95c in tensorflow::variant_op_registry_fn_registration::UnaryVariantDecod\
eRegistration<tensorflow::Tensor>::UnaryVariantDecodeRegistration(std::string const&) ()
   from /opt/conda/lib/python3.6/site-packages/tensorflow/python/../libtensorflow_framework.\
so
#3  0x00007fffce7a91f5 in _GLOBAL__sub_I_tensor.cc ()
   from /opt/conda/lib/python3.6/site-packages/tensorflow/python/../libtensorflow_framework.\
so
#4  0x00007ffff7de885a in call_init (l=<optimized out>, argc=argc@entry=2,
    argv=argv@entry=0x7fffffffec18, env=env@entry=0x555555e59d40) at dl-init.c:72
#5  0x00007ffff7de896b in call_init (env=0x555555e59d40, argv=0x7fffffffec18, argc=2,
    l=<optimized out>) at dl-init.c:30
#6  _dl_init (main_map=main_map@entry=0x5555566dfde0, argc=2, argv=0x7fffffffec18,
    env=0x555555e59d40) at dl-init.c:120
#7  0x00007ffff7decf18 in dl_open_worker (a=a@entry=0x7fffffff6380) at dl-open.c:575
#8  0x00007ffff7de8704 in _dl_catch_error (objname=objname@entry=0x7fffffff6370,
    errstring=errstring@entry=0x7fffffff6378, mallocedp=mallocedp@entry=0x7fffffff636f,
    operate=operate@entry=0x7ffff7decb30 <dl_open_worker>, args=args@entry=0x7fffffff6380)
    at dl-error.c:187

How did everyone else fix this (or this somehow specific to my set-up)?

Note: I’ve only worked with PyTorch, not Tensorflow/Keras

Issue Analytics

State:
Created 5 years ago
Comments:11

Top GitHub Comments

1reaction

rachel-1commented, Aug 29, 2018

I was also banking on Docker magic. That being said, I tried from a Google cloud machine (with a CPU that supports AVX) and it did work like magic. I’m happy with just using a secondary machine and calling it from Paperspace (the one without AVX support) via the API (which is super handy by the way).

I’m a bit busy at the moment, so no PR from me, but do keep me posted. Happy to help test any updates.

0reactions

bdwyer2commented, Aug 29, 2018

You’re probably right. I assumed Docker would magically take care of this kind of thing but I guess not.

This is something we should fix. Would you be interested in doing a PR for this? If not I can put one together but it won’t be until late next week.

Top Results From Across the Web

Tensorflow and official docker image - CPU instruction issue

I use a Synology NAS, which also fails on the default install due to the lack of AVX instruction sets. Generate a custom...

Make a Dockerfile that compiles a Tensorflow binary to use

Is it possible to have a docker file that compiles a Tensorflow binary to use: SSE4.1, SSE4.2 (and GPU, I have only found...

How to Resolve The Error “Illegal instruction (core dumped ...

This means that TensorFlow has crashed even before it does anything. ... the crash might be caused by the absence of AVX support...

Docker tensorflow/tensorflow:2.3.0rc0-gpu-jupyter - Snyk

A flaw was discovered in OpenLDAP before 2.4.57 leading to an invalid pointer free and slapd crash in the saslAuthzTo processing, resulting in...

Amazon EC2 FAQs - Amazon Web Services

An Amazon Machine Image (AMI) is simply a packaged-up environment that includes ... integrates with leading ML frameworks, such as PyTorch and TensorFlow, ......