Lack of AVX in Docker image leading to Tensorflow crash?
See original GitHub issueWith the default version of numpy (1.13.1) and tensorflow (1.8.0) given in the Dockerfile, I get a message saying Illegal instruction (core dumped)
. Downgrading Tensorflow to 1.5.0 fixes the import issue, so I believe the problem is a lack of AVX support in the Docker image (see https://github.com/tensorflow/tensorflow/issues/17411). However, the code doesn’t seem to be backward compatible, as with tensorflow 1.5.0 (and numpy 1.15.1 since 1.13.1 wasn’t compatible) I then get:
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'batch_normalization_1/keras_learning_phase' with dtype bool
Here is my original stack trace via GDB (from the unmodified Dockerfile):
#0 0x00007fffce85dfd0 in std::pair<std::__detail::_Node_iterator<std::pair<tensorflow::Stri\
ngPiece const, std::function<bool (tensorflow::Variant*)> >, false, true>, bool> std::_Hasht\
able<tensorflow::StringPiece, std::pair<tensorflow::StringPiece const, std::function<bool (t\
ensorflow::Variant*)> >, std::allocator<std::pair<tensorflow::StringPiece const, std::functi\
on<bool (tensorflow::Variant*)> > >, std::__detail::_Select1st, std::equal_to<tensorflow::St\
ringPiece>, tensorflow::hash<tensorflow::StringPiece>, std::__detail::_Mod_range_hashing, st\
d::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hash\
table_traits<true, false, true> >::_M_emplace<std::pair<tensorflow::StringPiece, std::functi\
on<bool (tensorflow::Variant*)> > >(std::integral_constant<bool, true>, std::pair<tensorflow\
::StringPiece, std::function<bool (tensorflow::Variant*)> >&&) ()
from /opt/conda/lib/python3.6/site-packages/tensorflow/python/../libtensorflow_framework.\
so
#1 0x00007fffce8618e5 in tensorflow::UnaryVariantOpRegistry::RegisterDecodeFn(std::string c\
onst&, std::function<bool (tensorflow::Variant*)> const&) ()
from /opt/conda/lib/python3.6/site-packages/tensorflow/python/../libtensorflow_framework.\
so
#2 0x00007fffce83d95c in tensorflow::variant_op_registry_fn_registration::UnaryVariantDecod\
eRegistration<tensorflow::Tensor>::UnaryVariantDecodeRegistration(std::string const&) ()
from /opt/conda/lib/python3.6/site-packages/tensorflow/python/../libtensorflow_framework.\
so
#3 0x00007fffce7a91f5 in _GLOBAL__sub_I_tensor.cc ()
from /opt/conda/lib/python3.6/site-packages/tensorflow/python/../libtensorflow_framework.\
so
#4 0x00007ffff7de885a in call_init (l=<optimized out>, argc=argc@entry=2,
argv=argv@entry=0x7fffffffec18, env=env@entry=0x555555e59d40) at dl-init.c:72
#5 0x00007ffff7de896b in call_init (env=0x555555e59d40, argv=0x7fffffffec18, argc=2,
l=<optimized out>) at dl-init.c:30
#6 _dl_init (main_map=main_map@entry=0x5555566dfde0, argc=2, argv=0x7fffffffec18,
env=0x555555e59d40) at dl-init.c:120
#7 0x00007ffff7decf18 in dl_open_worker (a=a@entry=0x7fffffff6380) at dl-open.c:575
#8 0x00007ffff7de8704 in _dl_catch_error (objname=objname@entry=0x7fffffff6370,
errstring=errstring@entry=0x7fffffff6378, mallocedp=mallocedp@entry=0x7fffffff636f,
operate=operate@entry=0x7ffff7decb30 <dl_open_worker>, args=args@entry=0x7fffffff6380)
at dl-error.c:187
How did everyone else fix this (or this somehow specific to my set-up)?
Note: I’ve only worked with PyTorch, not Tensorflow/Keras
Issue Analytics
- State:
- Created 5 years ago
- Comments:11
Top Results From Across the Web
Tensorflow and official docker image - CPU instruction issue
I use a Synology NAS, which also fails on the default install due to the lack of AVX instruction sets. Generate a custom...
Read more >Make a Dockerfile that compiles a Tensorflow binary to use
Is it possible to have a docker file that compiles a Tensorflow binary to use: SSE4.1, SSE4.2 (and GPU, I have only found...
Read more >How to Resolve The Error “Illegal instruction (core dumped ...
This means that TensorFlow has crashed even before it does anything. ... the crash might be caused by the absence of AVX support...
Read more >Docker tensorflow/tensorflow:2.3.0rc0-gpu-jupyter - Snyk
A flaw was discovered in OpenLDAP before 2.4.57 leading to an invalid pointer free and slapd crash in the saslAuthzTo processing, resulting in...
Read more >Amazon EC2 FAQs - Amazon Web Services
An Amazon Machine Image (AMI) is simply a packaged-up environment that includes ... integrates with leading ML frameworks, such as PyTorch and TensorFlow, ......
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I was also banking on Docker magic. That being said, I tried from a Google cloud machine (with a CPU that supports AVX) and it did work like magic. I’m happy with just using a secondary machine and calling it from Paperspace (the one without AVX support) via the API (which is super handy by the way).
I’m a bit busy at the moment, so no PR from me, but do keep me posted. Happy to help test any updates.
You’re probably right. I assumed Docker would magically take care of this kind of thing but I guess not.
This is something we should fix. Would you be interested in doing a PR for this? If not I can put one together but it won’t be until late next week.