Kernel freeze at tf.keras.Sequential.fit()
See original GitHub issueWhat I did?
Link to Colab: https://colab.research.google.com/drive/1g6BFapSuG0-WCQzxlrDsPKCcmaGemB9f?usp=sharing
Please use emails connected to the GitHub account for request - I’ll accept it. Notebook is related to my graduation project and I don’t want the work to go fully public yet.
I created a custom layer with quantum circuit in quantum_circuit() to represent 8x8 image - 4 readout qubits with two H gates, connected to 16 qubits by ZZ**(param) gates for each of 4 readouts. (8x8 extension of what can be found in MNIST Classification example.
The image is divided into 4 4x4 pieces, each connected to single readout qubit.
The data is represented similarly to what can be found in the example (X gate if normalized_color > 0.5).
I attached a softmax layer directly to quantum one for classification using tf.keras.Sequential model, since I want to extend it further - up to all 10 digits.
qnn_model = tf.keras.Sequential([
tf.keras.Input(shape=(), dtype=tf.string, name='q_input'),
tfq.layers.PQC(model_circuit, model_readout, name='quantum'),
tf.keras.layers.Dense(2, activation=tf.keras.activations.softmax, name='softmax'),
])
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
quantum (PQC) (None, 4) 64
_________________________________________________________________
softmax (Dense) (None, 2) 10
=================================================================
Total params: 74
Trainable params: 74
Non-trainable params: 0
_________________________________________________________________
I compiled the model and I tried to fit it.
What was expected to happen?
The model should start to iterate over given number of epochs.
What happened?
Epoch 1/10 is displayed, but nothing else happens.
- The Colab kernel restarts yielding log, that can be found in the Attachements section.
- Using WSL2 local environment I just encountered something I would call ‘a kernel freeze’. The cell was trying to run, but there was nothing happening - no CPU, RAM usage. The operation could not have been interrupted - only kernel restart worked.
Environment
tensorflow 2.3.1
tensorflow-quantum 0.4.0
for both:
- Google Colab
- Windows Subsystem Linux 2 (Ubuntu 20.04.1 LTS; Windows 10 Pro, build 20270)
No GPU involved.
What I found out?
When I try to run the notebook with compressed_image_size = 4 everything works as intended. I’ve checked my quantum_circuit() and it seems to be working as intended for version 8x8 - it generates circuit with desired architecture.
When I tried to trace down the error I found out that:
data_adapter.py:
enumerate_epochs() yields correct epoch, but the tf.data.Iterator data_iterator has AttributeErrors like
AttributeError: 'OwnedIterator' object has no attribute '_self_unconditional_checkpoint_dependencies'
in
_checkpoint_dependencies_deferred_dependencies
AttributeError: 'OwnedIterator' object has no attribute '_self_name_based_restores'
_name_based_restores
and:
AttributeError("'OwnedIterator' object has no attribute '_self_unconditional_checkpoint_dependencies'")
AttributeError("'OwnedIterator' object has no attribute '_self_unconditional_dependency_names'")
AttributeError("'OwnedIterator' object has no attribute '_self_update_uid'")
I’m not sure if this is relevant.
Attachments
colab-jupyter.log
Dec 15, 2020, 10:41:32 AM | WARNING | WARNING:root:kernel b6193863-8d44-476f-b8cc-eadbe7129967 restarted
Dec 15, 2020, 10:41:32 AM | INFO | KernelRestarter: restarting kernel (1/5), keep random ports
Dec 15, 2020, 10:40:56 AM | WARNING | 2020-12-15 09:40:56.133076: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
Dec 15, 2020, 10:40:56 AM | WARNING | 2020-12-15 09:40:56.133022: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1b91640 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
Dec 15, 2020, 10:40:56 AM | WARNING | 2020-12-15 09:40:56.131837: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2199995000 Hz
Dec 15, 2020, 10:40:56 AM | WARNING | To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Dec 15, 2020, 10:40:56 AM | WARNING | 2020-12-15 09:40:56.125112: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 FMA
Dec 15, 2020, 10:40:56 AM | WARNING | 2020-12-15 09:40:56.124271: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (0071d832075f): /proc/driver/nvidia/version does not exist
Dec 15, 2020, 10:40:56 AM | WARNING | 2020-12-15 09:40:56.123595: E tensorflow/stream_executor/cuda/cuda_driver.cc:314] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
Dec 15, 2020, 10:40:56 AM | WARNING | 2020-12-15 09:40:56.109400: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
Dec 15, 2020, 10:40:53 AM | WARNING | 2020-12-15 09:40:53.250994: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Dec 15, 2020, 10:37:53 AM | WARNING | WARNING:root:kernel b6193863-8d44-476f-b8cc-eadbe7129967 restarted
Dec 15, 2020, 10:37:53 AM | INFO | KernelRestarter: restarting kernel (1/5), keep random ports
Dec 15, 2020, 10:36:24 AM | WARNING | 2020-12-15 09:36:24.601416: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
Dec 15, 2020, 10:36:24 AM | WARNING | 2020-12-15 09:36:24.601370: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x20c3640 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
Dec 15, 2020, 10:36:24 AM | WARNING | 2020-12-15 09:36:24.600345: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2199995000 Hz
Dec 15, 2020, 10:36:24 AM | WARNING | To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Dec 15, 2020, 10:36:24 AM | WARNING | 2020-12-15 09:36:24.593357: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 FMA
Dec 15, 2020, 10:36:24 AM | WARNING | 2020-12-15 09:36:24.592695: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (0071d832075f): /proc/driver/nvidia/version does not exist
Dec 15, 2020, 10:36:24 AM | WARNING | 2020-12-15 09:36:24.592632: E tensorflow/stream_executor/cuda/cuda_driver.cc:314] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
Dec 15, 2020, 10:36:24 AM | WARNING | 2020-12-15 09:36:24.531111: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
Dec 15, 2020, 10:36:20 AM | WARNING | 2020-12-15 09:36:20.926549: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Dec 15, 2020, 10:36:01 AM | INFO | Adapting to protocol v5.1 for kernel b6193863-8d44-476f-b8cc-eadbe7129967
Dec 15, 2020, 10:33:42 AM | INFO | Adapting to protocol v5.1 for kernel b6193863-8d44-476f-b8cc-eadbe7129967
Dec 15, 2020, 10:33:41 AM | INFO | Kernel started: b6193863-8d44-476f-b8cc-eadbe7129967
Dec 15, 2020, 10:33:13 AM | INFO | Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
Dec 15, 2020, 10:33:13 AM | INFO | http://172.28.0.2:9000/
Dec 15, 2020, 10:33:13 AM | INFO | The Jupyter Notebook is running at:
Dec 15, 2020, 10:33:13 AM | INFO | 0 active kernels
Dec 15, 2020, 10:33:13 AM | INFO | Serving notebooks from local directory: /
Dec 15, 2020, 10:33:13 AM | INFO | google.colab serverextension initialized.
Dec 15, 2020, 10:33:13 AM | INFO | Writing notebook server cookie secret to /root/.local/share/jupyter/runtime/notebook_cookie_secret
Dec 15, 2020, 10:33:13 AM | WARNING | Config option `delete_to_trash` not recognized by `ColabFileContentsManager`.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:3
- Comments:8 (1 by maintainers)

Top Related StackOverflow Question
That’s awesome! Always happy to see more publications making use of TFQ!
No problem. So at first glance I think you’ve solved your own problem in your comment on the side there.
The
compressed_image_sizeis too big with a value of 8. Quick review on quantum circuit simulation:Simulating
nqubits takes2^nmemory. So looking at your code:compressed_image_size=8=>compressed_image_shape = (8,8)Then in the line:
qubits = cirq.GridQubit.rect(*compressed_image_shape)=>len(qubits) == 64Mathing that out really quick gives us a state vector with
2^64complex amplitudes where one amplitude is 64 bits means you requested 147 Exabytes of RAM. A bit too much 😃. In general simulations cap out around 30 qubits unless you’ve got some serious hardware and you might be able to push things up to 35-40.My guess is that the malloc call didn’t fail gracefully on that size which is a bug we should probably look into. Does this help clear things up ?