RFC: Quantized Variables
See original GitHub issueThis Request for Comments (RFC) outlines a proposed addition to the API of larq
. It would be good to get feedback on this, so please ask questions if I need to elaborate more in some areas. WIP implementation can be viewed at #312.
Objective
larq
has a concept of quantized layers that define a quantized operation using quantized inputs and variables. Quantized layers are currently the only official way to interact with larq
. E.g. accessing the precision of quantized variables or inputs are cumbersome to get and require checking layer.quantized_latent_weights
and layer.quantizers
. This often requires downstream code that needs access to variables to rely on private methods. We can fix this by either introducing convenience functions and a stable API of quantized layers to access quantized variables or extend the API with the concept of a quantized variable. This document will elaborate on the latter since it allows new use cases for larq
and gives users more fine grained control when using custom layers or no layers at all. This concept can be extended to tf.Tensor
for quantized activations which we can discuss in a future RFC.
Design Proposal
There are a few ways we could implement this functionality, for the initial proposal I closely followed the implementation of AutoCastVariable
which only wraps tf.Variable
.
While the variable acts like a quantized variable the TensorFlow data type will still be floating-point in order to be compatible with all TensorFlow operations. This is typically referred to as fake quantization.
class QuantizedVariable(tf.Variable):
def __init__(self, variable, quantizer=None, precision=None):
"""Creates an QuantizedVariable instance.
# Arguments
variable: A floating-point resource variable to wrap.
quantizer: An optional quantizer to transform the floating-point
variable to a fake quantized variable.
precision: An optional integer defining the precision of the quantized
variable. If `None`, `quantizer.precision` is used.
"""
pass
@property
def precision(self):
"""Returns the integer precision of the quantized variable."""
pass
def latent_value(self):
"""Returns the latent value of the variable."""
pass
# Overload all operators to use the quantized version and delegate all assign
# operations to the latent weight.
This variable would act like a quantized variable in the forward pass by overloading the default operators like __add__
, __mul__
, etc. but would delegate updates via self.assign*
to the underlying latent variable. This would make reasoning about quantized variables a lot easier and would allow features like the model converter or model summary to have easy access to all information.
Open Questions
How can we handle variable saving?
For the purpose of resuming training the latent variable should be saved to the checkpoint. For checkpoints we can set this behaviour by delegating _gather_saveables_for_checkpoint
to the wrapped variable. Will this also work for Keras when saving to .h5
which might call get_weights
during saving?
How should we handle true (fake) quantized variables like in Bop?
The most straight forward approach would be to keep the current design and use quantizer=None
and precision=1
, though an alternative would be to introduce an extra class.
How can we support quantizers with trainable variables?
We currently don’t have a quantizer with trainable variables in larq
core, but probably add one in the near future (e.g. 8-bit). It would be good to see how this would work with this approach. See also #89
Considered Alternative implementations
Don’t quantize the value of the variable by default
This would have the advantage that we won’t run into any problems with variable saving or other side effect we might neglect. This would make usability a bit worse since we would need a context manager to define a scope within which the variable would return the quantized value.
Should we wrap an existing tf.Variable
or create a new one?
I think the approach of wrapping an existing tf.Variable
is easier to implement and maintain. To me, it also doesn’t have major downsides from an user perspective.
Extend quantized layers with methods to return quantized versions of the variables
We could extend the larq
layers to allow better access to the underlying variables. However I think ultimately this would be a worse abstraction than introducing a new variable type.
Testing process
Since this touches a few Keras and TensorFlow internals we need very good testing for it. Below I highlighted some code paths that might have non obvious interactions with the QuantizedVariable
and need extra care:
- Multi GPU training and
tf.DistributedVariable
- Keras Float 16 training using
AutoCastVariable
- TPUStrategy (will only be supported by TensorFlow 2.1+, so no current concern)
- Model checkpointing and saving (see above)
- Gradient computation in eager, functional and legacy graph mode
Issue Analytics
- State:
- Created 4 years ago
- Comments:22 (22 by maintainers)
Top GitHub Comments
From the point of few of a heavy Bop user: I like the NoOpQuantizer approach, I am in favor of NoQuantizer or NoOpQuantizer as name (in this order), FakeQuantizer is too confusing indeed. It looks pretty intuitive to me also and I can not think of something I want to do with Bop that would not work with this approach.
It looks like we all seem to prefer going for a
NoOpQuantizer
approach and we are only debating the name at this point. To summarise, the usage for an optimizer like Bop would be:Here,
NoOpQuantizer
doesn’t do any quantization and only setsprecision=1
on the variable and will include the relevant training metrics (see #402).The
is_binary
check inBop
would change to:@larq/core I am still very open to more suggestions on how to do this in a better way, but it would be good to come to an decision soon.