question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RFC: Quantized Variables

See original GitHub issue

This Request for Comments (RFC) outlines a proposed addition to the API of larq. It would be good to get feedback on this, so please ask questions if I need to elaborate more in some areas. WIP implementation can be viewed at #312.

Objective

larq has a concept of quantized layers that define a quantized operation using quantized inputs and variables. Quantized layers are currently the only official way to interact with larq. E.g. accessing the precision of quantized variables or inputs are cumbersome to get and require checking layer.quantized_latent_weights and layer.quantizers. This often requires downstream code that needs access to variables to rely on private methods. We can fix this by either introducing convenience functions and a stable API of quantized layers to access quantized variables or extend the API with the concept of a quantized variable. This document will elaborate on the latter since it allows new use cases for larq and gives users more fine grained control when using custom layers or no layers at all. This concept can be extended to tf.Tensor for quantized activations which we can discuss in a future RFC.

Design Proposal

There are a few ways we could implement this functionality, for the initial proposal I closely followed the implementation of AutoCastVariable which only wraps tf.Variable. While the variable acts like a quantized variable the TensorFlow data type will still be floating-point in order to be compatible with all TensorFlow operations. This is typically referred to as fake quantization.

class QuantizedVariable(tf.Variable):
    def __init__(self, variable, quantizer=None, precision=None):
        """Creates an QuantizedVariable instance.

        # Arguments
        variable: A floating-point resource variable to wrap.
        quantizer: An optional quantizer to transform the floating-point
            variable to a fake quantized variable.
        precision: An optional integer defining the precision of the quantized
            variable. If `None`, `quantizer.precision` is used.
        """
        pass

    @property
    def precision(self):
        """Returns the integer precision of the quantized variable."""
        pass

    def latent_value(self):
        """Returns the latent value of the variable."""
        pass

    # Overload all operators to use the quantized version and delegate all assign
    # operations to the latent weight.

This variable would act like a quantized variable in the forward pass by overloading the default operators like __add__, __mul__, etc. but would delegate updates via self.assign* to the underlying latent variable. This would make reasoning about quantized variables a lot easier and would allow features like the model converter or model summary to have easy access to all information.

Open Questions

How can we handle variable saving?

For the purpose of resuming training the latent variable should be saved to the checkpoint. For checkpoints we can set this behaviour by delegating _gather_saveables_for_checkpoint to the wrapped variable. Will this also work for Keras when saving to .h5 which might call get_weights during saving?

How should we handle true (fake) quantized variables like in Bop?

The most straight forward approach would be to keep the current design and use quantizer=None and precision=1, though an alternative would be to introduce an extra class.

How can we support quantizers with trainable variables?

We currently don’t have a quantizer with trainable variables in larq core, but probably add one in the near future (e.g. 8-bit). It would be good to see how this would work with this approach. See also #89

Considered Alternative implementations

Don’t quantize the value of the variable by default

This would have the advantage that we won’t run into any problems with variable saving or other side effect we might neglect. This would make usability a bit worse since we would need a context manager to define a scope within which the variable would return the quantized value.

Should we wrap an existing tf.Variable or create a new one?

I think the approach of wrapping an existing tf.Variable is easier to implement and maintain. To me, it also doesn’t have major downsides from an user perspective.

Extend quantized layers with methods to return quantized versions of the variables

We could extend the larq layers to allow better access to the underlying variables. However I think ultimately this would be a worse abstraction than introducing a new variable type.

Testing process

Since this touches a few Keras and TensorFlow internals we need very good testing for it. Below I highlighted some code paths that might have non obvious interactions with the QuantizedVariable and need extra care:

  • Multi GPU training and tf.DistributedVariable
  • Keras Float 16 training using AutoCastVariable
  • TPUStrategy (will only be supported by TensorFlow 2.1+, so no current concern)
  • Model checkpointing and saving (see above)
  • Gradient computation in eager, functional and legacy graph mode

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:22 (22 by maintainers)

github_iconTop GitHub Comments

7reactions
MariaHeusscommented, Jan 31, 2020

From the point of few of a heavy Bop user: I like the NoOpQuantizer approach, I am in favor of NoQuantizer or NoOpQuantizer as name (in this order), FakeQuantizer is too confusing indeed. It looks pretty intuitive to me also and I can not think of something I want to do with Bop that would not work with this approach.

4reactions
lgeigercommented, Jan 31, 2020

It looks like we all seem to prefer going for a NoOpQuantizer approach and we are only debating the name at this point. To summarise, the usage for an optimizer like Bop would be:

layer = lq.layers.QuantDense(
    32,
    input_quantizer=lq.quantizers.SteSign(),
    kernel_quantizer=lq.quantizers.NoOpQuantizer(precision=1),
)

Here, NoOpQuantizer doesn’t do any quantization and only sets precision=1 on the variable and will include the relevant training metrics (see #402).

The is_binary check in Bop would change to:

@staticmethod
def is_binary_variable(var):
    return var.precision == 1 and var.quantizer is None

@larq/core I am still very open to more suggestions on how to do this in a better way, but it would be good to come to an decision soon.

Read more comments on GitHub >

github_iconTop Results From Across the Web

[RFC][Quantization] Quantization in TVM - Apache TVM Discuss
In calibration, I provide a callback through which users can set the scale and zero point variables to values, and run intermediate parts...
Read more >
RFC 3951 - Internet Low Bit Rate Codec (iLBC)
Internet Low Bit Rate Codec (iLBC) (RFC 3951, December 2004. ... Subsequently, the LPC residual is computed by using the quantized and interpolated...
Read more >
RFC 6386: VP8 Data Format and Decoding Guide
It is always coded and acts as a baseline for the other 5 quantization indices, ... The probability table used to decode this...
Read more >
Quantization-Aware Training support in Keras #27880 - GitHub
You need to call tf.contrib.quantize.create_training_graph() again to get the training graph. However, you cannot initialized the variable...
Read more >
JPEG - Wikipedia
JPEG is a commonly used method of lossy compression for digital images, particularly for ... DQT, 0xFF, 0xDB, variable size, Define Quantization Table(s) ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found