Passing explicit training=True to DNNRankingNetwork instance when invoking its call() function produces different output during training
See original GitHub issueHello Team,
Library versions:
- TensorFlow 2.5.0
- TensorFlow Ranking 0.4.2
TL;DR:
I trained two models (let’s calls them A and B (with same fixed seed, same training dataset input, the same hyper params). On each re-training run, each model consistently and deterministically produces the same results during training. The only difference is when training the two models, in the model B, I am explicitly passing argument training=True to the DNNRankingNetwork instance (basically when invoking its parent’s call() function) when building an instance of tf.keras.Model for training.
Because I pass in training=True when constructing model B, its training output consistently and deterministically different to the output of model A.
Details
First, I noticed that in https://github.com/tensorflow/ranking/blob/v0.4.2/tensorflow_ranking/python/keras/model.py#L31-L78 when calling the instance of DNNRankingNetwork (it is actually an instance of tf.keras.layers.Layer through the inheritance chain) , you can pass another parameter training=True, e.g.:
network(inputs=keras_inputs, mask=mask, training=True)
According to the docs in the code, this boolean arg controls whether running in training or inference mode. By default it is set to None.
I tried to trace what it is doing and how it is being used. When we pass the input of type TensorSpec to DNNRankingNetwork, we actually invoke its .call(..) under the hood. This respectively is calling the tensorflow_ranking/python/keras.EncodeListwiseFeatures.call(…), which is calling the tf.keras.layers.DenseFeatures.call(…)
By default, in the DenseFeatures , the training arg is None, which causes the invocation:
training = backend.learning_phase()
Eventually, in the DenseFeatures, the ‘training’ param gets passed to FeatureColumn.get_dense_tensor(..)
The only place that I see that sets the training param not to be None is score() definition in DNNRankingNetwork which has training=True by default.
But, having said that, at the same time it seems that when DNNRankingNetwork instance is called (i.e., network(inputs=keras_inputs, mask=mask)), the training=None.
Therefore, the DNNRankingNetwork’s respective parent RankingNetwork.call(..) would invoke the UnivariateRankingNetwork.compute_logits(...) with passing the training=None. The compute_logits(..) then invokes the listwise_scoring(...) with training=None also, which respectively passes the training=None to the DNNRankingNetwork.score(...), which is the scorer variable in the listwise_scoring(..) function. Thus overriding the default training=True.
In other words, it seems that at runtime, training=None gets propagated to the score(..) of the DNNRankingNetwork instance. Could you please confirm / sanity check?
Questions
It seems that passing training=True to the DNNRankingNetwork instance (when invoking its call() function) affects the training output.
- Is there any chance that
DNNRankingNetwork’sscore(..)at runtime is called withtraining=Nonein the current implementation of the function in dnn.py? - Why model.py is not calling an instance
DNNRankingNetworkwith an explicittraining=Truewhen creating an instance oftf.keras.Model? - As a better practice, should the instance of
DNNRankingNetworklayer be called with an explicittraining=Truein general, when creating a model instance for training? I am asking because eventually the following call function will be invoked in keras.layers.Layer which has a not so trivial decision tree related to handling of thetrainingvariable to determine if we are intrainingmode. Check the _set_training_mode(…) and _functional_construction_call(…)
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (1 by maintainers)

Top Related StackOverflow Question
Training model
Awithout passing explicittraining=Trueto the instance ofDNNRankingNetworklayer when calling it (i.e.:network(inputs=keras_inputs, mask=mask)). The following are epoch val loss output observed during training:Now, I am training model
B, by callingDNNRankingNetworklayer instance with an explicittraining=True. The following are epoch output observed during training:The results are different JUST because I passed that argument in as
True(To remind: both models were trained with the same fixed seed, same training dataset input, the same hyper params, i.e.: the training runs are deterministic, as per the TL;DR)I expected that explicitly setting
training=Truewould NOT produce a different output, since we already should be in a training mode, hopefully correctly determined by the FrameworkHi @ramakumar1729 , thank you. Did you have a chance to take a look?