Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Give sample weight for each row in training data

See original GitHub issue

I see the make_loss_fn has an argument weight_feature-name here https://github.com/tensorflow/ranking/blob/master/tensorflow_ranking/python/losses.py#L50

But I’m not sure what my training data row should look like if my data is in libsvm format. This is what a row currently looks like in my dataset 0 qid:236145 1:3.4222834 2:7.563366 3:-0.48238873 4:1.

Feature 4 is the feature corresponding to the sample weight (i.e. query frequency). I have created a ranking head like this

ranking_head = tfr.head.create_ranking_head(
      loss_fn=tfr.losses.make_loss_fn(_LOSS, weights_feature_name='4'),
      eval_metric_fns=eval_metric_fns(),
      train_op_fn=_train_op_fn)

But I get the following error:

File "<ipython-input-32-0f108a655421>", line 1, in <module>
    ranker.train(input_fn=lambda: input_data_fn(_TRAIN_DATA_PATH), steps=25000)
  File "/Users/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 356, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/Users/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 1181, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/Users/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 1211, in _train_model_default
    features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
  File "/Users/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 1169, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "/Users/lib/python2.7/site-packages/tensorflow_ranking/python/model.py", line 285, in _model_fn
    features=features, mode=mode, logits=logits, labels=labels)
  File "/Users/lib/python2.7/site-packages/tensorflow_ranking/python/head.py", line 196, in create_estimator_spec
    features=features, mode=mode, logits=logits, labels=labels)
  File "/Users/lib/python2.7/site-packages/tensorflow_ranking/python/head.py", line 146, in create_loss
    training_loss = self._loss_fn(labels, logits, features)
  File "/Users/lib/python2.7/site-packages/tensorflow_ranking/python/losses.py", line 156, in _loss_fn
    loss_ops.append(loss_fn(**kwargs))
  File "/Users/lib/python2.7/site-packages/tensorflow_ranking/python/losses.py", line 679, in _pairwise_logistic_loss
    _loss, labels, logits, weights, lambda_weight, reduction=reduction)
  File "/Users/lib/python2.7/site-packages/tensorflow_ranking/python/losses.py", line 586, in _pairwise_loss
    labels, logits, weights)
  File "/Users/lib/python2.7/site-packages/tensorflow_ranking/python/losses.py", line 481, in _sort_and_normalize
    weights = array_ops.ones_like(labels) * weights
  File "/Users/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 862, in binary_op_wrapper
    return func(x, y, name=name)
  File "/Users/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 1129, in _mul_dispatch
    return gen_math_ops.mul(x, y, name=name)
  File "/Users/lib/python2.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 5042, in mul
    "Mul", x=x, y=y, name=name)
  File "/Users/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/Users/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/Users/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3272, in create_op
    op_def=op_def)
  File "/Users/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1768, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Incompatible shapes: [64,3] vs. [64,3,1]
	 [[{{node head/pairwise_logistic_loss/mul}} = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](head/pairwise_logistic_loss/ones_like, IteratorGetNext:3)]]

[64,3] in the error message correspond to [batch_size, list_size].

What is the best way to give a query_id a specific weight in the training data?