question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Give sample weight for each row in training data

See original GitHub issue

I see the make_loss_fn has an argument weight_feature-name here https://github.com/tensorflow/ranking/blob/master/tensorflow_ranking/python/losses.py#L50

But I’m not sure what my training data row should look like if my data is in libsvm format. This is what a row currently looks like in my dataset 0 qid:236145 1:3.4222834 2:7.563366 3:-0.48238873 4:1.

Feature 4 is the feature corresponding to the sample weight (i.e. query frequency). I have created a ranking head like this

ranking_head = tfr.head.create_ranking_head(
      loss_fn=tfr.losses.make_loss_fn(_LOSS, weights_feature_name='4'),
      eval_metric_fns=eval_metric_fns(),
      train_op_fn=_train_op_fn)

But I get the following error:

File "<ipython-input-32-0f108a655421>", line 1, in <module>
    ranker.train(input_fn=lambda: input_data_fn(_TRAIN_DATA_PATH), steps=25000)
  File "/Users/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 356, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/Users/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 1181, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/Users/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 1211, in _train_model_default
    features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
  File "/Users/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 1169, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "/Users/lib/python2.7/site-packages/tensorflow_ranking/python/model.py", line 285, in _model_fn
    features=features, mode=mode, logits=logits, labels=labels)
  File "/Users/lib/python2.7/site-packages/tensorflow_ranking/python/head.py", line 196, in create_estimator_spec
    features=features, mode=mode, logits=logits, labels=labels)
  File "/Users/lib/python2.7/site-packages/tensorflow_ranking/python/head.py", line 146, in create_loss
    training_loss = self._loss_fn(labels, logits, features)
  File "/Users/lib/python2.7/site-packages/tensorflow_ranking/python/losses.py", line 156, in _loss_fn
    loss_ops.append(loss_fn(**kwargs))
  File "/Users/lib/python2.7/site-packages/tensorflow_ranking/python/losses.py", line 679, in _pairwise_logistic_loss
    _loss, labels, logits, weights, lambda_weight, reduction=reduction)
  File "/Users/lib/python2.7/site-packages/tensorflow_ranking/python/losses.py", line 586, in _pairwise_loss
    labels, logits, weights)
  File "/Users/lib/python2.7/site-packages/tensorflow_ranking/python/losses.py", line 481, in _sort_and_normalize
    weights = array_ops.ones_like(labels) * weights
  File "/Users/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 862, in binary_op_wrapper
    return func(x, y, name=name)
  File "/Users/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 1129, in _mul_dispatch
    return gen_math_ops.mul(x, y, name=name)
  File "/Users/lib/python2.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 5042, in mul
    "Mul", x=x, y=y, name=name)
  File "/Users/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/Users/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/Users/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3272, in create_op
    op_def=op_def)
  File "/Users/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1768, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Incompatible shapes: [64,3] vs. [64,3,1]
	 [[{{node head/pairwise_logistic_loss/mul}} = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](head/pairwise_logistic_loss/ones_like, IteratorGetNext:3)]]

[64,3] in the error message correspond to [batch_size, list_size].

What is the best way to give a query_id a specific weight in the training data?

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
xuanhuiwangcommented, Aug 5, 2019

@PoorvaRane that should work for the losses that accept per-example weights (most of them do).

1reaction
xuanhuiwangcommented, Aug 2, 2019

Thanks for the help, @eggie5. This was an issue before and we had a fix in the latest release and the code is here: https://github.com/tensorflow/ranking/blob/master/tensorflow_ranking/python/losses.py#L121.

You may want to upgrade your tf-ranking library or explicitly call tf.sequeeze(…, axis=2) in your transform_fn.

Read more comments on GitHub >

github_iconTop Results From Across the Web

weights_column — H2O 3.38.0.3 documentation
This option specifies the column in a training frame to be used when determining weights. Weights are per-row observation weights and do not...
Read more >
Why Weight? The Importance of Training on Balanced Datasets
It is important to train models on balanced data sets (unless there is a particular application to weight a certain class with more...
Read more >
How do sample weights work in classification models?
Here C is the same for each training sample, assigning equal 'cost' to each instance. In the case that there are sample weights...
Read more >
Adding custom weights to training data in PyTorch
More explicitly, I'd like to add a custom weight for every row in my dataset. By default, the weights are 1, which means...
Read more >
How to create a sample from an R data frame if weights are ...
How to create a sample from an R data frame if weights are assigned to the row values? - To create a random...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found