question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Regression: custom batches no longer supported

See original GitHub issue

Describe the bug In joeynmt<2.0.0, the TrainManager used to receive a batch_class to handle custom batches. In joeynmt=2.0.0, it is no longer needed, as there is the make_data_iter which calls a collate_fn responsible to return whatever.

This created a regression - it used to be possible to use this repository as a library - but now one has to make hard-coded changes in the code in order to have a custom make_data_iter.

It is hard coded in training and prediction

Possible Solutions: For training:

  1. to have data_iter callable passed to the TrainManager
  2. in the train manager, have a method so we can extend it:
def make_data_iter(self, **kwargs):
  return make_data_iter(**kwargs)

For prediction:

  1. pass this as an argument to the predict method?

What solutions would you prefer/approve if any?

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:17 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
AmitMYcommented, Aug 31, 2022

I moved it to two files as @may- and @juliakreutzer both thought “three classes for data-related functions might be confusing”.

I’d argue for merging and releasing as is (to support the functionality), and if down the line some restructuring needs to be done, that’s fine.

1reaction
may-commented, Aug 27, 2022

@AmitMY

Do we need to separate the load_data() func from data.py? I wonder if we really need both data.py and data_loader.py. If it’s just because of the circular import problem, we could move make_data_iter() from data.py to datasets.py, maybe? make_data_iter() is now actually a member of the dataset class. And other sampler funcs + collate func belong to this make_data_iter(). What do you think about this?


I did some nasty hack to avoid circular import before, and I regret it… 🤕 https://github.com/joeynmt/joeynmt/blob/38fcd3a2b19ee657f355cd9c6833c19cf29fb703/joeynmt/helpers.py#L29-L31 I should have put the log_data_info() in data.py file, instead.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Should the custom loss function in Keras return a single loss ...
I think the loss function should return loss values for every sample in the batch. So the loss function shoud give an array...
Read more >
How to use Different Batch Sizes when Training and ...
In this tutorial, you will discover how you can address this problem and even use different batch sizes during training and predicting. After ......
Read more >
Regression list for IBM WebSphere Application Server ...
The asserted user 'unauthenticated' is no longer authenticated. ... Medium. NPE submitting batch job with no servers available.
Read more >
sklearn Linear Regression vs Batch Gradient Descent
I have a small data set and wanted to use Batch Gradient Descent (self written) as an intermediate step for my own edification....
Read more >
Get batch predictions and explanations | Vertex AI
This page shows you how to make a batch prediction request to your trained classification or regression model using the Google Cloud console...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found