Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Regression: custom batches no longer supported

See original GitHub issue

Describe the bug In joeynmt<2.0.0, the TrainManager used to receive a batch_class to handle custom batches. In joeynmt=2.0.0, it is no longer needed, as there is the make_data_iter which calls a collate_fn responsible to return whatever.

This created a regression - it used to be possible to use this repository as a library - but now one has to make hard-coded changes in the code in order to have a custom make_data_iter.

It is hard coded in training and prediction

Possible Solutions: For training:

to have data_iter callable passed to the TrainManager
in the train manager, have a method so we can extend it:

def make_data_iter(self, **kwargs):
  return make_data_iter(**kwargs)

For prediction:

pass this as an argument to the predict method?

What solutions would you prefer/approve if any?

Issue Analytics

State:
Created a year ago
Comments:17 (8 by maintainers)

Top GitHub Comments

1reaction

AmitMYcommented, Aug 31, 2022

I moved it to two files as @may- and @juliakreutzer both thought “three classes for data-related functions might be confusing”.

I’d argue for merging and releasing as is (to support the functionality), and if down the line some restructuring needs to be done, that’s fine.

1reaction

may-commented, Aug 27, 2022

@AmitMY

Do we need to separate the load_data() func from data.py? I wonder if we really need both data.py and data_loader.py. If it’s just because of the circular import problem, we could move make_data_iter() from data.py to datasets.py, maybe? make_data_iter() is now actually a member of the dataset class. And other sampler funcs + collate func belong to this make_data_iter(). What do you think about this?

I did some nasty hack to avoid circular import before, and I regret it… 🤕 https://github.com/joeynmt/joeynmt/blob/38fcd3a2b19ee657f355cd9c6833c19cf29fb703/joeynmt/helpers.py#L29-L31 I should have put the log_data_info() in data.py file, instead.