Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[🐛BUG] Benchmark preset error with sequential dataset

See original GitHub issue

Hi, I am trying to use a sequential benchmark dataset, but I get following error:

Traceback (most recent call last):
  File "C:/Users/apein/Documents/projects/humrecsys/seq_rec/run.py", line 68, in <module>
    main(args=args)
  File "C:/Users/apein/Documents/projects/humrecsys/seq_rec/run.py", line 42, in main
    dataset = create_dataset(config)
  File "C:\Users\apein\.virtualenvs\humrecsys-Aky0O99T\lib\site-packages\recbole\data\utils.py", line 41, in create_dataset
    return SequentialDataset(config)
  File "C:\Users\apein\.virtualenvs\humrecsys-Aky0O99T\lib\site-packages\recbole\data\dataset\sequential_dataset.py", line 38, in __init__
    self._benchmark_presets()
  File "C:\Users\apein\.virtualenvs\humrecsys-Aky0O99T\lib\site-packages\recbole\data\dataset\sequential_dataset.py", line 146, in _benchmark_presets
    self.inter_feat[self.item_list_length_field] = self.inter_feat[self.item_id_list_field].agg(len)
AttributeError: 'SequentialDataset' object has no attribute 'item_id_list_field'

This error only appears while using a sequential model. General models work fine with the benchmark dataset.

Issue Analytics

State:
Created 2 years ago
Comments:7 (4 by maintainers)

Top GitHub Comments

1reaction

hyp1231commented, Dec 3, 2021

Hi, I understand that you want to ensure that session id doesn’t overlap between train set, validation set and test set. Currently, we suggest to pre-split the dataset into xxx.train.inter, xxx.valid.inter and xxx.test.inter and load it via benchmark_filename (one example, session_based_rec_example, you can also download the processed diginetica-session dataset [download] to check it detailly.)

The 1. group by user and 2. split pipeline is mainly designed for general collaborative filtering methods or sequential recommendation methods (especially leave one out evaluation). For session-based recommendation, we recommend following the way in the above example.

0reactions

pintonoscommented, Dec 3, 2021

Sorry I have to reopen this. I was wondering about the data splitting: According to the documentation the group_by evaluation parameter, if set to user, groups the dataset by the user (or session when used sequentially) before splitting into train, valid and test set. This would imply for me, that e.g. a session id from the valid set is not part of the train set. However, this is not the case! In the split_by_ratio function of Dataset each session gets equally split into the 3 sets. Is this intended? Whats the point of grouping then? How can I ensure to split the data session-wise only?

Top Results From Across the Web

BUGSJS: a benchmark and taxonomy of JavaScript bugs

Dataset. BugsJS, a benchmark of 453 manually selected and validated JS bugs from 10 JS Node.js programs pertaining to the Mocha testing ...

A Benchmark Suite of Real-World Java Concurrency Bugs

However, developing concurrent code remains challenging and error-prone because threads are non- deterministically scheduled and can inappropriately interact.

A systematic literature review on benchmarks for evaluating ...

A bug benchmark is a curated collection of bugs and their corresponding artifacts and data, created with the intent to evaluate debugging tools...

CoREBench: Studying Complexity of Regression Errors

We study the complexity of actual regression errors and establish that seeded errors in existing benchmarks are significantly less complex. CoREBench and the ......

Bug listing with status RESOLVED with resolution OBSOLETE ...

Bug :1523 - "[IDEA] Offload work by distributing trivial ebuild ... Bug:337534 - "[TRACKER] Insecure LD_LIBRARY_PATH setting bugs found in Debian" ...