question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[🐛BUG] Benchmark preset error with sequential dataset

See original GitHub issue

Hi, I am trying to use a sequential benchmark dataset, but I get following error:

Traceback (most recent call last):
  File "C:/Users/apein/Documents/projects/humrecsys/seq_rec/run.py", line 68, in <module>
    main(args=args)
  File "C:/Users/apein/Documents/projects/humrecsys/seq_rec/run.py", line 42, in main
    dataset = create_dataset(config)
  File "C:\Users\apein\.virtualenvs\humrecsys-Aky0O99T\lib\site-packages\recbole\data\utils.py", line 41, in create_dataset
    return SequentialDataset(config)
  File "C:\Users\apein\.virtualenvs\humrecsys-Aky0O99T\lib\site-packages\recbole\data\dataset\sequential_dataset.py", line 38, in __init__
    self._benchmark_presets()
  File "C:\Users\apein\.virtualenvs\humrecsys-Aky0O99T\lib\site-packages\recbole\data\dataset\sequential_dataset.py", line 146, in _benchmark_presets
    self.inter_feat[self.item_list_length_field] = self.inter_feat[self.item_id_list_field].agg(len)
AttributeError: 'SequentialDataset' object has no attribute 'item_id_list_field'

This error only appears while using a sequential model. General models work fine with the benchmark dataset.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
hyp1231commented, Dec 3, 2021

Hi, I understand that you want to ensure that session id doesn’t overlap between train set, validation set and test set. Currently, we suggest to pre-split the dataset into xxx.train.inter, xxx.valid.inter and xxx.test.inter and load it via benchmark_filename (one example, session_based_rec_example, you can also download the processed diginetica-session dataset [download] to check it detailly.)

The 1. group by user and 2. split pipeline is mainly designed for general collaborative filtering methods or sequential recommendation methods (especially leave one out evaluation). For session-based recommendation, we recommend following the way in the above example.

0reactions
pintonoscommented, Dec 3, 2021

Sorry I have to reopen this. I was wondering about the data splitting: According to the documentation the group_by evaluation parameter, if set to user, groups the dataset by the user (or session when used sequentially) before splitting into train, valid and test set. This would imply for me, that e.g. a session id from the valid set is not part of the train set. However, this is not the case! In the split_by_ratio function of Dataset each session gets equally split into the 3 sets. Is this intended? Whats the point of grouping then? How can I ensure to split the data session-wise only?

Read more comments on GitHub >

github_iconTop Results From Across the Web

BUGSJS: a benchmark and taxonomy of JavaScript bugs
Dataset. BugsJS, a benchmark of 453 manually selected and validated JS bugs from 10 JS Node.js programs pertaining to the Mocha testing ...
Read more >
A Benchmark Suite of Real-World Java Concurrency Bugs
However, developing concurrent code remains challenging and error-prone because threads are non- deterministically scheduled and can inappropriately interact.
Read more >
A systematic literature review on benchmarks for evaluating ...
A bug benchmark is a curated collection of bugs and their corresponding artifacts and data, created with the intent to evaluate debugging tools...
Read more >
CoREBench: Studying Complexity of Regression Errors
We study the complexity of actual regression errors and establish that seeded errors in existing benchmarks are significantly less complex. CoREBench and the ......
Read more >
Bug listing with status RESOLVED with resolution OBSOLETE ...
Bug :1523 - "[IDEA] Offload work by distributing trivial ebuild ... Bug:337534 - "[TRACKER] Insecure LD_LIBRARY_PATH setting bugs found in Debian" ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found