[🐛BUG] Benchmark preset error with sequential dataset
See original GitHub issueHi, I am trying to use a sequential benchmark dataset, but I get following error:
Traceback (most recent call last):
File "C:/Users/apein/Documents/projects/humrecsys/seq_rec/run.py", line 68, in <module>
main(args=args)
File "C:/Users/apein/Documents/projects/humrecsys/seq_rec/run.py", line 42, in main
dataset = create_dataset(config)
File "C:\Users\apein\.virtualenvs\humrecsys-Aky0O99T\lib\site-packages\recbole\data\utils.py", line 41, in create_dataset
return SequentialDataset(config)
File "C:\Users\apein\.virtualenvs\humrecsys-Aky0O99T\lib\site-packages\recbole\data\dataset\sequential_dataset.py", line 38, in __init__
self._benchmark_presets()
File "C:\Users\apein\.virtualenvs\humrecsys-Aky0O99T\lib\site-packages\recbole\data\dataset\sequential_dataset.py", line 146, in _benchmark_presets
self.inter_feat[self.item_list_length_field] = self.inter_feat[self.item_id_list_field].agg(len)
AttributeError: 'SequentialDataset' object has no attribute 'item_id_list_field'
This error only appears while using a sequential model. General models work fine with the benchmark dataset.
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (4 by maintainers)
Top Results From Across the Web
BUGSJS: a benchmark and taxonomy of JavaScript bugs
Dataset. BugsJS, a benchmark of 453 manually selected and validated JS bugs from 10 JS Node.js programs pertaining to the Mocha testing ...
Read more >A Benchmark Suite of Real-World Java Concurrency Bugs
However, developing concurrent code remains challenging and error-prone because threads are non- deterministically scheduled and can inappropriately interact.
Read more >A systematic literature review on benchmarks for evaluating ...
A bug benchmark is a curated collection of bugs and their corresponding artifacts and data, created with the intent to evaluate debugging tools...
Read more >CoREBench: Studying Complexity of Regression Errors
We study the complexity of actual regression errors and establish that seeded errors in existing benchmarks are significantly less complex. CoREBench and the ......
Read more >Bug listing with status RESOLVED with resolution OBSOLETE ...
Bug :1523 - "[IDEA] Offload work by distributing trivial ebuild ... Bug:337534 - "[TRACKER] Insecure LD_LIBRARY_PATH setting bugs found in Debian" ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Hi, I understand that you want to ensure that session id doesn’t overlap between train set, validation set and test set. Currently, we suggest to pre-split the dataset into
xxx.train.inter
,xxx.valid.inter
andxxx.test.inter
and load it viabenchmark_filename
(one example, session_based_rec_example, you can also download the processeddiginetica-session
dataset [download] to check it detailly.)The 1. group by user and 2. split pipeline is mainly designed for general collaborative filtering methods or sequential recommendation methods (especially leave one out evaluation). For session-based recommendation, we recommend following the way in the above example.
Sorry I have to reopen this. I was wondering about the data splitting: According to the documentation the
group_by
evaluation parameter, if set touser
, groups the dataset by the user (or session when used sequentially) before splitting into train, valid and test set. This would imply for me, that e.g. a session id from the valid set is not part of the train set. However, this is not the case! In thesplit_by_ratio
function ofDataset
each session gets equally split into the 3 sets. Is this intended? Whats the point of grouping then? How can I ensure to split the data session-wise only?