[🐛BUG] module 'recbole.data.dataset' has no attribute 'GRU4RecDataset' , KeyError: 'session_id'
See original GitHub issueHi, I tried to run GRU4Rec, this is the full error:
Traceback (most recent call last):
File "C:\Users\Administrator\RecBole\recbole\data\utils.py", line 35, in create_dataset
return getattr(importlib.import_module('recbole.data.dataset'), config['model'] + 'Dataset')(config)
AttributeError: module 'recbole.data.dataset' has no attribute 'GRU4RecDataset'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 2898, in get_loc
return self._engine.get_loc(casted_key)
File "pandas\_libs\index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'session_id'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "run_recbole.py", line 25, in <module>
run_recbole(model=args.model, dataset=args.dataset, config_file_list=config_file_list)
File "C:\Users\Administrator\RecBole\recbole\quick_start\quick_start.py", line 38, in run_recbole
dataset = create_dataset(config)
File "C:\Users\Administrator\RecBole\recbole\data\utils.py", line 40, in create_dataset
return SequentialDataset(config)
File "C:\Users\Administrator\RecBole\recbole\data\dataset\sequential_dataset.py", line 38, in __init__
super().__init__(config, saved_dataset=saved_dataset)
File "C:\Users\Administrator\RecBole\recbole\data\dataset\dataset.py", line 100, in __init__
self._from_scratch()
File "C:\Users\Administrator\RecBole\recbole\data\dataset\dataset.py", line 113, in _from_scratch
self._data_processing()
File "C:\Users\Administrator\RecBole\recbole\data\dataset\dataset.py", line 152, in _data_processing
self._data_filtering()
File "C:\Users\Administrator\RecBole\recbole\data\dataset\dataset.py", line 172, in _data_filtering
self._filter_nan_user_or_item()
File "C:\Users\Administrator\RecBole\recbole\data\dataset\dataset.py", line 607, in _filter_nan_user_or_item
dropped_inter = self.inter_feat.index[self.inter_feat[field].isnull()]
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2906, in __getitem__
indexer = self.columns.get_loc(key)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 2900, in get_loc
raise KeyError(key) from err
KeyError: 'session_id'
This is my config file:
# Atomic File Format
field_separator: "\t"
seq_separator: " "
# Common Features
USER_ID_FIELD: session_id
ITEM_ID_FIELD: item_id
TIME_FIELD: timestamp
seq_len: ~
# Label for Point-wise DataLoader
LABEL_FIELD: label
threshold: ~
# NegSample Prefix for Pair-wise DataLoader
NEG_PREFIX: neg_
# Selectively Loading
load_col:
inter: [session_id, item_id, timestamp, PatientLocationID,GenderID,AgeGroup, JobGroup]
# the others
unload_col: ~
additional_feat_suffix: ~
# Filtering
rm_dup_inter: ~
max_user_inter_num: ~
min_user_inter_num: 0
max_item_inter_num: ~
min_item_inter_num: 0
lowest_val: ~
highest_val: ~
equal_val: ~
not_equal_val: ~
drop_filter_field : True
# Preprocessing
fields_in_same_space: ~
fill_nan: True
preload_weight: ~
drop_preload_weight: True
normalize_field: ~
normalize_all: True
# Sequential Model Needed
ITEM_LIST_LENGTH_FIELD: item_length
LIST_SUFFIX: _list
MAX_ITEM_LIST_LENGTH: 50
POSITION_FIELD: position_id
# Benchmark .inter
benchmark_filename: ~
# general
gpu_id: 0
use_gpu: True
seed: 2020
state: INFO
reproducibility: True
data_path: 'dataset/Ofek'
checkpoint_dir: 'saved'
# training settings
epochs: 300
train_batch_size: 2048
learner: adam
learning_rate: 0.001
training_neg_sample_num: 1
eval_step: 1
stopping_step: 10
# evaluation settings
eval_setting: RO_RS,full
group_by_user: True
split_ratio: [0.8,0.1,0.1]
leave_one_num: 2
real_time_process: True
metrics: ["Recall", "MRR","NDCG","Hit","Precision"]
topk: [5]
valid_metric: MRR@5
eval_batch_size: 4096
and this is the header of my inter file:
JobGroup:token item_id:token PatientLocationID:token GenderID:token AgeGroup:token timestamp:float session_id:token
Where is my mistake?
Thanks!!
Issue Analytics
- State:
- Created 3 years ago
- Comments:7 (4 by maintainers)
Top Results From Across the Web
SQLALchemy: Where does the attribute error after data import ...
The problem is that df_to_sql drops the original table, which has a primary key defined, and replaces it with a table that does...
Read more >S3Rec — RecBole 1.1.1 documentation
To tackle this problem, we propose the model S<sup>3</sup>-Rec, which stands for Self-Supervised learning for Sequential Recommendation, based on the self- ...
Read more >Module 'pycaret.internal.preprocess' has no attribute ...
Hi I am having problem to load model for prediction. When I run the model_load, it gives me error message. My python version...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Thanks I will check. What is the separator in the first row of the inter file? is it \t? When you say " sequential models don’t support ratio-based splitting strategy currently" - then should I remove this configuration from the yaml file? And how should I load train\test data to there?
Please refer to #632.