Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[🐛BUG] module 'recbole.data.dataset' has no attribute 'GRU4RecDataset' , KeyError: 'session_id'

See original GitHub issue

Hi, I tried to run GRU4Rec, this is the full error:


Traceback (most recent call last):
  File "C:\Users\Administrator\RecBole\recbole\data\utils.py", line 35, in create_dataset
    return getattr(importlib.import_module('recbole.data.dataset'), config['model'] + 'Dataset')(config)
AttributeError: module 'recbole.data.dataset' has no attribute 'GRU4RecDataset'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 2898, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas\_libs\index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'session_id'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "run_recbole.py", line 25, in <module>
    run_recbole(model=args.model, dataset=args.dataset, config_file_list=config_file_list)
  File "C:\Users\Administrator\RecBole\recbole\quick_start\quick_start.py", line 38, in run_recbole
    dataset = create_dataset(config)
  File "C:\Users\Administrator\RecBole\recbole\data\utils.py", line 40, in create_dataset
    return SequentialDataset(config)
  File "C:\Users\Administrator\RecBole\recbole\data\dataset\sequential_dataset.py", line 38, in __init__
    super().__init__(config, saved_dataset=saved_dataset)
  File "C:\Users\Administrator\RecBole\recbole\data\dataset\dataset.py", line 100, in __init__
    self._from_scratch()
  File "C:\Users\Administrator\RecBole\recbole\data\dataset\dataset.py", line 113, in _from_scratch
    self._data_processing()
  File "C:\Users\Administrator\RecBole\recbole\data\dataset\dataset.py", line 152, in _data_processing
    self._data_filtering()
  File "C:\Users\Administrator\RecBole\recbole\data\dataset\dataset.py", line 172, in _data_filtering
    self._filter_nan_user_or_item()
  File "C:\Users\Administrator\RecBole\recbole\data\dataset\dataset.py", line 607, in _filter_nan_user_or_item
    dropped_inter = self.inter_feat.index[self.inter_feat[field].isnull()]
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2906, in __getitem__
    indexer = self.columns.get_loc(key)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 2900, in get_loc
    raise KeyError(key) from err
KeyError: 'session_id'

This is my config file:

# Atomic File Format
field_separator: "\t"
seq_separator: " "

# Common Features
USER_ID_FIELD: session_id 
ITEM_ID_FIELD: item_id
TIME_FIELD: timestamp
seq_len: ~

# Label for Point-wise DataLoader
LABEL_FIELD: label
threshold: ~

# NegSample Prefix for Pair-wise DataLoader
NEG_PREFIX: neg_

# Selectively Loading
load_col:
    inter: [session_id, item_id, timestamp, PatientLocationID,GenderID,AgeGroup, JobGroup]
    # the others
unload_col: ~
additional_feat_suffix: ~

# Filtering
rm_dup_inter: ~
max_user_inter_num: ~
min_user_inter_num: 0
max_item_inter_num: ~
min_item_inter_num: 0
lowest_val: ~
highest_val: ~
equal_val: ~
not_equal_val: ~
drop_filter_field : True

# Preprocessing
fields_in_same_space: ~
fill_nan: True
preload_weight: ~
drop_preload_weight: True
normalize_field: ~
normalize_all: True

# Sequential Model Needed
ITEM_LIST_LENGTH_FIELD: item_length
LIST_SUFFIX: _list
MAX_ITEM_LIST_LENGTH: 50
POSITION_FIELD: position_id


# Benchmark .inter
benchmark_filename: ~

# general
gpu_id: 0
use_gpu: True
seed: 2020
state: INFO
reproducibility: True
data_path: 'dataset/Ofek'
checkpoint_dir: 'saved'

# training settings
epochs: 300
train_batch_size: 2048
learner: adam
learning_rate: 0.001
training_neg_sample_num: 1
eval_step: 1
stopping_step: 10

# evaluation settings
eval_setting: RO_RS,full
group_by_user: True
split_ratio: [0.8,0.1,0.1]
leave_one_num: 2
real_time_process: True
metrics: ["Recall", "MRR","NDCG","Hit","Precision"]
topk: [5]
valid_metric: MRR@5
eval_batch_size: 4096

and this is the header of my inter file:

JobGroup:token item_id:token PatientLocationID:token GenderID:token AgeGroup:token timestamp:float session_id:token Where is my mistake? Thanks!!

Issue Analytics

State:
Created 3 years ago
Comments:7 (4 by maintainers)

Top GitHub Comments

1reaction

mayaKaplanskycommented, Dec 28, 2020

Thanks I will check. What is the separator in the first row of the inter file? is it \t? When you say " sequential models don’t support ratio-based splitting strategy currently" - then should I remove this configuration from the yaml file? And how should I load train\test data to there?

0reactions

2017pxycommented, Jan 15, 2021

Please refer to #632.