Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BertDataBunch' object has no attribute 'model_type'

See original GitHub issue

I have been following the tutorials concerning Fast-Bert: https://pypi.org/project/fast-bert/ https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/discussion/92668

My goal is to do binary text classifictation. Therefore, my label.csv has only two labels and I set multi_label to False.

When executing BertLearner.from_pretrained_model, I am receiving the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-240-ef4cead1d6f0> in <module>
     16                                             loss_scale = args['loss_scale'],
     17                                             multi_gpu = True,
---> 18                                             multi_label = False)

~/.local/lib/python3.6/site-packages/fast_bert/learner_cls.py in from_pretrained_model(dataBunch, pretrained_path, output_dir, metrics, device, logger, finetuned_wgts_path, multi_gpu, is_fp16, loss_scale, warmup_steps, fp16_opt_level, grad_accumulation_steps, multi_label, max_grad_norm, adam_epsilon, logging_steps, freeze_transformer_layers)
    131         model_state_dict = None
    132 
--> 133         model_type = dataBunch.model_type
    134 
    135         if torch.cuda.is_available():

AttributeError: 'BertDataBunch' object has no attribute 'model_type'

What I have tried so far is including model_type = 'bert' to the BertDataBunch command. This has not helped so far. I am quite sure that my .csv’s are in the right format, but of course, this could also be one source of the problem. PATH and imported modules should be fine.

Attached you find my code:

from pytorch_pretrained_bert.tokenization import BertTokenizer
from fast_bert.data import BertDataBunch

# Default args. If GPU runs out of memory while training, decrease training
# batch size
args = Box({
    "run_text": "tweet sentiment",
    "task_name": "Tweet Sentiment",
    "max_seq_length": 512,
    "do_lower_case": True,
    "train_batch_size": 8,
    "learning_rate": 6e-5,
    "num_train_epochs": 12.0,
    "warmup_proportion": 0.002,
    "local_rank": -1,
    "gradient_accumulation_steps": 1,
    "fp16": True,
    "loss_scale": 128
})

device = torch.device('cuda')

# check if multiple GPUs are available
if torch.cuda.device_count() > 1:
    multi_gpu = True
else:
    multi_gpu = False

# The tokenizer object is used to split the text into tokens used in training
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased', do_lower_case = args['do_lower_case'])
    
# Databunch    
databunch = BertDataBunch(DATA_PATH, LABEL_PATH,
                          tokenizer = tokenizer,
                          train_file = 'X_train.csv', 
                          val_file = 'X_test.csv', 
                          label_file = 'label.csv',
                          text_col = 'text',
                          label_col = 'label',
                          bs = args['train_batch_size'], 
                          maxlen = args['max_seq_length'], 
                          multi_gpu = True, 
                          multi_label = False,
                          model_type = 'bert')

databunch.save()
num_labels = len(databunch.labels)
num_labels

# Set logger
import logging
import sys

logfile = str(LOG_PATH/'log-{}-{}.txt'.format(run_start_time, args["run_text"]))

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(name)s -   %(message)s',
    datefmt='%m/%d/%Y %H:%M:%S',
    handlers=[
        logging.FileHandler(logfile),
        logging.StreamHandler(sys.stdout)
    ])

logger = logging.getLogger()
logger.info(args)

When executing this field, the error happens:

from fast_bert.learner_cls import BertLearner
from fast_bert.metrics import accuracy

# Choose the metrics used for the error function in training
metrics = []
metrics.append({'name': 'accuracy', 'function': accuracy})

learner = BertLearner.from_pretrained_model(databunch, 
                                            pretrained_path = "bert-base-uncased", 
                                            metrics = metrics, 
                                            device = device,
                                            logger = logger, 
                                            output_dir = OUTPUT_DIR,
                                            finetuned_wgts_path = None, 
                                            is_fp16 = args['fp16'], 
                                            loss_scale = args['loss_scale'],
                                            multi_gpu = True,
                                            multi_label = False)

Thank you for your help!

Issue Analytics

State:
Created 3 years ago
Comments:7 (3 by maintainers)

Top GitHub Comments

2reactions

JRatschatcommented, May 22, 2020

I finally found the issue. I loaded BertDataBunch from from fast_bert.data import BertDataBunch instead of from fast_bert.data_cls import BertDataBunch. Now everything works. Thank you very much @lingdoc and sorry that I did not spot this mistake earlier! Problem was probably that I was following an outdated tutorial. Your comment