hyperparameter_search raytune: ModuleNotFoundError: No module named 'datasets_modules'
See original GitHub issueEnvironment info
transformers
version: 4.4.2- Platform: Linux-4.15.0-142-generic-x86_64-with-glibc2.10
- Python version: 3.8.8
- PyTorch version (GPU?): 1.6.0 (True)
- Tensorflow version (GPU?): not installed (NA)
- Using GPU in script?: yes
- Using distributed or parallel set-up in script?: no
Who can help
Information
Model I am using (Bert, XLNet …): Bert (neuralmind/bert-base-portuguese-cased)
The problem arises when using:
- the official example scripts: (give details below)
- [ x ] my own modified scripts: (give details below)
The tasks I am working on is:
- [ x ] an official GLUE/SQUaD task: (give the name)
- [ x ] my own task or dataset: (give details below)
I’m running a modified run_ner example to use trainer.hyperparameter_search with raytune. I’m using my own datasets, but I have run into the same issue using other glue scripts and official glue datasets, such as the ones other people ran into here:
https://discuss.huggingface.co/t/using-hyperparameter-search-in-trainer/785/34 https://discuss.huggingface.co/t/using-hyperparameter-search-in-trainer/785/35 Colab from @piegu
At first I was using the run_ner and transformers version from the current 4.6.0-dev branch, but I ran into the same issue as reported here: #11249
So I downgraded transformers and ray to 4.4.2 and 1.2.0 (creating a fresh conda environment), and made the necessary adjustments to the run_ner script, to become compatible with 4.4.2.
To reproduce
Steps to reproduce the behavior:
This is the full code from the script:
#!/usr/bin/env python
# coding: utf-8
import json
import logging
import os
import sys
import copy
from dataclasses import dataclass, field
from typing import Optional, Dict, Any
import numpy as np
from datasets import ClassLabel, load_dataset, load_metric
from ray import tune
from ray.tune.integration.wandb import WandbLogger
from ray.tune.logger import DEFAULT_LOGGERS
from ray.tune.schedulers import PopulationBasedTraining
import transformers
from transformers import (
AutoConfig,
AutoModelForTokenClassification,
AutoTokenizer,
DataCollatorForTokenClassification,
HfArgumentParser,
PreTrainedTokenizerFast,
Trainer,
TrainingArguments,
set_seed,
)
from transformers.trainer_utils import get_last_checkpoint, is_main_process
from transformers.utils import check_min_version
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.4.0")
logger = logging.getLogger(__name__)
@dataclass
class RayArguments:
"""[summary]
"""
time_budget_h: str = field(
metadata={"help": "Time budget in hours."}
)
@dataclass
class ModelArguments:
"""
Arguments pertaining to which model/config/tokenizer we are going to fine-tune from.
"""
model_name_or_path: str = field(
metadata={"help": "Path to pretrained model or model identifier from huggingface.co/models"}
)
config_name: Optional[str] = field(
default=None, metadata={"help": "Pretrained config name or path if not the same as model_name"}
)
tokenizer_name: Optional[str] = field(
default=None, metadata={"help": "Pretrained tokenizer name or path if not the same as model_name"}
)
cache_dir: Optional[str] = field(
default=None,
metadata={"help": "Where do you want to store the pretrained models downloaded from huggingface.co"},
)
model_revision: str = field(
default="main",
metadata={"help": "The specific model version to use (can be a branch name, tag name or commit id)."},
)
use_auth_token: bool = field(
default=False,
metadata={
"help": "Will use the token generated when running `transformers-cli login` (necessary to use this script "
"with private models)."
},
)
@dataclass
class DataTrainingArguments:
"""
Arguments pertaining to what data we are going to input our model for training and eval.
"""
task_name: Optional[str] = field(default="ner", metadata={"help": "The name of the task (ner, pos...)."})
dataset_name: Optional[str] = field(
default=None, metadata={"help": "The name of the dataset to use (via the datasets library)."}
)
dataset_config_name: Optional[str] = field(
default=None, metadata={"help": "The configuration name of the dataset to use (via the datasets library)."}
)
train_file: Optional[str] = field(
default=None, metadata={"help": "The input training data file (a csv or JSON file)."}
)
validation_file: Optional[str] = field(
default=None,
metadata={"help": "An optional input evaluation data file to evaluate on (a csv or JSON file)."},
)
test_file: Optional[str] = field(
default=None,
metadata={"help": "An optional input test data file to predict on (a csv or JSON file)."},
)
overwrite_cache: bool = field(
default=False, metadata={"help": "Overwrite the cached training and evaluation sets"}
)
preprocessing_num_workers: Optional[int] = field(
default=None,
metadata={"help": "The number of processes to use for the preprocessing."},
)
pad_to_max_length: bool = field(
default=False,
metadata={
"help": "Whether to pad all samples to model maximum sentence length. "
"If False, will pad the samples dynamically when batching to the maximum length in the batch. More "
"efficient on GPU but very bad for TPU."
},
)
max_train_samples: Optional[int] = field(
default=None,
metadata={
"help": "For debugging purposes or quicker training, truncate the number of training examples to this "
"value if set."
},
)
max_val_samples: Optional[int] = field(
default=None,
metadata={
"help": "For debugging purposes or quicker training, truncate the number of validation examples to this "
"value if set."
},
)
max_test_samples: Optional[int] = field(
default=None,
metadata={
"help": "For debugging purposes or quicker training, truncate the number of test examples to this "
"value if set."
},
)
label_all_tokens: bool = field(
default=False,
metadata={
"help": "Whether to put the label for one word on all tokens of generated by that word or just on the "
"one (in which case the other tokens will have a padding index)."
},
)
return_entity_level_metrics: bool = field(
default=False,
metadata={"help": "Whether to return all the entity levels during evaluation or just the overall ones."},
)
def __post_init__(self):
if self.dataset_name is None and self.train_file is None and self.validation_file is None:
raise ValueError("Need either a dataset name or a training/validation file.")
else:
if self.train_file is not None:
extension = self.train_file.split(".")[-1]
assert extension in ["csv", "json"], "`train_file` should be a csv or a json file."
if self.validation_file is not None:
extension = self.validation_file.split(".")[-1]
assert extension in ["csv", "json"], "`validation_file` should be a csv or a json file."
self.task_name = self.task_name.lower()
def compute_objective(metrics: Dict[str, float]) -> float:
"""
The default objective to maximize/minimize when doing an hyperparameter search. It is the evaluation loss if no
metrics are provided to the :class:`~transformers.Trainer`, the sum of all metrics otherwise.
Args:
metrics (:obj:`Dict[str, float]`): The metrics returned by the evaluate method.
Return:
:obj:`float`: The objective to minimize or maximize
"""
metrics = copy.deepcopy(metrics)
loss = metrics.pop("eval_loss", None)
_ = metrics.pop("epoch", None)
# Remove speed metrics
speed_metrics = [m for m in metrics.keys() if m.endswith("_runtime") or m.endswith("_samples_per_second")]
for sm in speed_metrics:
_ = metrics.pop(sm, None)
return loss if len(metrics) == 0 else sum(metrics.values())
def main():
parser = HfArgumentParser((ModelArguments, DataTrainingArguments, TrainingArguments, RayArguments))
model_args, data_args, training_args, ray_args = parser.parse_args_into_dataclasses()
# Detecting last checkpoint.
last_checkpoint = None
if os.path.isdir(training_args.output_dir) and training_args.do_train and not training_args.overwrite_output_dir:
last_checkpoint = get_last_checkpoint(training_args.output_dir)
if last_checkpoint is None and len(os.listdir(training_args.output_dir)) > 0:
raise ValueError(
f"Output directory ({training_args.output_dir}) already exists and is not empty. "
"Use --overwrite_output_dir to overcome."
)
elif last_checkpoint is not None:
logger.info(
f"Checkpoint detected, resuming training at {last_checkpoint}. To avoid this behavior, change "
"the `--output_dir` or add `--overwrite_output_dir` to train from scratch."
)
# Setup logging
logging.basicConfig(
format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
datefmt="%m/%d/%Y %H:%M:%S",
handlers=[logging.StreamHandler(sys.stdout)],
)
logger.setLevel(logging.INFO if is_main_process(training_args.local_rank) else logging.WARN)
# Log on each process the small summary:
logger.warning(
f"Process rank: {training_args.local_rank}, device: {training_args.device}, n_gpu: {training_args.n_gpu}"
+ f"distributed training: {bool(training_args.local_rank != -1)}, 16-bits training: {training_args.fp16}"
)
# Set the verbosity to info of the Transformers logger (on main process only):
if is_main_process(training_args.local_rank):
transformers.utils.logging.set_verbosity_info()
transformers.utils.logging.enable_default_handler()
transformers.utils.logging.enable_explicit_format()
logger.info("Training/evaluation parameters %s", training_args)
# Set seed before initializing model.
set_seed(training_args.seed)
# Get the datasets: you can either provide your own CSV/JSON/TXT training and evaluation files (see below)
# or just provide the name of one of the public datasets available on the hub at https://huggingface.co/datasets/
# (the dataset will be downloaded automatically from the datasets Hub).
#
# For CSV/JSON files, this script will use the column called 'text' or the first column if no column called
# 'text' is found. You can easily tweak this behavior (see below).
#
# In distributed training, the load_dataset function guarantee that only one local process can concurrently
# download the dataset.
if data_args.dataset_name is not None:
# Downloading and loading a dataset from the hub.
datasets = load_dataset(data_args.dataset_name, data_args.dataset_config_name)
else:
data_files = {}
if data_args.train_file is not None:
data_files["train"] = data_args.train_file
if data_args.validation_file is not None:
data_files["validation"] = data_args.validation_file
if data_args.test_file is not None:
data_files["test"] = data_args.test_file
extension = data_args.train_file.split(".")[-1]
datasets = load_dataset(extension, data_files=data_files)
# See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at
# https://huggingface.co/docs/datasets/loading_datasets.html.
if training_args.do_train:
column_names = datasets["train"].column_names
features = datasets["train"].features
else:
column_names = datasets["validation"].column_names
features = datasets["validation"].features
text_column_name = "tokens" if "tokens" in column_names else column_names[0]
label_column_name = (
f"{data_args.task_name}_tags" if f"{data_args.task_name}_tags" in column_names else column_names[1]
)
# In the event the labels are not a `Sequence[ClassLabel]`, we will need to go through the dataset to get the
# unique labels.
def get_label_list(labels):
unique_labels = set()
for label in labels:
unique_labels = unique_labels | set(label)
label_list = list(unique_labels)
label_list.sort()
return label_list
if isinstance(features[label_column_name].feature, ClassLabel):
label_list = features[label_column_name].feature.names
# No need to convert the labels since they are already ints.
label_to_id = {i: i for i in range(len(label_list))}
else:
label_list = get_label_list(datasets["train"][label_column_name])
label_to_id = {l: i for i, l in enumerate(label_list)}
num_labels = len(label_list)
# Load pretrained model and tokenizer
#
# Distributed training:
# The .from_pretrained methods guarantee that only one local process can concurrently
# download model & vocab.
config = AutoConfig.from_pretrained(
model_args.config_name if model_args.config_name else model_args.model_name_or_path,
num_labels=num_labels,
finetuning_task=data_args.task_name,
cache_dir=model_args.cache_dir,
revision=model_args.model_revision,
use_auth_token=True if model_args.use_auth_token else None,
)
tokenizer = AutoTokenizer.from_pretrained(
model_args.tokenizer_name if model_args.tokenizer_name else model_args.model_name_or_path,
cache_dir=model_args.cache_dir,
use_fast=True,
revision=model_args.model_revision,
use_auth_token=True if model_args.use_auth_token else None,
model_max_length=512
)
model = AutoModelForTokenClassification.from_pretrained(
model_args.model_name_or_path,
from_tf=bool(".ckpt" in model_args.model_name_or_path),
config=config,
cache_dir=model_args.cache_dir,
revision=model_args.model_revision,
use_auth_token=True if model_args.use_auth_token else None,
)
# Tokenizer check: this script requires a fast tokenizer.
if not isinstance(tokenizer, PreTrainedTokenizerFast):
raise ValueError(
"This example script only works for models that have a fast tokenizer. Checkout the big table of models "
"at https://huggingface.co/transformers/index.html#bigtable to find the model types that meet this "
"requirement"
)
# Preprocessing the dataset
# Padding strategy
padding = "max_length" if data_args.pad_to_max_length else False
# Tokenize all texts and align the labels with them.
def tokenize_and_align_labels(examples):
tokenized_inputs = tokenizer(
examples[text_column_name],
padding=padding,
truncation=True,
# We use this argument because the texts in our dataset are lists of words (with a label for each word).
is_split_into_words=True,
)
labels = []
for i, label in enumerate(examples[label_column_name]):
word_ids = tokenized_inputs.word_ids(batch_index=i)
previous_word_idx = None
label_ids = []
for word_idx in word_ids:
# Special tokens have a word id that is None. We set the label to -100 so they are automatically
# ignored in the loss function.
if word_idx is None:
label_ids.append(-100)
# We set the label for the first token of each word.
elif word_idx != previous_word_idx:
label_ids.append(label_to_id[label[word_idx]])
# For the other tokens in a word, we set the label to either the current label or -100, depending on
# the label_all_tokens flag.
else:
label_ids.append(label_to_id[label[word_idx]] if data_args.label_all_tokens else -100)
previous_word_idx = word_idx
labels.append(label_ids)
tokenized_inputs["labels"] = labels
return tokenized_inputs
if training_args.do_train:
if "train" not in datasets:
raise ValueError("--do_train requires a train dataset")
train_dataset = datasets["train"]
if data_args.max_train_samples is not None:
train_dataset = train_dataset.select(range(data_args.max_train_samples))
train_dataset = train_dataset.map(
tokenize_and_align_labels,
batched=True,
num_proc=data_args.preprocessing_num_workers,
load_from_cache_file=not data_args.overwrite_cache,
)
if training_args.do_eval:
if "validation" not in datasets:
raise ValueError("--do_eval requires a validation dataset")
eval_dataset = datasets["validation"]
if data_args.max_val_samples is not None:
eval_dataset = eval_dataset.select(range(data_args.max_val_samples))
eval_dataset = eval_dataset.map(
tokenize_and_align_labels,
batched=True,
num_proc=data_args.preprocessing_num_workers,
load_from_cache_file=not data_args.overwrite_cache,
)
if training_args.do_predict:
if "test" not in datasets:
raise ValueError("--do_predict requires a test dataset")
test_dataset = datasets["test"]
if data_args.max_test_samples is not None:
test_dataset = test_dataset.select(range(data_args.max_test_samples))
test_dataset = test_dataset.map(
tokenize_and_align_labels,
batched=True,
num_proc=data_args.preprocessing_num_workers,
load_from_cache_file=not data_args.overwrite_cache,
)
# Data collator
data_collator = DataCollatorForTokenClassification(tokenizer, pad_to_multiple_of=8 if training_args.fp16 else None)
# Metrics
metric = load_metric("seqeval")
def compute_metrics(p):
predictions, labels = p
predictions = np.argmax(predictions, axis=2)
# Remove ignored index (special tokens)
true_predictions = [
[label_list[p] for (p, l) in zip(prediction, label) if l != -100]
for prediction, label in zip(predictions, labels)
]
true_labels = [
[label_list[l] for (p, l) in zip(prediction, label) if l != -100]
for prediction, label in zip(predictions, labels)
]
results = metric.compute(predictions=true_predictions, references=true_labels)
if data_args.return_entity_level_metrics:
# Unpack nested dictionaries
final_results = {}
for key, value in results.items():
if isinstance(value, dict):
for n, v in value.items():
final_results[f"{key}_{n}"] = v
else:
final_results[key] = value
return final_results
else:
return {
"precision": results["overall_precision"],
"recall": results["overall_recall"],
"f1": results["overall_f1"],
"accuracy": results["overall_accuracy"],
}
def model_init():
model = AutoModelForTokenClassification.from_pretrained(
model_args.model_name_or_path,
from_tf=bool(".ckpt" in model_args.model_name_or_path),
config=config,
cache_dir=model_args.cache_dir,
revision=model_args.model_revision,
use_auth_token=True if model_args.use_auth_token else None,
)
return model
class CustomTrainer(Trainer):
def __init__(self, *args, **kwargs):
super(CustomTrainer, self).__init__(*args, **kwargs)
def _hp_search_setup(self, trial: Any):
try:
trial.pop('wandb', None)
except AttributeError:
pass
super(CustomTrainer, self)._hp_search_setup(trial)
# Initialize our Trainer
trainer = CustomTrainer(
model_init=model_init,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset if training_args.do_eval else None,
compute_metrics=compute_metrics,
tokenizer=tokenizer,
data_collator=data_collator,
)
# Hyperparameter Search
def hp_space_fn(*args, **kwargs):
config = {
"seed": tune.choice([42, 43, 44]),
"weight_decay": tune.choice([0.0, 0.1, 0.2, 0.3]),
"adam_epsilon": tune.choice([1e-6, 1e-7, 1e-8]),
"max_grad_norm": tune.choice([1.0, 2.0]),
"warmup_steps": tune.choice([50, 100, 500, 1000]),
"learning_rate": tune.choice([2e-5, 3e-5, 4e-5, 5e-5]),
"num_train_epochs": tune.quniform(0.0, 8.0, 0.5),
}
wandb_config = {
"wandb": {
"project": "hf-ner-testing",
"api_key": os.environ.get("API_KEY"),
"log_config": True
}
}
config.update(wandb_config)
return config
time_budget_h = 60 * 60 * int(ray_args.time_budget_h)
best_run = trainer.hyperparameter_search(
direction="maximize",
backend="ray",
scheduler=PopulationBasedTraining(
time_attr='time_total_s',
metric='eval_f1',
mode='max',
perturbation_interval=600.0
),
hp_space=hp_space_fn,
loggers=DEFAULT_LOGGERS + (WandbLogger,),
time_budget_s=time_budget_h,
keep_checkpoints_num=1,
checkpoint_score_attr='eval_f1',
compute_objective=compute_objective
)
output_params_file = os.path.join(
training_args.output_dir,
"best_run.json"
)
with open(output_params_file, "w") as f:
json.dump(
best_run.hyperparameters,
f,
indent=4)
return best_run
if __name__ == "__main__":
main()
And these are the args I used for running it:
--model_name_or_path neuralmind/bert-base-portuguese-cased
--train_file train.json
--validation_file dev.json
--output_dir output
--do_train
--do_eval
--evaluation_strategy steps
--per_device_train_batch_size=2
--per_device_eval_batch_size=2
--time_budget_h 2
This is the full output log:
/media/discoD/anaconda3/envs/transformers/bin/python /media/discoD/pycharm-community-2019.2/plugins/python-ce/helpers/pydev/pydevd.py --multiproc --qt-support=auto --client 127.0.0.1 --port 38419 --file /media/discoD/repositorios/transformers_pedro/examples/pytorch/token-classification/run_ner_hp_search_442.py --model_name_or_path neuralmind/bert-base-portuguese-cased --train_file train.json --validation_file dev.json --output_dir transformers-hp --do_train --do_eval --evaluation_strategy steps --per_device_train_batch_size=2 --per_device_eval_batch_size=2 --time_budget_h 2
Connected to pydev debugger (build 211.7142.13)
05/03/2021 08:10:04 - WARNING - __main__ - Process rank: -1, device: cuda:0, n_gpu: 1distributed training: False, 16-bits training: False
05/03/2021 08:10:04 - INFO - __main__ - Training/evaluation parameters TrainingArguments(output_dir=transformers-hp, overwrite_output_dir=False, do_train=True, do_eval=True, do_predict=False, evaluation_strategy=IntervalStrategy.STEPS, prediction_loss_only=False, per_device_train_batch_size=2, per_device_eval_batch_size=2, gradient_accumulation_steps=1, eval_accumulation_steps=None, learning_rate=5e-05, weight_decay=0.0, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=3.0, max_steps=-1, lr_scheduler_type=SchedulerType.LINEAR, warmup_ratio=0.0, warmup_steps=0, logging_dir=runs/May03_08-10-04_user-XPS-8700, logging_strategy=IntervalStrategy.STEPS, logging_first_step=False, logging_steps=500, save_strategy=IntervalStrategy.STEPS, save_steps=500, save_total_limit=None, no_cuda=False, seed=42, fp16=False, fp16_opt_level=O1, fp16_backend=auto, fp16_full_eval=False, local_rank=-1, tpu_num_cores=None, tpu_metrics_debug=False, debug=False, dataloader_drop_last=False, eval_steps=500, dataloader_num_workers=0, past_index=-1, run_name=transformers-hp, disable_tqdm=False, remove_unused_columns=True, label_names=None, load_best_model_at_end=False, metric_for_best_model=None, greater_is_better=None, ignore_data_skip=False, sharded_ddp=[], deepspeed=None, label_smoothing_factor=0.0, adafactor=False, group_by_length=False, report_to=['tensorboard', 'wandb'], ddp_find_unused_parameters=None, dataloader_pin_memory=True, skip_memory_metrics=False, _n_gpu=1)
05/03/2021 08:10:04 - WARNING - datasets.builder - Using custom data configuration default-438421c06175ed26
05/03/2021 08:10:04 - WARNING - datasets.builder - Reusing dataset json (/home/user/.cache/huggingface/datasets/json/default-438421c06175ed26/0.0.0/83d5b3a2f62630efc6b5315f00f20209b4ad91a00ac586597caee3a4da0bef02)
[INFO|configuration_utils.py:463] 2021-05-03 08:10:06,050 >> loading configuration file https://huggingface.co/neuralmind/bert-base-portuguese-cased/resolve/main/config.json from cache at /home/user/.cache/huggingface/transformers/e716e2151985ba669e7197b64cdde2552acee146494d40ffaf0688a3f152e6ed.18a0b8b86f3ebd4c8a1d8d6199178feae9971ff5420f1d12f0ed8326ffdff716
[INFO|configuration_utils.py:499] 2021-05-03 08:10:06,063 >> Model config BertConfig {
"architectures": [
"BertForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"directionality": "bidi",
"finetuning_task": "ner",
"gradient_checkpointing": false,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"id2label": {
"0": "LABEL_0",
"1": "LABEL_1",
"2": "LABEL_2",
"3": "LABEL_3",
"4": "LABEL_4",
"5": "LABEL_5",
"6": "LABEL_6",
"7": "LABEL_7",
"8": "LABEL_8",
"9": "LABEL_9",
"10": "LABEL_10",
"11": "LABEL_11",
"12": "LABEL_12"
},
"initializer_range": 0.02,
"intermediate_size": 3072,
"label2id": {
"LABEL_0": 0,
"LABEL_1": 1,
"LABEL_10": 10,
"LABEL_11": 11,
"LABEL_12": 12,
"LABEL_2": 2,
"LABEL_3": 3,
"LABEL_4": 4,
"LABEL_5": 5,
"LABEL_6": 6,
"LABEL_7": 7,
"LABEL_8": 8,
"LABEL_9": 9
},
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"model_type": "bert",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"output_past": true,
"pad_token_id": 0,
"pooler_fc_size": 768,
"pooler_num_attention_heads": 12,
"pooler_num_fc_layers": 3,
"pooler_size_per_head": 128,
"pooler_type": "first_token_transform",
"position_embedding_type": "absolute",
"transformers_version": "4.4.2",
"type_vocab_size": 2,
"use_cache": true,
"vocab_size": 29794
}
[INFO|configuration_utils.py:463] 2021-05-03 08:10:06,767 >> loading configuration file https://huggingface.co/neuralmind/bert-base-portuguese-cased/resolve/main/config.json from cache at /home/user/.cache/huggingface/transformers/e716e2151985ba669e7197b64cdde2552acee146494d40ffaf0688a3f152e6ed.18a0b8b86f3ebd4c8a1d8d6199178feae9971ff5420f1d12f0ed8326ffdff716
[INFO|configuration_utils.py:499] 2021-05-03 08:10:06,777 >> Model config BertConfig {
"architectures": [
"BertForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"directionality": "bidi",
"gradient_checkpointing": false,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"model_type": "bert",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"output_past": true,
"pad_token_id": 0,
"pooler_fc_size": 768,
"pooler_num_attention_heads": 12,
"pooler_num_fc_layers": 3,
"pooler_size_per_head": 128,
"pooler_type": "first_token_transform",
"position_embedding_type": "absolute",
"transformers_version": "4.4.2",
"type_vocab_size": 2,
"use_cache": true,
"vocab_size": 29794
}
[INFO|tokenization_utils_base.py:1702] 2021-05-03 08:10:09,936 >> loading file https://huggingface.co/neuralmind/bert-base-portuguese-cased/resolve/main/vocab.txt from cache at /home/user/.cache/huggingface/transformers/aa6d50227b77416b26162efcf0cc9e9a702d13920840322060a2b41a44a8aff4.af25fb1e29ad0175300146695fd80069be69b211c52fa5486fa8aae2754cc814
[INFO|tokenization_utils_base.py:1702] 2021-05-03 08:10:09,936 >> loading file https://huggingface.co/neuralmind/bert-base-portuguese-cased/resolve/main/tokenizer.json from cache at None
[INFO|tokenization_utils_base.py:1702] 2021-05-03 08:10:09,937 >> loading file https://huggingface.co/neuralmind/bert-base-portuguese-cased/resolve/main/added_tokens.json from cache at /home/user/.cache/huggingface/transformers/9188d297517828a862f4e0b0700968574ca7ad38fbc0832c409bf7a9e5576b74.5cc6e825eb228a7a5cfd27cb4d7151e97a79fb962b31aaf1813aa102e746584b
[INFO|tokenization_utils_base.py:1702] 2021-05-03 08:10:09,937 >> loading file https://huggingface.co/neuralmind/bert-base-portuguese-cased/resolve/main/special_tokens_map.json from cache at /home/user/.cache/huggingface/transformers/eecc45187d085a1169eed91017d358cc0e9cbdd5dc236bcd710059dbf0a2f816.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d
[INFO|tokenization_utils_base.py:1702] 2021-05-03 08:10:09,938 >> loading file https://huggingface.co/neuralmind/bert-base-portuguese-cased/resolve/main/tokenizer_config.json from cache at /home/user/.cache/huggingface/transformers/f1a9ba41d40e8c6f5ba4988aa2f7702c3b43768183e4b82483e04f2848841ecf.a6c00251b9344c189e2419373d6033016d0cd3d87ea59f6c86069046ac81956d
[INFO|modeling_utils.py:1051] 2021-05-03 08:10:10,709 >> loading weights file https://huggingface.co/neuralmind/bert-base-portuguese-cased/resolve/main/pytorch_model.bin from cache at /home/user/.cache/huggingface/transformers/1e42c907c340c902923496246dae63e33f64955c529720991b7ec5543a98e442.fa492fca6dcee85bef053cc60912a211feb1f7173129e4eb1a5164e817f2f5f2
[WARNING|modeling_utils.py:1158] 2021-05-03 08:10:13,606 >> Some weights of the model checkpoint at neuralmind/bert-base-portuguese-cased were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
[WARNING|modeling_utils.py:1169] 2021-05-03 08:10:13,607 >> Some weights of BertForTokenClassification were not initialized from the model checkpoint at neuralmind/bert-base-portuguese-cased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
100%|██████████| 7/7 [00:02<00:00, 3.06ba/s]
100%|██████████| 2/2 [00:00<00:00, 3.13ba/s]
[INFO|modeling_utils.py:1051] 2021-05-03 08:10:19,160 >> loading weights file https://huggingface.co/neuralmind/bert-base-portuguese-cased/resolve/main/pytorch_model.bin from cache at /home/user/.cache/huggingface/transformers/1e42c907c340c902923496246dae63e33f64955c529720991b7ec5543a98e442.fa492fca6dcee85bef053cc60912a211feb1f7173129e4eb1a5164e817f2f5f2
[WARNING|modeling_utils.py:1158] 2021-05-03 08:10:22,280 >> Some weights of the model checkpoint at neuralmind/bert-base-portuguese-cased were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
[WARNING|modeling_utils.py:1169] 2021-05-03 08:10:22,280 >> Some weights of BertForTokenClassification were not initialized from the model checkpoint at neuralmind/bert-base-portuguese-cased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[INFO|trainer.py:482] 2021-05-03 08:10:24,327 >> The following columns in the training set don't have a corresponding argument in `BertForTokenClassification.forward` and have been ignored: ner_tags, tokens.
[INFO|trainer.py:482] 2021-05-03 08:10:24,334 >> The following columns in the evaluation set don't have a corresponding argument in `BertForTokenClassification.forward` and have been ignored: ner_tags, tokens.
[INFO|integrations.py:184] 2021-05-03 08:10:24,396 >> No `resources_per_trial` arg was passed into `hyperparameter_search`. Setting it to a default value of 1 CPU and 1 GPU for each trial.
2021-05-03 08:10:25,807 INFO services.py:1172 -- View the Ray dashboard at http://127.0.0.1:8265
2021-05-03 08:10:27,788 WARNING function_runner.py:540 -- Function checkpointing is disabled. This may result in unexpected behavior when using checkpointing features or certain schedulers. To enable, set the train function arguments to be `func(config, checkpoint_dir=None)`.
== Status ==
Memory usage on this node: 21.2/31.4 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 1/8 CPUs, 1/1 GPUs, 0.0/7.67 GiB heap, 0.0/2.64 GiB objects (0/1.0 accelerator_type:GTX)
Result logdir: /home/user/ray_results/_inner_2021-05-03_08-10-27
Number of trials: 1/20 (1 RUNNING)
+--------------------+----------+-------+----------------+-----------------+-----------------+--------------------+--------+----------------+----------------+
| Trial name | status | loc | adam_epsilon | learning_rate | max_grad_norm | num_train_epochs | seed | warmup_steps | weight_decay |
|--------------------+----------+-------+----------------+-----------------+-----------------+--------------------+--------+----------------+----------------|
| _inner_2a8cd_00000 | RUNNING | | 1e-06 | 4e-05 | 2 | 3 | 42 | 500 | 0 |
+--------------------+----------+-------+----------------+-----------------+-----------------+--------------------+--------+----------------+----------------+
wandb: Currently logged in as: pvcastro (use `wandb login --relogin` to force relogin)
2021-05-03 08:10:31,794 ERROR trial_runner.py:616 -- Trial _inner_2a8cd_00000: Error processing event.
Traceback (most recent call last):
File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/trial_runner.py", line 586, in _process_trial
results = self.trial_executor.fetch_result(trial)
File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/ray_trial_executor.py", line 609, in fetch_result
result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 47, in wrapper
return func(*args, **kwargs)
File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/worker.py", line 1456, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(TuneError): ray::ImplicitFunc.train_buffered() (pid=4311, ip=172.16.9.2)
File "python/ray/_raylet.pyx", line 480, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 432, in ray._raylet.execute_task.function_executor
File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/trainable.py", line 167, in train_buffered
result = self.train()
File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/trainable.py", line 226, in train
result = self.step()
File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 366, in step
self._report_thread_runner_error(block=True)
File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 512, in _report_thread_runner_error
raise TuneError(
ray.tune.error.TuneError: Trial raised an exception. Traceback:
ray::ImplicitFunc.train_buffered() (pid=4311, ip=172.16.9.2)
File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 248, in run
self._entrypoint()
File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 315, in entrypoint
return self._trainable_func(self.config, self._status_reporter,
File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 576, in _trainable_func
output = fn()
File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 651, in _inner
inner(config, checkpoint_dir=None)
File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 644, in inner
fn_kwargs[k] = parameter_registry.get(prefix + k)
File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/registry.py", line 167, in get
return ray.get(self.references[k])
File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 47, in wrapper
return func(*args, **kwargs)
File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/serialization.py", line 245, in deserialize_objects
self._deserialize_object(data, metadata, object_ref))
File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/serialization.py", line 192, in _deserialize_object
return self._deserialize_msgpack_data(data, metadata_fields)
File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/serialization.py", line 170, in _deserialize_msgpack_data
python_objects = self._deserialize_pickle5_data(pickle5_data)
File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/serialization.py", line 158, in _deserialize_pickle5_data
obj = pickle.loads(in_band, buffers=buffers)
ModuleNotFoundError: No module named 'datasets_modules'
(pid=4311) 2021-05-03 08:10:31,755 ERROR function_runner.py:254 -- Runner Thread raised error.
(pid=4311) Traceback (most recent call last):
(pid=4311) File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 248, in run
(pid=4311) self._entrypoint()
(pid=4311) File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 315, in entrypoint
(pid=4311) return self._trainable_func(self.config, self._status_reporter,
(pid=4311) File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 576, in _trainable_func
(pid=4311) output = fn()
(pid=4311) File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 651, in _inner
Result for _inner_2a8cd_00000:
{}
(pid=4311) inner(config, checkpoint_dir=None)
(pid=4311) File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 644, in inner
(pid=4311) fn_kwargs[k] = parameter_registry.get(prefix + k)
(pid=4311) File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/registry.py", line 167, in get
(pid=4311) return ray.get(self.references[k])
(pid=4311) File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 47, in wrapper
(pid=4311) return func(*args, **kwargs)
(pid=4311) File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/worker.py", line 1448, in get
(pid=4311) values, debugger_breakpoint = worker.get_objects(
(pid=4311) File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/worker.py", line 319, in get_objects
(pid=4311) return self.deserialize_objects(data_metadata_pairs,
(pid=4311) File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/worker.py", line 282, in deserialize_objects
(pid=4311) return context.deserialize_objects(data_metadata_pairs, object_refs)
(pid=4311) File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/serialization.py", line 245, in deserialize_objects
(pid=4311) self._deserialize_object(data, metadata, object_ref))
(pid=4311) File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/serialization.py", line 192, in _deserialize_object
(pid=4311) return self._deserialize_msgpack_data(data, metadata_fields)
(pid=4311) File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/serialization.py", line 170, in _deserialize_msgpack_data
(pid=4311) python_objects = self._deserialize_pickle5_data(pickle5_data)
(pid=4311) File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/serialization.py", line 158, in _deserialize_pickle5_data
(pid=4311) obj = pickle.loads(in_band, buffers=buffers)
(pid=4311) ModuleNotFoundError: No module named 'datasets_modules'
(pid=4311) Exception in thread Thread-2:
(pid=4311) Traceback (most recent call last):
(pid=4311) File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/threading.py", line 932, in _bootstrap_inner
(pid=4311) self.run()
(pid=4311) File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 267, in run
(pid=4311) raise e
(pid=4311) File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 248, in run
(pid=4311) self._entrypoint()
(pid=4311) File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 315, in entrypoint
(pid=4311) return self._trainable_func(self.config, self._status_reporter,
(pid=4311) File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 576, in _trainable_func
(pid=4311) output = fn()
(pid=4311) File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 651, in _inner
(pid=4311) inner(config, checkpoint_dir=None)
(pid=4311) File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 644, in inner
(pid=4311) fn_kwargs[k] = parameter_registry.get(prefix + k)
(pid=4311) File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/registry.py", line 167, in get
(pid=4311) return ray.get(self.references[k])
(pid=4311) File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 47, in wrapper
(pid=4311) return func(*args, **kwargs)
(pid=4311) File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/worker.py", line 1448, in get
(pid=4311) values, debugger_breakpoint = worker.get_objects(
(pid=4311) File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/worker.py", line 319, in get_objects
(pid=4311) return self.deserialize_objects(data_metadata_pairs,
(pid=4311) File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/worker.py", line 282, in deserialize_objects
(pid=4311) return context.deserialize_objects(data_metadata_pairs, object_refs)
(pid=4311) File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/serialization.py", line 245, in deserialize_objects
(pid=4311) self._deserialize_object(data, metadata, object_ref))
(pid=4311) File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/serialization.py", line 192, in _deserialize_object
(pid=4311) return self._deserialize_msgpack_data(data, metadata_fields)
(pid=4311) File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/serialization.py", line 170, in _deserialize_msgpack_data
(pid=4311) python_objects = self._deserialize_pickle5_data(pickle5_data)
(pid=4311) File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/serialization.py", line 158, in _deserialize_pickle5_data
(pid=4311) obj = pickle.loads(in_band, buffers=buffers)
(pid=4311) ModuleNotFoundError: No module named 'datasets_modules'
Problem at: /media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/integration/wandb.py 197 run
python-BaseException
CondaError: KeyboardInterrupt
Process finished with exit code 137 (interrupted by signal 9: SIGKILL)
Issue Analytics
- State:
- Created 2 years ago
- Comments:22 (6 by maintainers)
ah yes! will put on todo list.
On Fri, Jun 25, 2021 at 8:14 AM Pedro Vitor Quinta de Castro < @.***> wrote:
@amogkam I believe the dataset_modules path is added upon
load_dataset
, which can occur before the creation of the Trainer.To support this, I think we need to allow the custom path to be added before the invocation of each trial.