Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

hyperparameter_search raytune: ModuleNotFoundError: No module named 'datasets_modules'

See original GitHub issue

Environment info

transformers version: 4.4.2
Platform: Linux-4.15.0-142-generic-x86_64-with-glibc2.10
Python version: 3.8.8
PyTorch version (GPU?): 1.6.0 (True)
Tensorflow version (GPU?): not installed (NA)
Using GPU in script?: yes
Using distributed or parallel set-up in script?: no

Who can help

@richardliaw, @amogkam

Information

Model I am using (Bert, XLNet …): Bert (neuralmind/bert-base-portuguese-cased)

The problem arises when using:

the official example scripts: (give details below)
[ x ] my own modified scripts: (give details below)

The tasks I am working on is:

[ x ] an official GLUE/SQUaD task: (give the name)
[ x ] my own task or dataset: (give details below)

I’m running a modified run_ner example to use trainer.hyperparameter_search with raytune. I’m using my own datasets, but I have run into the same issue using other glue scripts and official glue datasets, such as the ones other people ran into here:

https://discuss.huggingface.co/t/using-hyperparameter-search-in-trainer/785/34 https://discuss.huggingface.co/t/using-hyperparameter-search-in-trainer/785/35 Colab from @piegu

At first I was using the run_ner and transformers version from the current 4.6.0-dev branch, but I ran into the same issue as reported here: #11249

So I downgraded transformers and ray to 4.4.2 and 1.2.0 (creating a fresh conda environment), and made the necessary adjustments to the run_ner script, to become compatible with 4.4.2.

To reproduce

Steps to reproduce the behavior:

This is the full code from the script:

#!/usr/bin/env python
# coding: utf-8


import json
import logging
import os
import sys
import copy

from dataclasses import dataclass, field
from typing import Optional, Dict, Any

import numpy as np
from datasets import ClassLabel, load_dataset, load_metric

from ray import tune
from ray.tune.integration.wandb import WandbLogger
from ray.tune.logger import DEFAULT_LOGGERS
from ray.tune.schedulers import PopulationBasedTraining

import transformers
from transformers import (
    AutoConfig,
    AutoModelForTokenClassification,
    AutoTokenizer,
    DataCollatorForTokenClassification,
    HfArgumentParser,
    PreTrainedTokenizerFast,
    Trainer,
    TrainingArguments,
    set_seed,
)
from transformers.trainer_utils import get_last_checkpoint, is_main_process
from transformers.utils import check_min_version

# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.4.0")

logger = logging.getLogger(__name__)


@dataclass
class RayArguments:
    """[summary]
    """

    time_budget_h: str = field(
        metadata={"help": "Time budget in hours."}
    )


@dataclass
class ModelArguments:
    """
    Arguments pertaining to which model/config/tokenizer we are going to fine-tune from.
    """

    model_name_or_path: str = field(
        metadata={"help": "Path to pretrained model or model identifier from huggingface.co/models"}
    )
    config_name: Optional[str] = field(
        default=None, metadata={"help": "Pretrained config name or path if not the same as model_name"}
    )
    tokenizer_name: Optional[str] = field(
        default=None, metadata={"help": "Pretrained tokenizer name or path if not the same as model_name"}
    )
    cache_dir: Optional[str] = field(
        default=None,
        metadata={"help": "Where do you want to store the pretrained models downloaded from huggingface.co"},
    )
    model_revision: str = field(
        default="main",
        metadata={"help": "The specific model version to use (can be a branch name, tag name or commit id)."},
    )
    use_auth_token: bool = field(
        default=False,
        metadata={
            "help": "Will use the token generated when running `transformers-cli login` (necessary to use this script "
                    "with private models)."
        },
    )


@dataclass
class DataTrainingArguments:
    """
    Arguments pertaining to what data we are going to input our model for training and eval.
    """

    task_name: Optional[str] = field(default="ner", metadata={"help": "The name of the task (ner, pos...)."})
    dataset_name: Optional[str] = field(
        default=None, metadata={"help": "The name of the dataset to use (via the datasets library)."}
    )
    dataset_config_name: Optional[str] = field(
        default=None, metadata={"help": "The configuration name of the dataset to use (via the datasets library)."}
    )
    train_file: Optional[str] = field(
        default=None, metadata={"help": "The input training data file (a csv or JSON file)."}
    )
    validation_file: Optional[str] = field(
        default=None,
        metadata={"help": "An optional input evaluation data file to evaluate on (a csv or JSON file)."},
    )
    test_file: Optional[str] = field(
        default=None,
        metadata={"help": "An optional input test data file to predict on (a csv or JSON file)."},
    )
    overwrite_cache: bool = field(
        default=False, metadata={"help": "Overwrite the cached training and evaluation sets"}
    )
    preprocessing_num_workers: Optional[int] = field(
        default=None,
        metadata={"help": "The number of processes to use for the preprocessing."},
    )
    pad_to_max_length: bool = field(
        default=False,
        metadata={
            "help": "Whether to pad all samples to model maximum sentence length. "
                    "If False, will pad the samples dynamically when batching to the maximum length in the batch. More "
                    "efficient on GPU but very bad for TPU."
        },
    )
    max_train_samples: Optional[int] = field(
        default=None,
        metadata={
            "help": "For debugging purposes or quicker training, truncate the number of training examples to this "
                    "value if set."
        },
    )
    max_val_samples: Optional[int] = field(
        default=None,
        metadata={
            "help": "For debugging purposes or quicker training, truncate the number of validation examples to this "
                    "value if set."
        },
    )
    max_test_samples: Optional[int] = field(
        default=None,
        metadata={
            "help": "For debugging purposes or quicker training, truncate the number of test examples to this "
                    "value if set."
        },
    )
    label_all_tokens: bool = field(
        default=False,
        metadata={
            "help": "Whether to put the label for one word on all tokens of generated by that word or just on the "
                    "one (in which case the other tokens will have a padding index)."
        },
    )
    return_entity_level_metrics: bool = field(
        default=False,
        metadata={"help": "Whether to return all the entity levels during evaluation or just the overall ones."},
    )

    def __post_init__(self):
        if self.dataset_name is None and self.train_file is None and self.validation_file is None:
            raise ValueError("Need either a dataset name or a training/validation file.")
        else:
            if self.train_file is not None:
                extension = self.train_file.split(".")[-1]
                assert extension in ["csv", "json"], "`train_file` should be a csv or a json file."
            if self.validation_file is not None:
                extension = self.validation_file.split(".")[-1]
                assert extension in ["csv", "json"], "`validation_file` should be a csv or a json file."
        self.task_name = self.task_name.lower()


def compute_objective(metrics: Dict[str, float]) -> float:
    """
    The default objective to maximize/minimize when doing an hyperparameter search. It is the evaluation loss if no
    metrics are provided to the :class:`~transformers.Trainer`, the sum of all metrics otherwise.
    Args:
        metrics (:obj:`Dict[str, float]`): The metrics returned by the evaluate method.
    Return:
        :obj:`float`: The objective to minimize or maximize
    """
    metrics = copy.deepcopy(metrics)
    loss = metrics.pop("eval_loss", None)
    _ = metrics.pop("epoch", None)
    # Remove speed metrics
    speed_metrics = [m for m in metrics.keys() if m.endswith("_runtime") or m.endswith("_samples_per_second")]
    for sm in speed_metrics:
        _ = metrics.pop(sm, None)
    return loss if len(metrics) == 0 else sum(metrics.values())


def main():
    parser = HfArgumentParser((ModelArguments, DataTrainingArguments, TrainingArguments, RayArguments))
    model_args, data_args, training_args, ray_args = parser.parse_args_into_dataclasses()

    # Detecting last checkpoint.
    last_checkpoint = None
    if os.path.isdir(training_args.output_dir) and training_args.do_train and not training_args.overwrite_output_dir:
        last_checkpoint = get_last_checkpoint(training_args.output_dir)
        if last_checkpoint is None and len(os.listdir(training_args.output_dir)) > 0:
            raise ValueError(
                f"Output directory ({training_args.output_dir}) already exists and is not empty. "
                "Use --overwrite_output_dir to overcome."
            )
        elif last_checkpoint is not None:
            logger.info(
                f"Checkpoint detected, resuming training at {last_checkpoint}. To avoid this behavior, change "
                "the `--output_dir` or add `--overwrite_output_dir` to train from scratch."
            )

    # Setup logging
    logging.basicConfig(
        format="%(asctime)s - %(levelname)s - %(name)s -   %(message)s",
        datefmt="%m/%d/%Y %H:%M:%S",
        handlers=[logging.StreamHandler(sys.stdout)],
    )
    logger.setLevel(logging.INFO if is_main_process(training_args.local_rank) else logging.WARN)

    # Log on each process the small summary:
    logger.warning(
        f"Process rank: {training_args.local_rank}, device: {training_args.device}, n_gpu: {training_args.n_gpu}"
        + f"distributed training: {bool(training_args.local_rank != -1)}, 16-bits training: {training_args.fp16}"
    )
    # Set the verbosity to info of the Transformers logger (on main process only):
    if is_main_process(training_args.local_rank):
        transformers.utils.logging.set_verbosity_info()
        transformers.utils.logging.enable_default_handler()
        transformers.utils.logging.enable_explicit_format()
    logger.info("Training/evaluation parameters %s", training_args)

    # Set seed before initializing model.
    set_seed(training_args.seed)

    # Get the datasets: you can either provide your own CSV/JSON/TXT training and evaluation files (see below)
    # or just provide the name of one of the public datasets available on the hub at https://huggingface.co/datasets/
    # (the dataset will be downloaded automatically from the datasets Hub).
    #
    # For CSV/JSON files, this script will use the column called 'text' or the first column if no column called
    # 'text' is found. You can easily tweak this behavior (see below).
    #
    # In distributed training, the load_dataset function guarantee that only one local process can concurrently
    # download the dataset.
    if data_args.dataset_name is not None:
        # Downloading and loading a dataset from the hub.
        datasets = load_dataset(data_args.dataset_name, data_args.dataset_config_name)
    else:
        data_files = {}
        if data_args.train_file is not None:
            data_files["train"] = data_args.train_file
        if data_args.validation_file is not None:
            data_files["validation"] = data_args.validation_file
        if data_args.test_file is not None:
            data_files["test"] = data_args.test_file
        extension = data_args.train_file.split(".")[-1]
        datasets = load_dataset(extension, data_files=data_files)
    # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at
    # https://huggingface.co/docs/datasets/loading_datasets.html.

    if training_args.do_train:
        column_names = datasets["train"].column_names
        features = datasets["train"].features
    else:
        column_names = datasets["validation"].column_names
        features = datasets["validation"].features
    text_column_name = "tokens" if "tokens" in column_names else column_names[0]
    label_column_name = (
        f"{data_args.task_name}_tags" if f"{data_args.task_name}_tags" in column_names else column_names[1]
    )

    # In the event the labels are not a `Sequence[ClassLabel]`, we will need to go through the dataset to get the
    # unique labels.
    def get_label_list(labels):
        unique_labels = set()
        for label in labels:
            unique_labels = unique_labels | set(label)
        label_list = list(unique_labels)
        label_list.sort()
        return label_list

    if isinstance(features[label_column_name].feature, ClassLabel):
        label_list = features[label_column_name].feature.names
        # No need to convert the labels since they are already ints.
        label_to_id = {i: i for i in range(len(label_list))}
    else:
        label_list = get_label_list(datasets["train"][label_column_name])
        label_to_id = {l: i for i, l in enumerate(label_list)}
    num_labels = len(label_list)

    # Load pretrained model and tokenizer
    #
    # Distributed training:
    # The .from_pretrained methods guarantee that only one local process can concurrently
    # download model & vocab.
    config = AutoConfig.from_pretrained(
        model_args.config_name if model_args.config_name else model_args.model_name_or_path,
        num_labels=num_labels,
        finetuning_task=data_args.task_name,
        cache_dir=model_args.cache_dir,
        revision=model_args.model_revision,
        use_auth_token=True if model_args.use_auth_token else None,
    )
    tokenizer = AutoTokenizer.from_pretrained(
        model_args.tokenizer_name if model_args.tokenizer_name else model_args.model_name_or_path,
        cache_dir=model_args.cache_dir,
        use_fast=True,
        revision=model_args.model_revision,
        use_auth_token=True if model_args.use_auth_token else None,
        model_max_length=512
    )
    model = AutoModelForTokenClassification.from_pretrained(
        model_args.model_name_or_path,
        from_tf=bool(".ckpt" in model_args.model_name_or_path),
        config=config,
        cache_dir=model_args.cache_dir,
        revision=model_args.model_revision,
        use_auth_token=True if model_args.use_auth_token else None,
    )

    # Tokenizer check: this script requires a fast tokenizer.
    if not isinstance(tokenizer, PreTrainedTokenizerFast):
        raise ValueError(
            "This example script only works for models that have a fast tokenizer. Checkout the big table of models "
            "at https://huggingface.co/transformers/index.html#bigtable to find the model types that meet this "
            "requirement"
        )

    # Preprocessing the dataset
    # Padding strategy
    padding = "max_length" if data_args.pad_to_max_length else False

    # Tokenize all texts and align the labels with them.
    def tokenize_and_align_labels(examples):
        tokenized_inputs = tokenizer(
            examples[text_column_name],
            padding=padding,
            truncation=True,
            # We use this argument because the texts in our dataset are lists of words (with a label for each word).
            is_split_into_words=True,
        )
        labels = []
        for i, label in enumerate(examples[label_column_name]):
            word_ids = tokenized_inputs.word_ids(batch_index=i)
            previous_word_idx = None
            label_ids = []
            for word_idx in word_ids:
                # Special tokens have a word id that is None. We set the label to -100 so they are automatically
                # ignored in the loss function.
                if word_idx is None:
                    label_ids.append(-100)
                # We set the label for the first token of each word.
                elif word_idx != previous_word_idx:
                    label_ids.append(label_to_id[label[word_idx]])
                # For the other tokens in a word, we set the label to either the current label or -100, depending on
                # the label_all_tokens flag.
                else:
                    label_ids.append(label_to_id[label[word_idx]] if data_args.label_all_tokens else -100)
                previous_word_idx = word_idx

            labels.append(label_ids)
        tokenized_inputs["labels"] = labels
        return tokenized_inputs

    if training_args.do_train:
        if "train" not in datasets:
            raise ValueError("--do_train requires a train dataset")
        train_dataset = datasets["train"]
        if data_args.max_train_samples is not None:
            train_dataset = train_dataset.select(range(data_args.max_train_samples))
        train_dataset = train_dataset.map(
            tokenize_and_align_labels,
            batched=True,
            num_proc=data_args.preprocessing_num_workers,
            load_from_cache_file=not data_args.overwrite_cache,
        )

    if training_args.do_eval:
        if "validation" not in datasets:
            raise ValueError("--do_eval requires a validation dataset")
        eval_dataset = datasets["validation"]
        if data_args.max_val_samples is not None:
            eval_dataset = eval_dataset.select(range(data_args.max_val_samples))
        eval_dataset = eval_dataset.map(
            tokenize_and_align_labels,
            batched=True,
            num_proc=data_args.preprocessing_num_workers,
            load_from_cache_file=not data_args.overwrite_cache,
        )

    if training_args.do_predict:
        if "test" not in datasets:
            raise ValueError("--do_predict requires a test dataset")
        test_dataset = datasets["test"]
        if data_args.max_test_samples is not None:
            test_dataset = test_dataset.select(range(data_args.max_test_samples))
        test_dataset = test_dataset.map(
            tokenize_and_align_labels,
            batched=True,
            num_proc=data_args.preprocessing_num_workers,
            load_from_cache_file=not data_args.overwrite_cache,
        )

    # Data collator
    data_collator = DataCollatorForTokenClassification(tokenizer, pad_to_multiple_of=8 if training_args.fp16 else None)

    # Metrics
    metric = load_metric("seqeval")

    def compute_metrics(p):
        predictions, labels = p
        predictions = np.argmax(predictions, axis=2)

        # Remove ignored index (special tokens)
        true_predictions = [
            [label_list[p] for (p, l) in zip(prediction, label) if l != -100]
            for prediction, label in zip(predictions, labels)
        ]
        true_labels = [
            [label_list[l] for (p, l) in zip(prediction, label) if l != -100]
            for prediction, label in zip(predictions, labels)
        ]

        results = metric.compute(predictions=true_predictions, references=true_labels)
        if data_args.return_entity_level_metrics:
            # Unpack nested dictionaries
            final_results = {}
            for key, value in results.items():
                if isinstance(value, dict):
                    for n, v in value.items():
                        final_results[f"{key}_{n}"] = v
                else:
                    final_results[key] = value
            return final_results
        else:
            return {
                "precision": results["overall_precision"],
                "recall": results["overall_recall"],
                "f1": results["overall_f1"],
                "accuracy": results["overall_accuracy"],
            }

    def model_init():
        model = AutoModelForTokenClassification.from_pretrained(
            model_args.model_name_or_path,
            from_tf=bool(".ckpt" in model_args.model_name_or_path),
            config=config,
            cache_dir=model_args.cache_dir,
            revision=model_args.model_revision,
            use_auth_token=True if model_args.use_auth_token else None,
        )
        return model

    class CustomTrainer(Trainer):

        def __init__(self, *args, **kwargs):
            super(CustomTrainer, self).__init__(*args, **kwargs)

        def _hp_search_setup(self, trial: Any):
            try:
                trial.pop('wandb', None)
            except AttributeError:
                pass
            super(CustomTrainer, self)._hp_search_setup(trial)

    # Initialize our Trainer
    trainer = CustomTrainer(
        model_init=model_init,
        args=training_args,
        train_dataset=train_dataset,
        eval_dataset=eval_dataset if training_args.do_eval else None,
        compute_metrics=compute_metrics,
        tokenizer=tokenizer,
        data_collator=data_collator,
    )

    # Hyperparameter Search
    def hp_space_fn(*args, **kwargs):
        config = {
            "seed": tune.choice([42, 43, 44]),
            "weight_decay": tune.choice([0.0, 0.1, 0.2, 0.3]),
            "adam_epsilon": tune.choice([1e-6, 1e-7, 1e-8]),
            "max_grad_norm": tune.choice([1.0, 2.0]),
            "warmup_steps": tune.choice([50, 100, 500, 1000]),
            "learning_rate": tune.choice([2e-5, 3e-5, 4e-5, 5e-5]),
            "num_train_epochs": tune.quniform(0.0, 8.0, 0.5),
        }
        wandb_config = {
            "wandb": {
                "project": "hf-ner-testing",
                "api_key": os.environ.get("API_KEY"),
                "log_config": True
            }
        }
        config.update(wandb_config)
        return config

    time_budget_h = 60 * 60 * int(ray_args.time_budget_h)

    best_run = trainer.hyperparameter_search(
        direction="maximize",
        backend="ray",
        scheduler=PopulationBasedTraining(
            time_attr='time_total_s',
            metric='eval_f1',
            mode='max',
            perturbation_interval=600.0
        ),
        hp_space=hp_space_fn,
        loggers=DEFAULT_LOGGERS + (WandbLogger,),
        time_budget_s=time_budget_h,
        keep_checkpoints_num=1,
        checkpoint_score_attr='eval_f1',
        compute_objective=compute_objective
    )

    output_params_file = os.path.join(
        training_args.output_dir,
        "best_run.json"
    )

    with open(output_params_file, "w") as f:
        json.dump(
            best_run.hyperparameters,
            f,
            indent=4)

    return best_run


if __name__ == "__main__":
    main()

And these are the args I used for running it:

--model_name_or_path neuralmind/bert-base-portuguese-cased
--train_file train.json
--validation_file dev.json
--output_dir output
--do_train
--do_eval
--evaluation_strategy steps
--per_device_train_batch_size=2
--per_device_eval_batch_size=2
--time_budget_h 2

This is the full output log:

/media/discoD/anaconda3/envs/transformers/bin/python /media/discoD/pycharm-community-2019.2/plugins/python-ce/helpers/pydev/pydevd.py --multiproc --qt-support=auto --client 127.0.0.1 --port 38419 --file /media/discoD/repositorios/transformers_pedro/examples/pytorch/token-classification/run_ner_hp_search_442.py --model_name_or_path neuralmind/bert-base-portuguese-cased --train_file train.json --validation_file dev.json --output_dir transformers-hp --do_train --do_eval --evaluation_strategy steps --per_device_train_batch_size=2 --per_device_eval_batch_size=2 --time_budget_h 2
Connected to pydev debugger (build 211.7142.13)
05/03/2021 08:10:04 - WARNING - __main__ -   Process rank: -1, device: cuda:0, n_gpu: 1distributed training: False, 16-bits training: False
05/03/2021 08:10:04 - INFO - __main__ -   Training/evaluation parameters TrainingArguments(output_dir=transformers-hp, overwrite_output_dir=False, do_train=True, do_eval=True, do_predict=False, evaluation_strategy=IntervalStrategy.STEPS, prediction_loss_only=False, per_device_train_batch_size=2, per_device_eval_batch_size=2, gradient_accumulation_steps=1, eval_accumulation_steps=None, learning_rate=5e-05, weight_decay=0.0, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=3.0, max_steps=-1, lr_scheduler_type=SchedulerType.LINEAR, warmup_ratio=0.0, warmup_steps=0, logging_dir=runs/May03_08-10-04_user-XPS-8700, logging_strategy=IntervalStrategy.STEPS, logging_first_step=False, logging_steps=500, save_strategy=IntervalStrategy.STEPS, save_steps=500, save_total_limit=None, no_cuda=False, seed=42, fp16=False, fp16_opt_level=O1, fp16_backend=auto, fp16_full_eval=False, local_rank=-1, tpu_num_cores=None, tpu_metrics_debug=False, debug=False, dataloader_drop_last=False, eval_steps=500, dataloader_num_workers=0, past_index=-1, run_name=transformers-hp, disable_tqdm=False, remove_unused_columns=True, label_names=None, load_best_model_at_end=False, metric_for_best_model=None, greater_is_better=None, ignore_data_skip=False, sharded_ddp=[], deepspeed=None, label_smoothing_factor=0.0, adafactor=False, group_by_length=False, report_to=['tensorboard', 'wandb'], ddp_find_unused_parameters=None, dataloader_pin_memory=True, skip_memory_metrics=False, _n_gpu=1)
05/03/2021 08:10:04 - WARNING - datasets.builder -   Using custom data configuration default-438421c06175ed26
05/03/2021 08:10:04 - WARNING - datasets.builder -   Reusing dataset json (/home/user/.cache/huggingface/datasets/json/default-438421c06175ed26/0.0.0/83d5b3a2f62630efc6b5315f00f20209b4ad91a00ac586597caee3a4da0bef02)
[INFO|configuration_utils.py:463] 2021-05-03 08:10:06,050 >> loading configuration file https://huggingface.co/neuralmind/bert-base-portuguese-cased/resolve/main/config.json from cache at /home/user/.cache/huggingface/transformers/e716e2151985ba669e7197b64cdde2552acee146494d40ffaf0688a3f152e6ed.18a0b8b86f3ebd4c8a1d8d6199178feae9971ff5420f1d12f0ed8326ffdff716
[INFO|configuration_utils.py:499] 2021-05-03 08:10:06,063 >> Model config BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "directionality": "bidi",
  "finetuning_task": "ner",
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2",
    "3": "LABEL_3",
    "4": "LABEL_4",
    "5": "LABEL_5",
    "6": "LABEL_6",
    "7": "LABEL_7",
    "8": "LABEL_8",
    "9": "LABEL_9",
    "10": "LABEL_10",
    "11": "LABEL_11",
    "12": "LABEL_12"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_10": 10,
    "LABEL_11": 11,
    "LABEL_12": 12,
    "LABEL_2": 2,
    "LABEL_3": 3,
    "LABEL_4": 4,
    "LABEL_5": 5,
    "LABEL_6": 6,
    "LABEL_7": 7,
    "LABEL_8": 8,
    "LABEL_9": 9
  },
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "output_past": true,
  "pad_token_id": 0,
  "pooler_fc_size": 768,
  "pooler_num_attention_heads": 12,
  "pooler_num_fc_layers": 3,
  "pooler_size_per_head": 128,
  "pooler_type": "first_token_transform",
  "position_embedding_type": "absolute",
  "transformers_version": "4.4.2",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 29794
}

[INFO|configuration_utils.py:463] 2021-05-03 08:10:06,767 >> loading configuration file https://huggingface.co/neuralmind/bert-base-portuguese-cased/resolve/main/config.json from cache at /home/user/.cache/huggingface/transformers/e716e2151985ba669e7197b64cdde2552acee146494d40ffaf0688a3f152e6ed.18a0b8b86f3ebd4c8a1d8d6199178feae9971ff5420f1d12f0ed8326ffdff716
[INFO|configuration_utils.py:499] 2021-05-03 08:10:06,777 >> Model config BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "directionality": "bidi",
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "output_past": true,
  "pad_token_id": 0,
  "pooler_fc_size": 768,
  "pooler_num_attention_heads": 12,
  "pooler_num_fc_layers": 3,
  "pooler_size_per_head": 128,
  "pooler_type": "first_token_transform",
  "position_embedding_type": "absolute",
  "transformers_version": "4.4.2",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 29794
}

[INFO|tokenization_utils_base.py:1702] 2021-05-03 08:10:09,936 >> loading file https://huggingface.co/neuralmind/bert-base-portuguese-cased/resolve/main/vocab.txt from cache at /home/user/.cache/huggingface/transformers/aa6d50227b77416b26162efcf0cc9e9a702d13920840322060a2b41a44a8aff4.af25fb1e29ad0175300146695fd80069be69b211c52fa5486fa8aae2754cc814
[INFO|tokenization_utils_base.py:1702] 2021-05-03 08:10:09,936 >> loading file https://huggingface.co/neuralmind/bert-base-portuguese-cased/resolve/main/tokenizer.json from cache at None
[INFO|tokenization_utils_base.py:1702] 2021-05-03 08:10:09,937 >> loading file https://huggingface.co/neuralmind/bert-base-portuguese-cased/resolve/main/added_tokens.json from cache at /home/user/.cache/huggingface/transformers/9188d297517828a862f4e0b0700968574ca7ad38fbc0832c409bf7a9e5576b74.5cc6e825eb228a7a5cfd27cb4d7151e97a79fb962b31aaf1813aa102e746584b
[INFO|tokenization_utils_base.py:1702] 2021-05-03 08:10:09,937 >> loading file https://huggingface.co/neuralmind/bert-base-portuguese-cased/resolve/main/special_tokens_map.json from cache at /home/user/.cache/huggingface/transformers/eecc45187d085a1169eed91017d358cc0e9cbdd5dc236bcd710059dbf0a2f816.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d
[INFO|tokenization_utils_base.py:1702] 2021-05-03 08:10:09,938 >> loading file https://huggingface.co/neuralmind/bert-base-portuguese-cased/resolve/main/tokenizer_config.json from cache at /home/user/.cache/huggingface/transformers/f1a9ba41d40e8c6f5ba4988aa2f7702c3b43768183e4b82483e04f2848841ecf.a6c00251b9344c189e2419373d6033016d0cd3d87ea59f6c86069046ac81956d
[INFO|modeling_utils.py:1051] 2021-05-03 08:10:10,709 >> loading weights file https://huggingface.co/neuralmind/bert-base-portuguese-cased/resolve/main/pytorch_model.bin from cache at /home/user/.cache/huggingface/transformers/1e42c907c340c902923496246dae63e33f64955c529720991b7ec5543a98e442.fa492fca6dcee85bef053cc60912a211feb1f7173129e4eb1a5164e817f2f5f2
[WARNING|modeling_utils.py:1158] 2021-05-03 08:10:13,606 >> Some weights of the model checkpoint at neuralmind/bert-base-portuguese-cased were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
[WARNING|modeling_utils.py:1169] 2021-05-03 08:10:13,607 >> Some weights of BertForTokenClassification were not initialized from the model checkpoint at neuralmind/bert-base-portuguese-cased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
100%|██████████| 7/7 [00:02<00:00,  3.06ba/s]
100%|██████████| 2/2 [00:00<00:00,  3.13ba/s]
[INFO|modeling_utils.py:1051] 2021-05-03 08:10:19,160 >> loading weights file https://huggingface.co/neuralmind/bert-base-portuguese-cased/resolve/main/pytorch_model.bin from cache at /home/user/.cache/huggingface/transformers/1e42c907c340c902923496246dae63e33f64955c529720991b7ec5543a98e442.fa492fca6dcee85bef053cc60912a211feb1f7173129e4eb1a5164e817f2f5f2
[WARNING|modeling_utils.py:1158] 2021-05-03 08:10:22,280 >> Some weights of the model checkpoint at neuralmind/bert-base-portuguese-cased were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
[WARNING|modeling_utils.py:1169] 2021-05-03 08:10:22,280 >> Some weights of BertForTokenClassification were not initialized from the model checkpoint at neuralmind/bert-base-portuguese-cased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[INFO|trainer.py:482] 2021-05-03 08:10:24,327 >> The following columns in the training set  don't have a corresponding argument in `BertForTokenClassification.forward` and have been ignored: ner_tags, tokens.
[INFO|trainer.py:482] 2021-05-03 08:10:24,334 >> The following columns in the evaluation set  don't have a corresponding argument in `BertForTokenClassification.forward` and have been ignored: ner_tags, tokens.
[INFO|integrations.py:184] 2021-05-03 08:10:24,396 >> No `resources_per_trial` arg was passed into `hyperparameter_search`. Setting it to a default value of 1 CPU and 1 GPU for each trial.
2021-05-03 08:10:25,807	INFO services.py:1172 -- View the Ray dashboard at http://127.0.0.1:8265
2021-05-03 08:10:27,788	WARNING function_runner.py:540 -- Function checkpointing is disabled. This may result in unexpected behavior when using checkpointing features or certain schedulers. To enable, set the train function arguments to be `func(config, checkpoint_dir=None)`.
== Status ==
Memory usage on this node: 21.2/31.4 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 1/8 CPUs, 1/1 GPUs, 0.0/7.67 GiB heap, 0.0/2.64 GiB objects (0/1.0 accelerator_type:GTX)
Result logdir: /home/user/ray_results/_inner_2021-05-03_08-10-27
Number of trials: 1/20 (1 RUNNING)
+--------------------+----------+-------+----------------+-----------------+-----------------+--------------------+--------+----------------+----------------+
| Trial name         | status   | loc   |   adam_epsilon |   learning_rate |   max_grad_norm |   num_train_epochs |   seed |   warmup_steps |   weight_decay |
|--------------------+----------+-------+----------------+-----------------+-----------------+--------------------+--------+----------------+----------------|
| _inner_2a8cd_00000 | RUNNING  |       |          1e-06 |           4e-05 |               2 |                  3 |     42 |            500 |              0 |
+--------------------+----------+-------+----------------+-----------------+-----------------+--------------------+--------+----------------+----------------+


wandb: Currently logged in as: pvcastro (use `wandb login --relogin` to force relogin)
2021-05-03 08:10:31,794	ERROR trial_runner.py:616 -- Trial _inner_2a8cd_00000: Error processing event.
Traceback (most recent call last):
  File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/trial_runner.py", line 586, in _process_trial
    results = self.trial_executor.fetch_result(trial)
  File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/ray_trial_executor.py", line 609, in fetch_result
    result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
  File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 47, in wrapper
    return func(*args, **kwargs)
  File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/worker.py", line 1456, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(TuneError): ray::ImplicitFunc.train_buffered() (pid=4311, ip=172.16.9.2)
  File "python/ray/_raylet.pyx", line 480, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 432, in ray._raylet.execute_task.function_executor
  File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/trainable.py", line 167, in train_buffered
    result = self.train()
  File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/trainable.py", line 226, in train
    result = self.step()
  File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 366, in step
    self._report_thread_runner_error(block=True)
  File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 512, in _report_thread_runner_error
    raise TuneError(
ray.tune.error.TuneError: Trial raised an exception. Traceback:
ray::ImplicitFunc.train_buffered() (pid=4311, ip=172.16.9.2)
  File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 248, in run
    self._entrypoint()
  File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 315, in entrypoint
    return self._trainable_func(self.config, self._status_reporter,
  File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 576, in _trainable_func
    output = fn()
  File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 651, in _inner
    inner(config, checkpoint_dir=None)
  File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 644, in inner
    fn_kwargs[k] = parameter_registry.get(prefix + k)
  File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/registry.py", line 167, in get
    return ray.get(self.references[k])
  File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 47, in wrapper
    return func(*args, **kwargs)
  File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/serialization.py", line 245, in deserialize_objects
    self._deserialize_object(data, metadata, object_ref))
  File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/serialization.py", line 192, in _deserialize_object
    return self._deserialize_msgpack_data(data, metadata_fields)
  File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/serialization.py", line 170, in _deserialize_msgpack_data
    python_objects = self._deserialize_pickle5_data(pickle5_data)
  File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/serialization.py", line 158, in _deserialize_pickle5_data
    obj = pickle.loads(in_band, buffers=buffers)
ModuleNotFoundError: No module named 'datasets_modules'
(pid=4311) 2021-05-03 08:10:31,755	ERROR function_runner.py:254 -- Runner Thread raised error.
(pid=4311) Traceback (most recent call last):
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 248, in run
(pid=4311)     self._entrypoint()
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 315, in entrypoint
(pid=4311)     return self._trainable_func(self.config, self._status_reporter,
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 576, in _trainable_func
(pid=4311)     output = fn()
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 651, in _inner
Result for _inner_2a8cd_00000:
  {}
  
(pid=4311)     inner(config, checkpoint_dir=None)
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 644, in inner
(pid=4311)     fn_kwargs[k] = parameter_registry.get(prefix + k)
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/registry.py", line 167, in get
(pid=4311)     return ray.get(self.references[k])
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 47, in wrapper
(pid=4311)     return func(*args, **kwargs)
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/worker.py", line 1448, in get
(pid=4311)     values, debugger_breakpoint = worker.get_objects(
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/worker.py", line 319, in get_objects
(pid=4311)     return self.deserialize_objects(data_metadata_pairs,
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/worker.py", line 282, in deserialize_objects
(pid=4311)     return context.deserialize_objects(data_metadata_pairs, object_refs)
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/serialization.py", line 245, in deserialize_objects
(pid=4311)     self._deserialize_object(data, metadata, object_ref))
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/serialization.py", line 192, in _deserialize_object
(pid=4311)     return self._deserialize_msgpack_data(data, metadata_fields)
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/serialization.py", line 170, in _deserialize_msgpack_data
(pid=4311)     python_objects = self._deserialize_pickle5_data(pickle5_data)
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/serialization.py", line 158, in _deserialize_pickle5_data
(pid=4311)     obj = pickle.loads(in_band, buffers=buffers)
(pid=4311) ModuleNotFoundError: No module named 'datasets_modules'
(pid=4311) Exception in thread Thread-2:
(pid=4311) Traceback (most recent call last):
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/threading.py", line 932, in _bootstrap_inner
(pid=4311)     self.run()
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 267, in run
(pid=4311)     raise e
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 248, in run
(pid=4311)     self._entrypoint()
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 315, in entrypoint
(pid=4311)     return self._trainable_func(self.config, self._status_reporter,
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 576, in _trainable_func
(pid=4311)     output = fn()
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 651, in _inner
(pid=4311)     inner(config, checkpoint_dir=None)
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/function_runner.py", line 644, in inner
(pid=4311)     fn_kwargs[k] = parameter_registry.get(prefix + k)
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/registry.py", line 167, in get
(pid=4311)     return ray.get(self.references[k])
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 47, in wrapper
(pid=4311)     return func(*args, **kwargs)
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/worker.py", line 1448, in get
(pid=4311)     values, debugger_breakpoint = worker.get_objects(
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/worker.py", line 319, in get_objects
(pid=4311)     return self.deserialize_objects(data_metadata_pairs,
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/worker.py", line 282, in deserialize_objects
(pid=4311)     return context.deserialize_objects(data_metadata_pairs, object_refs)
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/serialization.py", line 245, in deserialize_objects
(pid=4311)     self._deserialize_object(data, metadata, object_ref))
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/serialization.py", line 192, in _deserialize_object
(pid=4311)     return self._deserialize_msgpack_data(data, metadata_fields)
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/serialization.py", line 170, in _deserialize_msgpack_data
(pid=4311)     python_objects = self._deserialize_pickle5_data(pickle5_data)
(pid=4311)   File "/media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/serialization.py", line 158, in _deserialize_pickle5_data
(pid=4311)     obj = pickle.loads(in_band, buffers=buffers)
(pid=4311) ModuleNotFoundError: No module named 'datasets_modules'
Problem at: /media/discoD/anaconda3/envs/transformers/lib/python3.8/site-packages/ray/tune/integration/wandb.py 197 run
python-BaseException

CondaError: KeyboardInterrupt


Process finished with exit code 137 (interrupted by signal 9: SIGKILL)

Issue Analytics

State:
Created 2 years ago
Comments:22 (6 by maintainers)

Top GitHub Comments

2reactions

richardliawcommented, Jun 25, 2021

ah yes! will put on todo list.

On Fri, Jun 25, 2021 at 8:14 AM Pedro Vitor Quinta de Castro < @.***> wrote:

@richardliaw https://github.com/richardliaw @amogkam https://github.com/amogkam anyone working on this?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/huggingface/transformers/issues/11565#issuecomment-868571102, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABCRZZLICPENUETSL5PXSJLTUSMNHANCNFSM44AUIF4Q .

1reaction

richardliawcommented, May 5, 2021

@amogkam I believe the dataset_modules path is added upon load_dataset, which can occur before the creation of the Trainer.

To support this, I think we need to allow the custom path to be added before the invocation of each trial.

Top Results From Across the Web

Module not found when in tuning jons - Ray.io

I am using ray tune for optimizing some deep learning model. ... (TemporaryActor pid=90906) ModuleNotFoundError: No module named 'mlmod'.

Using hyperparameter-search in Trainer - Transformers

Note that you can use pretty much anything in optuna and Ray Tune by just ... ModuleNotFoundError: No module named 'datasets_modules'.

ImportError: No module named datasets - Stack Overflow

You can find the folder address on your device and append it to system path. import sys ...

How to Grid Search Hyperparameters for Deep Learning ...

How to grid search common neural network parameters, such as learning rate, dropout rate, epochs, and number of neurons; How to define your...

Sklearn sample with replacement - miocittadino.it

GridSearchCV replacement checkout Scikit-learn hyperparameter search wrapper ... stats. cross_validation” modulenotfounderror: no module named 'sklearn.