question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Improvement to error handling in subclasses

See original GitHub issue

Feature request

I encountered a fascinating (though very frustrating 😃) scenario about error handling in transformers. When subclassing a tokenizer that relies on sentencepiece, and not having it installed, you will get an unhelpful error message that sends you down a lot of rabbit holes.

Consider this minimal example:

from transformers import MBartTokenizer


class CustomMBartTokenizer(MBartTokenizer):
    @classmethod
    def from_pretrained(cls, *args, **kwargs):
        inst = super().from_pretrained(*args, **kwargs)
        # Do other stuff with it...


a = CustomMBartTokenizer.from_pretrained("facebook/mbart-large-cc25")

If you run this in a new environment where sentencepiece is not installed, you get the following error:

AttributeError: ‘super’ object has no attribute ‘from_existing’

This error message had me comparing Windows vs. Linux and python 3.8 vs 3.9 vs 3.10 because I could not figure out why it was working on my home machine and not on our cluster. In the end, the reason was that sentencepiece was not yet installed on the cluster but the error message does not show that. It seems that the sentencepiece error does not show or does not stop execution, which then leads the class to not be successfully initialized. Although admittedly I have not dug much farther.

Motivation

The error message does not seem to correctly propagate when subclassing a tokenizer. The error message that indicates that sentencepiece is not installed and needs to be installed is not correctly shown. Instead the user gets a vague error message about the from_pretrained call.

While this may be an exceptional case, I have found that subclassing tokenizers for a specific task is common in research.

Your contribution

I do not have the time to work on figuring out what the exact cause is unfortunately. Posting this here for posterity. For anyone getting this issue: you probably just need to make sure all necessary third party libraries (such as sentencepiece) are installed.

Issue Analytics

  • State:closed
  • Created 10 months ago
  • Reactions:1
  • Comments:9 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
mariosaskocommented, Nov 9, 2022

I think DummyObject should override __getattribute__ instead of __getattr__ to get the expected error.

EDIT:

Tested locally, and it works. I’ve linked a PR with the fix.

0reactions
jacwaltecommented, Nov 9, 2022

Thanks @mariosasko!

@jacwalte That is the expected behavior. sentencepiece is not installed by default because not all tokenizers need it. You’ll get the error message if you need it for your use-case and then you just have to install it manually. Or you can install transformers with the extra sentencepiece, transformers[sentencepiece]. The problem in my case was that the error message did not show up. This has now been quickly fixed by @mariosasko!

Thanks! - will update the requirements with that

Read more comments on GitHub >

github_iconTop Results From Across the Web

6 Tips to Improve Your Exception Handling - Data Pipeline
6 Tips to Improve Your Exception Handling · 1. Use a single, system-wide exception class · 2. Use enums for error codes ·...
Read more >
9 Best Practices to Handle Java Exceptions - Stackify
Today's post is going to show you nine of the most important ones you can use to get started or improve your exception...
Read more >
Exception Handling with Method Overriding in Java
An Exception is an unwanted or unexpected event, which occurs during the execution of a program i.e at run-time, that disrupts the normal ......
Read more >
Java Exceptions Hierarchy Explained - Rollbar
In Java “an event that occurs during the execution of a program that disrupts the normal flow of instructions” is called an exception....
Read more >
9 Best Practices to Handle Exceptions in Java - DZone
9 Best Practices to Handle Exceptions in Java · 1. public void doNotCloseResourceInTry() { · 1. public void closeResourceInFinally() { · 1. public ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found