question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

allow objects to define their own tokenize

See original GitHub issue

Recently, I found it convenient for objects to define their own tokenization. Basically, by checking if it has the attribute tokenize. Would this be something that could be supported by dask? If this has already been discussed, I apologize in advance. I searched very quickly.

Thanks!

example:

    def tokenize(*args, **kwargs):

        # now check if each argument/kwarg handle their own tokenization
        argslist = list()
        for arg in args:
            if hasattr(arg, 'tokenize'):
                argslist.append(arg.tokenize())
            else:
                argslist.append(arg)

        kwargsdict = dict()
        for key, val in kwargs.items():
            if hasattr(val, 'tokenize'):
                kwargsdict[key] = val.tokenize()
            else:
                kwargsdict[key] = val

        hsh = tokenize(*argslist, **kwargsdict, pure=True)
       
        return hsh

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:11 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
IPetrikcommented, Jun 13, 2019

For those like me coming here after Oct 2017, the method __dask_tokenize__ was added. (http://docs.dask.org/en/latest/custom-collections.html#deterministic-hashing)

1reaction
jcristcommented, May 19, 2017

maybe this should be added to the documentation?

I’m not 100% set on this being public api yet, but I wouldn’t expect it to change anytime soon so you should be fine relying on it.

Read more comments on GitHub >

github_iconTop Results From Across the Web

The tokenization pipeline - Hugging Face
Pre-tokenization is the act of splitting a text into smaller objects that give an upper bound to what your tokens will be at...
Read more >
How Do You Tokenize Assets to Create Non-Fungible Tokens ...
1. Selecting the Asset ... You can tokenize anything that you own or can create, but it's worth choosing something that might hold...
Read more >
What is Tokenization | Methods to Perform Tokenization
Tokenization is essentially splitting a phrase, sentence, paragraph, or an entire text document into smaller units, such as individual words or ...
Read more >
5 Simple Ways to Tokenize Text in Python | by Frank Andrade
Tokenization is a common task a data scientist comes across when working with text data. It consists of splitting an entire text into...
Read more >
Building a Tokenizer and a Sentencizer | Analytics Vidhya
Understanding the underlying bones that give NLP its structure ... But, since we do not create a Sentence/Token with its own string, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found