Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

beartype performance is substantially slower than no beartype

See original GitHub issue

Hello,

I was quite interested when I found this stackoverflow answer about beartype… As a POC, I cooked up a performance test using beartype, Enthought traits, traitlets, and plain-ole-python-ducktyping…

But… I found that beartype was pretty slow in my test… As an attempt to be as fair as possible, I used assert to enforce types in my duck-typed function (ref - main_duck_assert())…

I also confess that Enthought traits are compiled, so the Enthought traits data below is mostly just an FYI…

Running my comparison 100,000 times…

$ python test_type.py
timeit duck getattr time: 0.0395 seconds
timeit duck assert  time: 0.0417 seconds
timeit traits       time: 0.0633 seconds
timeit traitlets    time: 0.5236 seconds
timeit bear         time: 0.0782 seconds
$

Question Am I doing something wrong with bear-typing (ref my POC code below)? Is there a way to improve the “beartyped” performance?

My rig…

Linux under VMWare (running on a Lenovo T430); kernel version 4.19.0-12-amd64
Python 3.7.0
beartype version 0.8.1
Enthought traits version 6.3.0
traitlets, version 5.1.0

from beartype import beartype

from traits.api import HasTraits as eHasTraits
from traits.api import Unicode as eUnicode
from traits.api import Int as eInt

from traitlets import HasTraits as tHasTraitlets
from traitlets import Unicode as tUnicode
from traitlets import Integer as tInteger

from timeit import timeit

def main_duck_getattr(arg01="__undefined__", arg02=0):
    """Proof of concept code implenting duck-typed args and getattr"""
    getattr(arg01, "capitalize")  # Type-checking with attributes
    getattr(arg02, "to_bytes")    # Type-checking with attributes

    str_len = len(arg01) + arg02
    getattr(str_len, "to_bytes")
    return ("duck_bar", str_len,)

def main_duck_assert(arg01="__undefined__", arg02=0):
    """Proof of concept code implenting duck-typed args and assert"""
    assert isinstance(arg01, str)
    assert isinstance(arg02, int)

    str_len = len(arg01) + arg02
    assert isinstance(str_len, int)
    return ("duck_bar", str_len,)


class MainTraits(eHasTraits):
    """Proof of concept code implenting Enthought traits args"""
    arg01 = eUnicode()
    arg02 = eInt()
    def __init__(self, *args, **kwargs):
        super(MainTraits, self).__init__(*args, **kwargs)

    def run(self, arg01="__undefined__", arg02=0):
        self.arg01 = arg01
        self.arg02 = arg02
        self.str_len = len(self.arg01) + self.arg02
        return ("traits_bar", self.str_len)

class MainTraitlets(tHasTraitlets):
    """Proof of concept code implenting traitlets args"""
    arg01 = tUnicode()
    arg02 = tInteger()
    def __init__(self, *args, **kwargs):
        super(MainTraitlets, self).__init__(*args, **kwargs)

    def run(self, arg01="__undefined__", arg02=0):
        self.arg01 = arg01
        self.arg02 = arg02
        self.str_len = len(self.arg01) + self.arg02
        return ("traitlets_bar", self.str_len)

@beartype
def main_bear(arg01: str="__undefined__", arg02: int=0) -> tuple:
    """Proof of concept code implenting bear-typed args"""
    str_len = len(arg01) + arg02
    return ("bear_bar", str_len,)

if __name__=="__main__":
    num_loops = 100000

    duck_result_getattr = timeit('main_duck_getattr("foo", 1)', setup="from __main__ import main_duck_getattr", number=num_loops)
    print("timeit duck getattr time:", round(duck_result_getattr, 4), "seconds")

    duck_result_assert = timeit('main_duck_assert("foo", 1)', setup="from __main__ import main_duck_assert", number=num_loops)
    print("timeit duck assert  time:", round(duck_result_assert, 4), "seconds")

    traits_result = timeit('mm.run("foo", 1)', setup="from __main__ import MainTraits;mm = MainTraits()", number=num_loops)
    print("timeit traits       time:", round(traits_result, 4), "seconds")

    traitlets_result = timeit('tt.run("foo", 1)', setup="from __main__ import MainTraitlets;tt = MainTraitlets()", number=num_loops)
    print("timeit traitlets    time:", round(traitlets_result, 4), "seconds")

    bear_result = timeit('main_bear("foo", 1)', setup="from __main__ import main_bear", number=num_loops)
    print("timeit bear         time:", round(bear_result, 4), "seconds")

Issue Analytics

State:
Created 2 years ago
Comments:13 (7 by maintainers)

Top GitHub Comments

3reactions

leyceccommented, Oct 9, 2021

I found that beartype was much slower than no bear-typing in my test…

Hah-hah! I love fielding questions like this, because overly scrupulous fixation on efficiency is my middle name(s).

Thankfully, according to the wizened sages of old and our own timeit timings, @beartype is still as blazing fast at call time as it always was. In general, @beartype adds anywhere from 1µsec (i.e., 10^-6 seconds) in the worst case to 0.01µsec (i.e., 10^-8 seconds) in the best case of call-time overhead to each decorated callable. This superficially seems reasonable – but is it?

Let’s delve deeper.

Formulaic Formulas: They’re Back in Fashion

First, let’s formalize how exactly we arrive at the call-time overheads above.

Given any pair of reasonably fair timings (which yours absolutely are) between an undecorated callable and its equivalent @beartype-decorated callable, let:

n be the number of times (i.e., loop iterations) each callable is repetitiously called.
γ be the total time in seconds of all calls to that undecorated callable.
λ be the total time in seconds of all calls to that @beartype-decorated callable.

Then the call-time overhead Δ(n, γ, λ) added by @beartype to each call is:

Δ(n, γ, λ) = λ/n - γ/n

Plugging in n = 100000, γ = 0.0435s, and λ = 0.0823s from your excellent timings, we see that @beartype on average adds call-time overhead of 0.388µsec to each decorated call: e.g.,

Δ(100000, 0.0435s, 0.0823s) = 0.0823s/100000 - 0.0435s/100000
Δ(100000, 0.0435s, 0.0823s) = 3.8800000000000003e-07s

Again, this superficially seems reasonable – but is it? Let’s delve deeper.

Function Call Overhead: The New Glass Ceiling

Next, the added cost of calling @beartype-decorated callables is a residual artifact of the added cost of stack frames (i.e., function and method calls) in Python. The mere act of calling any pure-Python callable adds a measurable overhead – even if the body of that callable is just a noop doing absolutely nothing. This is the minimal cost of Python function calls.

Since Python decorators almost always add at least one additional stack frame (typically as a closure call) to the call stack of each decorated call, this measurable overhead is the minimal cost of doing business with Python decorators. Even the fastest possible Python decorator necessarily pays that cost.

Our quandary thus becomes: “Is 1—0.01µsec of call-time overhead reasonable or is this sufficiently embarrassing as to bring multigenerational shame upon our entire extended family tree, including that second cousin twice-removed who never sends a kitsch greeting card featuring Santa playing with mischievous kittens at Christmas time?”

We can answer that by first inspecting the theoretical maximum efficiency for a pure-Python decorator that performs minimal work by wrapping the decorated callable with a closure that just defers to the decorated callable. This excludes the identity decorator (i.e., decorator that merely returns the decorated callable unmodified), which doesn’t actually perform any work whatsoever. The fastest meaningful pure-Python decorator is thus:

def fastest_decorator(func):
    def fastest_wrapper(*args, **kwargs): return func(*args, **kwargs)
    return fastest_wrapper

By replacing @beartype with @fastest_decorator in your awesome snippet, we can expose the minimal cost of Python decoration:

$ python3.7 <<EOF
from timeit import timeit
def fastest_decorator(func):
    def fastest_wrapper(*args, **kwargs): return func(*args, **kwargs)
    return fastest_wrapper

@fastest_decorator
def main_decorated(arg01: str="__undefined__", arg02: int=0) -> tuple:
    """Proof of concept code implenting bear-typed args"""
    assert isinstance(arg01, str)
    assert isinstance(arg02, int)

    str_len = len(arg01) + arg02
    assert isinstance(str_len, int)
    return ("bear_bar", str_len,)

def main_undecorated(arg01="__undefined__", arg02=0):
    """Proof of concept code implenting duck-typed args"""
    assert isinstance(arg01, str)
    assert isinstance(arg02, int)

    str_len = len(arg01) + arg02
    assert isinstance(str_len, int)
    return ("duck_bar", str_len,)

if __name__=="__main__":
    num_loops = 100000

    decorated_result = timeit('main_decorated("foo", 1)', setup="from __main__ import main_decorated", number=num_loops)
    print("timeit decorated time:  ", round(decorated_result, 4), "seconds")    
                                   
    undecorated_result = timeit('main_undecorated("foo", 1)', setup="from __main__ import main_undecorated", number=num_loops)
    print("timeit undecorated time:", round(undecorated_result, 4), "seconds")    
EOF
timeit decorated time:   0.1185 seconds
timeit undecorated time: 0.0889 seconds

Again, plugging in n = 100000, γ = 0.0889s, and λ = 0.1185s from your excellent timings, we see that @fastest_decorator on average adds call-time overhead of 0.3µsec to each decorated call: e.g.,

Δ(100000, 0.0889s, 0.1185s) = 0.1185s/100000 - 0.0889s/100000
Δ(100000, 0.0889s, 0.1185s) = 2.959999999999998e-07s

Holy Balls of Flaming Dumpster Fires

Holy balls, people. I’m actually astounded myself.

Above, we saw that @beartype on average only adds call-time overhead of 0.388µsec to each decorated call. But 0.388µsec - 0.3µsec = 0.088µsec, so @beartype only adds 0.1µsec (generously rounding up) of additional call-time overhead above and beyond that necessarily added by the fastest possible Python decorator.

Not only is @beartype within the same order of magnitude as the fastest possible Python decorator, it’s effectively indistinguishable from the fastest possible Python decorator on a per-call basis.

Of course, even a negligible time delta accumulated over 10,000 function calls becomes slightly less negligible. Still, it’s pretty clear that @beartype remains the fastest possible runtime type-checker for now and all eternity. Amen.

But, but… That’s Not Good Enough!

Yeah. None of us are pleased with the performance of the official CPython interpreter anymore, are we? CPython is that geriatric old man down the street that everyone puts up with because they’ve seen Pixar’s Up! and he means well and he didn’t really mean to beat your equally geriatric 20-year-old tomcat with a cane last week. Really, that cat had it comin’.

If @beartype still isn’t ludicrously speedy enough for you under CPython, we also officially support PyPy3 – where you’re likely to extract even more ludicrous speed.

Does that fully satisfy your thirsty cravings for break-neck performance? If so, feel free to toggle that Close button. If not, I’d be happy to hash it out over a casual presentation of further timings, fake math, and Unicode abuse.

tl;dr

@beartype (and every other runtime type checker) will always be negligibly slower than hard-coded inlined runtime type-checking, thanks to the negligible (but surprisingly high) cost of Python function calls. Where this is unacceptable, PyPy3 is your code’s new BFFL.

0reactions

leyceccommented, Oct 13, 2021

You’ve gone above and beyond the bear call of duty. Now, I can only inundate you with my prophetic memes.

bad idea is bad

relevant meme hurts me

yah, you said something

Top Results From Across the Web

Beartype: Fast runtime type checking in Python - Hacker News

Results: Runs 1000x slower than not using it, but 200x faster than the competition! So only use this in correctness-critical paths, or use...

Python – Best way to check function arguments? - iTecNote

As this example suggests, bear typing explicitly supports type checking of parameters and return values annotated as either simple types or tuples of...

Is Python really 'too slow'? - Reddit

There's always 8 ways to do a given thing, often times with performance implications, and without fail the best option is the least...

This bear won't be gentle, or go away soon - Foster's Daily Democrat

Over the summer, assurances continued that only slower growth, but not a ... Bear-type mutual funds and "inverse" exchange-traded-funds are likely to again ......

Release develop Evan Hubinger - Coconut

How do I use a runtime type checker like beartype when Coconut seems to ... significantly slow down functions that use it, so...