question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Speed up trim_zeros

See original GitHub issue
a = np.hstack([
    np.zeros((100_000,)),
    np.random.uniform(size=(100_000,)),
    np.zeros((100_000,)),
])
trim_zeros(a)

Here the call to trim_zeros takes about 50ms.

Looking at the implementation of trim_zeros, it is implemented in the most obvious and unoptimized way imaginable (a for loop looking at each item separately).

I think there should be a warning in the documentation about the fact that it’s entirely unoptimized and may be horrendously slow, or we should strive to improve performance.

As an implementation idea to improve performance, I prototyped a “block-wise” trim function to be used before trim_zeros:

def fast_trim_zeros(filt, trim='fb'):
    filt = trim_zeros_block(filt, trim)
    return np.trim_zeros(filt, trim)


def trim_zeros_block(filt, trim='fb', block_size=1024):
    """Trim blocks of zeros"""
    trim = trim.upper()
    first = 0
    if 'F' in trim:
        for i in range(0, len(filt), block_size):
            if np.any(filt[i:i+block_size] != 0.):
                first = i
                break
    last = len(filt)
    if 'B' in trim:
        for i in range(len(filt)-1, block_size - 1, -block_size):
            if np.any(filt[i-block_size:i] != 0.):
                last = i
                break
    return filt[first:last]

Speed of a call to fast_trim_zeros is about 2ms, so roughly 25x as fast.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:9 (8 by maintainers)

github_iconTop GitHub Comments

2reactions
BvB93commented, Jul 9, 2020

As an implementation idea to improve performance, I prototyped a “block-wise” trim function to be used before trim_zeros

How about converting the passed object into a boolean array and then use np.argmax() to find the first/last non-zero element? With your previously defined example array I’m seeing an increase in execution speed of ~2 orders of magnitude (398 µs versus 37 ms).

import numpy as np

def trim_zeros(filt, trim='fb'):
    a = np.asanyarray(filt, dtype=bool)
    if a.ndim != 1:
        raise ValueError('trim_zeros requires an array of exactly one dimension')

    trim_upper = trim.upper()
    len_a = len(a)
    i = j = None
    
    if 'F' in trim_upper:
        i = a.argmax()
        if not a[i]:  # i.e. all elements of `filt` evaluate to `False`
            return filt[len_a:]

    if 'B' in trim_upper:
        j = len_a - a[::-1].argmax()
        if not j:  # i.e. all elements of `filt` evaluate to `False`
            return filt[len_a:]

    return filt[i:j]
1reaction
BvB93commented, Jul 11, 2020

Shall I create a pull request with the implementation as proposed above?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Python - Fastest way to strip the trailing zeros from the bit ...
Is there a faster way? Perhaps another useful tidbit is num ^ (num - 1) which produces 0b1!<trailing zeros> where !< ...
Read more >
How to Remove Leading Zeros in Excel (5 Easy Ways)
And in case the number is smaller, leading zeros are added to make up for it. ... So, the first step is to...
Read more >
How do I trim leading zeros? - Trifacta Community
If I have a 00240 coming through in the data, but I need it to output as 240. Changing the data type to...
Read more >
Trim Leading Zeros Function - SQL Server Helper
Trim Leading Zeros Function · Replace each 0 with a space – REPLACE([CustomerKey], '0', ' ') · Use the LTRIM string function to...
Read more >
Remove Leading Zeros in Alteryx - Big Mountain Analytics
This post explains how to use the Trim function to remove varying numbers of leading zeros in Alteryx. Click the post to learn...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found