Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Performance Optimizations

See original GitHub issue

Faker version: Master branch
OS: Mac OS

I am a Performance Engineer at Salesforce.org and I use Faker within my tool Snowfakery to generate hundreds of millions of rows of data. Some teams within my company have avoided Faker because they say it is too slow.

I’m a huge fan of Faker, but it is true that despite the complexity of Snowfakery overall (including writing to a SQL database), Faker is often the bottleneck. Luckily, it seems easy to fix. I investigated and found some very quick wins in terms of performance.

If the person creating the Faker() object could have more influence on the random_element method, they could get a gigantic speedup. I understand that some people in some cases need the sophisticated distribution features of the underlying Faker library, but my users do not, so I wish I could turn them off.

Steps to reproduce

from unittest.mock import patch
import random
from collections import OrderedDict

import faker
import timeit

def bench():
    f = faker.Faker()

    print(timeit.timeit(lambda:f.first_name(), number=100000))
        

def fast_random_element(self, choices):
    if isinstance(choices, OrderedDict):
        return random.choice(tuple(choices.keys()))
    else:
        return random.choice(choices)


def fast_random_element_2(self, choices):
    if isinstance(choices, OrderedDict):
        if not hasattr(choices, "_cached_choice_list"):
            setattr(choices, "_cached_choice_list", tuple(choices.keys()))
        choices = choices._cached_choice_list

    return random.choice(choices)

def all_bench():
    print("Warmup")
    bench()
    print("Normal")
    bench()
    with patch("faker.providers.BaseProvider.random_element", fast_random_element):
        from faker import providers
        print("Simple optimization - No caching")
        bench()

    with patch("faker.providers.BaseProvider.random_element", fast_random_element_2):
        print("With caching")
        bench()
    print("Baseline again")
    bench()

all_bench()

Expected behaviour

Faker should be roughly as fast for simple element choices as Python itself.

Actual behavior

These times are in seconds:

Faker
6.72312176
Simpler optimization
1.888246217999999
With caching
0.3221195019999996

I am willing to submit a PR for this but I might need some guidance about where you would want to store the “weighted or fast” flag. Perhaps pass it from the Faker() constructor to the Provider objects?

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:11 (8 by maintainers)

Top GitHub Comments

1reaction

fcurellacommented, Nov 30, 2020

I’m thinking of a case that although there’s no need to have a realistic distribution but at the same time would like to avoid rare entities to occur a lot in the data set

@prescod That sounds too magical. I’d rather keep it simple and shift the responsibility to the user.

1reaction

prescodcommented, Nov 27, 2020

I just ran “pytest” without tox.

Top Results From Across the Web

Performance Optimization - an overview | ScienceDirect Topics

Performance optimization, also known as “performance tuning”, is usually an iterative approach to making and then monitoring modifications to an application ...

Performance Optimization in Software Development - Medium

Performance Optimization of a programs and software is the process modifying a software system to make it work more efficiently and execute more...

Mastering Performance Optimization - The Basic Metrics And ...

Performance optimization is the process of modifying a system to amplify its functionality, thus making it more efficient and effective.

6 Performance Optimization

The single most important aspect of performance optimization is knowing what to optimize. To improve the performance of your application, you must fully ......

18 Tips for Website Performance Optimization - KeyCDN

Website performance optimization tips# · 1. Image optimization# · 2. Reduce HTTP requests# · 3. Minify CSS and JavaScript# · 4. Critical path...