question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Performance Optimizations

See original GitHub issue
  • Faker version: Master branch

  • OS: Mac OS

I am a Performance Engineer at Salesforce.org and I use Faker within my tool Snowfakery to generate hundreds of millions of rows of data. Some teams within my company have avoided Faker because they say it is too slow.

I’m a huge fan of Faker, but it is true that despite the complexity of Snowfakery overall (including writing to a SQL database), Faker is often the bottleneck. Luckily, it seems easy to fix. I investigated and found some very quick wins in terms of performance.

If the person creating the Faker() object could have more influence on the random_element method, they could get a gigantic speedup. I understand that some people in some cases need the sophisticated distribution features of the underlying Faker library, but my users do not, so I wish I could turn them off.

Steps to reproduce

from unittest.mock import patch
import random
from collections import OrderedDict

import faker
import timeit

def bench():
    f = faker.Faker()

    print(timeit.timeit(lambda:f.first_name(), number=100000))
        

def fast_random_element(self, choices):
    if isinstance(choices, OrderedDict):
        return random.choice(tuple(choices.keys()))
    else:
        return random.choice(choices)


def fast_random_element_2(self, choices):
    if isinstance(choices, OrderedDict):
        if not hasattr(choices, "_cached_choice_list"):
            setattr(choices, "_cached_choice_list", tuple(choices.keys()))
        choices = choices._cached_choice_list

    return random.choice(choices)

def all_bench():
    print("Warmup")
    bench()
    print("Normal")
    bench()
    with patch("faker.providers.BaseProvider.random_element", fast_random_element):
        from faker import providers
        print("Simple optimization - No caching")
        bench()

    with patch("faker.providers.BaseProvider.random_element", fast_random_element_2):
        print("With caching")
        bench()
    print("Baseline again")
    bench()

all_bench()

Expected behaviour

Faker should be roughly as fast for simple element choices as Python itself.

Actual behavior

These times are in seconds:

Faker
6.72312176
Simpler optimization
1.888246217999999
With caching
0.3221195019999996

I am willing to submit a PR for this but I might need some guidance about where you would want to store the “weighted or fast” flag. Perhaps pass it from the Faker() constructor to the Provider objects?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:11 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
fcurellacommented, Nov 30, 2020

I’m thinking of a case that although there’s no need to have a realistic distribution but at the same time would like to avoid rare entities to occur a lot in the data set

@prescod That sounds too magical. I’d rather keep it simple and shift the responsibility to the user.

1reaction
prescodcommented, Nov 27, 2020

I just ran “pytest” without tox.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Performance Optimization - an overview | ScienceDirect Topics
Performance optimization, also known as “performance tuning”, is usually an iterative approach to making and then monitoring modifications to an application ...
Read more >
Performance Optimization in Software Development - Medium
Performance Optimization of a programs and software is the process modifying a software system to make it work more efficiently and execute more...
Read more >
Mastering Performance Optimization - The Basic Metrics And ...
Performance optimization is the process of modifying a system to amplify its functionality, thus making it more efficient and effective.
Read more >
6 Performance Optimization
The single most important aspect of performance optimization is knowing what to optimize. To improve the performance of your application, you must fully ......
Read more >
18 Tips for Website Performance Optimization - KeyCDN
Website performance optimization tips# · 1. Image optimization# · 2. Reduce HTTP requests# · 3. Minify CSS and JavaScript# · 4. Critical path...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found