question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Hello!

I was benchmarking fastjsonschema against other Python implementations and at some point, fastjsonschema took too much memory (~30 GB) and I killed the process.

Schema: Official Swagger JSON Schema Instance: Kubernetes API definition from API.guru Interpreter: CPython 3.9.1 with platform.platform() as Linux-5.10.11-arch1-1-x86_64-with-glibc2.32

Other implementations (jsonschema and jsonschema-rs) didn’t demonstrate high memory usage on the same input.

Dependencies:

pip install fastjsonschema==2.14.5 requests==2.25.1 PyYAML==5.4.1

Code to reproduce:

import fastjsonschema
import requests
import yaml
import time

try:
    from yaml import CSafeLoader as Loader
except ImportError:
    from yaml import SafeLoader as Loader

SCHEMA_URL = "https://raw.githubusercontent.com/OAI/OpenAPI-Specification/master/schemas/v2.0/schema.json"
INSTANCE_URL = "https://raw.githubusercontent.com/APIs-guru/openapi-directory/master/APIs/kubernetes.io/v1.10.0/swagger.yaml"
ITERATIONS_NUMBER = 10

SCHEMA = requests.get(SCHEMA_URL).json()
INSTANCE = yaml.load(requests.get(INSTANCE_URL).content, Loader=Loader)

validate = fastjsonschema.compile(SCHEMA)

for _ in range(ITERATIONS_NUMBER):
    start = time.time()
    validate(INSTANCE)
    print(f"Iteration time: {time.time() - start}")

Output:

Iteration time: 0.1041560173034668
Iteration time: 0.12305545806884766
Iteration time: 0.16190171241760254
Iteration time: 0.2381150722503662
Iteration time: 0.438570499420166
Iteration time: 0.8084230422973633
Iteration time: 1.4931752681732178
Iteration time: 2.8855996131896973
Iteration time: 5.7769176959991455
Iteration time: 11.246583461761475

Here is the memory usage plotted with memory-profiler==0.58.0

fastjsonschema-memory-usage

For comparison, this is the memory usage of jsonschema==3.2.0:

Code is almost the same (the validator is compiled):

import jsonschema
import requests
import yaml

try:
    from yaml import CSafeLoader as Loader
except ImportError:
    from yaml import SafeLoader as Loader

SCHEMA_URL = "https://raw.githubusercontent.com/OAI/OpenAPI-Specification/master/schemas/v2.0/schema.json"
INSTANCE_URL = "https://raw.githubusercontent.com/APIs-guru/openapi-directory/master/APIs/kubernetes.io/v1.10.0/swagger.yaml"
ITERATIONS_NUMBER = 10

SCHEMA = requests.get(SCHEMA_URL).json()
INSTANCE = yaml.load(requests.get(INSTANCE_URL).content, Loader=Loader)

validate = jsonschema.validators.validator_for(SCHEMA)(SCHEMA)

for _ in range(ITERATIONS_NUMBER):
    validate.validate(INSTANCE)

jsonschema-memory-usage

Unfortunately, I didn’t track down what exactly causes the leak (my assumption is that validating the same instance should not linearly increase the memory usage)

Cheers

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
horejsekcommented, Feb 1, 2021

I just pushed new version with use_default option. Default is still True to be compatible, but you can fall back to set it to False.

1reaction
horejsekcommented, Feb 1, 2021

For now, you could recursively iterate the schema and remove all defaults keys from third party schema. Will make for #65 higher priority.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Memory leak - Wikipedia
In computer science, a memory leak is a type of resource leak that occurs when a computer program incorrectly manages memory allocations in...
Read more >
What is Memory Leak? How can we avoid? - GeeksforGeeks
Memory leak occurs when programmers create a memory in heap and forget to delete it. The consequences of memory leak is that it...
Read more >
Definition of memory leak - PCMag
When memory is allocated, but not deallocated, a memory leak occurs (the memory has leaked out of the computer). If too many memory...
Read more >
Memory Leaks and Garbage Collection | Computerworld
DEFINITION A memory leak is the gradual deterioration of system performance that occurs over time as the result of the fragmentation of a...
Read more >
Find a memory leak - Windows drivers - Microsoft Learn
A memory leak occurs when a process allocates memory from the paged or nonpaged pools, but doesn't free the memory.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found