does apache/avro perform better on pypy?
See original GitHub issueI performed some tests to check the performance of fastavro
and apache/avro
between pypy
and cpython
. And here is the summary of the results. Hope that the contributors can confirm if this aligns with their expectation
AVRO SCHEMA
BIG_SCHEMA_OLD = {
"type": "record",
"name": "test_record",
"fields": [
{
"type": ["null", "int"],
"name": "union_field"
},
{
"type": ["null", "int"],
"name": "union_field_null",
"default": None
},
{
"type": ["null", "int"],
"name": "union_field_101",
"default": 101
},
{
"type": "boolean",
"name": "bool_field"
},
{
"type": "boolean",
"name": "bool_field_F",
"default": False
},
{
"type": "string",
"name": "string_field"
},
{
"type": "string",
"name": "string_field_foo",
"default": "foo❤"
},
{
"type": "bytes",
"name": "bytes_field"
},
{
"type": "bytes",
"name": "bytes_field_bar",
"default": "bar"
},
{
"type": "int",
"name": "int_field"
},
{
"type": "int",
"name": "int_field_1",
"default": 1
},
{
"type": "long",
"name": "long_field"
},
{
"type": "long",
"name": "long_field_42",
"default": 42
},
{
"type": "float",
"name": "float_field"
},
{
"type": "float",
"name": "float_field_p75",
"default": 0.75
},
{
"type": "double",
"name": "double_field"
},
{
"type": "double",
"name": "double_field_pi",
"default": 3.14
}
]
}
Number of repetitions 100000
reader/writer type used
- I am using
schemaless_reader
andschemaless_writer
for these tests. - providing a
reader_schema
toschemaless_reader
has significantly poorer performance
benchmark on pypy
avro_reader
andavro_writer
corresponding toapache/avro
fastavro_reader
andfastavro_writer
corresponding tpschemaless_reader
andschemaless_writer
offastavro
- unit is
seconds
for total number of repitations
with reader_schema
in schemaless_reader
{
"avro_reader": 1.5274810791015625,
"avro_writer": 1.061816930770874,
"fastavro_reader": 8.852604866027832,
"fastavro_writer": 2.948662042617798
}
without reader_schema
in schemaless_reader
{
"avro_reader": 1.5450429916381836,
"avro_writer": 1.0277588367462158,
"fastavro_reader": 1.7646219730377197,
"fastavro_writer": 2.8703200817108154
}
benchmark on py27
with reader_schema
in schemaless_reader
{
"avro_reader": 15.267684936523438,
"avro_writer": 15.52902102470398,
"fastavro_reader": 13.891213178634644,
"fastavro_writer": 5.328428030014038
}
without reader_schema
in schemaless_reader
{
"avro_reader": 15.217741966247559,
"avro_writer": 15.30265998840332,
"fastavro_reader": 4.296072006225586,
"fastavro_writer": 5.25685715675354
}
2 things stood out to me
- use of reader_schema is causing performance regressions, can we do something there or is that a prenulty we have to pay?
- i would expect fastavro to perform better in
pypy
againstapache/avro
.
happy to hear from others.
we should defniately add some regression tests to fastavro
to catch performance related issues between releases.
Issue Analytics
- State:
- Created 5 years ago
- Comments:11 (2 by maintainers)
Top Results From Across the Web
Benchmarking avro and fastavro using pytest ... - Medium
general performance of pypy3 is much better than all other python interpreters. avro is much faster than fastavro on pypy , this is...
Read more >avro - PyPI
Apache Avro ™ is a data serialization system. To learn more, please visit our website. Documentation. Apache Avro documentation is maintained on our...
Read more >Performance | PyPy
(This is also good modularity practice). The cost of CPython global references is high enough that, for example, if you have code in...
Read more >Getting Started (Python) - Apache Avro
This is a short guide for getting started with Apache Avro™ using Python. ... few minor difference (e.g., function name capitalization, such as...
Read more >Re: [Vote] Re: Proposal: Official Python Version ... - The Mail Archive
Support for python 2.x is now removed from pip so time to move on our side too! ... a vote on Apache Avro's...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Thanks for all of the results. I don’t run with
pypy
normally so I’m afraid I can’t really say why the pypy2 performance is worse than the standard avro library (and why in pypy3 this flips). Patches to improve the pypy performance would be welcome, but I’m probably not going to be able to contribute them.good point, i updated the benchmark code so that writer/reader for apache avro are now part of benchmark. surpricingly this din’t change the benchmarking results.
I also fixed the number of rounds and iteration for pytest-benchmark which now give more reproducible runs