How to force install C version ? Even slower than avro package on Linux CentOS 7.
See original GitHub issueI installed with pip install fastavro==1.0.0.post1
both on Mac and Linux. Have a simple test to write a same dict 1000 times, compare with avro and json.
start = time.time()
n = 1000
for _ in range(n):
with ContextStringIO() as f:
fastavro.schemaless_writer(f, fastavro_schema, row)
end = time.time()
print('fastavro', 1000 * (end - start))
start = time.time()
for _ in range(n):
writer = avro.io.DatumWriter(schema)
with ContextStringIO() as f:
writer.write(row, avro.io.BinaryEncoder(f))
end = time.time()
print('avro', 1000 * (end - start))
start = time.time()
for _ in range(n):
with ContextStringIO() as f:
f.write(json.dumps(row).encode())
end = time.time()
print('json', 1000 * (end - start))
Result on Mac:
fastavro 45.388221740722656
avro 327.98099517822266
json 12.218952178955078
fastavro is about 7 times fast than avro, which is very good.
on Linux:
fastavro 479.9647331237793
avro 462.5232219696045
json 13.955354690551758
fastavro is even slower than avro, and 10 times slower than Mac (while avro is only 1.5 times slower).
Tried to see the source code of schemaless_writer from IPython
Mac (use the C version):
In [21]: from fastavro import schemaless_writer
In [22]: schemaless_writer??
Docstring: <no docstring>
Type: builtin_function_or_method
Linux (use the pure Python version):
In [1]: from fastavro import schemaless_writer
In [2]: schemaless_writer??
Signature: schemaless_writer(fo, schema, record)
Source:
def schemaless_writer(fo, schema, record):
"""Write a single record without the schema or header information
Parameters
----------
fo: file-like
Output file
schema: dict
Schema
record: dict
Record to write
Example::
parsed_schema = fastavro.parse_schema(schema)
with open('file.avro', 'rb') as fp:
fastavro.schemaless_writer(fp, parsed_schema, record)
Note: The ``schemaless_writer`` can only write a single record.
"""
named_schemas = {}
schema = parse_schema(schema, _named_schemas=named_schemas)
encoder = BinaryEncoder(fo)
write_data(encoder, record, schema, named_schemas, "")
encoder.flush()
File: ~/.pyenv/versions/weibo/lib/python3.6/site-packages/fastavro/_write_py.py
Type: function
I also attempted to install via FASTAVRO_USE_CYTHON=1 pip install fastavro
, but didn’t work.
Issue Analytics
- State:
- Created 3 years ago
- Comments:10
Top Results From Across the Web
libpython3.6m.so.1.0 is missing in python3.6.1 #917 - GitHub
After doing: $ pyenv install 3.6.1 $ export PYENV_VERSION=3.6.1 ... Even slower than avro package on Linux CentOS 7. fastavro/fastavro#469.
Read more >Fixed Issues in Apache Avro | CDP Private Cloud
Review the list of Avro issues that are resolved in Cloudera Runtime 7.1.8.
Read more >Apache Avro Data Source Guide - Spark 3.3.1 Documentation
To load/save data in Avro format, you need to specify the data source option format as avro (or org.apache.spark.sql.avro ). Scala; Java; Python;...
Read more >Package List — Spack 0.20.0.dev0 documentation
This is a list of things you can install using Spack. It is automatically generated based on the packages in this Spack version....
Read more >Handling Avro files in Python - Perfectly Random
A future version of pip will drop support for Python 2.7. ... Even if you install the correct Avro package for your Python...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Yeah, I reinstalled the Python environment, and it worked. Thanks for helping me to address the problem!
I don’t use pyenv so I’m not exactly sure why that file is missing, but looking at https://github.com/pyenv/pyenv/issues/917 and https://github.com/pytorch/pytorch/issues/997 it seems like you might need to pass
--enable-shared
when you start the environment.