question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to force install C version ? Even slower than avro package on Linux CentOS 7.

See original GitHub issue

I installed with pip install fastavro==1.0.0.post1 both on Mac and Linux. Have a simple test to write a same dict 1000 times, compare with avro and json.

start = time.time()
n = 1000
for _ in range(n):
    with ContextStringIO() as f:
        fastavro.schemaless_writer(f, fastavro_schema, row)
end = time.time()
print('fastavro', 1000 * (end - start))

start = time.time()
for _ in range(n):
    writer = avro.io.DatumWriter(schema)
    with ContextStringIO() as f:
        writer.write(row, avro.io.BinaryEncoder(f))
end = time.time()
print('avro', 1000 * (end - start))

start = time.time()
for _ in range(n):
    with ContextStringIO() as f:
        f.write(json.dumps(row).encode())
end = time.time()
print('json', 1000 * (end - start))

Result on Mac:

fastavro 45.388221740722656
avro 327.98099517822266
json 12.218952178955078

fastavro is about 7 times fast than avro, which is very good.

on Linux:

fastavro 479.9647331237793
avro 462.5232219696045
json 13.955354690551758

fastavro is even slower than avro, and 10 times slower than Mac (while avro is only 1.5 times slower).

Tried to see the source code of schemaless_writer from IPython

Mac (use the C version):

In [21]: from fastavro import schemaless_writer

In [22]: schemaless_writer??
Docstring: <no docstring>
Type:      builtin_function_or_method

Linux (use the pure Python version):

In [1]: from fastavro import schemaless_writer

In [2]: schemaless_writer??
Signature: schemaless_writer(fo, schema, record)
Source:
def schemaless_writer(fo, schema, record):
    """Write a single record without the schema or header information

    Parameters
    ----------
    fo: file-like
        Output file
    schema: dict
        Schema
    record: dict
        Record to write


    Example::

        parsed_schema = fastavro.parse_schema(schema)
        with open('file.avro', 'rb') as fp:
            fastavro.schemaless_writer(fp, parsed_schema, record)

    Note: The ``schemaless_writer`` can only write a single record.
    """
    named_schemas = {}
    schema = parse_schema(schema, _named_schemas=named_schemas)

    encoder = BinaryEncoder(fo)
    write_data(encoder, record, schema, named_schemas, "")
    encoder.flush()
File:      ~/.pyenv/versions/weibo/lib/python3.6/site-packages/fastavro/_write_py.py
Type:      function

I also attempted to install via FASTAVRO_USE_CYTHON=1 pip install fastavro, but didn’t work.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:10

github_iconTop GitHub Comments

1reaction
DeanThompsoncommented, Sep 8, 2020

Yeah, I reinstalled the Python environment, and it worked. Thanks for helping me to address the problem!

0reactions
scottbeldencommented, Sep 8, 2020

I don’t use pyenv so I’m not exactly sure why that file is missing, but looking at https://github.com/pyenv/pyenv/issues/917 and https://github.com/pytorch/pytorch/issues/997 it seems like you might need to pass --enable-shared when you start the environment.

Read more comments on GitHub >

github_iconTop Results From Across the Web

libpython3.6m.so.1.0 is missing in python3.6.1 #917 - GitHub
After doing: $ pyenv install 3.6.1 $ export PYENV_VERSION=3.6.1 ... Even slower than avro package on Linux CentOS 7. fastavro/fastavro#469.
Read more >
Fixed Issues in Apache Avro | CDP Private Cloud
Review the list of Avro issues that are resolved in Cloudera Runtime 7.1.8.
Read more >
Apache Avro Data Source Guide - Spark 3.3.1 Documentation
To load/save data in Avro format, you need to specify the data source option format as avro (or org.apache.spark.sql.avro ). Scala; Java; Python;...
Read more >
Package List — Spack 0.20.0.dev0 documentation
This is a list of things you can install using Spack. It is automatically generated based on the packages in this Spack version....
Read more >
Handling Avro files in Python - Perfectly Random
A future version of pip will drop support for Python 2.7. ... Even if you install the correct Avro package for your Python...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found