question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add low-level read and skip APIs

See original GitHub issue

The official Avro Python library supports low-level APIs for reading and skipping individual fields in an Avro-encoded byte stream. For example, it has functions like read_enum, skip_enum, skip_null, read_null, and so on. I don’t think that fastavro has anything like this, so this issue is a feature request to add those functions.

I have a use case for this. I am scanning through and indexing very large amounts of astronomical data which have a large and complicated schema. Out of the ~180 fields, I actually only care about 5 of them for my purposes. An encoded message might be about 100KB, and I’m interested in about 20 or 30 of those bytes.

Parsing the entire schema document with avro-python takes a while - about 10ms. But by using skip methods, and only reading the fields I care about, I’m able to get that way down to about 0.3ms. This is a substantial difference for me!

Now, using fastavro to read the entire object takes about 0.8ms, which is quite good - but it’s outperformed by my hand-tuned code which skips fields using ~slowavro~ the official Avro library by about 2x.

In alignment with fastavro’s goals, then, I think adding support for low-level skip and read APIs would be helpful. In really performance-critical or extreme circumstances, it makes it possible to write much faster code, which occasionally is needed.

What do you think about adding skip methods, and supporting the low-level read functions which are already written in a public API?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:11 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
scottbeldencommented, Jan 29, 2021

@spenczar I put together what I think should be an initial working implementation. I haven’t spent any time to do profiling or performance testing yet, but I’m hoping it should be pretty easy for you to re-run your test with the new changes.

Wheels are here: Windows - https://github.com/fastavro/fastavro/actions/runs/521802879 Linux - https://github.com/fastavro/fastavro/actions/runs/521802875 OSX - https://github.com/fastavro/fastavro/actions/runs/521802874

As you can see here, the test cases that do the manual calls to skip_* show that it would be painful for users to have to do that themselves and the fact that the interfaces are different depending on whether or not the cython modules are being used is less than ideal. So I’m hoping that the simple approach of using the subschema reader will be sufficient.

1reaction
scottbeldencommented, Jan 28, 2021

All that being said, I can take a stab at putting in the skip_ functions over the weekend and then when the CI builds and wheels are available to install, perhaps you could see if you can add it to the timing analysis.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Understanding video chat APIs: low-level APIs and new ...
If you're already familiar with API design, you can skip right to that blog post: “Choosing a video chat API: 5 key considerations.”...
Read more >
Error: Upgrade expected but low level API in use · Issue #1752
I tried to implement current thread TLS websocket server based on example code: ... let tls_cx = tokio_tls::TlsAcceptor::from(tls_cx); ...
Read more >
Low-Level Concurrency APIs - objc.io
In this article we'll talk about some low-level APIs available on both iOS ... If you're doing anything with concurrent programming, you need...
Read more >
COM+ 1.5: Discover Powerful Low-Level Programming in ...
Designed for advanced COM+ developers who understand the COM+ context model, these APIs make it easy to integrate runtime services with code in ......
Read more >
APIs for Beginners - How to use an API (Full Course / Tutorial)
What is an API ? Learn all about APIs (Application Programming Interfaces) in this full tutorial for beginners. You will learn what APIs...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found