question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Improved AsyncParser API

See original GitHub issue

Version

4.6.0-SNAPSHOT

Issues

  • Closing the iterator’s returned by AsyncParser does not abort the parsing process. In fact, repeatedly abandoning iterators will cause parsing threads to silently pile up.
  • AsyncParser’s default chunk size of 100K tuples introduces a long delay unsuitable for content probing. Though the 100k chunk size gives me 5-10% better throughput than significantly lower values.
  • The EltStreamRDF is private. As mentioned in JENA-2309 those events would be useful in an hadoop/spark setting to scan for prefixes, thereby stopping the parser once only data is seen anymore.

Improvements

PR https://github.com/apache/jena/pull/1478 adds the following improvements:

  • Changed AsyncParser API to return IteratorCloseables whose close() method actually cancels parsing.
  • Added a public EvtStreamRDF interface for the parsing events. The existing private EltStreamRDF class remains as the internal data object. The naming is up for discussion 😃
  • Added a Builder that gives control over chunk and queue sizes. The builder can create “low-level” IteratorCloseables as well as java Streams. The latter allows for convenient use with try-with-resources. The following snippet is from a new test case:
try (Stream<Triple> s = AsyncParser.of(Channels.newInputStream(channel), Lang.TURTLE, null)
        .setChunkSize(100).streamTriples().limit(expectedLimit)) {
    // ...
}
  • If a parser fails then all remaining parsers are still started with a destination in ‘aborted state’ in order for them to close their resources.

Are you interested in contributing a solution yourself?

Yes

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:8 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
Aklakancommented, Sep 19, 2022

I updated the tests. I just started running the windows and mac tests on my github account and I am now waiting for the results.

Update: Both green for me 🎉

0reactions
afscommented, Sep 19, 2022

There are a couple of test cases that spawn many (100) threads and subsequently test that all have terminated. This might be a problem in github CI pipelines which AFAIK only use 2 cores - it would be possible to decrease the number e.g. to 20 or 10.

Yes - probably a good idea to reduce the number. Many threads should not be a problem though they may go away in a very patchy manner. But if the test is good when using 10-20, it makes the suite a little bit more general.

Read more comments on GitHub >

github_iconTop Results From Across the Web

AsyncAPI parser for Javascript (browser-compatible too).
Use this package to parse and validate AsyncAPI documents —either YAML or JSON— in your Node.js or browser application. Updated bundle for the...
Read more >
Building an Asynchronous API to Improve Performance
In the pursuit of even better response times for our customers, we've built an asynchronous indexing API. Our goals in creating the new...
Read more >
AsyncAPI Initiative for event-driven APIs
Why AsyncAPI? Improving the current state of Event-Driven Architectures (EDA). Specification. Allows you to define the interfaces of asynchronous ...
Read more >
5 Ways to Make HTTP Requests in Node.js using Async/Await
The following code will send a GET request to NASA's API and print out the URL for ... You also need to parse...
Read more >
CSV Parse - Async iterator API
The Async iterator API is both scalable and elegant. It takes advantage of the native Readable Stream API upon which the parser is...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found