Improved AsyncParser API
See original GitHub issueVersion
4.6.0-SNAPSHOT
Issues
- Closing the iterator’s returned by AsyncParser does not abort the parsing process. In fact, repeatedly abandoning iterators will cause parsing threads to silently pile up.
- AsyncParser’s default chunk size of 100K tuples introduces a long delay unsuitable for content probing. Though the 100k chunk size gives me 5-10% better throughput than significantly lower values.
- The EltStreamRDF is private. As mentioned in JENA-2309 those events would be useful in an hadoop/spark setting to scan for prefixes, thereby stopping the parser once only data is seen anymore.
Improvements
PR https://github.com/apache/jena/pull/1478 adds the following improvements:
- Changed AsyncParser API to return IteratorCloseables whose close() method actually cancels parsing.
- Added a public EvtStreamRDF interface for the parsing events. The existing private EltStreamRDF class remains as the internal data object. The naming is up for discussion 😃
- Added a Builder that gives control over chunk and queue sizes. The builder can create “low-level” IteratorCloseables as well as java Streams. The latter allows for convenient use with try-with-resources. The following snippet is from a new test case:
try (Stream<Triple> s = AsyncParser.of(Channels.newInputStream(channel), Lang.TURTLE, null)
.setChunkSize(100).streamTriples().limit(expectedLimit)) {
// ...
}
- If a parser fails then all remaining parsers are still started with a destination in ‘aborted state’ in order for them to close their resources.
Are you interested in contributing a solution yourself?
Yes
Issue Analytics
- State:
- Created a year ago
- Comments:8 (8 by maintainers)
Top Results From Across the Web
AsyncAPI parser for Javascript (browser-compatible too).
Use this package to parse and validate AsyncAPI documents —either YAML or JSON— in your Node.js or browser application. Updated bundle for the...
Read more >Building an Asynchronous API to Improve Performance
In the pursuit of even better response times for our customers, we've built an asynchronous indexing API. Our goals in creating the new...
Read more >AsyncAPI Initiative for event-driven APIs
Why AsyncAPI? Improving the current state of Event-Driven Architectures (EDA). Specification. Allows you to define the interfaces of asynchronous ...
Read more >5 Ways to Make HTTP Requests in Node.js using Async/Await
The following code will send a GET request to NASA's API and print out the URL for ... You also need to parse...
Read more >CSV Parse - Async iterator API
The Async iterator API is both scalable and elegant. It takes advantage of the native Readable Stream API upon which the parser is...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I updated the tests. I just started running the windows and mac tests on my github account and I am now waiting for the results.
Update: Both green for me 🎉
Yes - probably a good idea to reduce the number. Many threads should not be a problem though they may go away in a very patchy manner. But if the test is good when using 10-20, it makes the suite a little bit more general.