Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Ingesting stream of entities in parallel

See original GitHub issue

First of all, thanks a lot for the great work! When saving entities, the PgBulkInsert and the BulkProcessor classes both have synchronized blocks. In particular, when saving a parallel stream with PgBulkInsert, the saveEntitySynchronized method seems to constitute a consumer bottleneck. Would it make any sense to have several threads/connection copying data into the database in parallel? If so, what would be the recommended way do that?

Issue Analytics

State:
Created 5 years ago
Comments:22 (22 by maintainers)

Top GitHub Comments

2reactions

bchapuiscommented, Jan 6, 2019

@bytefish sure, with pleasure, let me just clean it and make it a bit more generic. 😄

1reaction

bytefishcommented, Jan 7, 2019

Great job! I am currently also experimenting with importing large scale dataset in parallel with .NET: https://github.com/bytefish/GermanWeatherDataExample. It’s a little different in C#, but… I have the problem, that the database writes the data fast enough, but the CSV Reader / Mapping is too slow - no matter how much I optimized it. If I find a solution, that scales I will let you know.

Top Results From Across the Web

Ingesting stream of entities in parallel · Issue #33 - GitHub

The reason I want to delete the saveAll(PGConnection connection, Stream stream) is because a Parallel Stream doesn't give any benefit, when it comes...

parallel processing with infinite stream in Java - Stack Overflow

Stream. iterate returns 'an infinite sequential ordered Stream'. Therefore, making a sequential stream parallel is not too useful. According to ...

Alternating between Java streams and parallel streams at ...

StreamSupport.stream creates a new sequential or parallel Stream from a Spliterator (which in turn can be obtained from any Collection).

Self‐adaptation on parallel stream processing: A systematic ...

Self-adaptation can be broadly defined as the capability of the systems/environments to be autonomous, deciding and changing their behavior in ...

Parallelization of Structured Streaming Jobs Using Delta Lake

In conclusion, we will discuss an advanced topic on running a parallel streaming backfill job and the nuances in handling failure and recovery....