Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Trill multicast vs publish

See original GitHub issue

Hello,

I’d like to experiment with a Trill lib a little bit and was wondering what is the best approach for use case where there is a single ‘source stream’ for example trading data, like individual trades and many queries for such source stream, where first query would be Where that would narrow down trades to individual instruments. In a writing queries guide I’ve read about multicast that is necessary for such use case, but when looking at the source code I’ve also found about Publish . I’m not really familiar with rx so that is a little bit confusing for me when to use Multicast vs Publish , what would you suggest? Or perhaps it’s better to create separate streams instead of single one for each Where so then multicast is not necessary? Do you know any guidelines, lessons learned about that? I’d love to read more, but couldn’t find anything in docs.

I’ve also seen mentions about partitioning in the source code, is this only related to group operator or something that could also be useful for my use case?

I’ll be setting low batch size (< 5, maybe less), are there any settings that I could tweak for very near real-time queries to get best perf, sacrificing throughput, but getting lowest latency possible ?

Thanks a lot!

Issue Analytics

State:
Created 5 years ago
Comments:13 (7 by maintainers)

Top GitHub Comments

1reaction

cybertychecommented, Feb 21, 2019

The “partitioned” versions of the operators that you see in the code has to do with a feature called Partitioned Streams. If you ingress data as PartitionedStreamEvents or specify a partition lambda at ingress, you essentially turn this feature on.

What the feature does is allow Trill to handle multiple timelines - one per key - instead of a single one. Handling disorder, for instance, becomes a per-partition concern. Ordinarily, time in Trill is considered a global construct that is uniform across all data that is seen. With partitions, each individual partition is allowed to progress time individually. The downside is that you cannot then query across partitions; whatever query you specify is applied per-partition.

For example, consider a scenario where you have 10k sensors measuring temperature, and you want to find the maximum temperature per sensor per day. Without partitions, the time that each sensor’s data is measured against a single advancing timeline. The disorder policies are applied globally. That means that if 100 of those 10k sensors are lagging well behind, then you will either not see results until they have caught up or that lagging data will be either dropped or have their time adjusted.

However, given that the query is returning answers per-sensor, there is really no reason for one sensor’s data being behind or ahead to impact any other sensor’s data. That’s what partitions allow - each sensor will have its own timeline that is not impacted by any other.

1reaction

badrishccommented, Feb 20, 2019

Multicast would be the best fit when you have a source that you would like to use to feed to a fixed number (known a priori) of receiver sub-queries. The source is Subscribed to exactly once, Trill inress, batching, and/or columnarization occur exactly once, and the same data is fed to all the multicast subscribers. The Subscribe to source happens as soon as the required number of Subscribe operations are performed on the Multicast endpoint.

Publish is the dynamic version – you create a Publish endpoint that anyone can dynamically Subscribe to even runtime. The (single) Subscribe to upstream occurs when you call Connect on the endpoint. Any new subscribers after a connect simply receive the stream starting from that point forward. Note that because such a subscribe latches on to the stream mid-stream, the user needs to be careful not to use end edges, because then you could have an end edge without a corresponding start edge, which would be a malformed stream. For this reason, I would avoid Publish unless you know what you are doing.

A third option is to use neither, just call Subscribe on the source separately for each query. This results in multiple Subscribe calls being made to the source, which may be more expensive (as each Subscribe will have its own Trill ingress, batching, etc.), but in this case, the source becomes responsible for generating a correct stream to each subscriber independently.

Top Results From Across the Web

The Death of TRILL

But the hardware versus software battle has played out a little differently ... The general rule of thumb is that layer 2 broadcast...

01-TRILL configuration

When an RB receives a TRILL frame, it checks the V field and drops the frame ... 1—Multidestination frame (multicast, broadcast, or unknown...

Principles - CX11x, CX31x, CX710 (Earlier Than V6.03) , ...

On a TRILL network, an RB calculates a distribution tree for each VLAN based on the LSDB to guide the forwarding of multicast,...

RFC 7180 - Transparent Interconnection of Lots of Links ...

1. Known Unicast Origination When an overloaded RBridge RB2 ingresses or creates a known destination unicast TRILL Data frame, it delivers it locally...

RBridges and the IETF TRILL Protocol - YouTube

They support VLANs and optimization of the distribution of multi-destination frames based on VLAN and IP derived multicast groups.