Investigate vizceral output
See original GitHub issueVizceral is a streaming aggregation tool similar to the service dependency graph we have now (but more pretty and powerful).
https://github.com/Netflix/vizceral
There’ve been a few look into this. I played around with it a bit, toying with a custom version of our dependency graph linker or using jq to translate stored traces. I also thought maybe this could be done online with a custom kafka or elasticsearch 5 pipeline. Or maybe something in-between, like a 1 minute interval hack of our spark job. There’s also rumors of a spark alternative to normal zipkin collector (ahem @mansu ahem 😃 )
Here is a summary of notes last time this came up, when I chatted with @tramchamploo
so just to play with things you could use dependency linker like the zipkin-dependencies job does, except windowed into minute, not days. Use the zipkin api and GET /api/v1/traces
with a timestamp and lookback of your choosing (ex 1 minute). With a custom linker, you can emit vizceral data directly or into a new index for the experiment, like zipkin-vizceral-yyyy-MM-dd-HHmm. In other words, it is like the existing spark job, but writing vizceral format and much more frequently.
To dig deeper, you’d want to some “partition” vs a “grouping” command like a groupBy, in order to group the traces into minutes… so like before this flatMap here: https://github.com/openzipkin/zipkin-dependencies/blob/master/elasticsearch/src/main/java/zipkin/dependencies/elasticsearch/ElasticsearchDependenciesJob.java#L116 This would be the thing that buckets traces into epoch minutes.
In order to get the service relationships, you need to walk the trace tree. To generate the tree you need to merge multiple documents (which consitute a trace), to tell which pieces are a client or server call. This is what the DependencyLinker does.
So basically, by bucketing offline data into 1 minute intervals (based on the root span’s timestamp), you can get pretty good feedback. It will be mostly correct as traces are less duration than a minute. By using the api and a variation of our linker, you’d get a good head start which can of course be refactored later if/when a real-time ingestion pipeline exists.
Issue Analytics
- State:
- Created 7 years ago
- Comments:10 (2 by maintainers)
Top GitHub Comments
Sorry for the delay guys. We were busy with open sourcing our (Pinterest) Spark collector for Zipkin and couldn’t share much here.
As part of a hackathon, we were able to modify this spark collector to build service dependency graph on the streaming data and push the graph to Vizceral, in real-time. I’ll clean up the code and share it here, hopefully in next couple of weeks.
This issue was moved to openzipkin/zipkin-spark-streaming#1