Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Inquiry: SNB Basic + SNB Composite Merge Foreign?

See original GitHub issue

Hi again 😄

In an experiment to support our database’s multi-model functionality, we are trying to include the edges generated from the SNB Basic dataset, with the files generated from the SNB CompositeMergeForeign dataset.

~We are getting inconsistent results~, and wondered if there is any consideration of supporting this with the datagen, or if by any chance this is already possible?

For example, we want a data model where both the post_hasCreator_person relationship and creator attribute in the Post document exist.

Happy to move this conversation to the datagen repo if that makes more sense

Issue Analytics

  • State:open
  • Created a year ago
  • Reactions:1
  • Comments:24 (12 by maintainers)

github_iconTop GitHub Comments

szarnyasgcommented, Jun 21, 2022

@aMahanna I transferred the issue to the (new, Spark-based) Datagen’s repository. I skimmed your suggestion and it seems doable in Datagen albeit it will not have a high priority in our development plans.

This week I’m travelling/have other duties – I will take a look next week.

aMahannacommented, Jun 21, 2022

Hi Gabor,

We’ve been evaluating the various SNB datasets available in attempt to support our database’s multi-model functionality.

We found that using a combination of the Basic & MergeForeign datasets substantially increases our query performance and better suits our data model. Our request would be to have the datagen natively support the data model outlined below, or suggest a way to do so if it already exists. As it stands now, modelling the data in this way requires a lot of pre/post processing (as suggested above), which we believe will count against us if we were to have the benchmark audited.

In particular, we have situations where a query benefits from the Basic dataset (IC8), a query that benefits from the MergeForeign dataset (IC3 Sub-Query A), and another query that benefits from a combination of both (IC3 Sub-Query B).


Understanding that you may not be familiar with AQL (Arango Query Language), this query relies on the edge relationships only available in the Basic dataset (e.g post_hasCreator_person, comment_hasCreator_person, etc.).

FOR commentReply IN 2..2 INBOUND @personId post_hasCreator_person, comment_hasCreator_person, comment_replyOf_post, comment_replyOf_comment
    SORT commentReply.creationDate DESC, commentReply._id
    LIMIT 20
    FOR creator IN 1..1 OUTBOUND commentReply comment_hasCreator_person
        RETURN {
            id: creator._id,
            firstName: creator.firstName,
            lastName: creator.lastName,
            commentId: commentReply._id,
            commentCreationDate: commentReply.creationDate,
            commentContent: commentReply.content

The alternative approach is to solely rely on the MergeForeign attributes (i.e creator, replyOfPost, replyOfComment). Seeing that none of the edge relationships mentioned above are included in MergeForeign, switching to these attributes would result in a query performance that is 6x slower than the current implementation. On the other hand, sticking to a Basic-only data model poses its own challenges, as seen below.


We’ve noticed peak performance in IC3 when a combination of Basic SNB edge relationships & MergeForeign SNB attributes are used within the same query.

IC3 Sub-Query A

A portion of IC3 relies on the MergeForeign attribute for efficient query performance.

FOR friend IN 1..2 ANY @personId person_knows_person OPTIONS {bfs: true, uniqueVertices:"global"}
    FILTER NOT IN [countryXKey, countryYKey]
    RETURN {id:, place:}

Attempting to do this using the Basic SNB person_isLocatedIn_place edge relationship results in a query performance that is 70x slower.

IC3 Sub-Query B

Another portion of IC3 relies on the and the MergeForeign attributes, while also benefitting from the post_hasCreator_person and comment_hasCreator_person relationships (found only in the Basic SNB dataset).

FOR message IN 1..1 INBOUND friend post_hasCreator_person,comment_hasCreator_person
    FILTER IN [countryXKey, countryYKey]
    RETURN message

Attempting to do this using the Basic SNB post_isLocatedIn_place & comment_isLocatedIn_place edge relationships results in a query performance that is 30x slower.


As far as we can tell, the current datagen utility doesn’t support this, and so we feel that this leaves out the multi-model graph capabilities offered by our database. We are not looking to manipulate the data in a way that specifically favours us, but instead looking for the LDBC datagen to better support the functionality of multi-model graph databases.

Would it be possible to have the datagen support this data model out of the box (assuming it doesn’t already)?

Read more comments on GitHub >

github_iconTop Results From Across the Web

LDBC SNB Data Converter - GitHub
Scripts to convert from raw graphs produced by the SNB Datagen to graph data sets using various layouts (e.g. storing edges as merged...
Read more >
ldbc-snb-specification.pdf - Linked Data Benchmark Council
LDBC's Social Network Benchmark (LDBC SNB) is an effort intended to test various functionalities of systems used for graph-like data management.
Read more >
SNB urges Credit Suisse to halt payouts, bolster capital
The Swiss National Bank (SNB) urged Credit Suisse to improve its capital by halting dividends or issuing shares to raise cash to shield...
Read more >
SnB version 2.2: an example of crystallographic multiprocessing
The computer program SnB implements a direct-methods algorithm, known as. Shake-and-Bake, which optimizes trial structures consisting of randomly.
Read more >
PUBLIC VOLUME Application to Merge GulfShore Bank with and ...
SNB's main office is in Stuart, Florida,and it operates 46 additional banking ... or Target Institution that has received a CRA composite rating...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Post

No results found

github_iconTop Related Hashnode Post

No results found