question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RFC/Feature: Nebula Graph as Backend Storage

See original GitHub issue

Similar to https://github.com/amundsen-io/amundsen/issues/526, just to add backend support of Nebula Graph, an Open Source, distributed Graph Database stands out as it’s a Linear Scalable, Cloud Native, Open Source(Apache 2.0) GDB, and it speaks OpenCypher and nGQL.

Background

Based on my observation, Nebula Graph is beloved/and adopted as the Graph Infra by many teams in the community due to its excellent OLTP capability for huge data volumes, while, they had independently created their own wheels of metadata service/lineage system on their own on top of Nebula.

Knowing their entropy increasing efforts on modeling the metadata, writing hooks for different data sources to wire everything up, etc(i.e. some maintained their own Giant fork of Apache Atlas with Nebula Graph as backend and are basically unable to upstream), I am thinking of help bring their efforts together yet enable more of Nebula users to start managing their metadata without pain.

Then I see Amundsen, the elegant, community-driven, and beloved project(well done!!!), and had been working to bring Amundsen to the Nebula Graph community.

Expected Behavior or Use Case

It should be the same as it was for Neo4j, AWS Neptune, and Apache Atlas.

Service or Ingestion ETL

Metadata:

  • Nebula Proxy

Databuilder:

  • Nebula Extractor
  • Nebula Search Data Extractor
  • Nebula CSV Loader
  • Nebula CSV Publisher
  • Nebula Serializer
  • Nebula Sample Data Loader
       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
       β”‚                        β”‚ β”‚                                        β”‚
       β”‚ Frontend :5000         β”‚ β”‚ Metadata Sources                       β”‚
       β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
       β”‚ Metaservice :5001      β”‚ β”‚ β”‚        β”‚ β”‚         β”‚ β”‚             β”‚ β”‚
       β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”‚ β”‚ β”‚ Foo DB β”‚ β”‚ Bar App β”‚ β”‚ X Dashboard β”‚ β”‚
  β”Œβ”€β”€β”€β”€β”Όβ”€β”€ Nebula Proxy β”‚       β”‚ β”‚ β”‚        β”‚ β”‚         β”‚ β”‚             β”‚ β”‚
  β”‚    β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β”‚ β”‚ β”‚        β”‚ β”‚         β”‚ β”‚             β”‚ β”‚
  β”‚    β”‚                        β”‚ β”‚ β”‚        β”‚ β”‚         β”‚ β”‚             β”‚ β”‚
  β”‚    β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”Œβ”€β”Όβ”€β”€β”€β”€β”€ Searchsearvice :5002   β”‚ β”‚                  β”‚                     β”‚
β”‚ β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ β”‚                                                  β”‚
β”‚ β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ β”‚    β”‚                                             β”‚                       β”‚
β”‚ β”‚    β”‚ Databuilder     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                       β”‚
β”‚ β”‚    β”‚                 β”‚                                                   β”‚
β”‚ β”‚    β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ β”Œβ”€β”€β”Όβ”€β–Ί Extractor of Sources           β”œβ”€β–Ί nebula_search_data_extractor β”‚ β”‚
β”‚ β”‚ β”‚  β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚  β”‚                 β”‚                                 β”‚                 β”‚
β”‚ β”‚ β”‚  β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ β”‚  β”‚ β”‚ Loader filesystem_csv_nebula   β”‚ β”‚ Loader Elastic FS loader     β”‚ β”‚
β”‚ β”‚ β”‚  β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚  β”‚                 β”‚                                 β”‚                 β”‚
β”‚ β”‚ β”‚  β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ β”‚  β”‚ β”‚ Publisher nebula_csv_publisher β”‚ β”‚ Publisher Elasticsearch      β”‚ β”‚
β”‚ β”‚ β”‚  β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚  β”‚                 β”‚                                 β”‚                 β”‚
β”‚ β”‚ β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ β”‚ β”‚                    β”‚                                 β”‚
β”‚ β”‚ β”‚                    β”‚                                 β”‚
β”‚ β”‚ └────────────────┐   β”‚                                 β”‚
β”‚ β”‚                  β”‚   β”‚                                 β”‚
β”‚ β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β–Ίβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”
β”‚ β”‚    β”‚ Nebula Graphβ”‚   β”‚                         β”‚ β”‚           β”‚
β”‚ └────┼─────┬───────┴───┼───────────┐     β”Œβ”€β”€β”€β”€β”€β” β”‚ β”‚           β”‚
β”‚      β”‚     β”‚           β”‚           β”‚     β”‚MetaDβ”‚ β”‚ β”‚           β”‚
β”‚      β”‚ β”Œβ”€β”€β”€β–Όβ”€β”€β”    β”Œβ”€β”€β”€β–Όβ”€β”€β”    β”Œβ”€β”€β”€β–Όβ”€β”€β”  β””β”€β”€β”€β”€β”€β”˜ β”‚ β”‚           β”‚
β”‚ β”Œβ”€β”€β”€β”€β”Όβ”€β–ΊGraphDβ”‚    β”‚GraphDβ”‚    β”‚GraphDβ”‚          β”‚ β”‚           β”‚
β”‚ β”‚    β”‚ β””β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”˜  β”Œβ”€β”€β”€β”€β”€β” β”‚ β”‚           β”‚
β”‚ β”‚    β”‚ :9669                             β”‚MetaDβ”‚ β”‚ β”‚  Elastic  β”‚
β”‚ β”‚    β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”  β””β”€β”€β”€β”€β”€β”˜ β”‚ β”‚  Search   β”‚
β”‚ β”‚    β”‚ β”‚        β”‚ β”‚        β”‚ β”‚        β”‚          β”‚ β”‚  Cluster  β”‚
β”‚ β”‚    β”‚ β”‚StorageDβ”‚ β”‚StorageDβ”‚ β”‚StorageDβ”‚  β”Œβ”€β”€β”€β”€β”€β” β”‚ β”‚  :9200    β”‚
β”‚ β”‚    β”‚ β”‚        β”‚ β”‚        β”‚ β”‚        β”‚  β”‚MetaDβ”‚ β”‚ β”‚           β”‚
β”‚ β”‚    β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”˜ β”‚ β”‚           β”‚
β”‚ β”‚    β”‚                                           β”‚ β”‚           β”‚
β”‚ β”‚    β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚           β”‚
β”‚ └───── Nebula Studio :7001                       β”‚ β”‚           β”‚
β”‚      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β–²β”€β”€β”€β”€β”€β”˜
β”‚                                                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Possible Implementation

Due to its Directed Property Graph Model and the support of OpenCypher, the implementation is just following that the community had done with the great Neo4j.

The only thing that differentiated is Nebula Graph is Schema-ful, that is, inserting data before the Graph Schema is created is unaccepted. Thus, to decouple the Nebula schema creation of the model, my proposal now was to create/alter Nebula Graph Schema when needed in Nebula CSV Publisher.

I will create my draft PR here: #1817 , and it’s tested workable for all functions Neo4j that already supports with Docker Compose on the Frontend.

My branch πŸ‘‰πŸ»: https://github.com/wey-gu/amundsen/tree/amundsen_nebula_graph

docker-compose -f docker-Amundsen-nebula.yml build
docker-compose -f docker-Amundsen-nebula.yml up -d

# wait for 90 seconds after all containers are up
cd data builder
python3 -m venv venv
source venv/bin/activate
pip3 install --upgrade pip
pip3 install -r requirements.txt
python3 setup.py install
python3 example/scripts/sample_data_loader_nebula.py

# try to visit this from your browser!
http://localhost:5000/table_detail/gold/hive/test_schema/test_table1
  • For now, I assume example/scripts/sample_data_loader_nebula.py to be used to bootstrap the schema before any cluster is brought up. Please help advise on better solutions
  • I learned through documentation and codebase to contribute, maybe I didn’t understand things correctly, kindly help correct/teach me if possible πŸ˜ƒ

I will prepare some articles and videos and explore some real data source pipeline to help guys in the Nebula Graph community(for now, most of the friends are Chinese, me, too! being lockdown in Shanghai these days T__T ) in the upcoming days.

Could you kindly help with advice/review?

Thanks so much!

Why does yet another graph database for Amundsen speak cypher query?

Wey: I love Neo4j, too! I just hope those Nebula Graph lovers(they for sure love Neo4j, too as I know) would have a chance to enjoy Amundsen’s amazing offerings on their Nebula Graph clusters.

Example Screenshots (if appropriate):

Screen Shot 2022-04-15 at 6 01 11 PM Screen Shot 2022-04-15 at 6 03 47 PM

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

2reactions
wey-gucommented, Apr 22, 2022

Sorry, I just realise that we have an RFC way of working in the RFC repo, will create PR there later.

1reaction
feng-taocommented, Apr 21, 2022

thanks

Read more comments on GitHub >

github_iconTop Results From Across the Web

Storage Service - Nebula Graph Database Manual
Data storage structureΒΆ. Graphs consist of vertices and edges. Nebula Graph uses key-value pairs to store vertices, edges, and their properties.
Read more >
Large Scale Feature Storage with NebulaGraph
NebulaGraph is an open source distributed graph database featuring high performance, high availability, high reliability, and strong data consistency. Storage ...
Read more >
What is Nebula Graph
A graph database, such as Nebula Graph, is a database that specializes in storing vast graph networks and retrieving information from them.
Read more >
Storage Design - Nebula Graph Database Manual
This document gives an introduction to the storage design of the graph ... E.g., One space can use HBase as its storage backend...
Read more >
An Introduction to NebulaGraph's Storage Engine
Schema & Partition. NebulaGraph stores the vertices, edges, and properties. Efficient property filtering is critical for a Graph Database. NebulaGraph uses tags ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found