RFC/Feature: Nebula Graph as Backend Storage
See original GitHub issueSimilar to https://github.com/amundsen-io/amundsen/issues/526, just to add backend support of Nebula Graph, an Open Source, distributed Graph Database stands out as itβs a Linear Scalable, Cloud Native, Open Source(Apache 2.0) GDB, and it speaks OpenCypher and nGQL.
Background
Based on my observation, Nebula Graph is beloved/and adopted as the Graph Infra by many teams in the community due to its excellent OLTP capability for huge data volumes, while, they had independently created their own wheels of metadata service/lineage system on their own on top of Nebula.
Knowing their entropy increasing efforts on modeling the metadata, writing hooks for different data sources to wire everything up, etc(i.e. some maintained their own Giant fork of Apache Atlas with Nebula Graph as backend and are basically unable to upstream), I am thinking of help bring their efforts together yet enable more of Nebula users to start managing their metadata without pain.
Then I see Amundsen, the elegant, community-driven, and beloved project(well done!!!), and had been working to bring Amundsen to the Nebula Graph community.
Expected Behavior or Use Case
It should be the same as it was for Neo4j, AWS Neptune, and Apache Atlas.
Service or Ingestion ETL
Metadata:
- Nebula Proxy
Databuilder:
- Nebula Extractor
- Nebula Search Data Extractor
- Nebula CSV Loader
- Nebula CSV Publisher
- Nebula Serializer
- Nebula Sample Data Loader
ββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββββββββββ
β β β β
β Frontend :5000 β β Metadata Sources β
ββββββββββββββββββββββββββ€ β ββββββββββ βββββββββββ βββββββββββββββ β
β Metaservice :5001 β β β β β β β β β
β ββββββββββββββββ β β β Foo DB β β Bar App β β X Dashboard β β
ββββββΌββ€ Nebula Proxy β β β β β β β β β β
β β ββββββββββββββββ β β β β β β β β β
β β β β β β β β β β β
β ββββββββββββββββββββββββββ€ β ββββββββββ βββββββ¬ββββ βββββββββββββββ β
βββΌβββββ€ Searchsearvice :5002 β β β β
β β ββββββββββββββββββββββββββ ββββββββββββββββββββΌββββββββββββββββββββββ
β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββ
β β β β β
β β β Databuilder βββββββββββββββββββββββββββββ β
β β β β β
β β β βββββββββββββββββΌβββββββββββββββββ ββββββββββββββββββββββββββββββββ β
β β ββββΌββΊ Extractor of Sources βββΊ nebula_search_data_extractor β β
β β β β βββββββββββββββββ¬βββββββββββββββββ ββββββββββββββββ¬ββββββββββββββββ β
β β β β β β β
β β β β βββββββββββββββββΌβββββββββββββββββ ββββββββββββββββΌββββββββββββββββ β
β β β β β Loader filesystem_csv_nebula β β Loader Elastic FS loader β β
β β β β βββββββββββββββββ¬βββββββββββββββββ ββββββββββββββββ¬ββββββββββββββββ β
β β β β β β β
β β β β βββββββββββββββββΌβββββββββββββββββ ββββββββββββββββΌββββββββββββββββ β
β β β β β Publisher nebula_csv_publisher β β Publisher Elasticsearch β β
β β β β βββββββββββββββββ¬βββββββββββββββββ ββββββββββββββββ¬ββββββββββββββββ β
β β β β β β β
β β β βββββββββββββββββββΌββββββββββββββββββββββββββββββββββΌββββββββββββββββββ
β β β β β
β β β β β
β β ββββββββββββββββββ β β
β β β β β
β β βββββββββββββββΌββββΊββββββββββββββββββββββββββ βββββββΌββββββ
β β β Nebula Graphβ β β β β
β ββββββΌββββββ¬ββββββββ΄ββββΌββββββββββββ βββββββ β β β
β β β β β βMetaDβ β β β
β β βββββΌβββ βββββΌβββ βββββΌβββ βββββββ β β β
β ββββββΌββΊGraphDβ βGraphDβ βGraphDβ β β β
β β β ββββββββ ββββββββ ββββββββ βββββββ β β β
β β β :9669 βMetaDβ β β Elastic β
β β β ββββββββββ ββββββββββ ββββββββββ βββββββ β β Search β
β β β β β β β β β β β Cluster β
β β β βStorageDβ βStorageDβ βStorageDβ βββββββ β β :9200 β
β β β β β β β β β βMetaDβ β β β
β β β ββββββββββ ββββββββββ ββββββββββ βββββββ β β β
β β β β β β
β β βββββββββββββββββββββββββββββββββββββββββββββ€ β β
β ββββββ€ Nebula Studio :7001 β β β
β βββββββββββββββββββββββββββββββββββββββββββββ βββββββ²ββββββ
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Possible Implementation
Due to its Directed Property Graph Model and the support of OpenCypher, the implementation is just following that the community had done with the great Neo4j.
The only thing that differentiated is Nebula Graph is Schema-ful, that is, inserting data before the Graph Schema is created is unaccepted. Thus, to decouple the Nebula schema creation of the model, my proposal now was to create/alter Nebula Graph Schema when needed in Nebula CSV Publisher.
I will create my draft PR here: #1817 , and itβs tested workable for all functions Neo4j that already supports with Docker Compose on the Frontend.
My branch ππ»: https://github.com/wey-gu/amundsen/tree/amundsen_nebula_graph
docker-compose -f docker-Amundsen-nebula.yml build
docker-compose -f docker-Amundsen-nebula.yml up -d
# wait for 90 seconds after all containers are up
cd data builder
python3 -m venv venv
source venv/bin/activate
pip3 install --upgrade pip
pip3 install -r requirements.txt
python3 setup.py install
python3 example/scripts/sample_data_loader_nebula.py
# try to visit this from your browser!
http://localhost:5000/table_detail/gold/hive/test_schema/test_table1
- For now, I assume
example/scripts/sample_data_loader_nebula.py
to be used to bootstrap the schema before any cluster is brought up. Please help advise on better solutions - I learned through documentation and codebase to contribute, maybe I didnβt understand things correctly, kindly help correct/teach me if possible π
I will prepare some articles and videos and explore some real data source pipeline to help guys in the Nebula Graph community(for now, most of the friends are Chinese, me, too! being lockdown in Shanghai these days T__T ) in the upcoming days.
Could you kindly help with advice/review?
Thanks so much!
Why does yet another graph database for Amundsen speak cypher query?
Wey: I love Neo4j, too! I just hope those Nebula Graph lovers(they for sure love Neo4j, too as I know) would have a chance to enjoy Amundsenβs amazing offerings on their Nebula Graph clusters.
Example Screenshots (if appropriate):


Issue Analytics
- State:
- Created a year ago
- Comments:5 (1 by maintainers)
Sorry, I just realise that we have an RFC way of working in the RFC repo, will create PR there later.
thanks