question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DGL Operator: Leverage DGL on K8s

See original GitHub issue

This is Xiaoyu Zhai, from Qihoo 360 AI Infra. Recently there are some internal demands on DGL/DGL-KE framework in our AI/ML teams, so we just kick off the research on distributed DGL training.

The native distributed DGL training is based on the machine level, you need to manually set up ip config, grant passwordless ssh access, use copy_files.py to dispatch your partition data, and use launch.py to invoke your training. But what we want to offer to our users, is automatically training distributed, and most important is that the workload can be orchestrated on K8s. So we decide to develop a “DGL Operator”, to leverage DGL training on K8s. It can cover distributed scaffolding tools for ML engineers, they only need to work on partition script and train script.

The first version of DGL Operator will be finished by end of this month, and we are glad to open source our project, let more and more developers can involve in DGL or use DGL on K8s. However, I have a question, which is the main subject of this issue, is dmlc willing to host our project? I noticed that dmlc usually does not host any golang projects, but its ok, we can also contribute this Operator to Kubeflow Community (XGBoost Operator is hosted by Kubeflow).

Looking forward to having you guys any response, be free to ping me.

Ref: XGBoost Operator Design

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:6 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
ryantdcommented, Apr 23, 2021

After talked with @zheng-da and had an internal discussion in our team, we decided to contribute the DGL Operator repo to Kubeflow community, because 1) DGL Operator is a Golang project and Kubernetes infra, contributing to Kubeflow may touch more Golang and Kubernetes engineers; 2) Kubeflow community have a lot of experienced Golang and Kubernetes engineers, can stay together to improve the stability and high-level design.

We have already submitted the proposal to Kubeflow community, please let me know if there is any issue or concern.

Proposal PR: https://github.com/kubeflow/community/pull/512 Proposal reading friendly: https://github.com/ryantd/community/blob/dgl-operator/proposals/dgl-operator-proposal.md

0reactions
github-actions[bot]commented, Mar 10, 2022

This issue is closed due to lack of activity. Feel free to reopen it if you still have questions.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Qihoo360/dgl-operator - GitHub
The DGL Operator makes it easy to run Deep Graph Library (DGL) graph neural network distributed or non-distributed training on Kubernetes.
Read more >
DGL Operator and Graph Training - Xiaoyu Zhai - YouTube
This session is DGL Operator and Graph Training - Xiaoyu Zhai.
Read more >
Developers - DGL Operator: Leverage DGL on K8s - - Bountysource
So we decide to develop a "DGL Operator", to leverage DGL training on K8s. It can cover distributed scaffolding tools for ML engineers,...
Read more >
DGL with Kubernetes - Questions - Deep Graph Library
Hi, We have a community contributed DGL operator which you can found at GitHub - Qihoo360/dgl-operator: The DGL Operator makes it easy to...
Read more >
ArangoDB-DGL Adapter | Data Science | Manual
The ArangoDB-DGL Adapter exports graphs from ArangoDB into Deep Graph Library (DGL), a Python package for graph neural networks, and vice-versa.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found