question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Streamline cluster initialization

See original GitHub issue

gnt-cluster init <name> should yield a mostly-functional cluster without needing to pass additional arguments or follow up with 20 gnt-cluster modify commands. There are some areas that can be improved here:

  • New clusters should default to using KVM. Its use is much more widespread than Xen nowadays.
  • We should reassess hypervisor parameter defaults. A few things that come to mind are kernel_path, cpu_type, disk_aio/disk_cache/disk_type, vhost_net/vnet_hdr, serial_console/serial_speed, all of DRBD’s params that can’t even fill a 1Gbps line by default, etc.
  • We could also manage the master netdev ourselves, using a dummy interface just for the master IP address. I’ve used this approach on a number of clusters and find it nice and clean.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:3
  • Comments:9 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
cwseyscommented, Apr 5, 2021

I’m not convinced by |cpu_type: host| as a default though. Certainly it enables maximum CPU features, but it limits migration options on a heterogenous cluster. I think it should be opt-in.

One idea is to have a to scan for the intersection of of all cpu instructions available on every node in the cluster. Then newly started/rebooted VMs could be fed this subset and also safely migrate throughout the whole cluster.

‘gnt-node add’ and ‘gnt-node remove’ could trigger this scan and update the subset. It should warn if the subset of available instructions is decreasing (b/c of the addition of an older node).

C.

0reactions
candlerbcommented, Apr 3, 2021

Various features have been added over time, and Google took a very conservative approach of not enabling them by default, so that a new cluster always behaved identically to an old one. By comparison, libvirt (for example) has been happy to enable these features by default.

Key ones:

  • vhost networking (introduced in 2.1.3, over 10 years ago!):

    gnt-cluster modify -H kvm:vhost_net=true
    
  • User-initiated shutdown (2.10/2.11 I think, and for xen in 2.12):

    gnt-cluster modify --user-shutdown=true
    gnt-cluster modify -H kvm:user_shutdown=true
    

I’m not convinced by cpu_type: host as a default though. Certainly it enables maximum CPU features, but it limits migration options on a heterogenous cluster. I think it should be opt-in.

Personally I set vnc_bind_address=0.0.0.0 to allow consoles without ssh tunnelling, given that the management nodes are on private addresses behind a firewal. However that might not be appropriate for everyone.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Add or update Python client constructor to streamline in-cluster ...
Description There should be a simple way for users to in initialize the Cortex Python client from within their Predictor implementation.
Read more >
com.hortonworks.streamline.streams.cluster.exception ... - Tabnine
How to use. com.hortonworks.streamline.streams.cluster.exception.ServiceConfigurationNotFoundException. constructor. Best Java code snippets using com.
Read more >
Docs 1.1.1. - Enhancing QuickBundles with different metrics ... - DIPY
There is a wide variety of metrics that could be used to cluster streamlines. The purpose of this tutorial is to show how...
Read more >
Cisco APIC Getting Started Guide, Release 3.x - Fabric ...
About Fabric Initialization. You can build a fabric by adding switches to be managed by the APIC and then validating the steps using...
Read more >
ClusterClass - The Cluster API Book - Kubernetes
ClusterClass is a powerful abstraction implemented on top of existing interfaces and offers a set of tools and operations to streamline cluster lifecycle ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found