question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Better support exposing the Alluxio service to external the K8s cluster

See original GitHub issue

Is your feature request related to a problem? Please describe. It’s becoming more common that users want to host the Alluxio service on K8s while some external applications need to access the Alluxio cluster from outside the K8s cluster.

In the current state, the users need to:

  1. Change the master K8s services to be accessible from outside the K8s cluster. This typically changes the service to NodePort or Ingress.
  2. Somehow expose the worker pods to the external. This is much harder than 1 because worker pods are dynamic and do not have associated Services. One way is to use hostNetwork=true for all workers and clients will then talk to worker nodes.

Describe the solution you’d like We need one solution for:

  1. Enabling master pods to be accessible from outside
  2. Enabling worker pods to be accessible from outside
  3. Ideally use only one switch to control all

The biggest challenge is the worker pods. Using a combination of StatefulSet deployed workers + externalTrafficPolicy Service can be a solution. The Service maps to the worker pod by name, which becomes deterministic because workers are now deployed with StatefulSet.

apiVersion: v1
kind: Service
metadata:
  name: worker-0
spec:
  type: NodePort
  externalTrafficPolicy: Local
  selector:
    statefulset.kubernetes.io/pod-name: worker-0
  ports:
  - protocol: TCP
    port: 19998
    targetPort: 19998

The worker pods now need anti affinity defined, so no two worker pods appear on one node.

The master pods can be exposed similarly.

Describe alternatives you’ve considered Use hostNetwork to deploy all master and worker pods and access the Alluxio pods by node IP. This is the cleanest way as of Alluxio v2.8. The challenge is hostNetwork requires admin privileges and may even incur port collision with other services.

Urgency MEDIUM. There are existing use cases for this setup.

Additional context Add any other context or screenshots about the feature request here.

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
nirav-chotaicommented, Jul 26, 2022

Solution should be independent whether hostNetwork is enabled or not.

Init container for workers should collect metadata and use an init script to talk to master and register themselves.

          env:
            - name: POD_IP
              valueFrom:
                fieldRef:
                  fieldPath: status.podIP
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace

and

            - register-worker
            - --ip
            - $(POD_IP)
            - --k8s-namespace
            - $(POD_NAMESPACE)

Something like above.

This will work regardless I enabled hostNetwork or not.

      hostNetwork: {{ $hostNetwork }}
      hostPID: {{ $hostPID }}
      dnsPolicy: {{ .Values.worker.dnsPolicy | default ($hostNetwork | ternary "ClusterFirstWithHostNet" "ClusterFirst") }}
0reactions
jiacheliu3commented, Jul 16, 2022

No the masters don’t have that equivalent because we use Service to handle the name resolution. Clients talk to services so no need to know the pod names. But yea we currently don’t have a uniformed definition of what hostnames map to which use cases (internal/external the k8s cluster etc). The existing configs are more on demand. If there’s a chance to unify all those, I’m totally in 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

Deploy Alluxio on Kubernetes
This tutorial walks through a basic Alluxio setup on Kubernetes. Alluxio supports two methods of installation on Kubernetes: either using helm charts or ......
Read more >
Introduction - Alluxio v2.9.0 (stable) Documentation
Intelligent Multi-tiering Caching: Alluxio clusters act as a read and write cache for data in connected storage systems. Configurable policies automatically ...
Read more >
Deploy Alluxio on Docker - Introduction
Docker can be used to simplify the deployment and management of Alluxio servers. Using the alluxio/alluxio Docker image available on Dockerhub, you can...
Read more >
Caching - Alluxio v2.9.0 (stable) Documentation - Introduction
Alluxio storage improves performance by storing data in memory co-located with compute nodes. Data in Alluxio storage can be replicated to make “hot”...
Read more >
Deploy Alluxio on a Cluster with HA - Introduction
Specify Alluxio Service in Configuration Parameters or Java Options · Specify Alluxio Service with URL Authority · Specify Alluxio Service with logical URL ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found