Alluxio worker on K8s does not support shared storage
See original GitHub issueIs your feature request related to a problem? Please describe.
As of Alluxio 2.5, the Alluxio workers assume the tiered storage directories (disks) to be exclusively owned by the Alluxio worker. For example the tier 0 dir is /data/alluxio
. Then the Alluxio worker is writing to /data/alluxio/alluxioworker
.
For that reason, Alluxio workers on K8s only support the PV types of local
and hostPath
, because we want the worker tiered storage to be exclusive (not shared by other workers). Also the worker pods are deployed with DaemonSet
, which means we cannot define separate PVs for each worker pod. We can only use one local
or hostPath
PV, so that the worker will translate it to the local path.
In K8s there’s a use case where the Alluxio workers are running on physical machines that do not have local storage. In that case the machine is backed by storage PVs from Ceph, which are read-write-many. Under this circumstance it is hard to define a tiered storage because:
- If we define the workers to use this shared Ceph PV, because the workers will each think the shared disk is dedicated. The behavior will be undefined. The workers all think what is under
alluxioworker/
belong to it alone. But in fact all the workers are writing files underalluxioworker/
. - It is hard to define a dedicated PV for each worker, because the workers are deployed with
DaemonSet
. All pods created fromDaemonSet
will share the PVCs.
Describe the solution you’d like
We can enable the worker tiered storage to be shared directories, by figuring out a way to distinguish alluxioworker/
dir between workers. One naive idea is to change the data dir to be alluxioworker-UUID/
.
There may be confusions in handling lost workers and recreation.
Describe alternatives you’ve considered A clear and concise description of any alternative solutions or features you’ve considered.
Urgency MEDIUM
This will enable many more deployments in K8s. Yes locality to worker is lost, but this can make sense to use cases that have low requirement on locality.
Additional context Add any other context or screenshots about the feature request here.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:5 (5 by maintainers)
Top GitHub Comments
@jiacheliu3, you are right. Ceph clusters are managed by Rook within the same Kubernetes Cluster where Alluxio is running. In terms of data locality, we might lose that only to the node, however, data still be within the same data center, infrastructure within other nodes where disks are available. Having around 100Gbps network, not having data local to the node is not going to have much impact.
In the use case for @nirav-chotai the PV is from Ceph and it has the ability to resize. Yes I think Ceph is managing a pool of PVs (Nirav pls correct me if I’m wrong).
Yes a loss of locality is inevitable in this setup, but by not having local storage on the worker physical nodes, that’s the best one can do I guess.