question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

WorkerOutOfSpaceException when setting max_readahead=0 with Alluxio JNI Fuse

See original GitHub issue

Alluxio Version: What version of Alluxio are you using? https://github.com/cheyang/alluxio/tree/branch-2.3-fuse-pod-for-non-root

Describe the bug A clear and concise description of what the bug is. I started Alluxio with Alluxio JNI fuse on a 4-node k8s cluster. And my fuse opts are:

2020-09-29 05:46:31,172 INFO  AlluxioFuse - Mounting AlluxioJniFuseFileSystem: mount point="/alluxio-mnt/default/imagenet/alluxio-fuse", OPTIONS="[-obig_writes, -okernel_cache, -oro, -omax_read=131072, -omax_readahead=0, -oattr_timeout=7200, -oentry_timeout=7200, -ononempty, -oallow_other, -omax_write=131072]"

When I run my ResNet50 job reading ImageNet dataset through Alluxio JNI Fuse, I got the following exceptions:

In alluxio-fuse pod:

....
fuse: max_idle_threads: 10
fuse: max_idle_threads: 10
fuse: max_idle_threads: 10
fuse: max_idle_threads: 10
2020-09-29 05:55:32,450 INFO  AlluxioJniFuseFileSystem - read: cached 0 bytes, missed 0 bytes, ratio 0.0
2020-09-29 05:55:34,264 WARN  AlluxioFileInStream - Failed to close input stream for block 3875536896: Timeout waiting for response after 30000ms. (Zero Copy GrpcDataReader)
2020-09-29 05:55:35,241 WARN  AlluxioJniFuseFileSystem - read(path=/imagenet/imagenet/train/train-00835-of-01024,buf=java.nio.DirectByteBuffer[pos=4096 lim=4096 cap=4096],size=4096,offset=0) returned 4096 in 112442 ms (>=1000 ms)
2020-09-29 05:55:36,297 WARN  AlluxioJniFuseFileSystem - read(path=/imagenet/imagenet/train/train-00368-of-01024,buf=java.nio.DirectByteBuffer[pos=4096 lim=4096 cap=4096],size=4096,offset=16777216) returned 4096 in 3940 ms (>=1000 ms)
2020-09-29 05:55:40,438 ERROR ResourceLeakDetector - LEAK: ByteBuf.release() was not called before it's garbage-collected. See https://netty.io/wiki/reference-counted-objects.html for more information.
Recent access records: 
Created at:
	io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:349)
	io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:187)
	io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:178)
	io.netty.buffer.AbstractByteBufAllocator.buffer(AbstractByteBufAllocator.java:115)
	io.netty.handler.codec.ByteToMessageDecoder.expandCumulation(ByteToMessageDecoder.java:529)
	io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:97)
	io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:274)
	io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377)
	io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
	io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355)
	io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
	io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377)
	io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
	io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
	io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:792)
	io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$1.run(AbstractEpollChannel.java:387)
	io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
	io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
	io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
	io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
	io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	java.lang.Thread.run(Thread.java:748)
2020-09-29 05:55:40,439 ERROR ResourceLeakDetector - LEAK: ByteBuf.release() was not called before it's garbage-collected. See https://netty.io/wiki/reference-counted-objects.html for more information.
Recent access records: 
Created at:
	io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:349)
	io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:187)
	io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:178)
	io.netty.buffer.AbstractByteBufAllocator.buffer(AbstractByteBufAllocator.java:115)
	io.netty.handler.codec.ByteToMessageDecoder.expandCumulation(ByteToMessageDecoder.java:529)
	io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:97)
	io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:274)
	io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377)
	io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
	io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355)
	io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
	io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377)
	io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
	io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
	io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:792)
	io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:475)
	io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
	io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
	io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	java.lang.Thread.run(Thread.java:748)
2020-09-29 05:55:41,777 WARN  AlluxioFileInStream - Failed to close input stream for block 6979321856: Timeout waiting for response after 30000ms. (Zero Copy GrpcDataReader)
2020-09-29 05:55:41,779 ERROR AlluxioJniFuseFileSystem - Failed to read /imagenet/imagenet/train/train-00981-of-01024,4096,0: 
alluxio.exception.status.DeadlineExceededException: Timeout waiting for response after 30000ms. (Zero Copy GrpcDataReader)
	at alluxio.client.block.stream.GrpcBlockingStream.receive(GrpcBlockingStream.java:159)
	at alluxio.client.block.stream.GrpcDataMessageBlockingStream.receiveDataMessage(GrpcDataMessageBlockingStream.java:84)
	at alluxio.client.block.stream.GrpcDataReader.readChunk(GrpcDataReader.java:129)
	at alluxio.client.block.stream.BlockInStream.readChunk(BlockInStream.java:392)
	at alluxio.client.block.stream.BlockInStream.readInternal(BlockInStream.java:268)
	at alluxio.client.block.stream.BlockInStream.read(BlockInStream.java:264)
	at alluxio.client.file.AlluxioFileInStream.read(AlluxioFileInStream.java:187)
	at alluxio.fuse.AlluxioJniFuseFileSystem.readInternal(AlluxioJniFuseFileSystem.java:350)
	at alluxio.fuse.AlluxioJniFuseFileSystem.lambda$read$4(AlluxioJniFuseFileSystem.java:322)
	at alluxio.fuse.AlluxioFuseUtils.call(AlluxioFuseUtils.java:276)
	at alluxio.fuse.AlluxioJniFuseFileSystem.read(AlluxioJniFuseFileSystem.java:322)
	at alluxio.jnifuse.AbstractFuseFileSystem.readCallback(AbstractFuseFileSystem.java:150)
2020-09-29 05:55:41,784 WARN  AlluxioJniFuseFileSystem - read(path=/imagenet/imagenet/train/train-00981-of-01024,buf=java.nio.DirectByteBuffer[pos=0 lim=4096 cap=4096],size=4096,offset=0) returned -5 in 120068 ms (>=1000 ms)
2020-09-29 05:55:41,786 INFO  AlluxioJniFuseFileSystem - release(fd=43,entries=240)
2020-09-29 05:55:41,799 WARN  AlluxioFileInStream - Failed to close input stream for block 9193914368: Timeout waiting for response after 30000ms. (Zero Copy GrpcDataReader)
2020-09-29 05:55:41,800 ERROR AlluxioJniFuseFileSystem - Failed to read /imagenet/imagenet/train/train-00572-of-01024,4096,0: 
alluxio.exception.status.DeadlineExceededException: Timeout waiting for response after 30000ms. (Zero Copy GrpcDataReader)
	at alluxio.client.block.stream.GrpcBlockingStream.receive(GrpcBlockingStream.java:159)
	at alluxio.client.block.stream.GrpcDataMessageBlockingStream.receiveDataMessage(GrpcDataMessageBlockingStream.java:84)
	at alluxio.client.block.stream.GrpcDataReader.readChunk(GrpcDataReader.java:129)
	at alluxio.client.block.stream.BlockInStream.readChunk(BlockInStream.java:392)
	at alluxio.client.block.stream.BlockInStream.readInternal(BlockInStream.java:268)
	at alluxio.client.block.stream.BlockInStream.read(BlockInStream.java:264)
	at alluxio.client.file.AlluxioFileInStream.read(AlluxioFileInStream.java:187)
	at alluxio.fuse.AlluxioJniFuseFileSystem.readInternal(AlluxioJniFuseFileSystem.java:350)
	at alluxio.fuse.AlluxioJniFuseFileSystem.lambda$read$4(AlluxioJniFuseFileSystem.java:322)
	at alluxio.fuse.AlluxioFuseUtils.call(AlluxioFuseUtils.java:276)
	at alluxio.fuse.AlluxioJniFuseFileSystem.read(AlluxioJniFuseFileSystem.java:322)
	at alluxio.jnifuse.AbstractFuseFileSystem.readCallback(AbstractFuseFileSystem.java:150)
2020-09-29 05:55:41,800 WARN  AlluxioFileInStream - Failed to close input stream for block 13404995584: Timeout waiting for response after 30000ms. (Zero Copy GrpcDataReader)
2020-09-29 05:55:41,800 WARN  AlluxioJniFuseFileSystem - read(path=/imagenet/imagenet/train/train-00572-of-01024,buf=java.nio.DirectByteBuffer[pos=0 lim=4096 cap=4096],size=4096,offset=0) returned -5 in 120067 ms (>=1000 ms)
2020-09-29 05:55:41,800 ERROR AlluxioJniFuseFileSystem - Failed to read /imagenet/imagenet/train/train-00840-of-01024,4096,0: 
alluxio.exception.status.DeadlineExceededException: Timeout waiting for response after 30000ms. (Zero Copy GrpcDataReader)
	at alluxio.client.block.stream.GrpcBlockingStream.receive(GrpcBlockingStream.java:159)
	at alluxio.client.block.stream.GrpcDataMessageBlockingStream.receiveDataMessage(GrpcDataMessageBlockingStream.java:84)
	at alluxio.client.block.stream.GrpcDataReader.readChunk(GrpcDataReader.java:129)
	at alluxio.client.block.stream.BlockInStream.readChunk(BlockInStream.java:392)
	at alluxio.client.block.stream.BlockInStream.readInternal(BlockInStream.java:268)
	at alluxio.client.block.stream.BlockInStream.read(BlockInStream.java:264)
	at alluxio.client.file.AlluxioFileInStream.read(AlluxioFileInStream.java:187)
	at alluxio.fuse.AlluxioJniFuseFileSystem.readInternal(AlluxioJniFuseFileSystem.java:350)
	at alluxio.fuse.AlluxioJniFuseFileSystem.lambda$read$4(AlluxioJniFuseFileSystem.java:322)
	at alluxio.fuse.AlluxioFuseUtils.call(AlluxioFuseUtils.java:276)
	at alluxio.fuse.AlluxioJniFuseFileSystem.read(AlluxioJniFuseFileSystem.java:322)
	at alluxio.jnifuse.AbstractFuseFileSystem.readCallback(AbstractFuseFileSystem.java:150)
2020-09-29 05:55:41,800 WARN  AlluxioJniFuseFileSystem - read(path=/imagenet/imagenet/train/train-00840-of-01024,buf=java.nio.DirectByteBuffer[pos=0 lim=4096 cap=4096],size=4096,offset=0) returned -5 in 120073 ms (>=1000 ms)
fuse: max_idle_threads: 10
fuse: max_idle_threads: 10
fuse: max_idle_threads: 10
2020-09-29 05:55:41,838 INFO  AlluxioJniFuseFileSystem - read: cached 0 bytes, missed 0 bytes, ratio 0.0
fuse: max_idle_threads: 10
fuse: max_idle_threads: 10
fuse: max_idle_threads: 10
fuse: max_idle_threads: 10
fuse: max_idle_threads: 10
fuse: max_idle_threads: 10
fuse: max_idle_threads: 10
fuse: max_idle_threads: 10
fuse: max_idle_threads: 10
fuse: max_idle_threads: 10
2020-09-29 05:55:41,909 INFO  AlluxioJniFuseFileSystem - read: cached 0 bytes, missed 0 bytes, ratio 0.0
....

In alluxio-worker pod, I got so many WorkerOutOfSpaceException:

2020-09-29 05:53:57,114 ERROR TieredBlockStore - Failed to free space. Min contiguous requested: 335544320, Min available requested: 335544320, Blocks iterated: 0, Blocks removed: 0, Space freed: 0
2020-09-29 05:53:57,123 ERROR TieredBlockStore - Allocation failure. Options: AllocateOptions{Location=BlockStoreLocation{TierAlias=MEM, DirIndex=<Any>, MediumType=<Any>}, Size=335544320, ForceLocation=false, EvictionAllowed=true, UseReservedSpace=false}. Error: alluxio.exception.WorkerOutOfSpaceException: Failed to free 335544320 bytes space at location 
2020-09-29 05:53:57,124 WARN  UnderFileSystemBlockReader - Failed to update block writer for UFS block [blockId: 16055795712, ufsPath: /underFSStorage/imagenet/imagenet/train/train-00035-of-01024, offset: 0]: Failed to allocate 335544320 bytes on BlockStoreLocation{TierAlias=MEM, DirIndex=<Any>, MediumType=<Any>} to create blockId 16055795712
2020-09-29 05:53:57,124 ERROR TieredBlockStore - Failed to free space. Min contiguous requested: 335544320, Min available requested: 335544320, Blocks iterated: 0, Blocks removed: 0, Space freed: 0
2020-09-29 05:53:57,127 ERROR TieredBlockStore - Allocation failure. Options: AllocateOptions{Location=BlockStoreLocation{TierAlias=MEM, DirIndex=<Any>, MediumType=<Any>}, Size=335544320, ForceLocation=false, EvictionAllowed=true, UseReservedSpace=false}. Error: alluxio.exception.WorkerOutOfSpaceException: Failed to free 335544320 bytes space at location 
2020-09-29 05:53:57,130 WARN  UnderFileSystemBlockReader - Failed to update block writer for UFS block [blockId: 14831058944, ufsPath: /underFSStorage/imagenet/imagenet/train/train-00843-of-01024, offset: 0]: Failed to allocate 335544320 bytes on BlockStoreLocation{TierAlias=MEM, DirIndex=<Any>, MediumType=<Any>} to create blockId 14831058944
2020-09-29 05:53:57,131 ERROR TieredBlockStore - Failed to free space. Min contiguous requested: 335544320, Min available requested: 335544320, Blocks iterated: 0, Blocks removed: 0, Space freed: 0
2020-09-29 05:53:57,131 ERROR TieredBlockStore - Allocation failure. Options: AllocateOptions{Location=BlockStoreLocation{TierAlias=MEM, DirIndex=<Any>, MediumType=<Any>}, Size=335544320, ForceLocation=false, EvictionAllowed=true, UseReservedSpace=false}. Error: alluxio.exception.WorkerOutOfSpaceException: Failed to free 335544320 bytes space at location 
2020-09-29 05:53:57,131 WARN  UnderFileSystemBlockReader - Failed to update block writer for UFS block [blockId: 8975810560, ufsPath: /underFSStorage/imagenet/imagenet/train/train-00074-of-01024, offset: 0]: Failed to allocate 335544320 bytes on BlockStoreLocation{TierAlias=MEM, DirIndex=<Any>, MediumType=<Any>} to create blockId 8975810560
2020-09-29 05:53:57,131 ERROR TieredBlockStore - Failed to free space. Min contiguous requested: 335544320, Min available requested: 335544320, Blocks iterated: 0, Blocks removed: 0, Space freed: 0
2020-09-29 05:53:57,131 ERROR TieredBlockStore - Allocation failure. Options: AllocateOptions{Location=BlockStoreLocation{TierAlias=MEM, DirIndex=<Any>, MediumType=<Any>}, Size=335544320, ForceLocation=false, EvictionAllowed=true, UseReservedSpace=false}. Error: alluxio.exception.WorkerOutOfSpaceException: Failed to free 335544320 bytes space at location 
2020-09-29 05:53:57,132 WARN  UnderFileSystemBlockReader - Failed to update block writer for UFS block [blockId: 4378853376, ufsPath: /underFSStorage/imagenet/imagenet/train/train-00323-of-01024, offset: 0]: Failed to allocate 335544320 bytes on BlockStoreLocation{TierAlias=MEM, DirIndex=<Any>, MediumType=<Any>} to create blockId 4378853376
2020-09-29 05:53:57,132 ERROR TieredBlockStore - Failed to free space. Min contiguous requested: 335544320, Min available requested: 335544320, Blocks iterated: 0, Blocks removed: 0, Space freed: 0
2020-09-29 05:53:57,132 ERROR TieredBlockStore - Allocation failure. Options: AllocateOptions{Location=BlockStoreLocation{TierAlias=MEM, DirIndex=<Any>, MediumType=<Any>}, Size=335544320, ForceLocation=false, EvictionAllowed=true, UseReservedSpace=false}. Error: alluxio.exception.WorkerOutOfSpaceException: Failed to free 335544320 bytes space at location 
2020-09-29 05:53:57,132 WARN  UnderFileSystemBlockReader - Failed to update block writer for UFS block [blockId: 8959033344, ufsPath: /underFSStorage/imagenet/imagenet/train/train-00674-of-01024, offset: 0]: Failed to allocate 335544320 bytes on BlockStoreLocation{TierAlias=MEM, DirIndex=<Any>, MediumType=<Any>} to create blockId 8959033344
2020-09-29 05:53:57,132 ERROR TieredBlockStore - Failed to free space. Min contiguous requested: 335544320, Min available requested: 335544320, Blocks iterated: 0, Blocks removed: 0, Space freed: 0
2020-09-29 05:53:57,132 ERROR TieredBlockStore - Allocation failure. Options: AllocateOptions{Location=BlockStoreLocation{TierAlias=MEM, DirIndex=<Any>, MediumType=<Any>}, Size=335544320, ForceLocation=false, EvictionAllowed=true, UseReservedSpace=false}. Error: alluxio.exception.WorkerOutOfSpaceException: Failed to free 335544320 bytes space at location 
2020-09-29 05:53:57,132 WARN  UnderFileSystemBlockReader - Failed to update block writer for UFS block [blockId: 10301210624, ufsPath: /underFSStorage/imagenet/imagenet/train/train-00828-of-01024, offset: 0]: Failed to allocate 335544320 bytes on BlockStoreLocation{TierAlias=MEM, DirIndex=<Any>, MediumType=<Any>} to create blockId 10301210624
2020-09-29 05:53:57,132 ERROR TieredBlockStore - Failed to free space. Min contiguous requested: 335544320, Min available requested: 335544320, Blocks iterated: 0, Blocks removed: 0, Space freed: 0
2020-09-29 05:53:57,132 ERROR TieredBlockStore - Allocation failure. Options: AllocateOptions{Location=BlockStoreLocation{TierAlias=MEM, DirIndex=<Any>, MediumType=<Any>}, Size=335544320, ForceLocation=false, EvictionAllowed=true, UseReservedSpace=false}. Error: alluxio.exception.WorkerOutOfSpaceException: Failed to free 335544320 bytes space at location 
2020-09-29 05:53:57,133 WARN  UnderFileSystemBlockReader - Failed to update block writer for UFS block [blockId: 6694109184, ufsPath: /underFSStorage/imagenet/imagenet/train/train-00469-of-01024, offset: 0]: Failed to allocate 335544320 bytes on BlockStoreLocation{TierAlias=MEM, DirIndex=<Any>, MediumType=<Any>} to create blockId 6694109184
2020-09-29 05:53:57,133 ERROR TieredBlockStore - Failed to free space. Min contiguous requested: 335544320, Min available requested: 335544320, Blocks iterated: 0, Blocks removed: 0, Space freed: 0
2020-09-29 05:53:57,133 ERROR TieredBlockStore - Allocation failure. Options: AllocateOptions{Location=BlockStoreLocation{TierAlias=MEM, DirIndex=<Any>, MediumType=<Any>}, Size=335544320, ForceLocation=false, EvictionAllowed=true, UseReservedSpace=false}. Error: alluxio.exception.WorkerOutOfSpaceException: Failed to free 335544320 bytes space at location 
2020-09-29 05:53:57,133 WARN  UnderFileSystemBlockReader - Failed to update block writer for UFS block [blockId: 6509559808, ufsPath: /underFSStorage/imagenet/imagenet/train/train-00324-of-01024, offset: 0]: Failed to allocate 335544320 bytes on BlockStoreLocation{TierAlias=MEM, DirIndex=<Any>, MediumType=<Any>} to create blockId 6509559808
2020-09-29 05:53:57,133 ERROR TieredBlockStore - Failed to free space. Min contiguous requested: 335544320, Min available requested: 335544320, Blocks iterated: 0, Blocks removed: 0, Space freed: 0
2020-09-29 05:53:57,133 ERROR TieredBlockStore - Allocation failure. Options: AllocateOptions{Location=BlockStoreLocation{TierAlias=MEM, DirIndex=<Any>, MediumType=<Any>}, Size=335544320, ForceLocation=false, EvictionAllowed=true, UseReservedSpace=false}. Error: alluxio.exception.WorkerOutOfSpaceException: Failed to free 335544320 bytes space at location 
2020-09-29 05:53:57,133 WARN  UnderFileSystemBlockReader - Failed to update block writer for UFS block [blockId: 4546625536, ufsPath: /underFSStorage/imagenet/imagenet/train/train-00530-of-01024, offset: 0]: Failed to allocate 335544320 bytes on BlockStoreLocation{TierAlias=MEM, DirIndex=<Any>, MediumType=<Any>} to create blockId 4546625536
2020-09-29 05:53:57,133 ERROR TieredBlockStore - Failed to free space. Min contiguous requested: 335544320, Min available requested: 335544320, Blocks iterated: 0, Blocks removed: 0, Space freed: 0
2020-09-29 05:53:57,133 ERROR TieredBlockStore - Allocation failure. Options: AllocateOptions{Location=BlockStoreLocation{TierAlias=MEM, DirIndex=<Any>, MediumType=<Any>}, Size=335544320, ForceLocation=false, EvictionAllowed=true, UseReservedSpace=false}. Error: alluxio.exception.WorkerOutOfSpaceException: Failed to free 335544320 bytes space at location 
2020-09-29 05:53:57,133 WARN  UnderFileSystemBlockReader - Failed to update block writer for UFS block [blockId: 1610612736, ufsPath: /underFSStorage/imagenet/imagenet/train/train-00958-of-01024, offset: 0]: Failed to allocate 335544320 bytes on BlockStoreLocation{TierAlias=MEM, DirIndex=<Any>, MediumType=<Any>} to create blockId 1610612736
2020-09-29 05:53:57,134 ERROR TieredBlockStore - Failed to free space. Min contiguous requested: 335544320, Min available requested: 335544320, Blocks iterated: 0, Blocks removed: 0, Space freed: 0
2020-09-29 05:53:57,134 ERROR TieredBlockStore - Allocation failure. Options: AllocateOptions{Location=BlockStoreLocation{TierAlias=MEM, DirIndex=<Any>, MediumType=<Any>}, Size=335544320, ForceLocation=false, EvictionAllowed=true, UseReservedSpace=false}. Error: alluxio.exception.WorkerOutOfSpaceException: Failed to free 335544320 bytes space at location 
2020-09-29 05:53:57,134 WARN  UnderFileSystemBlockReader - Failed to update block writer for UFS block [blockId: 2113929216, ufsPath: /underFSStorage/imagenet/imagenet/train/train-00321-of-01024, offset: 0]: Failed to allocate 335544320 bytes on BlockStoreLocation{TierAlias=MEM, DirIndex=<Any>, MediumType=<Any>} to create blockId 2113929216
2020-09-29 05:53:57,134 ERROR TieredBlockStore - Failed to free space. Min contiguous requested: 335544320, Min available requested: 335544320, Blocks iterated: 0, Blocks removed: 0, Space freed: 0
2020-09-29 05:53:57,134 ERROR TieredBlockStore - Allocation failure. Options: AllocateOptions{Location=BlockStoreLocation{TierAlias=MEM, DirIndex=<Any>, MediumType=<Any>}, Size=335544320, ForceLocation=false, EvictionAllowed=true, UseReservedSpace=false}. Error: alluxio.exception.WorkerOutOfSpaceException: Failed to free 335544320 bytes space at location 
2020-09-29 05:53:57,134 WARN  UnderFileSystemBlockReader - Failed to update block writer for UFS block [blockId: 3472883712, ufsPath: /underFSStorage/imagenet/imagenet/train/train-00659-of-01024, offset: 0]: Failed to allocate 335544320 bytes on BlockStoreLocation{TierAlias=MEM, DirIndex=<Any>, MediumType=<Any>} to create blockId 3472883712

However, when I check my tierstore capacity, it seems far from full:

$ alluxio fsadmin report capacity
Worker Name      Last Heartbeat   Storage       MEM
192.168.1.15     0                capacity      50.00GB
                                  used          10.07GB (20%)
192.168.1.17     0                capacity      50.00GB
                                  used          6.25GB (12%)
192.168.1.11     0                capacity      50.00GB
                                  used          9.98GB (19%)
192.168.1.16     0                capacity      50.00GB
                                  used          4896.00MB (9%)

All the logs are here: diagnose_fluid_1601359323.tar.gz

To Reproduce Steps to reproduce the behavior (as minimally and precisely as possible) Setup a UFS with large dataset like ImageNet, mount it onto Alluxio, and setup Alluxio JNI Fuse with the options mentioned above. This may reproduce the problem.

Expected behavior A clear and concise description of what you expected to happen. Everything works fine and no WorkerOutOfSpaceException

Urgency Describe the impact and urgency of the bug.

Additional context Add any other context about the problem here.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
TrafalgarZZZcommented, Oct 6, 2020

@apc999 Yes, in our case it is.

0reactions
TrafalgarZZZcommented, Nov 16, 2020

The commit 2c41226 fixed this problem. I’ll close this issue. @apc999 @LuQQiu Thanks for your help!

Read more comments on GitHub >

github_iconTop Results From Across the Web

WorkerOutOfSpaceException when setting max_readahead=0 ...
WorkerOutOfSpaceException when setting max_readahead=0 with Alluxio JNI Fuse.
Read more >
FUSE-based POSIX API - Alluxio v2.9.0 (stable) Documentation
This example shows how to mount the whole Alluxio cluster to a local directory and run operations against the directory. Prerequisites. The followings...
Read more >
List of Configuration Properties - Alluxio v2.9.0 (stable ...
Property Name Default Description alluxio.conf.dynamic.update.enabled false Whether to support dynamic update pro... alluxio.fuse.fs.name alluxio‑fuse The FUSE file system name. alluxio.fuse.special.command.enabled false
Read more >
Configuration Settings - Alluxio v2.9.0 (stable) Documentation
An Alluxio cluster can be configured by setting the values of Alluxio ... Java VM options to apply when attaching a debugger to...
Read more >
Metrics System - Alluxio v2.9.0 (stable) Documentation
Restart the Alluxio servers to activate new configuration changes. To enable Prometheus Sink Setup in the Alluxio standalone Fuse process, setting alluxio.fuse.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found