Performance problem in segment build
See original GitHub issueWe found out that during segment completion, segment build takes a long time. For one table it takes more than half an hour to build an immutable segment while it used to take around one minute to do so. After some investigation, it turned out that the root cause is in this PR: https://github.com/apache/pinot/pull/7595
More specifically the issue is with refactoring of BaseChunkSVForwardIndexWriter
where we used to have one separate in-memory byte buffer to compress each chunk and then write its content to the index file:
sizeToWrite = _chunkCompressor.compress(_chunkBuffer, _compressedBuffer);
_dataFile.write(_compressedBuffer, _dataOffset);
_compressedBuffer.clear();
After writing the chunk, the bytebuffer gets cleared and the same object will be reused in the next writeChunk call. Now after refactoring, the reusable byte buffer is gone and in every writeChunk call, small part of the index file gets memory mapped into a new MappedByteBuffer and and the chunk data is compressed to that mapped byte buffer which in turn automatically gets written into the index file.
int maxCompressedSize = _chunkCompressor.maxCompressedSize(_chunkBuffer.limit());
try (PinotDataBuffer compressedBuffer = PinotDataBuffer.mapFile(_file, false, _dataOffset,
maxCompressedSize, ByteOrder.BIG_ENDIAN, "forward index chunk")) {
ByteBuffer view = compressedBuffer.toDirectByteBuffer(0, maxCompressedSize);
sizeWritten = _chunkCompressor.compress(_chunkBuffer, view);
}
This may look better as it doesn’t need an extra byte buffer for compression, but since the size of the chunk is very small - 1000 * data type size (8 bytes for long) - memory mapping degrades the performance [1]. We experimented a bit with the segments of the problematic table and turned out that even with SSD it takes more than 30% time to build the segment. For HDD, it’s much worse and it takes more than 30x (one minute for using interim byte buffer vs more than 30 minutes for memory mapping).
[1] From Oracle documentation: For most operating systems, mapping a file into memory is more expensive than reading or writing a few tens of kilobytes of data via the usual read and write methods. From the standpoint of performance it is generally only worth mapping relatively large files into memory. https://docs.oracle.com/javase/7/docs/api/java/nio/channels/FileChannel.html#map(java.nio.channels.FileChannel.MapMode, long, long)
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:14 (14 by maintainers)
Top GitHub Comments
@sajjad-moradi yes, but it requires field level config. I will open an issue to discuss making it default.
The PR is merged. Closing the issue.