Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Performance problem in segment build

See original GitHub issue

We found out that during segment completion, segment build takes a long time. For one table it takes more than half an hour to build an immutable segment while it used to take around one minute to do so. After some investigation, it turned out that the root cause is in this PR: https://github.com/apache/pinot/pull/7595 More specifically the issue is with refactoring of BaseChunkSVForwardIndexWriter where we used to have one separate in-memory byte buffer to compress each chunk and then write its content to the index file:

      sizeToWrite = _chunkCompressor.compress(_chunkBuffer, _compressedBuffer);
      _dataFile.write(_compressedBuffer, _dataOffset);
      _compressedBuffer.clear();

After writing the chunk, the bytebuffer gets cleared and the same object will be reused in the next writeChunk call. Now after refactoring, the reusable byte buffer is gone and in every writeChunk call, small part of the index file gets memory mapped into a new MappedByteBuffer and and the chunk data is compressed to that mapped byte buffer which in turn automatically gets written into the index file.

    int maxCompressedSize = _chunkCompressor.maxCompressedSize(_chunkBuffer.limit());
    try (PinotDataBuffer compressedBuffer = PinotDataBuffer.mapFile(_file, false, _dataOffset,
        maxCompressedSize, ByteOrder.BIG_ENDIAN, "forward index chunk")) {
      ByteBuffer view = compressedBuffer.toDirectByteBuffer(0, maxCompressedSize);
      sizeWritten = _chunkCompressor.compress(_chunkBuffer, view);
    }

This may look better as it doesn’t need an extra byte buffer for compression, but since the size of the chunk is very small - 1000 * data type size (8 bytes for long) - memory mapping degrades the performance [1]. We experimented a bit with the segments of the problematic table and turned out that even with SSD it takes more than 30% time to build the segment. For HDD, it’s much worse and it takes more than 30x (one minute for using interim byte buffer vs more than 30 minutes for memory mapping).

[1] From Oracle documentation: For most operating systems, mapping a file into memory is more expensive than reading or writing a few tens of kilobytes of data via the usual read and write methods. From the standpoint of performance it is generally only worth mapping relatively large files into memory. https://docs.oracle.com/javase/7/docs/api/java/nio/channels/FileChannel.html#map(java.nio.channels.FileChannel.MapMode, long, long)

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:14 (14 by maintainers)

Top GitHub Comments

1reaction

richardstartincommented, Dec 20, 2021

@sajjad-moradi yes, but it requires field level config. I will open an issue to discuss making it default.

0reactions

sajjad-moradicommented, Jan 5, 2022

The PR is merged. Closing the issue.

Top Results From Across the Web

Changing one character wildly improved our application's ...

What's the biggest performance improvement you've gotten from changing a single character? We recently removed one character and tripled the ...

Segment Performance — Deepchecks 279ea72 documentation

Segment Performance #. Load data#. The dataset is the adult dataset which can be downloaded from the UCI machine learning repository. Dua, D....

Common Causes of Performance Issues | Pivotal Greenplum ...

Identifying Hardware and Segment Failures · Managing Workload · Avoiding Contention · Maintaining Database Statistics · Optimizing Data Distribution · Optimizing ...

About audience segments in Audience manager - Google Help

Creating segments. You can create your data segments based on the data gathered from your audience sources. Google Ads automatically generates some segments...

Building segments - Amazon Pinpoint - AWS Documentation

Creating a dynamic segment · Step 1: Build a new segment or import an existing segment · Step 2: Configure segment group 1...