question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Performance problem in segment build

See original GitHub issue

We found out that during segment completion, segment build takes a long time. For one table it takes more than half an hour to build an immutable segment while it used to take around one minute to do so. After some investigation, it turned out that the root cause is in this PR: https://github.com/apache/pinot/pull/7595 More specifically the issue is with refactoring of BaseChunkSVForwardIndexWriter where we used to have one separate in-memory byte buffer to compress each chunk and then write its content to the index file:

      sizeToWrite = _chunkCompressor.compress(_chunkBuffer, _compressedBuffer);
      _dataFile.write(_compressedBuffer, _dataOffset);
      _compressedBuffer.clear();

After writing the chunk, the bytebuffer gets cleared and the same object will be reused in the next writeChunk call. Now after refactoring, the reusable byte buffer is gone and in every writeChunk call, small part of the index file gets memory mapped into a new MappedByteBuffer and and the chunk data is compressed to that mapped byte buffer which in turn automatically gets written into the index file.

    int maxCompressedSize = _chunkCompressor.maxCompressedSize(_chunkBuffer.limit());
    try (PinotDataBuffer compressedBuffer = PinotDataBuffer.mapFile(_file, false, _dataOffset,
        maxCompressedSize, ByteOrder.BIG_ENDIAN, "forward index chunk")) {
      ByteBuffer view = compressedBuffer.toDirectByteBuffer(0, maxCompressedSize);
      sizeWritten = _chunkCompressor.compress(_chunkBuffer, view);
    } 

This may look better as it doesn’t need an extra byte buffer for compression, but since the size of the chunk is very small - 1000 * data type size (8 bytes for long) - memory mapping degrades the performance [1]. We experimented a bit with the segments of the problematic table and turned out that even with SSD it takes more than 30% time to build the segment. For HDD, it’s much worse and it takes more than 30x (one minute for using interim byte buffer vs more than 30 minutes for memory mapping).

[1] From Oracle documentation: For most operating systems, mapping a file into memory is more expensive than reading or writing a few tens of kilobytes of data via the usual read and write methods. From the standpoint of performance it is generally only worth mapping relatively large files into memory. https://docs.oracle.com/javase/7/docs/api/java/nio/channels/FileChannel.html#map(java.nio.channels.FileChannel.MapMode, long, long)

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:14 (14 by maintainers)

github_iconTop GitHub Comments

1reaction
richardstartincommented, Dec 20, 2021

@sajjad-moradi yes, but it requires field level config. I will open an issue to discuss making it default.

0reactions
sajjad-moradicommented, Jan 5, 2022

The PR is merged. Closing the issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Changing one character wildly improved our application's ...
What's the biggest performance improvement you've gotten from changing a single character? We recently removed one character and tripled the ...
Read more >
Segment Performance — Deepchecks 279ea72 documentation
Segment Performance #. Load data#. The dataset is the adult dataset which can be downloaded from the UCI machine learning repository. Dua, D....
Read more >
Common Causes of Performance Issues | Pivotal Greenplum ...
Identifying Hardware and Segment Failures · Managing Workload · Avoiding Contention · Maintaining Database Statistics · Optimizing Data Distribution · Optimizing ...
Read more >
About audience segments in Audience manager - Google Help
Creating segments. You can create your data segments based on the data gathered from your audience sources. Google Ads automatically generates some segments...
Read more >
Building segments - Amazon Pinpoint - AWS Documentation
Creating a dynamic segment · Step 1: Build a new segment or import an existing segment · Step 2: Configure segment group 1...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found