question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Adding batch operation APIs for PinotFS

See original GitHub issue

Currently PinotFS doesn’t support batch operation API extensions easily.

For some usage such as SegmentDeletionManager. once has to iterate over all files checking for deletion and then do the actual delete. This doesn’t seem to be an issue with localFS, but for many of the cloud FS, there’s more efficient batch APIs to reduce significantly on the remote request overheads.

Propose to

  1. add delete / copy / move (List<URI> segmentUris, ...) API to its single URI variant.
  2. make default implementation for them to fall back to looping each individually.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
walterddrcommented, Jan 25, 2022

for this issue it is more the multiple-round-trip per segment when we operate on the entire table.

Even if the deleteSegment does the deletion only without the move. on a large enough table it will still take 10s of minutes to complete the deletion since it requires pinot to issue delete(URI segmentUri) sequentially segment after segment instead of leveraging the underlying PinotFS impl, which might have a much more efficient way to batch operator on a list of segment URIs

1reaction
Jackie-Jiangcommented, Jan 25, 2022

@mcvsubbu Which config are you referring to? I find a config for the retention days before deleting the segment from the deleted segment dir, but didn’t find one to skip moving the segment to the deleted segment dir. In SegmentDeletionManager.removeSegmentFromStore(), segment is always moved.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Batch Ingestion - Apache Pinot Docs
Batch ingestion allows users to create a table using data already present ... Here we'll take a look at the standalone local processing...
Read more >
Adding batch or bulk endpoints to your REST API - Codementor
A comprehensive guide on what batch endpoints are, why they're useful, and how they can be added to existing REST APIs.
Read more >
Supporting bulk operations in REST APIs - mscharhag
Bulk (or batch) operations are used to perform an action on more than one resource in single request. In this post we will...
Read more >
Apache Pinot Daily Email Digest (2021-01-27)
@ken: I've run into a few bugs in Pinot caused by `PinotFS.listFiles()` implementations not returning the protocol with the path. So you get ......
Read more >
org.apache.pinot.plugin.ingestion.batch.spark ...
PinotFS ; import org.apache.pinot.spi.filesystem. ... JavaSparkContext; import org.apache.spark.api.java.function.VoidFunction; import org.slf4j.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found