Adding batch operation APIs for PinotFS
See original GitHub issueCurrently PinotFS doesn’t support batch operation API extensions easily.
For some usage such as SegmentDeletionManager. once has to iterate over all files checking for deletion and then do the actual delete. This doesn’t seem to be an issue with localFS, but for many of the cloud FS, there’s more efficient batch APIs to reduce significantly on the remote request overheads.
Propose to
- add
delete
/copy
/move
(List<URI> segmentUris, ...)
API to its single URI variant. - make default implementation for them to fall back to looping each individually.
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (6 by maintainers)
Top Results From Across the Web
Batch Ingestion - Apache Pinot Docs
Batch ingestion allows users to create a table using data already present ... Here we'll take a look at the standalone local processing...
Read more >Adding batch or bulk endpoints to your REST API - Codementor
A comprehensive guide on what batch endpoints are, why they're useful, and how they can be added to existing REST APIs.
Read more >Supporting bulk operations in REST APIs - mscharhag
Bulk (or batch) operations are used to perform an action on more than one resource in single request. In this post we will...
Read more >Apache Pinot Daily Email Digest (2021-01-27)
@ken: I've run into a few bugs in Pinot caused by `PinotFS.listFiles()` implementations not returning the protocol with the path. So you get ......
Read more >org.apache.pinot.plugin.ingestion.batch.spark ...
PinotFS ; import org.apache.pinot.spi.filesystem. ... JavaSparkContext; import org.apache.spark.api.java.function.VoidFunction; import org.slf4j.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
for this issue it is more the multiple-round-trip per segment when we operate on the entire table.
Even if the deleteSegment does the deletion only without the move. on a large enough table it will still take 10s of minutes to complete the deletion since it requires pinot to issue
delete(URI segmentUri)
sequentially segment after segment instead of leveraging the underlying PinotFS impl, which might have a much more efficient way to batch operator on a list of segment URIs@mcvsubbu Which config are you referring to? I find a config for the retention days before deleting the segment from the deleted segment dir, but didn’t find one to skip moving the segment to the deleted segment dir. In
SegmentDeletionManager.removeSegmentFromStore()
, segment is always moved.