SegmentNameGenerator: Extend interface to accept input file name
See original GitHub issueFollowing Slack discussion: https://apache-pinot.slack.com/archives/C011C9JHN7R/p1624443889243500
Use case: Some tables in Pinot are used in conjunction with IdSet filtering or Lookups - and in some cases don’t have a time column.
- Hence, existing segment name generation strategies (i.e. time-based and fixed) do not allow for simple segment replacements whenever data in those “dimension” tables change (say, data for
ID<X>
changed)
Proposition Allow segment name generation to be based on the input file names such that segments can be named following a user provided id (in the file names).
E.g.
basedir/id1/file.parquet
basedir/id2/file.parquet
Would generate segments
<table_name>_id1.segment
<table_name>_id2.segment
Currently, the SegmentNameGenerator interface doesn’t allow input file names, therefore it is not possible to implement a strategy similar to the one presented above.
Note: If you known any other alternative to reach the use case goal, please feel free to provide ideas !
Issue Analytics
- State:
- Created 2 years ago
- Comments:16 (12 by maintainers)
Top Results From Across the Web
Ingestion Job Spec - Apache Pinot Docs
Segment Name Generator Spec ; append.uuid.to.segment.name. If the input data doesn't contain a time column, set this to true to generate unique segment...
Read more ><input type="file"> - HTML: HyperText Markup Language | MDN
accept. The accept attribute value is a string that defines the file types the file input should accept. This string is a comma-separated...
Read more >Dynamic Input appends file extension to file name and errors
Solved: I have a directory input feeding a dynamic input tool. The directory contains only the xml files for processing. Dynamic Input tool ......
Read more >Operation Path Naming - API Platform
Defining the Operation Segment Name Generator ... Transforms a given string to a valid path name which can be pluralized (eg. for collections)....
Read more >cmd - How to make the user input the file's name and extension?
I managed to get the working-directory to be variable (code below), but only where the bat file is (so for this to work...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @MrNeocore - yes, the scope of this issue is mapping
n
input files ton
segments. I was pointing out that there’s one other place in the code where segments are created (SegmentProcessorFramework
), but that code is (a) out of scope, and (b) doesn’t currently use theSegmentNameGenerator
support in any case, though there’s a TODO comment in the code about that.How about being able to define things via:
or, for our use case:
We could also support formatting of the input dates via say
${minTimeValue:yyyy-MM}
, but maybe that’s a bridge too far.