question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Pulsar SQL: support user defined indexes

See original GitHub issue

Is your feature request related to a problem? Please describe. Currently, there is no index used to query topic using presto. __publish_time__ can be considered as index because of ledger storage way but it’s not a real one.

Describe the solution you’d like AvroSchema used to insert to topic should comes with a indexes definition. Since then, we should be able to have managedledger for indexes referencing classical managedledgers or messageid? And then configure pulsar presto impl to use user defined indexes from schema. (This is a suggestion to initialize the discussion, as @jerrypeng and I discussed it’s a large discussion to have).

Describe alternatives you’ve considered There are probably multiples ways to do it, feel free to suggest your pov.

Additional context Reduce the query runtime.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:12 (12 by maintainers)

github_iconTop GitHub Comments

2reactions
sijiecommented, May 15, 2020

I don’t think it is a good idea to add an index definition to the schema definition. The schema definition defines the structure of the original data. The index definition depends on the schema definition but it is different from the original data. So the index definition should be associated with the storage that is used for storing the index data. For example, if we are using another managed ledger for storing the index, then the index definition should be the schema definition of the managed ledger. Does that make sense?

1reaction
golden-yangcommented, Dec 18, 2021

Is there any progress on this issue? Being able to support indexes in Pulsar Sql will be a very meaningful feature.

One way is to support it natively, and the other way I think it can be achieved through tiered storage. For example, combined with the data lake, with the help of Apache Hudi and so on.

I saw some articles about the combination of hudi and pulsar, is there any progress? @sijie

Read more comments on GitHub >

github_iconTop Results From Across the Web

[GitHub] [pulsar] pointearth commented on issue #6930: Pulsar SQL ...
[GitHub] [pulsar] pointearth commented on issue #6930: Pulsar SQL: support user defined indexes · GitBox Thu, 18 Mar 2021 02:17:34 -0700.
Read more >
Pulsar configuration
Name Description Default exposePublisherStats Whether to enable topic level metrics. true statsUpdateFrequencyInSecs 60 statsUpdateInitialDelayInSecs 60
Read more >
Interactive querying of streams using Apache Pulsar - YouTube
Pulsar SQL is a query layer built on top of Apache Pulsar (a next-gen messaging platform), that enables users to dynamically query all ......
Read more >
pulsar-flink - Scaladex
We change our project version definition, the Flink & Pulsar supporting matrix ... By default, to use the Pulsar directory in the SQL...
Read more >
User-Defined Types in SQL
INSERT and UPDATE queries are only supported for non-cyclic type hierarchies. Presence of a cycle in a type hierarchy automatically disables the ability...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found