Fixing query options
See original GitHub issueThis issue is created follow up #8880.
Background
Currently, Pinot SQL OPTION
keyword is a REGEX match that is allow ANYWHERE in the SQL string. This causes SQLi issues.
Backward-Compatibility Issue
#8880 proposed an alternative syntax of using OPTION similar to SQL setStatement (see: https://calcite.apache.org/docs/reference.html)
However, this creates a backward-incompatible change towards the Pinot SQL syntax.
- it hasn’t been design-reviewed by the community;
- it changes the syntax to be a “statement” instead of a “clause” (e.g. https://docs.microsoft.com/en-us/sql/t-sql/queries/option-clause-transact-sql?view=sql-server-ver16)
Thus we propose to hold off the syntax change until we come to a consensus on which syntax to use.
SQLi Resolution
- #8905 proposed a temporary solution to only allow
OPTION
keyword at end of SQL string. - also marked the REGEX syntax as deprecated (although we have no easy way to inform end-user in the query result)
Alternative Syntax Change
Note, for the syntax code segment: <
, >
is used to quote reserved keywords or reserved operators.
As STATEMENT
Pinot query now only allows a single STATEMENT. So there’s no difference setting OPTIONS as “clause” associated with the statement, or associates the OPTION with the entire SQL statement list context.
Potentially acceptable syntax listed below:
- Standard
SET statement
setStatement:
<SET> identifier <=> literal <;>
identifier:
simpleIdentifier | <">quoted.complex.identifier<">
literal:
stringLiteral | integerLiteral | doubleLiteral | booleanLiteral
PRO: standard SET statement is default supported in Calcite, also supported by major SQL engines:
- https://dev.mysql.com/doc/refman/8.0/en/set-statement.html,
- https://www.postgresql.org/docs/current/sql-set.html CON: standard SET statement only allows a single key-value, for multiple key-value options, multiple set statement is needed
- Special
OPTION
keyword
optionStatement:
<OPTION> <(> optionKVPair [<,> optionKVPair ]* <)> <;>
optionKVPair:
identifier <=> literal
identifier:
simpleIdentifier | <">quoted.complex.identifier<">
literal:
stringLiteral | integerLiteral | doubleLiteral | booleanLiteral
PRO: similar usage of current OPTION
clause.
CON: I couldn’t find a commonly used SQL system that supports this statement type, not easy for users understand the syntax.
AS CLAUSE
We can also extend Calcite’s syntax to support OPTION
clause in SELEC statement
the syntax will be similar to:
select:
<SELECT> [ <ALL> | <DISTINCT> ]
{ * | projectItem [<,> projectItem ]* }
<FROM> tableExpression
[ <WHERE> booleanExpression ]
[ <GROUP BY> { groupItem [<,> groupItem ]* } ]
[ <HAVING> booleanExpression ]
[ <OPTIONS> <(> optionKVPair [<,> optionKVPair ]* <)> ]
<;>
optionKVPair:
identifier <=> literal
identifier:
simpleIdentifier | <">quoted.complex.identifier<">
literal:
stringLiteral | integerLiteral | doubleLiteral | booleanLiteral
PRO: This is almost identical to current Pinot OPTION syntax CON: for each statement we need to extend and add OPTION syntax (e.g. INSERT, CREATE, DELETE, and other DML/DQL we add in the future); also I think requires us to alter Calcite’s core syntax parsing extension template.
Closing Thoughts
- Please comment/reply on this issue to share your understanding, and if any of the descriptions I posted is incorrect or imprecise.
- if there’s any alternative syntax not listed above. please kindly share in the comment as well. I will incorporate in the alternative syntax list.
Issue Analytics
- State:
- Created a year ago
- Reactions:7
- Comments:7 (7 by maintainers)
Top GitHub Comments
looks like
<SET> key = value
is the way to go. implementing it soonI’m also incline to
SET
, asSET
is more commonly adopted by other data platforms, which requires less learning curves for users to leverage.And I think what Jackie mentioned for
SET
in Presto is that Presto usesSET
to set runtime parameters, which is similar to what Postgres does.