question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Sampling rows using TABLESAMPLE raises a ParseException

See original GitHub issue

The SQL standard 2003 defined a TABLESAMPLE clause to execute the query only on a (random) subset of rows (supported e.g. in SQL Server and PostgreSQL). However, parsing a query which contains such a clause leads to either an incorrect JSON parse tree (if no table alias is given) or raises a ParseException (if an alias is given).

Consider the following example query: SELECT * FROM foo TABLESAMPLE bernoulli (20) WHERE a < 42

Parsing it via parse("SELECT * FROM foo TABLESAMPLE bernoulli (20) WHERE a < 42") mistakes the TABLESAMPLE for an alias:

{'select': '*',
 'from': {'value': 'foo', 'name': {'TABLESAMPLE': 'bernoulli'}},
 'where': {'lt': ['a', 42]}}

If the query is modified to use an alias:

parse("SELECT * FROM foo f TABLESAMPLE bernoulli (20) WHERE f.a < 42"),

parsing it raises a

ParseException: Expecting {union} | {intersect} | {except} | {minus} | {order by} | {offset} | {fetch} | {limit} | {union} | {intersect} | {except} | {minus} | {StringEnd}, found "TABLESAMPL" (at char 20), (line:1, col:21).

EDIT [22-06-09]: I corrected a copy/paste error in the example query. If the corrected version (without parenthesis around bernoulli and an added sampling percentage) is parsed, both versions (with and without table alias) raise a ParseException.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:8 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
klahnakoskicommented, Jun 11, 2022

This is done. Feel free to open another issue if you find another problem, or even if you have a question.

Thank you for your help on this issue.

0reactions
rbergmcommented, Jun 9, 2022

It seems like I made a copy/paste error in the initial example query. I edited the issue description to use the corrected version. I am terribly sorry for the confusion.

The progress is looking really good! I was already able to parse my test queries successfully!

Also, if I can help you out with anything regarding this issue just let me know. Although I am not really familiar with the tech stack, maybe there is still something left.

Read more comments on GitHub >

github_iconTop Results From Across the Web

SAMPLE / TABLESAMPLE - Snowflake Documentation
Returns a subset of rows sampled randomly from the specified table. The following sampling methods are supported: Sample a fraction of a table,...
Read more >
How is `sample` different from `TABLESAMPLE` in Spark?
TABLESAMPLE (num_rows ROWS) is not a simple random sample but instead is implemented using LIMIT. So the answer whether sample and TABLESAMPLE ......
Read more >
Sampling Queries - Spark 3.3.1 Documentation
The TABLESAMPLE statement is used to sample the table. It supports the following sampling methods: TABLESAMPLE (x ROWS ): Sample the table down...
Read more >
spark/PlanParserSuite.scala at master · apache/spark - GitHub
Parser test cases for rules defined in [[CatalystSqlParser]] / [[AstBuilder]]. ... Hive compatibility: Missing parameter raises ParseException.
Read more >
Got pyspark.sql.utils.ParseException error when read ... - Reddit
Hello I've create a Glue job that simply read data from DynamoDB that use AWS Glue DynamoDB export connector as source, do some...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found