question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

sqlglot seems to remove adjacent comments

See original GitHub issue

The following line,

Select * from Table /*comment 1*/
/*comment 2*/

becomes,

Select * from Table /*comment 2*/

after we run the parser,

expression_tree = parse_one(query)

This is an unexpected behaviour.

Issue Analytics

  • State:closed
  • Created 10 months ago
  • Comments:7 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
mpf82commented, Nov 21, 2022

Again, this is just my preference, but I’d say yes, for “normal” mode, side-by-side should be used, so SELECT * FROM table /*comment 1*/ /*comment 2*/ would look the same after parsing and printing.

For pretty mode, it’s probably easiest to also output the comments using /* … */ notation, that way you do not have to worry about multi-line strings inside comments, the difference to normal mode being, that each comment has a line-break prepended.

Let’s talk code.

Initial statement: SELECT * FROM tbl /*line1\nline2\nline3*/ /*another comment*/ where 1=1 -- comment at the end

Current output (10.0.8) (“another comment” being ignored/swallowed)

SELECT * FROM tbl /* line1
line2
line3 */ WHERE 1 = 1 /* comment at the end */

Suggestion normal mode:

SELECT * FROM tbl /* line1
line2
line3 */ /* another comment */ WHERE 1 = 1 /* comment at the end */

Suggestion pretty mode:

SELECT * FROM tbl
/* line1
line2
line3 */
/* another comment */
WHERE 1 = 1
/* comment at the end */

To extend on the idea, you could also remember if the original comment was a block or inline comment, however this would require either a Comment class where you store the inline/block attribute, or you could make an educated guess, based on whether or not the comment includes a \n.

I guess would be the best way, as (in pretty mode) it could print very much the comment as it originally was, but it’s the most work.

For normal mode, using always block comments in order is probably best, so

-- comment 1
-- comment 2
-- comment 3
SELECT * FROM foo

would print as

/* comment 1 */ /* comment 2 */ /* comment 3 */ SELECT * FROM foo

with the above mentioned changes, in pretty mode, the output could be the same as the input, as it would remember the inline attribute on the comment object.


By the way, it seems currently comments are not supported “everywhere”, e.g.

SELECT /* a */ * FROM /* b */ tbl WHERE /* c */ 1=2"

parses to

(SELECT expressions:
  (STAR ), from:
  (FROM expressions:
    (TABLE this:
      (IDENTIFIER this: tbl, quoted: False))), where:     
  (WHERE this:
    (EQ this:
      (LITERAL this: 1, is_string: False), expression:    
      (LITERAL this: 2, is_string: False))), comment:  a )

and prints as

/* a */ SELECT * FROM tbl WHERE 1 = 2

I didn’t have time to look at the code to figure out if there’s a reason for this behaviour, just wanted to mention it.

1reaction
GeorgeSittascommented, Nov 16, 2022

Hello SudarshanVS,

The logic is different if you place the comment on the same line as the previous token, as opposed to placing it on a new line. That’s why we get different outputs. However, you have a point, since the tokenizer is expected to not overwrite the previous token’s comment if any.

I’ll post a fix so that the two cases are consistent and we always keep the first comment.

Read more comments on GitHub >

github_iconTop Results From Across the Web

tobymao/sqlglot: Python SQL Parser and Transpiler - GitHub
SQLGlot is a no dependency Python SQL parser, transpiler, and optimizer. It can be used to format SQL or translate between different dialects...
Read more >
In SQL, remove adjacent duplicate rows and perform time ...
In Microsoft SQL Server 2012, I need to remove adjacent duplicate rows in the Flow column below, and just keep the first ones...
Read more >
Bad Habits to Kick : Using AS instead of = for column aliases
First, I want to discard item #3 entirely (which I've edited thanks to techvslife's comments). Why? Because using string literals as column ...
Read more >
SQL Server 2008: Microsoft has given ... - Redgate Software
2008 also seems to remove the DUMP and LOAD keywords. ... claims that SETUSER will be removed from 2008, but it still seems...
Read more >
DataDuel.co – Analytics-Adjacent Ideas & Musings
A typical stack would include (at least!) a tool to extract data from sources and ... Using the sqlglot library, dbt-duckdb would be...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found