Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Handling escape symbols

See original GitHub issue

From https://acss.io/#pseudo-classes:

Each line should parse as one word (i.e. one identifier):

C\(\#0280ae\)
C\(brandColor\)
C\(\#fff\)\:h:hover

This can be verified with https://rawgit.com/tabatkins/parse-css/master/example.html

This issue continues from https://github.com/shellscape/postcss-values-parser/issues/93

With thanks and credit to @nex3 and @ai for identifying this issue.

Update: I have started work on updating the Tokenizer, but I may need assistance as I integrate or abandon the current multichar tokens. I don’t necessarily see how those tokens benefit speed. Any risk of inaccuracy seems too steep a price to pay.

@ai, thank you for the wonderful documentation @ https://github.com/postcss/postcss/blob/7.0.27/docs/architecture.md#tokenizer--libtokenizees6-

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:17 (15 by maintainers)

Top GitHub Comments

1reaction

aicommented, May 15, 2020

Andrey, please let me know if I’m pestering with these updates or if I can make them more helpful.

A new performance breakthrough is awesome 😍.

Let’s change tokenizer in 8.0.

Is it possible to use it in the safe parser or SCSS parser without changes? I forked the current one and can fork a new one too if it will be impossible to customise it.

1reaction

jonathantnealcommented, May 15, 2020

Andrey, please let me know if I’m pestering with these updates or if I can make them more helpful.

While analyzing the “slower” parts of the tokenizer, it seems like eagerly checking the character ahead improves overall performance. I have rewritten the tokenizer to take advantage of this.

I have also added 2 fields to a token; they are the opening and closing distances between the meaningful value of a token and its delimiters. Although “delimiter” is a poor term, this refers to the split between things like the @ & media in a @media At-Identifier token, the 2 & em in a 2em Number token, or the " & hello & " in a "hello" String token.

Anyway, I think you’ll really like these results!

Compressing PostCSS Tokenizer...

PostCSS Tokenizer Development:       1910 B
PostCSS Tokenizer Development (min):  638 B
PostCSS Tokenizer Development (web):  639 B

Collecting PostCSS Tokenizer Benchmarks...

PostCSS Tokenizer Development:       58721 tokens in 8 ms (1.0 times faster)
PostCSS Tokenizer Development (min): 58721 tokens in 8 ms (1.0 times faster)
PostCSS Tokenizer 7.0.30:            49548 tokens in 8 ms


Compressing PostCSS Parser...

PostCSS Parser Development:       1369 B
PostCSS Parser Development (min):  836 B
PostCSS Parser Development (web):  805 B

Collecting PostCSS Parser Benchmarks...

PostCSS Experimental Parser:       56024 nodes in 10 ms (1.6 times faster)
PostCSS Parser 7.0.30:              6240 nodes in 15 ms
PostCSS + Selector + Value Parser: 28491 nodes in 86 ms (5.5 times slower)

— From https://github.com/csstools/tokenizer#collecting-postcss-parser-benchmarks

Top Results From Across the Web

Escape character - Wikipedia

In the telecommunications field, escape characters are used to indicate that the following characters are encoded differently. This is used to alter control ......

Escaping special characters - IBM

To search for a special character that has a special function in the query syntax, you must escape the special character by adding...

Escape Sequences | Microsoft Learn

Character combinations consisting of a backslash (\) followed by a letter or by a combination of digits are called "escape sequences.

HTML Escape Characters: Complete List of HTML Entities

Number Symbol Entity Name Code Description 9 Tab &Tab &#9 Tab 10 New Line &NewLine &#10 New Line 32 Space &nbsp &#32 Space

4 Special Characters in Queries

Use the backslash character to escape a single character or symbol. Only the character immediately following the backslash is escaped. In the following...