question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

UTF-8 support in PHP labels

See original GitHub issue

Hi.

The PHP manual says that any multi-byte UTF-8 character is valid for variable names and any other label, like function names.

So this is valid PHP code:

function AñadirAcción($acción) 
{ 
   $░▒▓█Coração_do_Æsir😝█▓▒░ = $acción;
}

That weird variable is unlikely 😃 but it’s a valid one, and spanish, portuguese, and other non-english speakers can use native language labels in PHP.

EnlighterJS uses a simple regex \$[A-Z_][\w]* for (at least) variables which doesn’t include those characters, coloring wrong that code:

imagen

The PHP manual suggests a regex for a valid PHP label: [a-zA-Z_\x80-\xff][a-zA-Z0-9_\x80-\xff].

Thanks!

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
AndiDittrichcommented, Jul 19, 2020

keep in mind that regular expression works on characters, not on bytes - this requires a much more generic expression:

image

but generally this php behaviour is only a side effect of the parser and can be threatened as a kind of bug…

i’ve changed the functions+variable regex to the following expression to match utf8 chars:

// variables
{
    regex: /\$[^\s=;()]+/gim,
    type: 'k7'
},

// global function calls
{
    regex: /\b([^\s(]+)\s*\(/gm,
    type: 'm0'
},
1reaction
AndiDittrichcommented, Jul 18, 2020

Hi @drmad ,

it’s not a big issue to change this - most of the EnlighterJS languages doesn’t support unicode.

Read more comments on GitHub >

github_iconTop Results From Across the Web

utf8_encode - Manual - PHP
This function converts the string string from the ISO-8859-1 encoding to UTF-8 . Note: This function does not attempt to guess the current...
Read more >
A Guide to UTF-8 Encoding in PHP and MySQL - Toptal
UTF-8 is a variable-width encoding that can represent every character in the Unicode character set. It was designed for backward compatibility with ASCII...
Read more >
How to set UTF-8 encoding for a PHP file - Stack Overflow
So is there a way to inform browser that this file is UTF-8 without using meta tags? PS. File is encoded in UTF-8...
Read more >
PHP | utf8_encode() Function - GeeksforGeeks
The utf8_encode() function is an inbuilt function in PHP which is used to encode an ISO-8859-1 string to UTF-8. Unicode has been developed ......
Read more >
PHP utf8_encode() Function - W3Schools
The utf8_encode() function encodes an ISO-8859-1 string to UTF-8. Unicode is a universal standard, and has been developed to describe all possible ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found