UTF-8 support in PHP labels
See original GitHub issueHi.
The PHP manual says that any multi-byte UTF-8 character is valid for variable names and any other label, like function names.
So this is valid PHP code:
function AñadirAcción($acción)
{
$░▒▓█Coração_do_Æsir😝█▓▒░ = $acción;
}
That weird variable is unlikely 😃 but it’s a valid one, and spanish, portuguese, and other non-english speakers can use native language labels in PHP.
EnlighterJS uses a simple regex \$[A-Z_][\w]*
for (at least) variables which doesn’t include those characters, coloring wrong that code:
The PHP manual suggests a regex for a valid PHP label: [a-zA-Z_\x80-\xff][a-zA-Z0-9_\x80-\xff]
.
Thanks!
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (4 by maintainers)
Top Results From Across the Web
utf8_encode - Manual - PHP
This function converts the string string from the ISO-8859-1 encoding to UTF-8 . Note: This function does not attempt to guess the current...
Read more >A Guide to UTF-8 Encoding in PHP and MySQL - Toptal
UTF-8 is a variable-width encoding that can represent every character in the Unicode character set. It was designed for backward compatibility with ASCII...
Read more >How to set UTF-8 encoding for a PHP file - Stack Overflow
So is there a way to inform browser that this file is UTF-8 without using meta tags? PS. File is encoded in UTF-8...
Read more >PHP | utf8_encode() Function - GeeksforGeeks
The utf8_encode() function is an inbuilt function in PHP which is used to encode an ISO-8859-1 string to UTF-8. Unicode has been developed ......
Read more >PHP utf8_encode() Function - W3Schools
The utf8_encode() function encodes an ISO-8859-1 string to UTF-8. Unicode is a universal standard, and has been developed to describe all possible ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
keep in mind that regular expression works on characters, not on bytes - this requires a much more generic expression:
but generally this php behaviour is only a side effect of the parser and can be threatened as a kind of bug…
i’ve changed the functions+variable regex to the following expression to match utf8 chars:
Hi @drmad ,
it’s not a big issue to change this - most of the EnlighterJS languages doesn’t support unicode.