Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Can not analyze symbols with Chinese

See original GitHub issue

Can not analyze the Chinese function name, variable name, please add the analysis of Chinese function name and variable name support. Luajit can support gbk or utf8 Chinese function name and variable name.

example ` function 中文函数名(参数1,参数2) local 中文变量 = “Chinese variable name” end

Issue Analytics

State:
Created 6 years ago
Comments:9 (6 by maintainers)

Top GitHub Comments

1reaction

ghostcommented, May 20, 2018

@fstirlitz

That at least makes it simple to implement; I worried we might have to import the Unicode character database to check character properties or something.

I was able to convert a UnicodeData.txt into a lightweight 2MB string map of general categories recently. It’s something like '²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²¯ªªª¬ªªª¦§ª«ª¥ªªCCCCCCCCCCªª«««ªª ¦ª§D!!!!!!!!!!!!!!!!!!!!!!!!!!¦«§«²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²²¯ª¬¬¬¬®®®!¨«²®®«¤¤!®ª [more GCs]'.

package hydroper.unicode
{
	/**
	 * The UnicodeType static class. 
	 */
	public class UnicodeType
	{
		public static const LETTER_UPPERCASE: uint = ' '.charCodeAt(0);
		public static const LETTER_LOWERCASE: uint = '!'.charCodeAt(0);
		public static const LETTER_TITLECASE: uint = '#'.charCodeAt(0);
		public static const LETTER_MODIFIER: uint = '$'.charCodeAt(0);
		public static const LETTER_OTHER: uint = '%'.charCodeAt(0);

		public static const MARK_NON_SPACING: uint = 'A'.charCodeAt(0);
		public static const MARK_SPACING_COMBINING: uint = 'B'.charCodeAt(0);
		public static const MARK_ENCLOSING: uint = 0xA3;

		public static const NUMBER_DECIMAL_DIGIT: uint = 'C'.charCodeAt(0);
		public static const NUMBER_LETTER: uint = '&'.charCodeAt(0);
		public static const NUMBER_OTHER: uint = 0xA4;

		public static const PUNCTUATION_CONNECTOR: uint = 'D'.charCodeAt(0);
		public static const PUNCTUATION_DASH: uint = 0xA5;
		public static const PUNCTUATION_OPEN: uint = 0xA6;
		public static const PUNCTUATION_CLOSE: uint = 0xA7;
		public static const PUNCTUATION_INITIAL_QUOTE: uint = 0xA8;
		public static const PUNCTUATION_FINAL_QUOTE: uint = 0xA9;
		public static const PUNCTUATION_OTHER: uint = 0xAA;

		public static const SYMBOL_MATH: uint = 0xAB;
		public static const SYMBOL_CURRENCY: uint = 0xAC;
		public static const SYMBOL_MODIFIER: uint = 0xAD;
		public static const SYMBOL_OTHER: uint = 0xAE;

		public static const SEPARATOR_SPACE: uint = 0xAF;
		public static const SEPARATOR_LINE: uint = 0xB0;
		public static const SEPARATOR_PARAGRAPH: uint = 0xB1;

		public static const OTHER_CONTROL: uint = 0xB2;
		public static const OTHER_FORMAT: uint = 0xB3;
		public static const OTHER_SURROGATE: uint = 0xB4;
		public static const OTHER_PRIVATE_USE: uint = 0xB5;
		public static const OTHER_NOT_ASSIGNED: uint = 0xB6;

		// private static const data: String = /* ... 2MB ...  */;

		[Inline]
		public static function getType(cp: uint): uint
		{
			return data.charCodeAt(cp);
		}
	}
}

There’s also the UnicodeSet tool for that, but it outputs a pattern-like set instead, with range elements. Range checking is generally slower than indexing into a string literal map.

1reaction

fstirlitzcommented, Aug 4, 2017

Implemented in 71729404772a771e588a6c0ca2c70d3db7f9f254. Consider it unstable, however. I might still revisit the encoding issue.

Top Results From Across the Web

Analysis of Chinese Character Writing Errors by Secondary ...

The main reason for this writing error is that learners are not concerned with the details of Chinese characters, and they are not...

A Simple Explanation Of Chinese Characters

Ever wonder how Chinese characters work? Instead of being based on an alphabet, they are components of meanings that come together to form...

The Study on Chinese Character Acquisition Errors of Foreign ...

symbols to record Chinese because of the characteristics of Chinese phonetic structure. ... that, it can't meet the needs of Chinese character teaching....

Towards a Semiotics of Chinese Characters in - Brill

Consequently, Chinese characters should not be confined to the 'acoustic' and 'chronological' features of the signifier proposed by Saussure.

How to Read Chinese Characters: A Beginner's Guide

How to Read Chinese Characters: A Beginner's Guide · UNDERSTAND HOW CHARACTERS WORK · START WITH PICTOGRAPH CHARACTERS · LEARN RADICALS · COMBINED ......