Author does not support having an emoji in it
See original GitHub issueIf i have a message that looks like this:
DD-MM-YYY hh-mm - Alfonso 🤓: Lorem impsum
The function get_message_author cannot interpret the emoji since the regex patterns do not take emojis into account. I’ve found an expression that matches any emoji (supposedly): https://www.regextester.com/106421 How I see this there are two options:
- Create more regex rules to incorporate the use of emojis in the author.
- Delete all emojis from the author except if it is made up only of emojis.
I think the better solution is to delete all emojis from the author, it is easier to do and yields better data since there is no need to take emojis into account. What do you think @joweich?
Issue Analytics
- State:
- Created 10 months ago
- Comments:6 (6 by maintainers)
Top Results From Across the Web
Server doesn't support emoji · Issue #8923 - GitHub
STR: Anywhere on addon-server e.g. the dev hub save a field and add emoji e.g. "this is before emoji 🔥 this is after...
Read more >Archived Author Help > Use of emoticons in a book - Goodreads
I think it depends on the book. Normally, I don't use emoticons, but I did in my first novel because it's mostly written...
Read more >Guidelines for Submitting Unicode® Emoji Proposals
The Submission needs to be complete and meet the criteria (that is, well-formed) for it to be reviewed. Submissions proposing to emojify existing...
Read more >Do emojis and accessibility work together? - TinyMCE
Yes it sounds weird, but despite your emoji-filled social posts and messages seeming to add more depth and breadth, those little pictures could ......
Read more >Emojis in Writer - English - Ask LibreOffice
You can insert emojis directly into the text using Writer in several ways. Remember they are just characters inserted from a font. Insert...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@alfonso46674 thank you for picking this up! With #35, I attempted to simplify the message parsing while increasing the robustness of the author detection. In a nutshell, we now only use a regex to detect if a line in the log starts with a date and therefore is a new message. As you mentioned, the split between timestamp is done via either
' - '
or'] '
(note the spaces). This approach seems to be stable for all messages formats we have seen yet. Author and message body are always divided be:
, so there’s no need to overcomplicate things. Please note that this comes with a slight decrease in performance.FYI, It seems that https://github.com/joweich/chat-miner/pull/35 fixed the emoji not being recognized in the author. By checking what is between
timestamp_author_sep
(could be-
or]
) and the character:
instead of using regular expressions to get the author, now emojis are correctly processed. I’ll add a couple of test lines to cover the problem. Should this issue be closed and a new one be opened as an enhancement for converting the emojis to Unicode values? Or should we stop here?