sphinxsearch and astral characters
See original GitHub issueWorkaround — solved using code from here:
var pool = mysql.createPool({
host: '10.0.3.77',
port: 9306,
connectionLimit: 10,
typeCast: function (field, next) {
if (field.type === 'STRING') {
return field.buffer().toString('utf-8');
}
return next();
}
})
I’m using mysql2 to connect to sphinx (that’s a search engine that works over mysql 4.1 protocol, although sql syntax differs quite a bit). I was unable to reproduce following issue with a standard mysql server so far.
When I send a text there and get it back, astral characters (U+10000 and up, represented as surrogate pairs) gets replaced with 4 U+FFFD each.
I assume this is a bug in node-mysql2
because node-mysql
works correctly in this exact case.
Source code:
//var mysql = require('mysql')
var mysql = require('mysql2')
var pool = mysql.createPool({
host: '10.0.3.77',
port: 9306,
connectionLimit: 10
})
pool.getConnection(function (err, connection) {
if (err) throw err
connection.query(`CALL SNIPPETS(('test 😹 αβγ'), 'forum_posts', 'whatever')`,
function (err, response) {
if (err) throw err
console.log(response)
}
)
})
Output with mysql
module:
[ RowDataPacket { snippet: 'test 😹 αβγ' } ]
Output with mysql2
module:
[ TextRow { snippet: 'test ���� αβγ' } ]
Here’s network traffic:
00000000 43 00 00 00 0a 32 2e 33 2e 32 2d 69 64 36 34 2d C....2.3 .2-id64-
00000010 62 65 74 61 20 28 3f 3f 3f 29 00 01 00 00 00 01 beta (?? ?)......
00000020 02 03 04 05 06 07 08 00 08 82 21 02 00 00 00 00 ........ ..!.....
00000030 00 00 00 00 00 00 00 00 00 00 01 02 03 04 05 06 ........ ........
00000040 07 08 09 0a 0b 0c 0d .......
00000000 23 00 00 01 cf f3 82 00 00 00 00 00 e0 00 00 00 #....... ........
00000010 00 00 e0 01 00 00 00 00 90 37 25 03 00 00 00 00 ........ .7%.....
00000020 98 36 25 03 00 00 00 .6%....
00000047 07 00 00 02 00 00 00 00 00 00 00 ........ ...
00000027 3f 00 00 00 03 43 41 4c 4c 20 53 4e 49 50 50 45 ?....CAL L SNIPPE
00000037 54 53 28 28 27 74 65 73 74 20 f0 9f 98 b9 20 ce TS(('tes t .... .
00000047 b1 ce b2 ce b3 27 29 2c 20 27 66 6f 72 75 6d 5f .....'), 'forum_
00000057 70 6f 73 74 73 27 2c 20 27 77 68 61 74 65 76 65 posts', 'whateve
00000067 72 27 29 r')
00000052 01 00 00 01 01 24 00 00 02 03 64 65 66 00 00 00 .....$.. ..def...
00000062 07 73 6e 69 70 70 65 74 07 73 6e 69 70 70 65 74 .snippet .snippet
00000072 0c 21 00 ff 00 00 00 fe 00 00 00 00 00 05 00 00 .!...... ........
00000082 03 fe 00 00 00 00 11 00 00 04 10 74 65 73 74 20 ........ ...test
00000092 f0 9f 98 b9 20 ce b1 ce b2 ce b3 05 00 00 05 fe .... ... ........
000000A2 00 00 00 00 ....
Edit: added workaround on the top of the post
Edit 2: opened a bugreport against sphinx - http://sphinxsearch.com/bugs/view.php?id=2607
Issue Analytics
- State:
- Created 7 years ago
- Comments:19 (12 by maintainers)
Top Results From Across the Web
blend_chars - Sphinx | Open Source Search Server
Blended characters are indexed both as separators and valid characters. ... Positions for tokens obtained by replacing blended characters with whitespace ...
Read more >System Properties Comparison Redis vs. Sphinx - DB-Engines
Detailed side-by-side view of Redis and Sphinx.
Read more >Sphinx search from text with special characters - Stack Overflow
Please help me on sphinx search with extended search mode - I need to find "fathers day" query string from "Today is fathers's...
Read more >Gnb - River Thames Conditions - Environment Agency - GOV.UK
Toon characters free download, Reisport handguards size chart, Volvo ocean race ... Ales on rails north creek, Astral cloud serpent buy, Ruger 204...
Read more >macbre/docker-sphinxsearch - GitHub
Docker image for Sphinx search engine. Contribute to macbre/docker-sphinxsearch development by creating an account on GitHub.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Done http://sphinxsearch.com/bugs/view.php?id=2607. But, to be honest, they fix public reports veeery sloooow.
It’s better to find the most simple workaround. Doing
.query("set character_set_results 'utf8mb4'")
after each connection is not cool. Option in createPool would be fine, if it helps.PS. now we use temporary kludge - encode astrals as entities 😃
@rlidwka I guess simple hackish (not very future proof) way to handle this for you might be this
this would force mysql2 to decode fields with encoding 33 as utf8
Do you know if sphinx server respects connection time encoding flags? What are results if you connect like this: