MySQL: Use of utf8_bin collation prevents use of emojis and normal Unicode characters
Fields that are SAUnicode
, SAUnicodeLarge
, SAUnicodeXL
, and SAText
have overrides for MySQL which set <size> COLLATE utf8_bin
. But utf8 is problematic, it's actually 3-bit UTF-8, not standard 4-bit UTF-8, which is utf8mb4
in MySQL. Using utf8 means it doesn't support emojis and other non-Western languages.
These fields should be switched to use the utf8mb4_bin
collation, which means they'll have a utf8mb4
charset. Using utf8mb4 is already the recommended config as of e6e0a10a.
Wikimedia's downstream task is https://phabricator.wikimedia.org/T282271, where someone had an emoji in their display name, causing the import to fail.
Edited by legoktm