TL;DR: Use utf8mb4
instead of utf8
in MySQL/MariaDB to have proper unicode support
This is actually a bit of an older problem: MySQL’s utf8
isn’t UTF-8. utf8
in MySQL only supports three bytes per character, while the real UTF-8 needs up to four bytes. This results in some characters not being supported.
Proper UTF-8 support has been introduced as a workaround in 2010 (version 5.5.3[^1]) with a new character set called utf8mb4
.
Mathias Bynens explains in detail what the implications are. In a nutshell: You won’t be able to store all unicode characters, including the (very important) 💩-emoji. I’m not suggestion that you write your own web bot, that throws piles of poop around the web just to see, who used utf8
and has some data loss in the process.
[^1] I cannot find the actual changelog anymore, however this was confirmed by at least three independent articles (or at least they appear to be independent).
Migration suggestions
Remember to repair and optimize all databases after running any of those commands:
mysqlcheck -u root -p --auto-repair --optimize --all-databases
Switch database
Run this sequence for every database “database” you need to switch over
ALTER DATABASE database CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
Switch table
Convert a table db.table
ALTER TABLE `db.table` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Switch column of a table
Convert a single column (column
)
ALTER TABLE `db.table` MODIFY COLUMN column varchar(64) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;