Proper unicode in MySQL (Use utf8mb4 instead of utf8)

TL;DR: Use utf8mb4 instead of utf8 in MySQL/MariaDB to have proper unicode support


This is actually a bit of an older problem: MySQL’s utf8 isn’t UTF-8. utf8 in MySQL only supports three bytes per character, while the real UTF-8 needs up to four bytes. This results in some characters not being supported.

Proper UTF-8 support has been introduced as a workaround in 2010 (version 5.5.3[^1]) with a new character set called utf8mb4.

Mathias Bynens explains in detail what the implications are. In a nutshell: You won’t be able to store all unicode characters, including the (very important) 💩-emoji. I’m not suggestion that you write your own web bot, that throws piles of poop around the web just to see, who used utf8 and has some data loss in the process.

[^1] I cannot find the actual changelog anymore, however this was confirmed by at least three independent articles (or at least they appear to be independent).

Migration suggestions

Remember to repair and optimize all databases after running any of those commands:

mysqlcheck -u root -p --auto-repair --optimize --all-databases

Switch database

Run this sequence for every database “database” you need to switch over

ALTER DATABASE database CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;

Switch table

Convert a table db.table

ALTER TABLE `db.table` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Switch column of a table

Convert a single column (column)

ALTER TABLE `db.table` MODIFY COLUMN column varchar(64) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;