Comment 6 for bug 925781

Revision history for this message
Brian Fraser (fraserbn) wrote :

Augh, I wrote a long reply here but then firefox crashed : /

While I've not been able to reproduce this, I've finally figured out what is causing it. The "Incorrect string value" error shows up if you try inserting non-BMP characters into a column with charset utf8, because MySQL's utf8 is stupid and broken (it only covers the BMP, text and varchar columns are truncated on insert at the first astral character, so it's not even a legal UTF-8 encoding). What I don't understand is why the loose- version works; Maybe it disables encoding errors? (I'm told that it shouldn't, but...)
The good news is that there's a couple of workarounds (I think. I've only tried the third):

Since you're on mysql +5.5.3, you can change the default-character-set to utf8mb4, which is the non-broken UTF-8 encoding. I don't think that all the toolkit deals with utf8mb4 yet (pt-archiver won't recognize it as a valid charset for the --file option) but I'll update it soon so that it will. That should be about it;

You could also alter percona.checksum so that the lower_boundary and upper_boundary columns are CHARSET utf8mb4.

Otherwise you could try the code in https://code.launchpad.net/~percona-toolkit-dev/percona-toolkit/possible-fix-925781-932327, which I just worked into dealing with this (and other UTF-8 errors). Do keep in mind that it will invalidate your previous percona.checksums table, since it uses a different serialization for the lower and upper boundary columns.