This is a follow-up to the discussion with Alexander Barkov.
UTF8MB4 charset can represent all known characters. However, other charsets may have so-called un-assigned characters: byte combinations that are not mapped to any particular unicode characters. (These holes are used e.g. for introducing new characters. For example, the EURO sign was initially not present in charsets but then newer versions of charsets have introduced it).
As for histogram collection: Histogram collection code will try to convert unassigned characters to UTF-8. This will fail, and the result will be not what we need. This needs to be fixed.
Conceptually, [VAR]BINARY data does not represent UTF-8 characters.
Technically, one can store it in UTF-8, as UTF-8 has a character for every (my_wc_t)NUM for each NUM in the [0x00, 0xFF] range.