Some comments...
First, The <?xml?> declaration is indeed optional, but it's always better to have it than to not have. I think we can all agree on this. Newer versions of the XML spec make it mandatory.
Secondly, character encoding used in the file must match what is declared in the <?xml?> (or implicitly declared by omission: if there is no encoding specified, then the encoding is being implicitly declared as UTF-8 or UTF-16). More info on encoding is here.
Remember "XML" is a well-defined thing. If you're going to claim to output "XML" then you have to follow the XML rules. Note that a "character" in XML is by definition a Unicode character.
In particular this statement worries me:
Shouldn't the encoding of the dump results match that of the database? If the database encoding is utf16, then the hard-coded utf8 is wrong.
This is a reasonable thing to do but only if it's consistent with the <?xml?> declaration. For example, it would be totally wrong to output an <?xml?> declaration encoding specified as UTF-8, yet encode the characters as UTF-16.
If you want to output UTF-16 that's fine but you must then include a byte-order mark (see reference above).
But mysql client only knows MariaDB name of the character set, there must be some mapping to find a corresponding XML encoding name
Yes - the encoding names must follow what is allowed for XML. What MySQL likes to call them is irrelevant.
So to summarize this:
- Including the <?xml?> declaration is a good thing and we should keep doing it
- As long as the <?xml?> declaration does not specify an encoding and there's no byte-order mark, then the file must be encoded in UTF-8
- Using a different encoding, e.g., same as the database itself, is reasonable, but if this is done, the <?xml?> declaration must explicitly specify that encoding using the proper "XML" name for it
Questions:
- Since currently there is no explicit encoding declared, can someone verify that MySQL currently does actually output XML using UTF-8 encoding, regardless of the encoding of the database?
- Are there any MySQL-supported encodings that can't 100% be mapped into UTF-8? If so then they would have to be output using a different encoding to preserve all the characters.
Hmm, well if it's possible for MySQL to output something that looks like XML but contains characters that are not encoded consistently with the <?xml?> declaration, then it's not outputting XML, it's outputting something else, in which case MySQL should be fixed to conform to the standard, or else stop calling the output "XML".
I'm not familiar with the internals, but from a coding point of view there may be a missing component. E.g., each MySQL encoding is going to need to know how to convert its character data to UTF-8 or UTF-16 (or perhaps throw an error if not possible). Then this layer would be applied before writing the XML output, etc.