Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-21228

Document mariadb-conv

    XMLWordPrintable

    Details

    • Type: Task
    • Status: In Progress (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Fix Version/s: 10.5
    • Component/s: OTHER
    • Labels:
      None

      Description

      Under terms of MDEV-17088, a new tool mariadb-conv was implemented.

      mariadb-conv is a character set conversion utility similar to iconv or uconv.

      As of 10.5.1, mariadb-conv understands the following options:

        -f, --from=name     Specifies the encoding of the input.
        -t, --to=name       Specifies the encoding of the output.
        -c, --continue      Silently ignore conversion errors.
        --delimiter=name    Treat the specified characters as delimiters.
      

      The options f and -t understand MariaDB character sets names (same names as in SQL clauses CHARACTER SET or command line options like -default-character-set).

      Examples:

      • This command converts the file file.latin1.txt from latin1 to utf8.

        mariadb-conv -f latin1 -t utf8 file.latin1.txt
        

      • This command converts the the file file.latin1.txt from latin1 to utf8, but now it reads the input data from stdin.

        mariadb-conv -f latin1 -t utf8 < file.latin1.txt
        

      • This command uses mariadb-conv in a pipe:

        echo test | ./mariadb-conv -f utf8 -t ucs2 >file.ucs2.txt
        

      By default, mariadb-conv exits whenever it encounters any conversion problems, e.g.:

      • the input byte sequence is not valid in the source character set
      • the character cannot be converted to the target character set

      The option -c makes mariadb-conv ignore such errors and use the question mark '?' to replace bytes in bad input sequences, or unconvertable characters.

      The option --delimiter=... makes mariadb-conv treat the specified characters as delimiters rather than the data to convert, so the input is treated as a combination of:

      • data chunks, which are converted according to the -f and -t options
      • delimiters, which are not converted and are copied from the input to the output as is.

      Using mariadb-conv to list MariaDB data directory

      As a side effect, mariadb-conv can be used to list MariaDB data directories in a human readable form.

      Suppose you have created the following tables:

      SET NAMES utf8;
      CREATE OR REPLACE TABLE t1 (a INT);
      CREATE OR REPLACE TABLE ß (a INT);
      CREATE OR REPLACE TABLE абв (a INT);
      CREATE OR REPLACE TABLE 桌子 (a INT);
      

      The above script makes the server create the following files in the MariaDB data directory:

      @1j.frm
      @1j.ibd
      @684c@5b50.frm
      @684c@5b50.ibd
      @g0@h0@i0.frm
      @g0@h0@i0.ibd
      t1.frm
      t1.ibd
      

      It's not precisely clear which file store which table, because MariaDB uses a special table-name-to-file-name encoding.

      This command on Linux (assuming an utf-8 console):

      ls | mariadb-conv -f filename -t utf8 --delimiter=".\n"
      

      can print the table list in a readable way:

      ß.frm
      ß.ibd
      桌子.frm
      桌子.ibd
      абв.frm
      абв.ibd
      t1.frm
      t1.ibd
      

      Note, the option --delimiter=".\n" is needed to make mariadb-conv treat the dot character (delimiting the encoded table name from the file extension) and the new line character (delimiting separate lines) as delimiters rather than as the data to convert (otherwise the conversion would fail).

      Windows users can use the following command to list the data directory in the ANSI text console:

      dir /b | mariadb-conv -c -f filename -t cp850 --delimiter=".\r\n"
      

      Notе:

      • the options -t assumes a Western machine
      • the option -c is needed to ignore conversion errors for the Cyrillic and CJK characters
      • --delimiter= additionally needs the carriage return character \r

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              greenman Ian Gilfillan
              Reporter:
              bar Alexander Barkov
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated: