Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34959

Add an OUTFILE option to dump binary data using HEX encoding

    XMLWordPrintable

Details

    Description

      SELECT INTO OUTFILE handles very badly a mixuture of multibyte character data and binary data. See MDEV-34883 as and example.

      With utf8mb4 being the default character set since MDEV-19123 (11.6.0) the problem becomes even more critical.

      LOAD DATA INFILE uses @@character_set_database, which is now typically utf8mb4, as the character set of the loaded file by default.

      In order for text and binary data to co-exists in a dump file, let's add a way to dump binary data using HEX representation.
      Let's also fix LOAD DATA INFILE to detect and decode a hex representation.

      Let's add a new option as follows:

      SELECT ... INTO OUTFILE 'file_name'
              [CHARACTER SET charset_name]
              [binary_encoding]
              [export_options]
              
      binary_encoding: BINARY ENCODED USING HEX
      

      If this option is specified then columns of the data types BINARY, VARBINARY, BLOB, Geometry (and its variants) will be dumped using HEX encoding.

      Columns with hex encoded will have the "\X" prefix followed by HEX data.

      Let's also change LOAD DATA INFILE to automatically decode HEX representation if a field data in the file starts with "\X".

      This does not conflict with previous versions because SELECT INTO OUTFILE cannot produce a sequence of "\X" in the very beginning of a field data.

      For example, this script:

      CREATE TABLE t1 (
        name VARCHAR(10),
        p point NOT NULL
      ) CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
      INSERT INTO t1 VALUES ('point1', GeomFromText('POINT(37.646944 -75.761111)'));
      SELECT * INTO OUTFILE '$MYSQLTEST_VARDIR/tmp/t1' BINARY ENCODED USING HEX FROM t1;
      

      will produce a file with this content:

      point1 \X000000000101000000F52B9D0FCFD242400E68E90AB6F052C0
      

      Let's also change this command:

      mysqldump --tab --hex-blob
      

      to add the BINARY ENCODED USING HEX clause into the generated SELECT INTO OUTFILE.

      The change should possibly go to as early version as possible (10.5?), so users can dump their data in a safe way.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              bar Alexander Barkov
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.