Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-3442

NULL values are inserted into utf8 collate utf8_unicode_ci VARCHAR columns in batchinsert mode

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.2.3, 1.2.5
    • Fix Version/s: 22.08
    • Component/s: cpimport
    • Labels:
      None

      Description

      Consider the following test case:

      openxs@ao756:~$ mcsmysql -uroot test
      Welcome to the MariaDB monitor.  Commands end with ; or \g.
      Your MariaDB connection id is 18
      Server version: 10.3.16-MariaDB-log Columnstore 1.2.5-1
       
      Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
       
      Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
       
      MariaDB [test]> use mysql
      Database changed
      MariaDB [mysql]> drop database test;
      Query OK, 5 rows affected (12,910 sec)
       
      MariaDB [mysql]> create database test charset=utf8;
      Query OK, 1 row affected (0,000 sec)
       
      MariaDB [mysql]> use test
      Database changed
      MariaDB [test]> create table t_c2_inno(
          -> ADDR_ID int NOT NULL,
          -> POST_CD varchar (30) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL,
          -> ADDR1 varchar (300) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL,
          -> ADDR2 varchar (300) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL,
          -> ADDR3 varchar (300) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL,
          -> UPDT_DT varchar (14) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL
      ) engine=innodb;
          -> ) engine=innodb;
      Query OK, 0 rows affected (0,520 sec)
       
      MariaDB [test]> insert into t_c2_inno(ADDR_ID,POST_CD,ADDR1,ADDR2,ADDR3,UPDT_DT)
          -> values('1000000001','07282','서울특별시 영등포구 선유서로','50, LG전자강 서빌딩 4층 (문래동6가)','','20170717112135');
      Query OK, 1 row affected (0,043 sec)
       
      MariaDB [test]> create table t_c2_cs(
          -> ADDR_ID int NOT NULL,
          -> POST_CD varchar (30) DEFAULT NULL,
          -> ADDR1 varchar (300) DEFAULT NULL,
      ADDR2 varchar (300) DEFAULT NULL,
          -> ADDR2 varchar (300) DEFAULT NULL,
          -> ADDR3 varchar (300) DEFAULT NULL,
          -> UPDT_DT varchar (14) DEFAULT NULL)
          -> engine=columnstore DEFAULT CHARSET=utf8;
      Query OK, 0 rows affected (4,878 sec)
       
      MariaDB [test]> show create table t_c2_cs\G
      *************************** 1. row ***************************
             Table: t_c2_cs
      Create Table: CREATE TABLE `t_c2_cs` (
        `ADDR_ID` int(11) NOT NULL,
        `POST_CD` varchar(30) DEFAULT NULL,
        `ADDR1` varchar(300) DEFAULT NULL,
        `ADDR2` varchar(300) DEFAULT NULL,
        `ADDR3` varchar(300) DEFAULT NULL,
        `UPDT_DT` varchar(14) DEFAULT NULL
      ) ENGINE=Columnstore DEFAULT CHARSET=utf8
      1 row in set (0,001 sec)
       
      MariaDB [test]> insert into t_c2_cs (select ADDR_ID,POST_CD,ADDR1,ADDR2,ADDR3,UPDT_DT from t_c2_inno);
      Query OK, 1 row affected (2,440 sec)
      Records: 1  Duplicates: 0  Warnings: 0
       
      MariaDB [test]> select * from t_c2_cs;
      +------------+---------+-------------------------------------------+-----------------------------------------------+-------+----------------+
      | ADDR_ID    | POST_CD | ADDR1                                     | ADDR2                                         | ADDR3 | UPDT_DT        |
      +------------+---------+-------------------------------------------+-----------------------------------------------+-------+----------------+
      | 1000000001 | 07282   | 서울특별시 영등포구 선유서로              | 50, LG전자강서빌딩 4층 (문래동6가)            | NULL  | 20170717112135 |
      +------------+---------+-------------------------------------------+-----------------------------------------------+-------+----------------+
      1 row in set (0,417 sec)
       
      MariaDB [test]> show session variables like 'infinidb_use_import_for_batchinsert';
      +-------------------------------------+-------+
      | Variable_name                       | Value |
      +-------------------------------------+-------+
      | infinidb_use_import_for_batchinsert | ON    |
      +-------------------------------------+-------+
      1 row in set (0,002 sec)
      

      So, with default batnchinsert mode ON and default collation used for the database and Columnsotre table we can successfully INSERT data selected form the InnoDB table.

      Now, if we force to use utf8_unicode_ci collation for a new table, we get NULL values inserted without any warning:

      MariaDB [test]> alter database test default character set utf8 collate utf8_unicode_ci;
      Query OK, 1 row affected (0,001 sec)
       
      MariaDB [test]> create table t_c2_ncse( ADDR_ID int  NOT NULL, POST_CD varchar (30)    DEFAULT NULL, ADDR1 varchar (300)  DEFAULT NULL, ADDR2 varchar (300) DEFAULT NULL, ADDR3 varchar (300)  DEFAULT NULL, UPDT_DT varchar (14) DEFAULT NULL ) engine=Columnstore;
      Query OK, 0 rows affected (4,665 sec)
       
      MariaDB [test]> insert into t_c2_ncse (select ADDR_ID,POST_CD,ADDR1,ADDR2,ADDR3,UPDT_DT from t_c2_inno);
      Query OK, 1 row affected (2,577 sec)
      Records: 1  Duplicates: 0  Warnings: 0
       
      MariaDB [test]> select * from  t_c2_ncse;                                       +------------+---------+-------+-------+-------+---------+
      | ADDR_ID    | POST_CD | ADDR1 | ADDR2 | ADDR3 | UPDT_DT |
      +------------+---------+-------+-------+-------+---------+
      | 1000000001 | 07282   | NULL  | NULL  | NULL  | NULL    |
      +------------+---------+-------+-------+-------+---------+
      1 row in set (0,443 sec)
       
      MariaDB [test]> delete from t_c2_ncse;
      Query OK, 1 row affected (1,211 sec)
       
      MariaDB [test]> set session infinidb_use_import_for_batchinsert=OFF;
      Query OK, 0 rows affected (0,016 sec)
       
      MariaDB [test]> insert into t_c2_ncse (select ADDR_ID,POST_CD,ADDR1,ADDR2,ADDR3,UPDT_DT from t_c2_inno);
      Query OK, 1 row affected (1,869 sec)
      Records: 1  Duplicates: 0  Warnings: 0
       
      MariaDB [test]> select * from  t_c2_ncse;
      +------------+---------+-------------------------------------------+-----------------------------------------------+-------+----------------+
      | ADDR_ID    | POST_CD | ADDR1                                     | ADDR2                                         | ADDR3 | UPDT_DT        |
      +------------+---------+-------------------------------------------+-----------------------------------------------+-------+----------------+
      | 1000000001 | 07282   | 서울특별시 영등포구 선유서로              | 50, LG전자강서빌딩 4층 (문래동6가)            | NULL  | 20170717112135 |
      +------------+---------+-------------------------------------------+-----------------------------------------------+-------+----------------+
      1 row in set (1,057 sec)
       
      MariaDB [test]> show create table t_c2_ncse\G
      *************************** 1. row ***************************
             Table: t_c2_ncse
      Create Table: CREATE TABLE `t_c2_ncse` (
        `ADDR_ID` int(11) NOT NULL,
        `POST_CD` varchar(30) COLLATE utf8_unicode_ci DEFAULT NULL,
        `ADDR1` varchar(300) COLLATE utf8_unicode_ci DEFAULT NULL,
        `ADDR2` varchar(300) COLLATE utf8_unicode_ci DEFAULT NULL,
        `ADDR3` varchar(300) COLLATE utf8_unicode_ci DEFAULT NULL,
        `UPDT_DT` varchar(14) COLLATE utf8_unicode_ci DEFAULT NULL
      ) ENGINE=Columnstore DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
      1 row in set (0,000 sec)
      

      Note that when we switch batchinsert, we get data inserted properly. For me it means that there is a bug with utf8_unicode_ci collation handling somewhere in cpimport or batchinsert mode.

        Attachments

          Activity

            People

            Assignee:
            David.Hall David Hall
            Reporter:
            valerii Valerii Kravchuk
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated:

                Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.