[MCOL-4222] Import from Table fails or writing empty values Created: 2020-07-30  Updated: 2022-05-06  Resolved: 2020-12-29

Status: Closed
Project: MariaDB ColumnStore
Component/s: ExeMgr
Affects Version/s: 1.0.0, 1.5.2
Fix Version/s: 5.5.1

Type: Bug Priority: Critical
Reporter: Christian2 Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 2
Labels: None
Environment:

CentOS 7


Issue Links:
Relates
relates to MDEV-24298 SELECT_LEX::find_select_handler() fai... Closed
relates to MDEV-23778 Derived table handler loses data on c... Closed

 Description   

Hello.

With our latest upgrade from CS 1.2 to Server 1.5 a typical select and insert between 2 tables is not working anymore.

– Test with derived sub: ERROR without limit in derived

insert into test_target_innodb (  C_, A_, B_, D_, E_)  select   C_,   A_,   B_,   D_  ,  E_  
FROM (select 'text' as C_,   A_,   B_,   D_  ,  E_   FROM test_base_columnstore    -- limit 5000
 )derived;

ERROR: 
CAL0055: ERROR: ExeMgr has caught an exception. InetStreamSocket::readToMagic(): I/O error2.1: err = -1 e = 104: Connection reset by peer

Additional:
After repeating the insert a couple of times, the insert works wrong with empty or NULL Values and will not force any Warning or ERROR, just writing garbage int the target table:

    select * from test_target_innodb order by rand();
    # row_id, C_, A_, B_, D_, E_
   '28904', '', NULL, NULL, NULL, NULL
   '12174', '', NULL, NULL, NULL, NULL
    '7356', '', NULL, NULL, NULL, NULL

With smaller result set ,e.g. with limit, it works). It works also when both tables are innodb tables. Hover, the full table insert is just 50K rows and a couple of columns is finally not really outstanding.

So far Errors coming up with default parameters from Upgrade. We've tried a lot of Parameter variations, like read_buffer_size, max_allowed_packet, etc., unfortunately with no success.

Alternative proceeding w/o derived table works fine with all tested larger result sets:

-- Test : works fine
insert into test_target_innodb (  C_, A_, B_, D_, E_)  select 'text' as C_,   A_,   B_,   D_  ,  E_  
FROM test_base_columnstore;

If source and target tables are both columnstore, we've got the same result. The log entries:

Jul 30 19:14:05 127 cpimport.bin[20874]: 05.085591 |0|0|0| I 34 CAL0086: Initiating BulkLoad: -e 0 -s #007 -E #021 -R /tmp/columnstore_tmp_files/BrmRpt063019140520842.rpt -m 1 -P pm1-20842 -T SYSTEM -u7a2f86ab-8251-4753-9860-b0da6541a4df uii5 test_target_innodb
Jul 30 19:14:05 127 cpimport.bin[20874]: 05.165076 |0|0|0| I 34 CAL0081: Start BulkLoad: JobId-5359; db-uii5
Jul 30 19:14:07 127 ExeMgr[15996]: 07.310363 |7|0|0| D 16 CAL0042: End SQL statement
Jul 30 19:14:07 127 ExeMgr[15996]: 07.310529 |7|0|0| C 16 CAL0055: ERROR: ExeMgr has caught an exception. InetStreamSocket::readToMagic(): I/O error2.1: err = -1 e = 104: Connection reset by peer
Jul 30 19:14:07 127 writeenginesplit[20842]: 07.310631 |0|0|0| I 33 CAL0000: Send EOD message to All PMs
Jul 30 19:14:07 127 writeengineserver[15990]: 07.315008 |0|0|0| D 32 CAL0000: 5671 : onReceiveEOD : child ID = 20874
Jul 30 19:14:07 127 writeengineserver[15990]: 07.315128 |0|0|0| D 32 CAL0000: 5671 : Message Queue is empty; Stopping CF Thread
Jul 30 19:14:07 127 cpimport.bin[20874]: 07.430849 |0|0|0| I 34 CAL0083: BulkLoad: JobId-5359; finished loading table uii5.test_target_innodb; 0 rows inserted
Jul 30 19:14:07 127 cpimport.bin[20874]: 07.436008 |0|0|0| I 34 CAL0082: End BulkLoad: JobId-5359; status-SUCCESS
Jul 30 19:14:07 127 writeengineserver[15990]: 07.451677 |0|0|0| I 32 CAL0000: 5671 : cpimport exit on success

This Test indicates that we don't have a problem with structure of the data or memory in general. Therefore it seems to be a bug with result streaming from columnstore or Parameter we didn't found.

Your help would be really appreciated.

Thanks a lot in advance.

-----------------------
Proceeding to reproduce the data for upon:

 CREATE TABLE `test_base_columnstore_stage` ( 
  `C_` varchar(100) DEFAULT NULL,
  `A_` varchar(100) DEFAULT NULL,
  `B_` varchar(100) DEFAULT NULL,
  `D_` varchar(100) DEFAULT NULL,
  `E_` varchar(100) DEFAULT NULL  
) ENGINE=innodb DEFAULT CHARSET=utf8 ;
 
 CREATE TABLE `test_base_columnstore` ( 
  `C_` varchar(100) DEFAULT NULL,
  `A_` varchar(100) DEFAULT NULL,
  `B_` varchar(100) DEFAULT NULL,
  `D_` varchar(100) DEFAULT NULL,
  `E_` varchar(100) DEFAULT NULL  
) ENGINE=columnstore DEFAULT CHARSET=utf8 ;
 
 
DELIMITER $$
CREATE PROCEDURE generate_demo()
BEGIN
  DECLARE i INT DEFAULT 0;
  WHILE i < 50000 DO
    INSERT INTO `test_base_columnstore_stage` (C_,A_,B_,D_,E_) VALUES (
      ROUND((RAND() * 100000),0), ROUND((RAND() * 100000),0), ROUND((RAND() * 100000),0), ROUND((RAND() * 100000),0),  
      1
    );
    SET i = i + 1;
  END WHILE;
  insert into test_base_columnstore select * from test_base_columnstore_stage;
END$$
DELIMITER ;
 
CALL generate_demo();



 Comments   
Comment by David Hall (Inactive) [ 2020-07-30 ]

It's refreshing to be given the DDL and data insertion method to properly attempt to reproduce the issue. Thanks so much!

Comment by Alexander Barkov [ 2020-09-21 ]

Reproducible with the following script

-- Creating tables
DROP TABLE IF EXISTS t1ib;
DROP TABLE IF EXISTS t1cs;
 
CREATE TABLE t1ib ( 
  `C_` varchar(100) DEFAULT NULL,
  `A_` varchar(100) DEFAULT NULL,
  `B_` varchar(100) DEFAULT NULL,
  `D_` varchar(100) DEFAULT NULL,
  `E_` varchar(100) DEFAULT NULL  
) ENGINE=innodb DEFAULT CHARSET=utf8 ;
 
 
CREATE TABLE t1cs ( 
  `C_` varchar(100) DEFAULT NULL,
  `A_` varchar(100) DEFAULT NULL,
  `B_` varchar(100) DEFAULT NULL,
  `D_` varchar(100) DEFAULT NULL,
  `E_` varchar(100) DEFAULT NULL  
) ENGINE=columnstore DEFAULT CHARSET=utf8 ;

-- Populating tables
DELIMITER $$
BEGIN NOT ATOMIC
  DECLARE i INT DEFAULT 0;
  TRUNCATE TABLE t1ib;
  TRUNCATE TABLE t1cs;
  START TRANSACTION;
  WHILE i < 50000 DO
    INSERT INTO t1ib (C_,A_,B_,D_,E_) VALUES (
             ROUND((RAND() * 100000),0),
             ROUND((RAND() * 100000),0),
             ROUND((RAND() * 100000),0),
             ROUND((RAND() * 100000),0),  
             1);
    SET i = i + 1;
  END WHILE;
  COMMIT;
END
$$
DELIMITER ;
 
INSERT INTO t1cs SELECT * FROM t1ib;

-- Make sure all records are there and C_ is never NULL
SELECT COUNT(*), SUM(t1ib.C_ IS NULL) FROM t1ib;
SELECT COUNT(*), SUM(t1cs.C_ IS NULL) FROM t1cs;

+----------+----------------------+
| COUNT(*) | SUM(t1ib.C_ IS NULL) |
+----------+----------------------+
|    50000 |                    0 |
+----------+----------------------+

+----------+----------------------+
| COUNT(*) | SUM(t1cs.C_ IS NULL) |
+----------+----------------------+
|    50000 |                    0 |
+----------+----------------------+

Tables were populated as expected.

-- Copy DERIVED t1cs to IB
DROP TABLE IF EXISTS t2;
CREATE TABLE t2 LIKE t1ib;
INSERT INTO t2 (C_, A_, B_, D_, E_)
  SELECT C_, A_, B_, D_, E_  FROM
    (SELECT 'text' as C_, A_, B_, D_ ,E_ FROM t1cs) derived;
SELECT COUNT(*), SUM(C_ IS NULL) FROM t2;
DROP TABLE t2;

+----------+-----------------+
| COUNT(*) | SUM(C_ IS NULL) |
+----------+-----------------+
|    50000 |               0 |
+----------+-----------------+

Looks good.

-- Copy DERIVED t1ib to CS
DROP TABLE IF EXISTS t2;
CREATE TABLE t2 LIKE t1cs;
INSERT INTO t2 (C_, A_, B_, D_, E_)
  SELECT C_, A_, B_, D_, E_  FROM
    (SELECT 'text' as C_, A_, B_, D_ ,E_ FROM t1ib) derived;
SELECT COUNT(*), SUM(C_ IS NULL) FROM t2;
DROP TABLE t2;

+----------+-----------------+
| COUNT(*) | SUM(C_ IS NULL) |
+----------+-----------------+
|    50000 |               0 |
+----------+-----------------+

Looks good.

-- Copy DERIVED t1cs to CS
DROP TABLE IF EXISTS t2;
CREATE TABLE t2 LIKE t1cs;
INSERT INTO t2 (C_, A_, B_, D_, E_)
  SELECT C_, A_, B_, D_, E_  FROM
    (SELECT 'text' as C_, A_, B_, D_ ,E_ FROM t1cs) derived;
SELECT COUNT(*), SUM(C_ IS NULL) FROM t2;
DROP TABLE t2;

+----------+-----------------+
| COUNT(*) | SUM(C_ IS NULL) |
+----------+-----------------+
|    50000 |           50000 |
+----------+-----------------+

Looks wrong. C_ is not expected to be NULL in any records.

-- Copy t1cs -> CS (not using derived table syntax)
DROP TABLE IF EXISTS t2;
CREATE TABLE t2 LIKE t1cs;
INSERT INTO t2 (C_, A_, B_, D_, E_)
  SELECT 'text' as C_, A_, B_, D_ ,E_ FROM t1cs;
SELECT COUNT(*), SUM(C_ IS NULL) FROM t2;
DROP TABLE t2;

+----------+-----------------+
| COUNT(*) | SUM(C_ IS NULL) |
+----------+-----------------+
|    50000 |               0 |
+----------+-----------------+

Looks OK.

Comment by Alexander Barkov [ 2020-09-21 ]

Interesting observation: with a fairly small LIMIT copying from DERIVED t1cs to CS works fine:

DROP TABLE IF EXISTS t2;
CREATE TABLE t2 LIKE t1cs;
INSERT INTO t2 (C_, A_, B_, D_, E_)
  SELECT C_, A_, B_, D_, E_  FROM
    (SELECT 'text' as C_, A_, B_, D_ ,E_ FROM t1cs LIMIT 10000) derived;
SELECT COUNT(*), SUM(C_ IS NULL) FROM t2;
DROP TABLE t2;

+----------+-----------------+
| COUNT(*) | SUM(C_ IS NULL) |
+----------+-----------------+
|    10000 |               0 |
+----------+-----------------+

However, when the LIMIT gets larger, it works badly again:

DROP TABLE IF EXISTS t2;
CREATE TABLE t2 LIKE t1cs;
INSERT INTO t2 (C_, A_, B_, D_, E_)
  SELECT C_, A_, B_, D_, E_  FROM
    (SELECT 'text' as C_, A_, B_, D_ ,E_ FROM t1cs LIMIT 15000) derived;
SELECT COUNT(*), SUM(C_ IS NULL) FROM t2;
DROP TABLE t2;

+----------+-----------------+
| COUNT(*) | SUM(C_ IS NULL) |
+----------+-----------------+
|    15000 |           15000 |
+----------+-----------------+

The exact threashold may vary.

Comment by Alexander Barkov [ 2020-09-21 ]

Also repeatable with this simplified INSERT..SELECT statement:

DROP TABLE IF EXISTS t2;
CREATE TABLE t2 LIKE t1cs;
INSERT INTO t2
  SELECT *  FROM (SELECT * FROM t1cs) derived;
SELECT COUNT(*), SUM(C_ IS NULL) FROM t2;
DROP TABLE t2;

+----------+-----------------+
| COUNT(*) | SUM(C_ IS NULL) |
+----------+-----------------+
|    50000 |           50000 |
+----------+-----------------+

Comment by Alexander Barkov [ 2020-09-21 ]

Reproducible with the following script, with a table having only one column, but with more rows.

# Create and populate tables
DROP TABLE IF EXISTS t1ib;
DROP TABLE IF EXISTS t1cs;
 
CREATE TABLE t1ib ( 
  `C_` varchar(100) DEFAULT NULL
) ENGINE=innodb DEFAULT CHARSET=utf8 ;
 
 
CREATE TABLE t1cs ( 
  `C_` varchar(100) DEFAULT NULL
) ENGINE=columnstore DEFAULT CHARSET=utf8 ;
 
 
DELIMITER $$
BEGIN NOT ATOMIC
  DECLARE i INT DEFAULT 0;
  TRUNCATE TABLE t1ib;
  TRUNCATE TABLE t1cs;
  START TRANSACTION;
  WHILE i < 70000 DO
    INSERT INTO t1ib VALUES (i);
    SET i = i + 1;
  END WHILE;
  COMMIT;
END
$$
DELIMITER ;
 
INSERT INTO t1cs SELECT * FROM t1ib;

-- Copy DERIVED t1cs to IB - works badly
DROP TABLE IF EXISTS t2;
CREATE TABLE t2 LIKE t1ib;
INSERT INTO t2 SELECT *  FROM (SELECT * FROM t1cs) derived;
SELECT COUNT(*), SUM(C_ IS NULL) FROM t2;
DROP TABLE t2;

+----------+-----------------+
| COUNT(*) | SUM(C_ IS NULL) |
+----------+-----------------+
|    70000 |           70000 |
+----------+-----------------+

Looks wrong.

-- Copy DERIVED t1ib to CS - works OK
DROP TABLE IF EXISTS t2;
CREATE TABLE t2 LIKE t1cs;
INSERT INTO t2 SELECT * FROM (SELECT * FROM t1ib) derived;
SELECT COUNT(*), SUM(C_ IS NULL) FROM t2;
DROP TABLE t2;

+----------+-----------------+
| COUNT(*) | SUM(C_ IS NULL) |
+----------+-----------------+
|    70000 |               0 |
+----------+-----------------+

Looks good.

-- Copy DERIVED t1cs to CS - works badly
DROP TABLE IF EXISTS t2;
CREATE TABLE t2 LIKE t1cs;
INSERT INTO t2 SELECT * FROM (SELECT * FROM t1cs) derived;
SELECT COUNT(*), SUM(C_ IS NULL) FROM t2;
DROP TABLE t2;

+----------+-----------------+
| COUNT(*) | SUM(C_ IS NULL) |
+----------+-----------------+
|    70000 |           70000 |
+----------+-----------------+

Looks wrong.

Comment by Alexander Barkov [ 2020-09-21 ]

Also reproducible with a NOT NULL column. Now the value in t2 gets assigned to the default value instead of NULL:

# Create and populate tables
DROP TABLE IF EXISTS t1ib;
DROP TABLE IF EXISTS t1cs;
 
CREATE TABLE t1ib ( 
  `C_` varchar(100) NOT NULL DEFAULT 123
) ENGINE=innodb DEFAULT CHARSET=utf8 ;
 
 
CREATE TABLE t1cs ( 
  `C_` varchar(100) NOT NULL DEFAULT 123
) ENGINE=columnstore DEFAULT CHARSET=utf8 ;
 
 
DELIMITER $$
BEGIN NOT ATOMIC
  DECLARE i INT DEFAULT 0;
  TRUNCATE TABLE t1ib;
  TRUNCATE TABLE t1cs;
  START TRANSACTION;
  WHILE i < 70000 DO
    INSERT INTO t1ib VALUES (i);
    SET i = i + 1;
  END WHILE;
  COMMIT;
END
$$
DELIMITER ;
 
INSERT INTO t1cs SELECT * FROM t1ib;

-- Copy DERIVED t1cs to IB - Works badly
DROP TABLE IF EXISTS t2;
CREATE TABLE t2 LIKE t1ib;
INSERT INTO t2 SELECT *  FROM (SELECT * FROM t1cs) derived;
SELECT * FROM t2 LIMIT 3;
DROP TABLE t2;

+-----+
| C_  |
+-----+
| 123 |
| 123 |
| 123 |
+-----+

Looks wrong.

-- Copy DERIVED t1ib to CS - works OK
DROP TABLE IF EXISTS t2;
CREATE TABLE t2 LIKE t1cs;
INSERT INTO t2 SELECT * FROM (SELECT * FROM t1ib) derived;
SELECT * FROM t2 LIMIT 3;
DROP TABLE t2;

+----+
| C_ |
+----+
| 0  |
| 1  |
| 2  |
+----+

-- Copy DERIVED t1cs to CS - works badly
DROP TABLE IF EXISTS t2;
CREATE TABLE t2 LIKE t1cs;
INSERT INTO t2 SELECT * FROM (SELECT * FROM t1cs) derived;
SELECT * FROM t2 LIMIT 3;
DROP TABLE t2;

+-----+
| C_  |
+-----+
| 123 |
| 123 |
| 123 |
+-----+

Comment by Alexander Barkov [ 2020-09-21 ]

The problem happens when the HEAP temporary table gets full and the execution switches to an Aria table.

A workaround is to disable derived table handlers:

SET columnstore_derived_handler=OFF;

After this SET command all scripts start to work as expected.

Comment by Alexander Barkov [ 2020-09-21 ]

The same problem is repeatable with the FederatedX storage engine using this MTR test:

--source have_federatedx.inc
--source include/federated.inc
 
connection default;
 
set global federated_pushdown=1;
 
connection slave;
 
CREATE TABLE federated.t1 (
  a varchar(100) NOT NULL default '123'
)
DEFAULT CHARSET=latin1;
 
CREATE TABLE federated.t2 LIKE federated.t1;
 
DELIMITER $$;
BEGIN NOT ATOMIC
  DECLARE i INT DEFAULT 0;
  START TRANSACTION;
  WHILE i < 70000 DO
    INSERT INTO federated.t1 VALUES (i);
    SET i = i + 1;
  END WHILE;
  COMMIT;
END
$$
 
DELIMITER ;$$
 
connection master;
 
--replace_result $SLAVE_MYPORT SLAVE_PORT
eval
CREATE TABLE federated.t1 (
  a varchar(100) NOT NULL default '123'
)
ENGINE="FEDERATED" DEFAULT CHARSET=latin1
CONNECTION='mysql://root@127.0.0.1:$SLAVE_MYPORT/federated/t1';
 
 
--replace_result $SLAVE_MYPORT SLAVE_PORT
eval
CREATE TABLE federated.t2 (
  a varchar(100) NOT NULL default '123'
)
ENGINE="FEDERATED" DEFAULT CHARSET=latin1
CONNECTION='mysql://root@127.0.0.1:$SLAVE_MYPORT/federated/t2';
 
--echo #
--echo # This correctly returns 70000
--echo #
 
SELECT COUNT(DISTINCT a) FROM federated.t1;
 
--echo #
--echo # This correctly returns 100
--echo #
 
INSERT INTO federated.t2 SELECT * FROM (SELECT * FROM federated.t1 LIMIT 100) derived;
SELECT COUNT(DISTINCT a) FROM federated.t2;
 
--echo #
--echo # This erroneously returns 1
--echo #
 
TRUNCATE TABLE federated.t2;
INSERT INTO federated.t2 SELECT * FROM (SELECT * FROM federated.t1 LIMIT 70000) derived;
SELECT COUNT(DISTINCT a) FROM federated.t2;
 
 
--echo #
--echo # This correctly returns 70000  
--echo #
 
SET global federated_pushdown=0;
TRUNCATE TABLE federated.t2;
INSERT INTO federated.t2 SELECT * FROM (SELECT * FROM federated.t1 LIMIT 70000) derived;
SELECT COUNT(DISTINCT a) FROM federated.t2;
 
source include/federated_cleanup.inc;

Comment by Alexander Barkov [ 2020-09-21 ]

The problem happens because create_internal_tmp_table_from_heap() is called with "start_recinfo" and "recinfo" pointing to some non-initialized memory (instead of the real table structure), so create_internal_tmp_table_from_heap() thinks the table has no data columns. Thus, instead of copying from HEAP Fields to Aria Fields, it puts records consisting of default values to Aria.

Thread 30 "mariadbd" hit Breakpoint 10, create_internal_tmp_table_from_heap (thd=0x7fff9800d268, table=0x7fff980cac40, 
    start_recinfo=0xa5a5a5a5a5a5a5a5, recinfo=0x7fff980e95b0, error=135, ignore_last_dupp_key_error=true, 
    is_duplicate=0x7fffdcaed83f) at /home/bar/maria-git/server.10.5.cs2/sql/sql_select.cc:19794
19794	  TABLE new_table;

The wrong table structure information seems to come from here:

void derived_handler::set_derived(TABLE_LIST *tbl)
{
  derived= tbl;
  table= tbl->table;
  unit= tbl->derived;
  select= unit->first_select();
  tmp_table_param= select->next_select() ?
                   ((select_unit *)(unit->result))->get_tmp_table_param() :
                   &select->join->tmp_table_param;
}

tmp_table_param gets assigned to &select->join->tmp_table_param, which is not properly initialized, it seems.

Comment by Alexander Barkov [ 2020-09-21 ]

Filed a server issue for this: MDEV-23778

Waiting for Igor to fix it. Then will add ColumnStore specific regression tests.

Comment by Roman [ 2020-11-27 ]

Despite the fact that bar found a legitimate bug in Derived Handler processing path the real issue happens earlier when Select Handler fails to find a table handler capable of Select Handler processing. Here is the Jira desribes why MDB skips Select Handler for the query.

Comment by Roman [ 2020-12-02 ]

The issue in MDB had been solved so one should test the case.

Comment by Gregory Dorman (Inactive) [ 2020-12-29 ]

Test in what release? 10.5 latest GA is fine?

Comment by Daniel Lee (Inactive) [ 2020-12-29 ]

Build verified: 5.5.1-1

The fixed version was not previously populated and the ticket was not tested prior to the 5.5.1-1 release. I just verified the test case worked in 5.5.1.

Generated at Thu Feb 08 02:48:43 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.