[MCOL-3434] Certain window function queries crash mysqld Created: 2019-08-15  Updated: 2020-04-10  Resolved: 2019-12-10

Status: Closed
Project: MariaDB ColumnStore
Component/s: MariaDB Server
Affects Version/s: 1.2.5
Fix Version/s: 1.2.6

Type: Bug Priority: Critical
Reporter: David Hall (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 1
Labels: None

Issue Links:
PartOf
includes MDEV-19364 Server crashes on certain window func... Closed
Sprint: 2019-06

 Description   

In sql_select.cc optimize_stage2() line 2433, Columnstore has added:

  // @InfiniDB We don't need tmp table for vtable create phase. Plus
  // to build tmp table may corrupt some field table_name & db_name (for some reason)
  if (thd->infinidb_vtable.vtable_state == THD::INFINIDB_CREATE_VTABLE)
	need_tmp = false;

Removing this allows the query to run against InnoDB correcty. Investigating other things.



 Comments   
Comment by David Hall (Inactive) [ 2019-08-15 ]

MariaDB [test]> CREATE TABLE `t` ( `id` int(11) NOT NULL AUTO_INCREMENT, `cid` int(11) DEFAULT NULL, `d` datetime DEFAULT NULL, `c` decimal(12,4) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB;

Query OK, 0 rows affected (0,243 sec)

MariaDB [test]> insert into t(cid, d, c) values (1, '2018-01-01', 20000.05);

Query OK, 1 row affected (0,056 sec)

MariaDB [test]> insert into t(cid, d, c) select round(rand()*100), date_sub(now(), interval rand()*100 week), round(rand()*100000,2) from t;

Query OK, 1 row affected (0,051 sec)

Records: 1 Duplicates: 0 Warnings: 0

MariaDB [test]> insert into t(cid, d, c) select round(rand()*100), date_sub(now(), interval rand()*100 week), round(rand()*100000,2) from t;

Query OK, 2 rows affected (0,040 sec)

Records: 2 Duplicates: 0 Warnings: 0

...

\MariaDB [test]> insert into t(cid, d, c) select round(rand()*100), date_sub(now(), interval rand()*100 week), round(rand()*100000,2) from t;

Query OK, 32768 rows affected (0,794 sec)

Records: 32768 Duplicates: 0 Warnings: 0

MariaDB [test]> insert into t(cid, d, c) select round(rand()*100), date_sub(now(), interval rand()*100 week), round(rand()*100000,2) from t;

Query OK, 65536 rows affected (1,400 sec)

Records: 65536 Duplicates: 0 Warnings: 0

MariaDB [test]> select count from t;

----------

count

----------

131072

----------

1 row in set (0,070 sec)

MariaDB [test]> select distinct cid, week(d) as week, max(c) over (partition by cid, week(d)) as c_mx, sum(c) over (partition by cid, week(d)) as c_sum, count(c) over (partition by cid, week(d)) as c_cnt, max(week(d)) over (partition by year(d)) as w_max from t where year(d) = '2018';

ERROR 2013 (HY000): Lost connection to MySQL server during query

Comment by David Hall (Inactive) [ 2019-08-15 ]

From the support case, use the following to create a file to load data infile:
1|\N|option|outright|\N|300202|\N|\N|\N|2018-01-02 09:34:42|2018-01-02 09:34:42|\N|\N|S|0.8000|0.8000|-0.2176|\N|500|500|500|0.0000|0.0000|0|0|0|1|0|1|0|431028933|4310219|\N|\N|\N|\N|2018-01-02|\N|\N|\N|\N|2019-08-05 17:21:46|1|1|2018-01-02|SPX|index|37|SPXW180103P02650000|2650.0000|2018-01-03|P|Eur|100|11|\N|\N|\N|SPXW
2|\N|option|outright|\N|883338|\N|\N|\N|2018-01-02 09:41:02|2018-01-02 09:41:02|\N|\N|S|1.5500|1.5500|-0.0238|\N|500|500|350|0.0200|1000.0000|1|0|2|1|2|1|1|431027610|431021142|\N|\N|\N|\N|2018-01-16|\N|\N|\N|\N|2019-08-05 17:21:46|1|1|2018-01-02|SPX|index|37|SPX180119P02485000|2485.0000|2018-01-19|P|Eur|100|11|\N|\N|\N|SPX
3|\N|option|outright|\N|587724|\N|\N|\N|2018-01-02 11:29:08|2018-01-02 11:29:08|\N|\N|S|8.0000|8.0000|0.0385|\N|1000|1000|0|0.0000|0.0000|2|0|2|1|2|1|2|4310213280|43102106|\N|\N|\N|\N|2018-01-23|\N|\N|\N|\N|2019-08-05 17:21:46|1|1|2018-01-02|SPX|index|37|SPXW180126P02635000|2635.0000|2018-01-26|P|Eur|100|11|\N|\N|\N|SPXW

Comment by David Hall (Inactive) [ 2019-08-16 ]

Removing the noted lines breaks other queries. Houston, we have a problem. Specifically, working_tpch1_compareLogOnly/misc/MCOL-2267.sql

I found where we can ignore sorting Window function parameters for Columnstore runs. We handle it ourselves.

Comment by David Hall (Inactive) [ 2019-08-26 ]

Note: Putting the 131072 rows in as described in the original MCOL can take 10 minutes to run on InnoDB. Using fewer rows will still trigger the fault pre-patch.

Comment by David Hall (Inactive) [ 2019-08-26 ]

Be sure to merge Server PR #164 and regression test PR #130

Comment by David Hill (Inactive) [ 2019-09-24 ]

Question from a customer.. Any chance that the fix for this issue will also fix MCOL-645?

Comment by Daniel Lee (Inactive) [ 2019-12-10 ]

Build verified: 1.2.6-1

engine commit:
d4173ef

Reproduced the issue in 1.2.5-1

MariaDB [mytest]> select distinct cid, week(d) as week, max(c) over (partition by cid, week(d)) as c_mx, sum(c) over (partition by cid, week(d)) as c_sum, count(c) over (partition by cid, week(d)) as c_cnt, max(week(d)) over (partition by year(d)) as w_max from t where year(d) = '2018';
ERROR 2013 (HY000): Lost connection to MySQL server during query
MariaDB [mytest]> quit

Dec 10 15:12:56 localhost ProcessMonitor[13354]: 56.044173 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/mysqld State = ACTIVE PID = 13922
Dec 10 15:15:24 localhost ProcessMonitor[13354]: 24.437014 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/mysqld State = ACTIVE
Dec 10 15:15:24 localhost ProcessMonitor[13354]: 24.437090 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/mysqld State = ACTIVE PID = 16187

Verified that the test case is working correctly in 1.2.6-1

Generated at Thu Feb 08 02:42:42 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.