[MDEV-12788] Server hangs or Assertion `0' fails in Protocol::end_statement on SELECT with UNION Created: 2017-05-12  Updated: 2017-05-19  Resolved: 2017-05-19

Status: Closed
Project: MariaDB Server
Component/s: Scripts & Clients, Server
Affects Version/s: 10.2
Fix Version/s: 10.2.6

Type: Bug Priority: Critical
Reporter: Elena Stepanova Assignee: Igor Babaev
Resolution: Fixed Votes: 0
Labels: None
Environment:

Linux ws 3.16.0-4-amd64 #1 SMP Debian 3.16.39-1+deb8u2 (2017-03-07) x86_64 GNU/Linux ; Debian GNU/Linux 8.7 (jessie)


Issue Links:
Relates
relates to MDEV-12084 Test timeouts on CentOS - interaction... Open

 Description   

Test case

DROP TABLE IF EXISTS t1, t2;
CREATE TABLE t1 (i INT) ENGINE=MyISAM;
INSERT INTO t1 VALUES (1);
 
CREATE TABLE t2 (pk INT PRIMARY KEY) ENGINE=MyISAM;
INSERT INTO t2 VALUES (1),(2);
 
# This is over-simplified query where GROUP BY and aggregate were removed as unnecessary
 
SELECT 1
UNION ALL
SELECT i FROM (
  SELECT * FROM t1 WHERE i IN ( SELECT pk FROM t2 )
) AS sq HAVING i = 10
;
 
# ... but the problem is also reproducible with proper GROUP BY
 
SELECT 1, 2
UNION ALL
SELECT i, COUNT(*) FROM (
  SELECT * FROM t1 WHERE i IN ( SELECT pk FROM t2 )
) AS sq
GROUP BY i
HAVING i = 10
;

On a debug server, either of the SELECTs above causes assertion failure:

10.2 993323995648526781e5d22afbc77e260529a8b0

mysqld: /data/src/10.2/sql/protocol.cc:587: void Protocol::end_statement(): Assertion `0' failed.
170512 13:07:21 [ERROR] mysqld got signal 6 ;
 
#7  0x00007fc6d05ee312 in __GI___assert_fail (assertion=0x7fc6d36e2fb8 "0", file=0x7fc6d36e2d58 "/data/src/10.2/sql/protocol.cc", line=587, function=0x7fc6d36e38a0 <Protocol::end_statement()::__PRETTY_FUNCTION__> "void Protocol::end_statement()") at assert.c:101
#8  0x00007fc6d2bd1c1b in Protocol::end_statement (this=0x7fc6b8001098) at /data/src/10.2/sql/protocol.cc:587
#9  0x00007fc6d2c974f4 in dispatch_command (command=COM_QUERY, thd=0x7fc6b8000b00, packet=0x7fc6b80087a1 "", packet_length=107, is_com_multi=false, is_next_command=false) at /data/src/10.2/sql/sql_parse.cc:2362
#10 0x00007fc6d2c943c8 in do_command (thd=0x7fc6b8000b00) at /data/src/10.2/sql/sql_parse.cc:1362
#11 0x00007fc6d2ddf115 in do_handle_one_connection (connect=0x7fc6d5e86360) at /data/src/10.2/sql/sql_connect.cc:1354
#12 0x00007fc6d2ddeea2 in handle_one_connection (arg=0x7fc6d5e86360) at /data/src/10.2/sql/sql_connect.cc:1260
#13 0x00007fc6d31f8cac in pfs_spawn_thread (arg=0x7fc6d5e60c10) at /data/src/10.2/storage/perfschema/pfs.cc:1862
#14 0x00007fc6d22b8064 in start_thread (arg=0x7fc6caab3700) at pthread_create.c:309
#15 0x00007fc6d06a862d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

On a non-debug server, the query seemingly hangs. It is reproducible both with MySQL client and mysqltest. With the client, for example, it looks like this:

Connection where the query is executed -- never returns

MariaDB [test]> SELECT 1
    -> UNION ALL
    -> SELECT i FROM (
    ->   SELECT * FROM t1 WHERE i IN ( SELECT pk FROM t2 )
    -> ) AS sq HAVING i = 10
    -> ;

But in the processlist the query looks finished (thread 15)

MariaDB [(none)]> show processlist;
+----+-------------+-----------------+------+---------+------+-------------------------+------------------+----------+
| Id | User        | Host            | db   | Command | Time | State                   | Info             | Progress |
+----+-------------+-----------------+------+---------+------+-------------------------+------------------+----------+
|  1 | system user |                 | NULL | Daemon  | NULL |                         | NULL             |    0.000 |
|  2 | system user |                 | NULL | Daemon  | NULL |                         | NULL             |    0.000 |
|  3 | system user |                 | NULL | Daemon  | NULL |                         | NULL             |    0.000 |
|  4 | system user |                 | NULL | Daemon  | NULL |                         | NULL             |    0.000 |
|  5 | system user |                 | NULL | Daemon  | NULL | InnoDB shutdown handler | NULL             |    0.000 |
| 15 | root        | localhost:49808 | test | Sleep   |    6 |                         | NULL             |    0.000 |
| 16 | root        | localhost:49809 | NULL | Query   |    0 | init                    | show processlist |    0.000 |
+----+-------------+-----------------+------+---------+------+-------------------------+------------------+----------+
7 rows in set (0.00 sec)

Stack trace from the client (does not seem to change)

Thread 1 (Thread 0x7ffbb3a12740 (LWP 27992)):
#0  0x00007ffbb16c5ad0 in __poll_nocancel () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007ffbb3538e69 in poll (__timeout=-1, __nfds=1, __fds=0x7ffe360957e0) at /usr/include/x86_64-linux-gnu/bits/poll2.h:46
#2  pvio_socket_wait_io_or_timeout (pvio=pvio@entry=0x7ffbb3c9de80, is_read=is_read@entry=1 '\001', timeout=timeout@entry=-1) at /data/src/10.2/libmariadb/plugins/pvio/pvio_socket.c:499
#3  0x00007ffbb353926a in pvio_socket_read (pvio=0x7ffbb3c9de80, buffer=0x7ffbb3c9def0 "\001", length=16384) at /data/src/10.2/libmariadb/plugins/pvio/pvio_socket.c:300
#4  0x00007ffbb3527d97 in ma_pvio_read (pvio=pvio@entry=0x7ffbb3c9de80, buffer=0x7ffbb3c9def0 "\001", length=length@entry=16384) at /data/src/10.2/libmariadb/libmariadb/ma_pvio.c:251
#5  0x00007ffbb3527ec9 in ma_pvio_cache_read (pvio=0x7ffbb3c9de80, buffer=buffer@entry=0x7ffbb3ca22e0 "", length=length@entry=4) at /data/src/10.2/libmariadb/libmariadb/ma_pvio.c:293
#6  0x00007ffbb353c5b7 in ma_real_read (net=0x7ffbb3ac9440 <mysql>, complen=complen@entry=0x7ffe36095920) at /data/src/10.2/libmariadb/libmariadb/ma_net.c:376
#7  0x00007ffbb353cfed in ma_net_read (net=net@entry=0x7ffbb3ac9440 <mysql>) at /data/src/10.2/libmariadb/libmariadb/ma_net.c:430
#8  0x00007ffbb352452c in ma_net_safe_read (mysql=mysql@entry=0x7ffbb3ac9440 <mysql>) at /data/src/10.2/libmariadb/libmariadb/mariadb_lib.c:188
#9  0x00007ffbb3524c87 in mthd_my_read_rows (mysql=0x7ffbb3ac9440 <mysql>, mysql_fields=0x7ffbb3d381e8, fields=1) at /data/src/10.2/libmariadb/libmariadb/mariadb_lib.c:888
#10 0x00007ffbb3521000 in mysql_store_result (mysql=0x7ffbb3ac9440 <mysql>) at /data/src/10.2/libmariadb/libmariadb/mariadb_lib.c:2189
#11 0x00007ffbb3519190 in mysql_store_result_for_lazy (result=0x7ffe36095b28) at /data/src/10.2/client/mysql.cc:3002
#12 0x00007ffbb351bfc0 in com_go (buffer=<optimized out>, line=<optimized out>) at /data/src/10.2/client/mysql.cc:3283
#13 0x00007ffbb351d0bb in add_line (buffer=..., truncated=false, ml_comment=<synthetic pointer>, in_string=<synthetic pointer>, line_length=1, line=0x7ffbb3d8f6c0 ";") at /data/src/10.2/client/mysql.cc:2422
#14 read_and_execute (interactive=<optimized out>) at /data/src/10.2/client/mysql.cc:2124
#15 0x00007ffbb3515a39 in main (argc=4, argv=0x7ffbb3c9b4a0) at /data/src/10.2/client/mysql.cc:1291

Stack trace from the server, does not seem to change either

Thread 2 (Thread 0x7f903d7e5700 (LWP 27979)):
#0  0x00007f903b928aed in poll () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f903e5291ed in poll (__timeout=28800000, __nfds=1, __fds=0x7f903d7e47f0) at /usr/include/x86_64-linux-gnu/bits/poll2.h:46
#2  vio_io_wait (vio=vio@entry=0x7f90408c0d38, event=event@entry=VIO_IO_EVENT_READ, timeout=28800000) at /data/src/10.2/vio/viosocket.c:945
#3  0x00007f903e529278 in vio_socket_io_wait (vio=vio@entry=0x7f90408c0d38, event=event@entry=VIO_IO_EVENT_READ) at /data/src/10.2/vio/viosocket.c:108
#4  0x00007f903e529371 in vio_read (vio=0x7f90408c0d38, buf=0x7f8fcc006be8 "\001", size=4) at /data/src/10.2/vio/viosocket.c:184
#5  0x00007f903ddb806e in my_real_read (net=0x7f8fcc000be8, complen=complen@entry=0x7f903d7e49c8, header=<optimized out>) at /data/src/10.2/sql/net_serv.cc:888
#6  0x00007f903ddb8dd9 in my_net_read_packet_reallen (net=0x7f8fcc000be8, read_from_server=<optimized out>, reallen=reallen@entry=0x7f903d7e4a18) at /data/src/10.2/sql/net_serv.cc:1158
#7  0x00007f903ddb8f0b in my_net_read_packet (net=<optimized out>, read_from_server=<optimized out>) at /data/src/10.2/sql/net_serv.cc:1142
#8  0x00007f903de46032 in do_command (thd=0x7f8fcc0009a8) at /data/src/10.2/sql/sql_parse.cc:1244
#9  0x00007f903df09904 in do_handle_one_connection (connect=connect@entry=0x7f9040072b88) at /data/src/10.2/sql/sql_connect.cc:1354
#10 0x00007f903df09aa4 in handle_one_connection (arg=0x7f9040072b88) at /data/src/10.2/sql/sql_connect.cc:1260
#11 0x00007f903d541064 in start_thread (arg=0x7f903d7e5700) at pthread_create.c:309
#12 0x00007f903b93162d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
 
Thread 1 (Thread 0x7f903ebd4740 (LWP 27946)):
#0  0x00007f903b928aed in poll () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f903ddaf7b1 in poll (__timeout=-1, __nfds=2, __fds=0x7fff1264b450) at /usr/include/x86_64-linux-gnu/bits/poll2.h:41
#2  handle_connections_sockets () at /data/src/10.2/sql/mysqld.cc:6568
#3  0x00007f903ddb6eba in mysqld_main (argc=11, argv=0x7f9040029180) at /data/src/10.2/sql/mysqld.cc:6034
#4  0x00007f903b86ab45 in __libc_start_main (main=0x7f903dd96a80 <main(int, char**)>, argc=11, argv=0x7fff1264bb78, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fff1264bb68) at libc-start.c:287
#5  0x00007f903ddaad18 in _start ()

Top doesn't show any significant activity either in client or in server.

Neither assertion failure nor the hang are reproducible on 10.1 server, but reproducible if I connect to 10.2 server using 10.1 client.



 Comments   
Comment by Sergei Golubchik [ 2017-05-15 ]

This is caused by the UNION ALL optimization (skipping temp table, class select_union_direct). The execution plan is

+------+--------------------+-------+-----------------+---------------+---------+---------+------+------+------------------------------------------------------+
| id   | select_type        | table | type            | possible_keys | key     | key_len | ref  | rows | Extra                                                |
+------+--------------------+-------+-----------------+---------------+---------+---------+------+------+------------------------------------------------------+
|    1 | PRIMARY            | NULL  | NULL            | NULL          | NULL    | NULL    | NULL | NULL | No tables used                                       |
|    2 | UNION              | NULL  | NULL            | NULL          | NULL    | NULL    | NULL | NULL | Impossible HAVING noticed after reading const tables |
|    4 | DEPENDENT SUBQUERY | t2    | unique_subquery | PRIMARY       | PRIMARY | 4       | func |    1 | Using index                                          |
+------+--------------------+-------+-----------------+---------------+---------+---------+------+------+------------------------------------------------------+

Now, select_union_direct::send_eof() sends EOF only for the last SELECT_LEX in the list, but the second select gets "Impossible HAVING", and select_union_direct::send_eof() for the last SELECT_LEX is not called at all.

As a result, query execution ends with no EOF packet ever sent.

Comment by Igor Babaev [ 2017-05-19 ]

This bug was introduced by this patch:
commit 2cfc450bf78c2d951729d1a0e8f731c0d987b1d5
Author: Igor Babaev <igor@askmonty.org> Tue Feb 9 12:35:59 2016
Committer: Igor Babaev <igor@askmonty.org> Tue Feb 9 12:35:59 2016

This is the consolidated patch for mdev-8646:
"Re-factor the code for post-join operations".

The patch mainly contains the code ported from mysql-5.6 and
created for two essential architectural changes:
1. WL#5558: Resolve ORDER BY execution method at the optimization stage
2. WL#6071: Inline tmp tables into the nested loops algorithm

More exactly the following code leads to this bug:

  if (having && const_table_map && !having->with_sum_func)
  {
    having->update_used_tables();
    having= having->remove_eq_conds(thd, &select_lex->having_value, true);
    if (select_lex->having_value == Item::COND_FALSE)
    {
      having= new (thd->mem_root) Item_int(thd, (longlong) 0,1);
      zero_result_cause= "Impossible HAVING noticed after reading const tables";
      error= 0;
      DBUG_RETURN(0);
   }

This code may cause a premature exit from the optimizer when not all subqueries are optimized. This is not good by itself and is not in line with handling other 'impossible' situations.
Yet this does not cause directly the problem. When flow comes to JOIN::exec_inner
the following code is executed:

  if (exec_const_cond && !(select_options & SELECT_DESCRIBE) &&
      !exec_const_cond->val_int())
    zero_result_cause= "Impossible WHERE noticed after reading const tables";
 

This code triggers optimization and execution of the subquery. As a result
subselect_single_select_engine::exec() is called. The function sees that the optimizer
changed the execution engine for subselect_uniquesubquery_engine and
returns with the code 1:

    if (item->engine_changed(this))
      DBUG_RETURN(1);

without performing

thd->lex->current_select= save_select;

what it does when returning in all other points.
This ultimately causes the reported failure.
So the cause of the problem is in the legacy code though the mentioned ported code should be corrected as well.

Comment by Igor Babaev [ 2017-05-19 ]

The following correction for subselect_single_select_engine::exec() resolves the problem.

       }
     }
     if (item->engine_changed(this))
+    {
+      thd->lex->current_select= save_select;
       DBUG_RETURN(1);
+    }
   }
   if (select_lex->uncacheable &&
       select_lex->uncacheable != UNCACHEABLE_EXPLAIN

With the above change we have:

ariaDB [test]> SELECT 1, 2
    -> UNION ALL
    -> SELECT i, COUNT(*) FROM (
    ->   SELECT * FROM t1 WHERE i IN ( SELECT pk FROM t2 )
    -> ) AS sq
    -> GROUP BY i
    -> HAVING i = 10;
+------+---+
| 1    | 2 |
+------+---+
|    1 | 2 |
+------+---+
MariaDB [test]> EXPLAIN EXTENDED
    -> SELECT 1, 2
    -> UNION ALL
    -> SELECT i, COUNT(*) FROM (
    ->   SELECT * FROM t1 WHERE i IN ( SELECT pk FROM t2 )
    -> ) AS sq
    -> GROUP BY i
    -> HAVING i = 10;
+------+--------------------+-------+-----------------+---------------+---------+---------+------+------+----------+------------------------------------------------------+
| id   | select_type        | table | type            | possible_keys | key     | key_len | ref  | rows | filtered | Extra                                                |
+------+--------------------+-------+-----------------+---------------+---------+---------+------+------+----------+------------------------------------------------------+
|    1 | PRIMARY            | NULL  | NULL            | NULL          | NULL    | NULL    | NULL | NULL |     NULL | No tables used                                       |
|    2 | UNION              | NULL  | NULL            | NULL          | NULL    | NULL    | NULL | NULL |     NULL | Impossible HAVING noticed after reading const tables |
|    4 | DEPENDENT SUBQUERY | t2    | unique_subquery | PRIMARY       | PRIMARY | 4       | func |    1 |   100.00 | Using index                                          |
+------+--------------------+-------+-----------------+---------------+---------+---------+------+------+----------+------------------------------------------------------+
 
MariaDB [test]> SHOW WARNINGS;
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                                       |
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | select 1 AS `1`,2 AS `2` union all select 1 AS `i`,count(0) AS `COUNT(*)` from dual where <in_optimizer>(1,<exists>(<primary_index_lookup>(<cache>(1) in t2 on PRIMARY))) group by 1 having 0 |
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

and the main test suite passes.

However with this patch the subquery is still evaluated and this does not make sense.
To avoid this I've added the following change:

--- a/sql/sql_select.cc
+++ b/sql/sql_select.cc
@@ -1984,7 +1984,8 @@ JOIN::optimize_inner()
       having= new (thd->mem_root) Item_int(thd, (longlong) 0,1);
       zero_result_cause= "Impossible HAVING noticed after reading const tables";
       error= 0;
-      DBUG_RETURN(0);
+      select_lex->mark_const_derived(zero_result_cause);
+      goto setup_subq_exit;
     }
   }
 
@@ -3377,7 +3378,8 @@ void JOIN::exec_inner()
     condtions may be arbitrarily costly, and because the optimize phase
     might not have produced a complete executable plan for EXPLAINs.
   */
-  if (exec_const_cond && !(select_options & SELECT_DESCRIBE) &&
+  if (!zero_result_cause &&
+      exec_const_cond && !(select_options & SELECT_DESCRIBE) &&
       !exec_const_cond->val_int())
     zero_result_cause= "Impossible WHERE noticed after reading const tables";

It should be noticed that it would be enough to apply only this change to make the reported failing test case working because the subquery would be optimized at the optimization stage. So there would not be any execution of the subquery.

Comment by Igor Babaev [ 2017-05-19 ]

The fix for this bug was pushed into the 10.2 tree.

Generated at Thu Feb 08 08:00:28 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.