Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-35571

Connection hangs after query on a partitioned table with UNION and LIMIT ROWS EXAMINED

Details

    Description

      We have a couple bug reports about a standard assertion failure in Protocol::end_statement upon queries with LIMIT ROWS EXAMINED (MDEV-22241, MDEV-29289). As it usually happens with this assertion, nobody is fixing it.

      This is another attempt. The difference with the existing bugs is that here the connection hangs on a non-debug build (while on a debug build it produces the same assertion failure). Also, the patch from MDEV-29289 doesn't seem to help.

      The query itself is quick (as can be verified by executing it without LIMIT ROWS EXAMINED, for example). When the connection hangs, the query isn't shown either in the process list or in the stack trace, from the outside the connection appears to be idle, the query just never returns a result set.

      --source include/have_partition.inc
      --source include/have_sequence.inc
       
      CREATE TABLE t1 (a INT);
      INSERT INTO t1 SELECT seq%25 FROM seq_1_to_100;
       
      CREATE TABLE t2 (b INT, c INT, KEY(b)) PARTITION BY HASH(c) PARTITIONS 12;
      INSERT INTO t2 SELECT seq, seq FROM seq_1_to_10;
       
      SELECT COUNT(*) FROM t1 JOIN t2 ON (b = a) UNION ALL SELECT COUNT(*) FROM t1 JOIN t2 ON (b = a) LIMIT ROWS EXAMINED 100;
       
      # Cleanup
      DROP TABLE t1, t2;
      

      10.5 fdb6db6b47f1825eabffde76c29d9b94545f1ef4 non-debug, stack traces from the test case hanging on SELECT

      [Thread debugging using libthread_db enabled]
      Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
      0x00007ff41c91b05f in __GI___poll (fds=fds@entry=0x7ffc51829480, nfds=nfds@entry=3, timeout=timeout@entry=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
      29	../sysdeps/unix/sysv/linux/poll.c: No such file or directory.
       
      Thread 6 (Thread 0x7ff4168946c0 (LWP 2998338) "mariadbd"):
      #0  0x00007ff41c91b05f in __GI___poll (fds=fds@entry=0x7ff4168939d8, nfds=nfds@entry=1, timeout=timeout@entry=28800000) at ../sysdeps/unix/sysv/linux/poll.c:29
      #1  0x0000559301e4bd68 in poll (__timeout=28800000, __nfds=1, __fds=0x7ff4168939d8) at /usr/include/x86_64-linux-gnu/bits/poll2.h:39
      #2  vio_io_wait (vio=vio@entry=0x559304bbba68, event=event@entry=VIO_IO_EVENT_READ, timeout=28800000) at /data/bld/10.5-rel/vio/viosocket.c:1000
      #3  0x0000559301e4bee0 in vio_socket_io_wait (vio=vio@entry=0x559304bbba68, event=event@entry=VIO_IO_EVENT_READ) at /data/bld/10.5-rel/vio/viosocket.c:118
      #4  0x0000559301e4bfb3 in vio_read (vio=0x559304bbba68, buf=0x7ff404008108 "\001", size=4) at /data/bld/10.5-rel/vio/viosocket.c:199
      #5  0x0000559301dd2917 in my_real_read (net=0x7ff404000f10, complen=complen@entry=0x7ff416893c78, header=1 '\001') at /data/bld/10.5-rel/sql/net_serv.cc:996
      #6  0x0000559301dd3c80 in my_net_read_packet_reallen (net=0x7ff404000f10, read_from_server=<optimized out>, reallen=reallen@entry=0x7ff416893cc8) at /data/bld/10.5-rel/sql/net_serv.cc:1277
      #7  0x0000559301dd3d7d in my_net_read_packet (net=<optimized out>, read_from_server=<optimized out>) at /data/bld/10.5-rel/sql/net_serv.cc:1261
      #8  0x0000559301a5cdf9 in do_command (thd=thd@entry=0x7ff404000c68) at /data/bld/10.5-rel/sql/sql_parse.cc:1233
      #9  0x0000559301b6801d in do_handle_one_connection (connect=<optimized out>, put_in_cache=true) at /data/bld/10.5-rel/sql/sql_connect.cc:1386
      #10 0x0000559301b68485 in handle_one_connection (arg=arg@entry=0x559304c27ed8) at /data/bld/10.5-rel/sql/sql_connect.cc:1298
      #11 0x0000559301f1ca94 in pfs_spawn_thread (arg=0x559304bbba68) at /data/bld/10.5-rel/storage/perfschema/pfs.cc:2201
      #12 0x00007ff41c8a8044 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
      #13 0x00007ff41c92861c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
       
      Thread 5 (Thread 0x7ff4168df6c0 (LWP 2998332) "mariadbd"):
      #0  0x00007ff41c85bc02 in __GI___sigtimedwait (set=0x7ff4168ded10, info=0x7ff4168ded90, timeout=0x0) at ../sysdeps/unix/sysv/linux/sigtimedwait.c:31
      #1  0x000055930196b33b in my_sigwait (code=<synthetic pointer>, sig=<synthetic pointer>, set=0x7ff4168ded10) at /data/bld/10.5-rel/include/my_pthread.h:193
      #2  signal_hand () at /data/bld/10.5-rel/sql/mysqld.cc:3003
      #3  0x0000559301f1ca94 in pfs_spawn_thread (arg=0x5593048c0df8) at /data/bld/10.5-rel/storage/perfschema/pfs.cc:2201
      #4  0x00007ff41c8a8044 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
      #5  0x00007ff41c92861c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
       
      Thread 4 (Thread 0x7ff41692a6c0 (LWP 2998331) "mariadbd"):
      #0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x559302d018c8 <COND_manager+40>) at ./nptl/futex-internal.c:57
      #1  __futex_abstimed_wait_common (futex_word=futex_word@entry=0x559302d018c8 <COND_manager+40>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0, cancel=cancel@entry=true) at ./nptl/futex-internal.c:87
      #2  0x00007ff41c8a4e0b in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x559302d018c8 <COND_manager+40>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
      #3  0x00007ff41c8a7468 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x559302d018e0 <LOCK_manager>, cond=0x559302d018a0 <COND_manager>) at ./nptl/pthread_cond_wait.c:503
      #4  ___pthread_cond_wait (cond=0x559302d018a0 <COND_manager>, mutex=0x559302d018e0 <LOCK_manager>) at ./nptl/pthread_cond_wait.c:618
      #5  0x0000559301a45246 in inline_mysql_cond_wait (that=0x559302d018a0 <COND_manager>, mutex=0x559302d018e0 <LOCK_manager>, src_file=0x5593024a0b70 "/data/bld/10.5-rel/sql/sql_manager.cc", src_line=109) at /data/bld/10.5-rel/include/mysql/psi/mysql_thread.h:1222
      #6  handle_manager (arg=arg@entry=0x0) at /data/bld/10.5-rel/sql/sql_manager.cc:109
      #7  0x0000559301f1ca94 in pfs_spawn_thread (arg=0x5593048cd918) at /data/bld/10.5-rel/storage/perfschema/pfs.cc:2201
      #8  0x00007ff41c8a8044 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
      #9  0x00007ff41c92861c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
       
      Thread 3 (Thread 0x7ff41712b6c0 (LWP 2998330) "mariadbd"):
      #0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x7ff41712ad70, op=393, expected=0, futex_word=0x559303535d88 <COND_checkpoint+40>) at ./nptl/futex-internal.c:57
      #1  __futex_abstimed_wait_common (futex_word=futex_word@entry=0x559303535d88 <COND_checkpoint+40>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x7ff41712ad70, private=private@entry=0, cancel=cancel@entry=true) at ./nptl/futex-internal.c:87
      #2  0x00007ff41c8a4e0b in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x559303535d88 <COND_checkpoint+40>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x7ff41712ad70, private=private@entry=0) at ./nptl/futex-internal.c:139
      #3  0x00007ff41c8a774c in __pthread_cond_wait_common (abstime=0x7ff41712ad70, clockid=0, mutex=0x559303535da0 <LOCK_checkpoint>, cond=0x559303535d60 <COND_checkpoint>) at ./nptl/pthread_cond_wait.c:503
      #4  ___pthread_cond_timedwait64 (cond=cond@entry=0x559303535d60 <COND_checkpoint>, mutex=mutex@entry=0x559303535da0 <LOCK_checkpoint>, abstime=abstime@entry=0x7ff41712ad70) at ./nptl/pthread_cond_wait.c:643
      #5  0x0000559301eb016c in inline_mysql_cond_timedwait (src_file=0x55930261e9d8 "/data/bld/10.5-rel/storage/maria/ma_servicethread.c", src_line=115, abstime=0x7ff41712ad70, mutex=0x559303535da0 <LOCK_checkpoint>, that=0x559303535d60 <COND_checkpoint>) at /data/bld/10.5-rel/include/mysql/psi/mysql_thread.h:1259
      #6  my_service_thread_sleep (control=control@entry=0x559302c6c7c0 <checkpoint_control>, sleep_time=sleep_time@entry=30000000000) at /data/bld/10.5-rel/storage/maria/ma_servicethread.c:115
      #7  0x0000559301ea85b9 in ma_checkpoint_background (arg=arg@entry=0x1e) at /data/bld/10.5-rel/storage/maria/ma_checkpoint.c:725
      #8  0x0000559301f1ca94 in pfs_spawn_thread (arg=0x5593048a2548) at /data/bld/10.5-rel/storage/perfschema/pfs.cc:2201
      #9  0x00007ff41c8a8044 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
      #10 0x00007ff41c92861c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
       
      Thread 2 (Thread 0x7ff41cd8e6c0 (LWP 2998329) "mariadbd"):
      #0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x7ff41cd8dde0, op=393, expected=0, futex_word=0x5593035d6ac8 <COND_timer+40>) at ./nptl/futex-internal.c:57
      #1  __futex_abstimed_wait_common (futex_word=futex_word@entry=0x5593035d6ac8 <COND_timer+40>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x7ff41cd8dde0, private=private@entry=0, cancel=cancel@entry=true) at ./nptl/futex-internal.c:87
      #2  0x00007ff41c8a4e0b in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x5593035d6ac8 <COND_timer+40>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x7ff41cd8dde0, private=private@entry=0) at ./nptl/futex-internal.c:139
      #3  0x00007ff41c8a774c in __pthread_cond_wait_common (abstime=0x7ff41cd8dde0, clockid=0, mutex=0x5593035d6ae0 <LOCK_timer>, cond=0x5593035d6aa0 <COND_timer>) at ./nptl/pthread_cond_wait.c:503
      #4  ___pthread_cond_timedwait64 (cond=cond@entry=0x5593035d6aa0 <COND_timer>, mutex=mutex@entry=0x5593035d6ae0 <LOCK_timer>, abstime=abstime@entry=0x7ff41cd8dde0) at ./nptl/pthread_cond_wait.c:643
      #5  0x000055930229485b in inline_mysql_cond_timedwait (that=0x5593035d6aa0 <COND_timer>, mutex=0x5593035d6ae0 <LOCK_timer>, src_file=0x55930269e630 "/data/bld/10.5-rel/mysys/thr_timer.c", src_line=322, abstime=0x7ff41cd8dde0) at /data/bld/10.5-rel/include/mysql/psi/mysql_thread.h:1259
      #6  timer_handler (arg=arg@entry=0x0) at /data/bld/10.5-rel/mysys/thr_timer.c:322
      #7  0x0000559301f1ca94 in pfs_spawn_thread (arg=0x5593047f9ed8) at /data/bld/10.5-rel/storage/perfschema/pfs.cc:2201
      #8  0x00007ff41c8a8044 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
      #9  0x00007ff41c92861c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
       
      Thread 1 (Thread 0x7ff41cc25280 (LWP 2998327) "mariadbd"):
      #0  0x00007ff41c91b05f in __GI___poll (fds=fds@entry=0x7ffc51829480, nfds=nfds@entry=3, timeout=timeout@entry=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
      #1  0x0000559301973872 in poll (__timeout=-1, __nfds=3, __fds=0x7ffc51829480) at /usr/include/x86_64-linux-gnu/bits/poll2.h:39
      #2  handle_connections_sockets () at /data/bld/10.5-rel/sql/mysqld.cc:6297
      #3  0x00005593019747b8 in run_main_loop () at /data/bld/10.5-rel/sql/mysqld.cc:5323
      #4  mysqld_main (argc=<optimized out>, argv=<optimized out>) at /data/bld/10.5-rel/sql/mysqld.cc:5734
      #5  0x00007ff41c8461ca in __libc_start_call_main (main=main@entry=0x559301933080 <main(int, char**)>, argc=argc@entry=8, argv=argv@entry=0x7ffc51829908) at ../sysdeps/nptl/libc_start_call_main.h:58
      #6  0x00007ff41c846285 in __libc_start_main_impl (main=0x559301933080 <main(int, char**)>, argc=8, argv=0x7ffc51829908, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffc518298f8) at ../csu/libc-start.c:360
      #7  0x0000559301968c91 in _start ()
      

      10.5 fdb6db6b47f1825eabffde76c29d9b94545f1ef4 debug

      mariadbd: /data/bld/10.5-debug/sql/protocol.cc:618: void Protocol::end_statement(): Assertion `0' failed.
      241205  1:31:03 [ERROR] mysqld got signal 6 ;
       
      #8  0x00007f0e2d445395 in __assert_fail_base (fmt=0x7f0e2d5b9a90 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x55a3e94f8c78 "0", file=file@entry=0x55a3e94f8830 "/data/bld/10.5-debug/sql/protocol.cc", line=line@entry=618, function=function@entry=0x55a3e94f8bf0 "void Protocol::end_statement()") at ./assert/assert.c:92
      #9  0x00007f0e2d453e32 in __GI___assert_fail (assertion=0x55a3e94f8c78 "0", file=0x55a3e94f8830 "/data/bld/10.5-debug/sql/protocol.cc", line=618, function=0x55a3e94f8bf0 "void Protocol::end_statement()") at ./assert/assert.c:101
      #10 0x000055a3e8515bb3 in Protocol::end_statement (this=0x7f0e0c0013c8) at /data/bld/10.5-debug/sql/protocol.cc:618
      #11 0x000055a3e8659008 in dispatch_command (command=COM_QUERY, thd=0x7f0e0c000dc8, packet=0x7f0e0c00b759 "", packet_length=119, is_com_multi=false, is_next_command=false) at /data/bld/10.5-debug/sql/sql_parse.cc:2484
      #12 0x000055a3e8655656 in do_command (thd=0x7f0e0c000dc8) at /data/bld/10.5-debug/sql/sql_parse.cc:1375
      #13 0x000055a3e881d7d5 in do_handle_one_connection (connect=0x55a3ebdb70f8, put_in_cache=true) at /data/bld/10.5-debug/sql/sql_connect.cc:1386
      #14 0x000055a3e881d55d in handle_one_connection (arg=0x55a3ebd9a758) at /data/bld/10.5-debug/sql/sql_connect.cc:1298
      #15 0x000055a3e8d68cf2 in pfs_spawn_thread (arg=0x55a3ebcf8318) at /data/bld/10.5-debug/storage/perfschema/pfs.cc:2201
      #16 0x00007f0e2d4a8044 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
      #17 0x00007f0e2d52861c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
      

      Attachments

        Issue Links

          Activity

            ycp Yuchen Pei added a comment - - edited

            The assertion failure seems irrelevant to the partition engine, as it occurs if we remove the PARTITION BY... from the table definition. Similarly, on a release build (of 10.5 2ab10fbec2dc086f3486c356064fed4d4c74ca7d), the hang happens with or without partitioning. Debugging seems to suggest independence to partition too.

            ycp Yuchen Pei added a comment - - edited The assertion failure seems irrelevant to the partition engine, as it occurs if we remove the PARTITION BY... from the table definition. Similarly, on a release build (of 10.5 2ab10fbec2dc086f3486c356064fed4d4c74ca7d), the hang happens with or without partitioning. Debugging seems to suggest independence to partition too.
            ycp Yuchen Pei added a comment - - edited

            A minimal testcase resulting in both assertion failure in a debug build and hang in a release build, at 10.5 2238c484e68d4f67648b2dc3f339b989a7a77c0c:

            create table t1 (a int);
            insert into t1 values (1), (2);
            select * from t1 UNION ALL select * from t1 LIMIT ROWS EXAMINED 1;
            DROP TABLE t1;
            

            Note that the failure does not occur if UNION ALL is replaced by UNION DISTINCT.

            ycp Yuchen Pei added a comment - - edited A minimal testcase resulting in both assertion failure in a debug build and hang in a release build, at 10.5 2238c484e68d4f67648b2dc3f339b989a7a77c0c: create table t1 (a int ); insert into t1 values (1), (2); select * from t1 UNION ALL select * from t1 LIMIT ROWS EXAMINED 1; DROP TABLE t1; Note that the failure does not occur if UNION ALL is replaced by UNION DISTINCT .
            ycp Yuchen Pei added a comment - - edited

            Here's a patch fixing this issue

            fb0515a5768 upstream/bb-10.5-mdev-35571 MDEV-35571 Check for LIMIT ROWS EXAMINED exceeded in UNION ALL
            

            It is a different issue from MDEV-29289, as that one is concerned with cleaning up after the execution.

            It is also different from MDEV-22241, which is concerned with the check of LIMIT ROWS EXAMINED during optimization (the present issue happens in the exec stage).

            I've pushed all three patches to a custom branch for demo purposes:

            01b6c442a72 upstream/bb-10.5-limit-rows-examined MDEV-22241 Check for LIMIT ROWS EXAMINED in optimization
            41b264e4029 MDEV-29289 uncleaned LIMIT ROWS EXAMINED
            fb0515a5768 upstream/bb-10.5-mdev-35571 MDEV-35571 Check for LIMIT ROWS EXAMINED exceeded in UNION ALL
            

            ycp Yuchen Pei added a comment - - edited Here's a patch fixing this issue fb0515a5768 upstream/bb-10.5-mdev-35571 MDEV-35571 Check for LIMIT ROWS EXAMINED exceeded in UNION ALL It is a different issue from MDEV-29289 , as that one is concerned with cleaning up after the execution. It is also different from MDEV-22241 , which is concerned with the check of LIMIT ROWS EXAMINED during optimization (the present issue happens in the exec stage). I've pushed all three patches to a custom branch for demo purposes: 01b6c442a72 upstream/bb-10.5-limit-rows-examined MDEV-22241 Check for LIMIT ROWS EXAMINED in optimization 41b264e4029 MDEV-29289 uncleaned LIMIT ROWS EXAMINED fb0515a5768 upstream/bb-10.5-mdev-35571 MDEV-35571 Check for LIMIT ROWS EXAMINED exceeded in UNION ALL
            ycp Yuchen Pei added a comment -

            Hi sanja, ptal thanks

            fb0515a5768 upstream/bb-10.5-mdev-35571 MDEV-35571 Check for LIMIT ROWS EXAMINED exceeded in UNION ALL
            

            ycp Yuchen Pei added a comment - Hi sanja , ptal thanks fb0515a5768 upstream/bb-10.5-mdev-35571 MDEV-35571 Check for LIMIT ROWS EXAMINED exceeded in UNION ALL

            OK to push

            sanja Oleksandr Byelkin added a comment - OK to push
            ycp Yuchen Pei added a comment -

            Thanks for the review - pushed 432856c473feb92ddd69442a4c164ee88a0d28d7 to 10.5

            ycp Yuchen Pei added a comment - Thanks for the review - pushed 432856c473feb92ddd69442a4c164ee88a0d28d7 to 10.5

            People

              ycp Yuchen Pei
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.