Details

    Description

      Problem found during RQG testing.
      The maximum runtime (300s) of the RQG worker threads gets exceeded even though
      the worker threads would exit in case this limit is already exceeded and the next query would have to be generated. This lets assume that the worker threads are since some long
      timespan within the execution of a statement which has not finished yet.
      The processlist shows the following
      Id      User    Host    db      Command Time    State   Info    Progress
      1       system user             NULL    Daemon  NULL    InnoDB purge worker     NULL    0.000
      2       system user             NULL    Daemon  NULL    InnoDB purge coordinator        NULL    0.000
      3       system user             NULL    Daemon  NULL    InnoDB purge worker     NULL    0.000
      4       system user             NULL    Daemon  NULL    InnoDB purge worker     NULL    0.000
      5       system user             NULL    Daemon  NULL    InnoDB shutdown handler NULL    0.000
      8       root    localhost:56266 mysql   Sleep   671             NULL    0.000
      16      root    localhost:56332 test    Query   659     preparing for alter table       ALTER TABLE t_09 DROP INDEX `Marvão_ftidx1`, ADD FULLTEXT INDEX `MarVão_ftidx1` ( col_text ), LOCK    0.000
      17      root    localhost:56334 test    Query   655     setup   ALTER TABLE t_06 ADD FULLTEXT INDEX `Marvão_ftidx1` ( col_text ), LOCK = SHARED /* E_R Thread2 QNO     0.000
      18      root    localhost:56336 test    Query   643     setup   ALTER TABLE t_07 ADD FULLTEXT INDEX `MarvãO_ftidx1` ( col_text ), LOCK = SHARED /* E_R Thread3 QNO     0.000
      19      root    localhost:56340 test    Query   659     Unlocking tables        ALTER TABLE t_02 DROP INDEX `Marvão_ftidx1`, ADD FULLTEXT INDEX `Marvão_ftidx1` ( col_text ), LOCK    0.000
      20      root    localhost:56342 test    Query   657     Unlocking tables        ALTER TABLE t_04 DROP INDEX `Marvão_ftidx1`, ADD FULLTEXT INDEX `Marvão_ftidx1` ( col_text ), LOCK    0.000
      21      root    localhost:56344 test    Query   657     Unlocking tables        ALTER TABLE t_05 DROP INDEX `MarVão_ftidx1`, ADD FULLTEXT INDEX `Marvão_ftidx1` ( col_text ), LOCK    0.000
      23      root    localhost:56348 test    Query   659     committing alter table to storage engine        ALTER TABLE t_03 DROP INDEX `Marvão_ftidx1`, ADD FULLTEXT INDEX `Marvão_ftidx1` ( col_text ), LOCK    0.000
      24      root    localhost:56350 test    Query   653     Unlocking tables        ALTER TABLE t_08 DROP INDEX `MarvãO_ftidx1`, ADD FULLTEXT INDEX `MArvão_ftidx1` ( col_text ), LOCK    0.000
      25      root    localhost:56352 test    Query   659     preparing for alter table       ALTER TABLE t_10 DROP INDEX `MArvão_ftidx1`, ADD FULLTEXT INDEX `MarVão_ftidx1` ( col_text ), LOCK    0.000
      26      root    localhost:56354 test    Query   659     committing alter table to storage engine        ALTER TABLE t_01 DROP INDEX `MArvão_ftidx1`, ADD FULLTEXT INDEX `Marvão_ftidx1` ( col_text ), LOCK    0.000
      114     root    localhost       NULL    Query   0       init    SHOW PROCESSLIST        0.000
      RQG declares "deadlock" in that case. 
       
      The problem reported here might be a sibling of already reported bugs where
      server do not react on "mysqladmin ... shutdown" or SIGTERM.
       
      Roughly the same effects was observed on 10.3 too.
       
      In the moment I do not know if the Storage engine InnoDB or the server is guilty.
      So please adjust the assignee if necessary.
      

      Attachments

        Issue Links

          Activity

            How to replay the problem?
            1. git clone https://github.com/mleich1/rqg --branch experimental RQG
            2. cd RQG
            3. tar xvzf <path_to>/MDEV-19529.tgz
                Unpacking of  MDEV-19529.sh, MDEV-19529.yy and some patched RQG reporter lib/GenTest/Reporter/Deadlock1.pm
            4. ./MDEV-19529.sh <path to your MariaDB binaries like /work/10.1/bld_debug>
                 At the end of the shellscript above is a "tail -f ...." which shows you the RQG protocol "storage/SINGLE_RQG/rqg.log" on screen.
                 In case you see a final 
                    # 2019-09-12T19:15:42 [83031] VERDICT: replay            <========
                    # 2019-09-12T19:15:42 [83031] DEBUG: 'du -sk /dev/shm/vardir/SINGLE_RQG | cut -f1' exited with value 0
                    # 2019-09-12T19:15:42 [83031] INFO: vardir_size : 1086236
                 than the RQG run has replayed the problem.
             5. After the run has finished "storage/SINGLE_RQG/" will contain an "archive.tgz" containing all remainings of the data directory.
                 This archive should also contain a core file.
                  Background:
                  As soon as the reporter means to have identified a deadlock/server freeze
                         The criterion which should kick in is:    Two threads wait already longer than 120s that their command finishes. 
                 the reporter sends the signals SIGHUP and SIGSEGV to the DB server process.
            6. storage/SINGLE_RQG/rqg.log contains already InnoDB Monitor output and some backtrace.
                 In case you want to inspect the core with GDB on your own than
                  1. cd storage/SINGLE_RQG/ ;tar xvzf archive.tgz
                  2. gdb -c dev/shm/vardir/SINGLE_RQG/1/data/core <path_to>/mysqld
            

            mleich Matthias Leich added a comment - How to replay the problem? 1. git clone https://github.com/mleich1/rqg --branch experimental RQG 2. cd RQG 3. tar xvzf <path_to>/MDEV-19529.tgz Unpacking of MDEV-19529.sh, MDEV-19529.yy and some patched RQG reporter lib/GenTest/Reporter/Deadlock1.pm 4. ./MDEV-19529.sh <path to your MariaDB binaries like /work/10.1/bld_debug> At the end of the shellscript above is a "tail -f ...." which shows you the RQG protocol "storage/SINGLE_RQG/rqg.log" on screen. In case you see a final # 2019-09-12T19:15:42 [83031] VERDICT: replay <======== # 2019-09-12T19:15:42 [83031] DEBUG: 'du -sk /dev/shm/vardir/SINGLE_RQG | cut -f1' exited with value 0 # 2019-09-12T19:15:42 [83031] INFO: vardir_size : 1086236 than the RQG run has replayed the problem. 5. After the run has finished "storage/SINGLE_RQG/" will contain an "archive.tgz" containing all remainings of the data directory. This archive should also contain a core file. Background: As soon as the reporter means to have identified a deadlock/server freeze The criterion which should kick in is: Two threads wait already longer than 120s that their command finishes. the reporter sends the signals SIGHUP and SIGSEGV to the DB server process. 6. storage/SINGLE_RQG/rqg.log contains already InnoDB Monitor output and some backtrace. In case you want to inspect the core with GDB on your own than 1. cd storage/SINGLE_RQG/ ;tar xvzf archive.tgz 2. gdb -c dev/shm/vardir/SINGLE_RQG/1/data/core <path_to>/mysqld

            The following stack trace has the main root cause for the hang of fts index. It looks like design flaw and there is no common table between the
            threads.

            Thread 28 (Thread 0x7f6c14e22700 (LWP 83125)):
            #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
            #1  0x000055c9f65d1560 in os_cond_wait (cond=0x7f6be68f83c8, fast_mutex=0x7f6be68f8390) at /home/mleich/work/10.1/storage/xtradb/os/os0sync.cc:193
            #2  0x000055c9f65d18d6 in os_event_wait_low (event=0x7f6be68f8390, reset_sig_count=1) at /home/mleich/work/10.1/storage/xtradb/os/os0sync.cc:474
            #3  0x000055c9f67de251 in fts_optimize_remove_table (table=0x7f6be64b70f8) at /home/mleich/work/10.1/storage/xtradb/fts/fts0opt.cc:2667
            #4  0x000055c9f67cae98 in fts_drop_index (table=0x7f6be64b70f8, index=0x7f6be6527a78, trx=0x7f6be68b7d78)
                at /home/mleich/work/10.1/storage/xtradb/fts/fts0fts.cc:853
            #5  0x000055c9f655fa97 in commit_cache_norebuild (ctx=0x7f6be6825b48, table=0x7f6be64bfc08, trx=0x7f6be68b7d78)
                at /home/mleich/work/10.1/storage/xtradb/handler/handler0alter.cc:5645
            #6  0x000055c9f655cb78 in ha_innobase::commit_inplace_alter_table (this=0x7f6be8088820, altered_table=0x7f6be682d008, ha_alter_info=0x7f6c14e1e790, 
                commit=true) at /home/mleich/work/10.1/storage/xtradb/handler/handler0alter.cc:6199
            #7  0x000055c9f61f4573 in handler::ha_commit_inplace_alter_table (this=0x7f6be8088820, altered_table=0x7f6be682d008, ha_alter_info=0x7f6c14e1e790, 
                commit=true) at /home/mleich/work/10.1/sql/handler.cc:4292
            #8  0x000055c9f606bb5e in mysql_inplace_alter_table (thd=0x7f6bf0e2b008, table_list=0x7f6be68241f0, table=0x7f6be64bfc08, altered_table=0x7f6be682d008, 
                ha_alter_info=0x7f6c14e1e790, inplace_supported=HA_ALTER_INPLACE_SHARED_LOCK_AFTER_PREPARE, target_mdl_request=0x7f6c14e1e800, 
                alter_ctx=0x7f6c14e1f3b0) at /home/mleich/work/10.1/sql/sql_table.cc:7209
            #9  0x000055c9f6070710 in mysql_alter_table (thd=0x7f6bf0e2b008, new_db=0x7f6be68247e8 "test", new_name=0x0, create_info=0x7f6c14e1ffa0, 
                table_list=0x7f6be68241f0, alter_info=0x7f6c14e1ff10, order_num=0, order=0x0, ignore=false) at /home/mleich/work/10.1/sql/sql_table.cc:9077
            #10 0x000055c9f60e3458 in Sql_cmd_alter_table::execute (this=0x7f6be68248f8, thd=0x7f6bf0e2b008) at /home/mleich/work/10.1/sql/sql_alter.cc:334
            #11 0x000055c9f5faebcb in mysql_execute_command (thd=0x7f6bf0e2b008) at /home/mleich/work/10.1/sql/sql_parse.cc:5442
            #12 0x000055c9f5fb37a1 in mysql_parse (thd=0x7f6bf0e2b008, 
                rawbuf=0x7f6be6824020 "ALTER TABLE t_01 DROP INDEX ftidx1, ADD FULLTEXT INDEX ftidx1 ( col_text ), LOCK = SHARED, ALGORITHM = INPLACE /* E_R Thread3 QNO 234 CON_ID 9 */", length=145, parser_state=0x7f6c14e21230) at /home/mleich/work/10.1/sql/sql_parse.cc:7209
            #13 0x000055c9f5fa2953 in dispatch_command (command=COM_QUERY, thd=0x7f6bf0e2b008, 
                packet=0x7f6bf0e31009 "ALTER TABLE t_01 DROP INDEX ftidx1, ADD FULLTEXT INDEX ftidx1 ( col_text ), LOCK = SHARED, ALGORITHM = INPLACE /* E_R Thread3 QNO 234 CON_ID 9 */ ", packet_length=146) at /home/mleich/work/10.1/sql/sql_parse.cc:1499
            #14 0x000055c9f5fa1711 in do_command (thd=0x7f6bf0e2b008) at /home/mleich/work/10.1/sql/sql_parse.cc:1131
            #15 0x000055c9f60de893 in do_handle_one_connection (thd_arg=0x7f6bf0e2b008) at /home/mleich/work/10.1/sql/sql_connect.cc:1331
            #16 0x000055c9f60de5c4 in handle_one_connection (arg=0x7f6bf0e2b008) at /home/mleich/work/10.1/sql/sql_connect.cc:1242
            ---Type <return> to continue, or q <return> to quit---
            #17 0x00007f6c15b2d6da in start_thread (arg=0x7f6c14e22700) at pthread_create.c:456
            #18 0x00007f6c151d6d7f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105
             
            Thread 22 (Thread 0x7f6bee3fa700 (LWP 83101)):
            #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
            #1  0x000055c9f65d1560 in os_cond_wait (cond=0x7f6c144350b0, fast_mutex=0x7f6c14435078) at /home/mleich/work/10.1/storage/xtradb/os/os0sync.cc:193
            #2  0x000055c9f65d18d6 in os_event_wait_low (event=0x7f6c14435078, reset_sig_count=409) at /home/mleich/work/10.1/storage/xtradb/os/os0sync.cc:474
            #3  0x000055c9f669d2ee in sync_array_wait_event (arr=0x7f6c144bcb40, index=2) at /home/mleich/work/10.1/storage/xtradb/sync/sync0arr.cc:476
            #4  0x000055c9f66a33e8 in mutex_spin_wait (_mutex=0x7f6c14435078, high_priority=false, 
                file_name=0x55c9f6b03bc0 "/home/mleich/work/10.1/storage/xtradb/dict/dict0dict.cc", line=568)
                at /home/mleich/work/10.1/storage/xtradb/sync/sync0sync.cc:689
            #5  0x000055c9f675ee5b in mutex_enter_func (mutex=0x7f6c14435078, file_name=0x55c9f6b03bc0 "/home/mleich/work/10.1/storage/xtradb/dict/dict0dict.cc", 
                line=568, priority=IB_DEFAULT_PRIO) at /home/mleich/work/10.1/storage/xtradb/include/sync0sync.ic:331
            #6  0x000055c9f675efd0 in pfs_mutex_enter_func (mutex=0x7f6c14435078, 
                file_name=0x55c9f6b03bc0 "/home/mleich/work/10.1/storage/xtradb/dict/dict0dict.cc", line=568, priority=IB_DEFAULT_PRIO)
                at /home/mleich/work/10.1/storage/xtradb/include/sync0sync.ic:415
            #7  0x000055c9f6763041 in dict_table_close (table=0x7f6be85fcb78, dict_locked=0, try_drop=0)
                at /home/mleich/work/10.1/storage/xtradb/dict/dict0dict.cc:568
            #8  0x000055c9f67dde6c in fts_optimize_table_bk (slot=0x7f6bec814140) at /home/mleich/work/10.1/storage/xtradb/fts/fts0opt.cc:2486
            #9  0x000055c9f67de92f in fts_optimize_thread (arg=0x7f6c14445af8) at /home/mleich/work/10.1/storage/xtradb/fts/fts0opt.cc:2892
            #10 0x00007f6c15b2d6da in start_thread (arg=0x7f6bee3fa700) at pthread_create.c:456
            #11 0x00007f6c151d6d7f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105
            

            t22 thread tries to close the table for t_02. Pasting the stack trace for future references.

            thiru Thirunarayanan Balathandayuthapani added a comment - The following stack trace has the main root cause for the hang of fts index. It looks like design flaw and there is no common table between the threads. Thread 28 (Thread 0x7f6c14e22700 (LWP 83125)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x000055c9f65d1560 in os_cond_wait (cond=0x7f6be68f83c8, fast_mutex=0x7f6be68f8390) at /home/mleich/work/10.1/storage/xtradb/os/os0sync.cc:193 #2 0x000055c9f65d18d6 in os_event_wait_low (event=0x7f6be68f8390, reset_sig_count=1) at /home/mleich/work/10.1/storage/xtradb/os/os0sync.cc:474 #3 0x000055c9f67de251 in fts_optimize_remove_table (table=0x7f6be64b70f8) at /home/mleich/work/10.1/storage/xtradb/fts/fts0opt.cc:2667 #4 0x000055c9f67cae98 in fts_drop_index (table=0x7f6be64b70f8, index=0x7f6be6527a78, trx=0x7f6be68b7d78) at /home/mleich/work/10.1/storage/xtradb/fts/fts0fts.cc:853 #5 0x000055c9f655fa97 in commit_cache_norebuild (ctx=0x7f6be6825b48, table=0x7f6be64bfc08, trx=0x7f6be68b7d78) at /home/mleich/work/10.1/storage/xtradb/handler/handler0alter.cc:5645 #6 0x000055c9f655cb78 in ha_innobase::commit_inplace_alter_table (this=0x7f6be8088820, altered_table=0x7f6be682d008, ha_alter_info=0x7f6c14e1e790, commit=true) at /home/mleich/work/10.1/storage/xtradb/handler/handler0alter.cc:6199 #7 0x000055c9f61f4573 in handler::ha_commit_inplace_alter_table (this=0x7f6be8088820, altered_table=0x7f6be682d008, ha_alter_info=0x7f6c14e1e790, commit=true) at /home/mleich/work/10.1/sql/handler.cc:4292 #8 0x000055c9f606bb5e in mysql_inplace_alter_table (thd=0x7f6bf0e2b008, table_list=0x7f6be68241f0, table=0x7f6be64bfc08, altered_table=0x7f6be682d008, ha_alter_info=0x7f6c14e1e790, inplace_supported=HA_ALTER_INPLACE_SHARED_LOCK_AFTER_PREPARE, target_mdl_request=0x7f6c14e1e800, alter_ctx=0x7f6c14e1f3b0) at /home/mleich/work/10.1/sql/sql_table.cc:7209 #9 0x000055c9f6070710 in mysql_alter_table (thd=0x7f6bf0e2b008, new_db=0x7f6be68247e8 "test", new_name=0x0, create_info=0x7f6c14e1ffa0, table_list=0x7f6be68241f0, alter_info=0x7f6c14e1ff10, order_num=0, order=0x0, ignore=false) at /home/mleich/work/10.1/sql/sql_table.cc:9077 #10 0x000055c9f60e3458 in Sql_cmd_alter_table::execute (this=0x7f6be68248f8, thd=0x7f6bf0e2b008) at /home/mleich/work/10.1/sql/sql_alter.cc:334 #11 0x000055c9f5faebcb in mysql_execute_command (thd=0x7f6bf0e2b008) at /home/mleich/work/10.1/sql/sql_parse.cc:5442 #12 0x000055c9f5fb37a1 in mysql_parse (thd=0x7f6bf0e2b008, rawbuf=0x7f6be6824020 "ALTER TABLE t_01 DROP INDEX ftidx1, ADD FULLTEXT INDEX ftidx1 ( col_text ), LOCK = SHARED, ALGORITHM = INPLACE /* E_R Thread3 QNO 234 CON_ID 9 */", length=145, parser_state=0x7f6c14e21230) at /home/mleich/work/10.1/sql/sql_parse.cc:7209 #13 0x000055c9f5fa2953 in dispatch_command (command=COM_QUERY, thd=0x7f6bf0e2b008, packet=0x7f6bf0e31009 "ALTER TABLE t_01 DROP INDEX ftidx1, ADD FULLTEXT INDEX ftidx1 ( col_text ), LOCK = SHARED, ALGORITHM = INPLACE /* E_R Thread3 QNO 234 CON_ID 9 */ ", packet_length=146) at /home/mleich/work/10.1/sql/sql_parse.cc:1499 #14 0x000055c9f5fa1711 in do_command (thd=0x7f6bf0e2b008) at /home/mleich/work/10.1/sql/sql_parse.cc:1131 #15 0x000055c9f60de893 in do_handle_one_connection (thd_arg=0x7f6bf0e2b008) at /home/mleich/work/10.1/sql/sql_connect.cc:1331 #16 0x000055c9f60de5c4 in handle_one_connection (arg=0x7f6bf0e2b008) at /home/mleich/work/10.1/sql/sql_connect.cc:1242 ---Type <return> to continue, or q <return> to quit--- #17 0x00007f6c15b2d6da in start_thread (arg=0x7f6c14e22700) at pthread_create.c:456 #18 0x00007f6c151d6d7f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105   Thread 22 (Thread 0x7f6bee3fa700 (LWP 83101)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x000055c9f65d1560 in os_cond_wait (cond=0x7f6c144350b0, fast_mutex=0x7f6c14435078) at /home/mleich/work/10.1/storage/xtradb/os/os0sync.cc:193 #2 0x000055c9f65d18d6 in os_event_wait_low (event=0x7f6c14435078, reset_sig_count=409) at /home/mleich/work/10.1/storage/xtradb/os/os0sync.cc:474 #3 0x000055c9f669d2ee in sync_array_wait_event (arr=0x7f6c144bcb40, index=2) at /home/mleich/work/10.1/storage/xtradb/sync/sync0arr.cc:476 #4 0x000055c9f66a33e8 in mutex_spin_wait (_mutex=0x7f6c14435078, high_priority=false, file_name=0x55c9f6b03bc0 "/home/mleich/work/10.1/storage/xtradb/dict/dict0dict.cc", line=568) at /home/mleich/work/10.1/storage/xtradb/sync/sync0sync.cc:689 #5 0x000055c9f675ee5b in mutex_enter_func (mutex=0x7f6c14435078, file_name=0x55c9f6b03bc0 "/home/mleich/work/10.1/storage/xtradb/dict/dict0dict.cc", line=568, priority=IB_DEFAULT_PRIO) at /home/mleich/work/10.1/storage/xtradb/include/sync0sync.ic:331 #6 0x000055c9f675efd0 in pfs_mutex_enter_func (mutex=0x7f6c14435078, file_name=0x55c9f6b03bc0 "/home/mleich/work/10.1/storage/xtradb/dict/dict0dict.cc", line=568, priority=IB_DEFAULT_PRIO) at /home/mleich/work/10.1/storage/xtradb/include/sync0sync.ic:415 #7 0x000055c9f6763041 in dict_table_close (table=0x7f6be85fcb78, dict_locked=0, try_drop=0) at /home/mleich/work/10.1/storage/xtradb/dict/dict0dict.cc:568 #8 0x000055c9f67dde6c in fts_optimize_table_bk (slot=0x7f6bec814140) at /home/mleich/work/10.1/storage/xtradb/fts/fts0opt.cc:2486 #9 0x000055c9f67de92f in fts_optimize_thread (arg=0x7f6c14445af8) at /home/mleich/work/10.1/storage/xtradb/fts/fts0opt.cc:2892 #10 0x00007f6c15b2d6da in start_thread (arg=0x7f6bee3fa700) at pthread_create.c:456 #11 0x00007f6c151d6d7f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105 t22 thread tries to close the table for t_02. Pasting the stack trace for future references.

            Patch is in bb-10.1-MDEV-19529

            thiru Thirunarayanan Balathandayuthapani added a comment - Patch is in bb-10.1- MDEV-19529
            mleich Matthias Leich added a comment - - edited

            When testing with RQG on MariaDB 10.1 + the patch mentioned above I got the
            usual fraction of known open bugs.
            

            mleich Matthias Leich added a comment - - edited When testing with RQG on MariaDB 10.1 + the patch mentioned above I got the usual fraction of known open bugs.

            Very nice fix! OK to push after considering my review comments.

            marko Marko Mäkelä added a comment - Very nice fix! OK to push after considering my review comments.

            People

              thiru Thirunarayanan Balathandayuthapani
              mleich Matthias Leich
              Votes:
              4 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.